An Introduction to Nonlinear Principal Component AnalysisAn Introduction to Nonlinear Principal...

Post on 04-Jul-2020

7 views 0 download

transcript

An Introduction toNonlinear

Principal Component AnalysisAdam Monahanmonahana@uvic.ca

School of Earth and Ocean SciencesUniversity of Victoria

An Introduction toNonlinearPrincipal Component Analysis – p. 1/33

OverviewDimensionality reduction

Principal Component AnalysisNonlinear PCA

theoryimplementation

Applications of NLPCALorenz attractorNH Tropospheric LFV

Conclusions

An Introduction toNonlinearPrincipal Component Analysis – p. 2/33

OverviewDimensionality reductionPrincipal Component Analysis

Nonlinear PCAtheoryimplementation

Applications of NLPCALorenz attractorNH Tropospheric LFV

Conclusions

An Introduction toNonlinearPrincipal Component Analysis – p. 2/33

OverviewDimensionality reductionPrincipal Component AnalysisNonlinear PCA

theoryimplementation

Applications of NLPCALorenz attractorNH Tropospheric LFV

Conclusions

An Introduction toNonlinearPrincipal Component Analysis – p. 2/33

OverviewDimensionality reductionPrincipal Component AnalysisNonlinear PCA

theory

implementation

Applications of NLPCALorenz attractorNH Tropospheric LFV

Conclusions

An Introduction toNonlinearPrincipal Component Analysis – p. 2/33

OverviewDimensionality reductionPrincipal Component AnalysisNonlinear PCA

theoryimplementation

Applications of NLPCALorenz attractorNH Tropospheric LFV

Conclusions

An Introduction toNonlinearPrincipal Component Analysis – p. 2/33

OverviewDimensionality reductionPrincipal Component AnalysisNonlinear PCA

theoryimplementation

Applications of NLPCA

Lorenz attractorNH Tropospheric LFV

Conclusions

An Introduction toNonlinearPrincipal Component Analysis – p. 2/33

OverviewDimensionality reductionPrincipal Component AnalysisNonlinear PCA

theoryimplementation

Applications of NLPCALorenz attractor

NH Tropospheric LFV

Conclusions

An Introduction toNonlinearPrincipal Component Analysis – p. 2/33

OverviewDimensionality reductionPrincipal Component AnalysisNonlinear PCA

theoryimplementation

Applications of NLPCALorenz attractorNH Tropospheric LFV

Conclusions

An Introduction toNonlinearPrincipal Component Analysis – p. 2/33

OverviewDimensionality reductionPrincipal Component AnalysisNonlinear PCA

theoryimplementation

Applications of NLPCALorenz attractorNH Tropospheric LFV

Conclusions

An Introduction toNonlinearPrincipal Component Analysis – p. 2/33

Dimensionality Reduction

Climate datasets made up of time series at individualstations/geographical locations

Typical dataset has P ∼ O(103) time seriesOrganised structure in atmosphere/ocean flows

⇒ time series at different locations not independent

⇒ data does not fill out isotropic cloud of points in RP ,but clusters around lower-dimensional surface(reflecting the “attractor”)Goal of dimensionality reduction in climatediagnostics is to characterise such structures inclimate datasets

An Introduction toNonlinearPrincipal Component Analysis – p. 3/33

Dimensionality Reduction

Climate datasets made up of time series at individualstations/geographical locations

Typical dataset has P ∼ O(103) time series

Organised structure in atmosphere/ocean flows⇒ time series at different locations not independent

⇒ data does not fill out isotropic cloud of points in RP ,but clusters around lower-dimensional surface(reflecting the “attractor”)Goal of dimensionality reduction in climatediagnostics is to characterise such structures inclimate datasets

An Introduction toNonlinearPrincipal Component Analysis – p. 3/33

Dimensionality Reduction

Climate datasets made up of time series at individualstations/geographical locations

Typical dataset has P ∼ O(103) time seriesOrganised structure in atmosphere/ocean flows

⇒ time series at different locations not independent

⇒ data does not fill out isotropic cloud of points in RP ,but clusters around lower-dimensional surface(reflecting the “attractor”)Goal of dimensionality reduction in climatediagnostics is to characterise such structures inclimate datasets

An Introduction toNonlinearPrincipal Component Analysis – p. 3/33

Dimensionality Reduction

Climate datasets made up of time series at individualstations/geographical locations

Typical dataset has P ∼ O(103) time seriesOrganised structure in atmosphere/ocean flows

⇒ time series at different locations not independent

⇒ data does not fill out isotropic cloud of points in RP ,but clusters around lower-dimensional surface(reflecting the “attractor”)Goal of dimensionality reduction in climatediagnostics is to characterise such structures inclimate datasets

An Introduction toNonlinearPrincipal Component Analysis – p. 3/33

Dimensionality Reduction

Climate datasets made up of time series at individualstations/geographical locations

Typical dataset has P ∼ O(103) time seriesOrganised structure in atmosphere/ocean flows

⇒ time series at different locations not independent

⇒ data does not fill out isotropic cloud of points in RP ,but clusters around lower-dimensional surface(reflecting the “attractor”)

Goal of dimensionality reduction in climatediagnostics is to characterise such structures inclimate datasets

An Introduction toNonlinearPrincipal Component Analysis – p. 3/33

Dimensionality Reduction

Climate datasets made up of time series at individualstations/geographical locations

Typical dataset has P ∼ O(103) time seriesOrganised structure in atmosphere/ocean flows

⇒ time series at different locations not independent

⇒ data does not fill out isotropic cloud of points in RP ,but clusters around lower-dimensional surface(reflecting the “attractor”)Goal of dimensionality reduction in climatediagnostics is to characterise such structures inclimate datasets

An Introduction toNonlinearPrincipal Component Analysis – p. 3/33

Dimensionality Reduction

Realising this goal has both theoretical and practicaldifficulties:

Theoretical:what is the precise definition of “structure”?how to formulate appropriate statistical model?

Practical:many important observational climate datasetsquite short, with O(10)−O(1000) statisticaldegrees of freedomwhat degree of “structure” can be robustlydiagnosed with existing data?

An Introduction toNonlinearPrincipal Component Analysis – p. 4/33

Dimensionality Reduction

Realising this goal has both theoretical and practicaldifficulties:Theoretical:

what is the precise definition of “structure”?how to formulate appropriate statistical model?

Practical:many important observational climate datasetsquite short, with O(10)−O(1000) statisticaldegrees of freedomwhat degree of “structure” can be robustlydiagnosed with existing data?

An Introduction toNonlinearPrincipal Component Analysis – p. 4/33

Dimensionality Reduction

Realising this goal has both theoretical and practicaldifficulties:Theoretical:

what is the precise definition of “structure”?

how to formulate appropriate statistical model?

Practical:many important observational climate datasetsquite short, with O(10)−O(1000) statisticaldegrees of freedomwhat degree of “structure” can be robustlydiagnosed with existing data?

An Introduction toNonlinearPrincipal Component Analysis – p. 4/33

Dimensionality Reduction

Realising this goal has both theoretical and practicaldifficulties:Theoretical:

what is the precise definition of “structure”?how to formulate appropriate statistical model?

Practical:many important observational climate datasetsquite short, with O(10)−O(1000) statisticaldegrees of freedomwhat degree of “structure” can be robustlydiagnosed with existing data?

An Introduction toNonlinearPrincipal Component Analysis – p. 4/33

Dimensionality Reduction

Realising this goal has both theoretical and practicaldifficulties:Theoretical:

what is the precise definition of “structure”?how to formulate appropriate statistical model?

Practical:

many important observational climate datasetsquite short, with O(10)−O(1000) statisticaldegrees of freedomwhat degree of “structure” can be robustlydiagnosed with existing data?

An Introduction toNonlinearPrincipal Component Analysis – p. 4/33

Dimensionality Reduction

Realising this goal has both theoretical and practicaldifficulties:Theoretical:

what is the precise definition of “structure”?how to formulate appropriate statistical model?

Practical:many important observational climate datasetsquite short, with O(10)−O(1000) statisticaldegrees of freedom

what degree of “structure” can be robustlydiagnosed with existing data?

An Introduction toNonlinearPrincipal Component Analysis – p. 4/33

Dimensionality Reduction

Realising this goal has both theoretical and practicaldifficulties:Theoretical:

what is the precise definition of “structure”?how to formulate appropriate statistical model?

Practical:many important observational climate datasetsquite short, with O(10)−O(1000) statisticaldegrees of freedomwhat degree of “structure” can be robustlydiagnosed with existing data?

An Introduction toNonlinearPrincipal Component Analysis – p. 4/33

Principal Component Analysis

A classical approach to dimensionalityprincipal component analysis (PCA)

Look for M -dimensional hyperplane approximation, optimalin least-squares sense

X(t) =M∑

k=1

〈X(t), ek〉 ek + ε(t)

minimising E {||ε2||}inner product often (not always) simple dot product

Vectors ek are the empirical orthogonal functions (EOFs)

An Introduction toNonlinearPrincipal Component Analysis – p. 5/33

Principal Component Analysis

A classical approach to dimensionalityprincipal component analysis (PCA)

Look for M -dimensional hyperplane approximation, optimalin least-squares sense

X(t) =M∑

k=1

〈X(t), ek〉 ek + ε(t)

minimising E {||ε2||}inner product often (not always) simple dot product

Vectors ek are the empirical orthogonal functions (EOFs)

An Introduction toNonlinearPrincipal Component Analysis – p. 5/33

Principal Component Analysis

A classical approach to dimensionalityprincipal component analysis (PCA)

Look for M -dimensional hyperplane approximation, optimalin least-squares sense

X(t) =M∑

k=1

〈X(t), ek〉 ek + ε(t)

minimising E {||ε2||}

inner product often (not always) simple dot product

Vectors ek are the empirical orthogonal functions (EOFs)

An Introduction toNonlinearPrincipal Component Analysis – p. 5/33

Principal Component Analysis

A classical approach to dimensionalityprincipal component analysis (PCA)

Look for M -dimensional hyperplane approximation, optimalin least-squares sense

X(t) =M∑

k=1

〈X(t), ek〉 ek + ε(t)

minimising E {||ε2||}inner product often (not always) simple dot product

Vectors ek are the empirical orthogonal functions (EOFs)

An Introduction toNonlinearPrincipal Component Analysis – p. 5/33

Principal Component Analysis

A classical approach to dimensionalityprincipal component analysis (PCA)

Look for M -dimensional hyperplane approximation, optimalin least-squares sense

X(t) =M∑

k=1

〈X(t), ek〉 ek + ε(t)

minimising E {||ε2||}inner product often (not always) simple dot product

Vectors ek are the empirical orthogonal functions (EOFs)

An Introduction toNonlinearPrincipal Component Analysis – p. 5/33

Principal Component Analysis

An Introduction toNonlinearPrincipal Component Analysis – p. 6/33

Principal Component Analysis

Operationally, EOFs are found as eigenvectors ofcovariance matrix (in appropriate norm)

PCA optimally efficient characterisation of GaussiandataMore generally: PCA provides optimallyparsimonious data compression for any datasetwhose distribution lies along orthogonal axesBut what if the underlying low-dimensionalstructure is curved rather than straight?(cigars vs. bananas)

An Introduction toNonlinearPrincipal Component Analysis – p. 7/33

Principal Component Analysis

Operationally, EOFs are found as eigenvectors ofcovariance matrix (in appropriate norm)PCA optimally efficient characterisation of Gaussiandata

More generally: PCA provides optimallyparsimonious data compression for any datasetwhose distribution lies along orthogonal axesBut what if the underlying low-dimensionalstructure is curved rather than straight?(cigars vs. bananas)

An Introduction toNonlinearPrincipal Component Analysis – p. 7/33

Principal Component Analysis

Operationally, EOFs are found as eigenvectors ofcovariance matrix (in appropriate norm)PCA optimally efficient characterisation of GaussiandataMore generally: PCA provides optimallyparsimonious data compression for any datasetwhose distribution lies along orthogonal axes

But what if the underlying low-dimensionalstructure is curved rather than straight?(cigars vs. bananas)

An Introduction toNonlinearPrincipal Component Analysis – p. 7/33

Principal Component Analysis

Operationally, EOFs are found as eigenvectors ofcovariance matrix (in appropriate norm)PCA optimally efficient characterisation of GaussiandataMore generally: PCA provides optimallyparsimonious data compression for any datasetwhose distribution lies along orthogonal axesBut what if the underlying low-dimensionalstructure is curved rather than straight?(cigars vs. bananas)

An Introduction toNonlinearPrincipal Component Analysis – p. 7/33

Nonlinear Low-Dimensional Structure

An Introduction toNonlinearPrincipal Component Analysis – p. 8/33

Nonlinear PCAAn approach to diagnosing nonlinear low-dimensionalstructure is Nonlinear PCA (NLPCA)

Goal: find functions (with M < P )

sf : RP → RM , f : RM → RP

such thatX(t) = (f ◦ sf ) (X(t)) + ε(t)

where

E {||ε2||} is minimised

f(λ) ∼ approximation manifold

λ(t) = sf (X(t)) ∼ manifold parameterisation (time series)

An Introduction toNonlinearPrincipal Component Analysis – p. 9/33

Nonlinear PCAAn approach to diagnosing nonlinear low-dimensionalstructure is Nonlinear PCA (NLPCA)

Goal: find functions (with M < P )

sf : RP → RM , f : RM → RP

such thatX(t) = (f ◦ sf ) (X(t)) + ε(t)

where

E {||ε2||} is minimised

f(λ) ∼ approximation manifold

λ(t) = sf (X(t)) ∼ manifold parameterisation (time series)

An Introduction toNonlinearPrincipal Component Analysis – p. 9/33

Nonlinear PCAAn approach to diagnosing nonlinear low-dimensionalstructure is Nonlinear PCA (NLPCA)

Goal: find functions (with M < P )

sf : RP → RM , f : RM → RP

such thatX(t) = (f ◦ sf ) (X(t)) + ε(t)

where

E {||ε2||} is minimised

f(λ) ∼ approximation manifold

λ(t) = sf (X(t)) ∼ manifold parameterisation (time series)

An Introduction toNonlinearPrincipal Component Analysis – p. 9/33

Nonlinear PCAAn approach to diagnosing nonlinear low-dimensionalstructure is Nonlinear PCA (NLPCA)

Goal: find functions (with M < P )

sf : RP → RM , f : RM → RP

such thatX(t) = (f ◦ sf ) (X(t)) + ε(t)

where

E {||ε2||} is minimised

f(λ) ∼ approximation manifold

λ(t) = sf (X(t)) ∼ manifold parameterisation (time series)

An Introduction toNonlinearPrincipal Component Analysis – p. 9/33

Nonlinear PCAAn approach to diagnosing nonlinear low-dimensionalstructure is Nonlinear PCA (NLPCA)

Goal: find functions (with M < P )

sf : RP → RM , f : RM → RP

such thatX(t) = (f ◦ sf ) (X(t)) + ε(t)

where

E {||ε2||} is minimised

f(λ) ∼ approximation manifold

λ(t) = sf (X(t)) ∼ manifold parameterisation (time series)

An Introduction toNonlinearPrincipal Component Analysis – p. 9/33

Nonlinear PCA

s f

f

X(t)

λ(t) = s f( X(t))

X(t) = ( f o s f)( X(t))

^

From Monahan, Fyfe, and Pandolfo (2003)

An Introduction toNonlinearPrincipal Component Analysis – p. 10/33

Nonlinear PCAAs with PCA, “fraction of variance explained” is ameasure of quality of approximation

PCA is a special case of NLPCAWhen implemented, NLPCA should reduce to PCAif:

data is Gaussiannot enough data is available to robustlycharacterise non-Gaussian structure

An Introduction toNonlinearPrincipal Component Analysis – p. 11/33

Nonlinear PCAAs with PCA, “fraction of variance explained” is ameasure of quality of approximationPCA is a special case of NLPCA

When implemented, NLPCA should reduce to PCAif:

data is Gaussiannot enough data is available to robustlycharacterise non-Gaussian structure

An Introduction toNonlinearPrincipal Component Analysis – p. 11/33

Nonlinear PCAAs with PCA, “fraction of variance explained” is ameasure of quality of approximationPCA is a special case of NLPCAWhen implemented, NLPCA should reduce to PCAif:

data is Gaussiannot enough data is available to robustlycharacterise non-Gaussian structure

An Introduction toNonlinearPrincipal Component Analysis – p. 11/33

Nonlinear PCAAs with PCA, “fraction of variance explained” is ameasure of quality of approximationPCA is a special case of NLPCAWhen implemented, NLPCA should reduce to PCAif:

data is Gaussian

not enough data is available to robustlycharacterise non-Gaussian structure

An Introduction toNonlinearPrincipal Component Analysis – p. 11/33

Nonlinear PCAAs with PCA, “fraction of variance explained” is ameasure of quality of approximationPCA is a special case of NLPCAWhen implemented, NLPCA should reduce to PCAif:

data is Gaussiannot enough data is available to robustlycharacterise non-Gaussian structure

An Introduction toNonlinearPrincipal Component Analysis – p. 11/33

NLPCA: Implementation

Implemented NLPCA using neural networks(convenient, not necessary)

Parameter estimation more difficult than for PCAPCA model is linear in statistical parameters:

Y = MX

so variational problem has unique analytic solutionNLPCA model nonlinear in model parameters, sosolution

may not be uniquemust be found through numerical minimisation

An Introduction toNonlinearPrincipal Component Analysis – p. 12/33

NLPCA: Implementation

Implemented NLPCA using neural networks(convenient, not necessary)Parameter estimation more difficult than for PCA

PCA model is linear in statistical parameters:

Y = MX

so variational problem has unique analytic solutionNLPCA model nonlinear in model parameters, sosolution

may not be uniquemust be found through numerical minimisation

An Introduction toNonlinearPrincipal Component Analysis – p. 12/33

NLPCA: Implementation

Implemented NLPCA using neural networks(convenient, not necessary)Parameter estimation more difficult than for PCAPCA model is linear in statistical parameters:

Y = MX

so variational problem has unique analytic solution

NLPCA model nonlinear in model parameters, sosolution

may not be uniquemust be found through numerical minimisation

An Introduction toNonlinearPrincipal Component Analysis – p. 12/33

NLPCA: Implementation

Implemented NLPCA using neural networks(convenient, not necessary)Parameter estimation more difficult than for PCAPCA model is linear in statistical parameters:

Y = MX

so variational problem has unique analytic solutionNLPCA model nonlinear in model parameters, sosolution

may not be uniquemust be found through numerical minimisation

An Introduction toNonlinearPrincipal Component Analysis – p. 12/33

NLPCA: Implementation

Implemented NLPCA using neural networks(convenient, not necessary)Parameter estimation more difficult than for PCAPCA model is linear in statistical parameters:

Y = MX

so variational problem has unique analytic solutionNLPCA model nonlinear in model parameters, sosolution

may not be unique

must be found through numerical minimisation

An Introduction toNonlinearPrincipal Component Analysis – p. 12/33

NLPCA: Implementation

Implemented NLPCA using neural networks(convenient, not necessary)Parameter estimation more difficult than for PCAPCA model is linear in statistical parameters:

Y = MX

so variational problem has unique analytic solutionNLPCA model nonlinear in model parameters, sosolution

may not be uniquemust be found through numerical minimisation

An Introduction toNonlinearPrincipal Component Analysis – p. 12/33

NLPCA: Parameter EstimationTwo fundamental issues regarding parameterestimation common to all statistical models:

Reproducibilitymodel must be robust to the introduction of newdatanew observations shouldn’t fundamentallychange model

Classifiability:model must be robust to details of optimisationproceduremodel shouldn’t depend on initial parametervalues

An Introduction toNonlinearPrincipal Component Analysis – p. 13/33

NLPCA: Parameter EstimationTwo fundamental issues regarding parameterestimation common to all statistical models:Reproducibility

model must be robust to the introduction of newdatanew observations shouldn’t fundamentallychange model

Classifiability:model must be robust to details of optimisationproceduremodel shouldn’t depend on initial parametervalues

An Introduction toNonlinearPrincipal Component Analysis – p. 13/33

NLPCA: Parameter EstimationTwo fundamental issues regarding parameterestimation common to all statistical models:Reproducibility

model must be robust to the introduction of newdata

new observations shouldn’t fundamentallychange model

Classifiability:model must be robust to details of optimisationproceduremodel shouldn’t depend on initial parametervalues

An Introduction toNonlinearPrincipal Component Analysis – p. 13/33

NLPCA: Parameter EstimationTwo fundamental issues regarding parameterestimation common to all statistical models:Reproducibility

model must be robust to the introduction of newdatanew observations shouldn’t fundamentallychange model

Classifiability:model must be robust to details of optimisationproceduremodel shouldn’t depend on initial parametervalues

An Introduction toNonlinearPrincipal Component Analysis – p. 13/33

NLPCA: Parameter EstimationTwo fundamental issues regarding parameterestimation common to all statistical models:Reproducibility

model must be robust to the introduction of newdatanew observations shouldn’t fundamentallychange model

Classifiability:

model must be robust to details of optimisationproceduremodel shouldn’t depend on initial parametervalues

An Introduction toNonlinearPrincipal Component Analysis – p. 13/33

NLPCA: Parameter EstimationTwo fundamental issues regarding parameterestimation common to all statistical models:Reproducibility

model must be robust to the introduction of newdatanew observations shouldn’t fundamentallychange model

Classifiability:model must be robust to details of optimisationprocedure

model shouldn’t depend on initial parametervalues

An Introduction toNonlinearPrincipal Component Analysis – p. 13/33

NLPCA: Parameter EstimationTwo fundamental issues regarding parameterestimation common to all statistical models:Reproducibility

model must be robust to the introduction of newdatanew observations shouldn’t fundamentallychange model

Classifiability:model must be robust to details of optimisationproceduremodel shouldn’t depend on initial parametervalues

An Introduction toNonlinearPrincipal Component Analysis – p. 13/33

NLPCA: Synthetic Gaussian Data

Synthetic Gaussian dataAn Introduction toNonlinearPrincipal Component Analysis – p. 14/33

Applications of NLPCA: Lorenz Attractor

ScatterplotsAn Introduction toNonlinearPrincipal Component Analysis – p. 15/33

Applications of NLPCA: Lorenz Attractor

1D PCA approximation (60%)An Introduction toNonlinearPrincipal Component Analysis – p. 16/33

Applications of NLPCA: Lorenz Attractor

1D NLPCA approximation (76%)An Introduction toNonlinearPrincipal Component Analysis – p. 17/33

Applications of NLPCA: Lorenz Attractor

2D PCA approximation (94%)An Introduction toNonlinearPrincipal Component Analysis – p. 18/33

Applications of NLPCA: Lorenz Attractor

2D NLPCA approximation (97%)An Introduction toNonlinearPrincipal Component Analysis – p. 19/33

Applications of NLPCA: NH Tropospheric LFV

EOF1

EOF2

10-day lowpass-filtered 500 hPa geopotential height EOFs

An Introduction toNonlinearPrincipal Component Analysis – p. 20/33

Applications of NLPCA: NH Tropospheric LFV

1D NLPCA Approximation: spatial structure(PCA: 14.8%; NLPCA 18.4%)

An Introduction toNonlinearPrincipal Component Analysis – p. 21/33

Applications of NLPCA: NH Tropospheric LFV

1D NLPCA Approximation: pdf of time seriesAn Introduction toNonlinearPrincipal Component Analysis – p. 22/33

Applications of NLPCA: NH Tropospheric LFV

1D NLPCA Approximation: regime maps

An Introduction toNonlinearPrincipal Component Analysis – p. 23/33

Applications of NLPCA: NH Tropospheric LFV

1D NLPCA Approximation: interannual variabilityAn Introduction toNonlinearPrincipal Component Analysis – p. 24/33

NLPCA: Limitations and DrawbacksParameter estimation in NLPCA (as in any nonlinearstatistical model) must be done very carefully toensure robust approximation

⇒ analysis time-consuming, data hungry⇒ insufficiently careful analysis leads to spurious

results (e.g. Christiansen, 2005)Theoretical underpinning of NLPCA is weak

⇒ no “rigorous” theory of sampling variabilityInformation theory may provide new tools with

better sampling propertiesbetter theoretical basis

An Introduction toNonlinearPrincipal Component Analysis – p. 25/33

NLPCA: Limitations and DrawbacksParameter estimation in NLPCA (as in any nonlinearstatistical model) must be done very carefully toensure robust approximation

⇒ analysis time-consuming, data hungry

⇒ insufficiently careful analysis leads to spuriousresults (e.g. Christiansen, 2005)Theoretical underpinning of NLPCA is weak

⇒ no “rigorous” theory of sampling variabilityInformation theory may provide new tools with

better sampling propertiesbetter theoretical basis

An Introduction toNonlinearPrincipal Component Analysis – p. 25/33

NLPCA: Limitations and DrawbacksParameter estimation in NLPCA (as in any nonlinearstatistical model) must be done very carefully toensure robust approximation

⇒ analysis time-consuming, data hungry⇒ insufficiently careful analysis leads to spurious

results (e.g. Christiansen, 2005)

Theoretical underpinning of NLPCA is weak⇒ no “rigorous” theory of sampling variability

Information theory may provide new tools withbetter sampling propertiesbetter theoretical basis

An Introduction toNonlinearPrincipal Component Analysis – p. 25/33

NLPCA: Limitations and DrawbacksParameter estimation in NLPCA (as in any nonlinearstatistical model) must be done very carefully toensure robust approximation

⇒ analysis time-consuming, data hungry⇒ insufficiently careful analysis leads to spurious

results (e.g. Christiansen, 2005)Theoretical underpinning of NLPCA is weak

⇒ no “rigorous” theory of sampling variabilityInformation theory may provide new tools with

better sampling propertiesbetter theoretical basis

An Introduction toNonlinearPrincipal Component Analysis – p. 25/33

NLPCA: Limitations and DrawbacksParameter estimation in NLPCA (as in any nonlinearstatistical model) must be done very carefully toensure robust approximation

⇒ analysis time-consuming, data hungry⇒ insufficiently careful analysis leads to spurious

results (e.g. Christiansen, 2005)Theoretical underpinning of NLPCA is weak

⇒ no “rigorous” theory of sampling variability

Information theory may provide new tools withbetter sampling propertiesbetter theoretical basis

An Introduction toNonlinearPrincipal Component Analysis – p. 25/33

NLPCA: Limitations and DrawbacksParameter estimation in NLPCA (as in any nonlinearstatistical model) must be done very carefully toensure robust approximation

⇒ analysis time-consuming, data hungry⇒ insufficiently careful analysis leads to spurious

results (e.g. Christiansen, 2005)Theoretical underpinning of NLPCA is weak

⇒ no “rigorous” theory of sampling variabilityInformation theory may provide new tools with

better sampling propertiesbetter theoretical basis

An Introduction toNonlinearPrincipal Component Analysis – p. 25/33

NLPCA: Limitations and DrawbacksParameter estimation in NLPCA (as in any nonlinearstatistical model) must be done very carefully toensure robust approximation

⇒ analysis time-consuming, data hungry⇒ insufficiently careful analysis leads to spurious

results (e.g. Christiansen, 2005)Theoretical underpinning of NLPCA is weak

⇒ no “rigorous” theory of sampling variabilityInformation theory may provide new tools with

better sampling properties

better theoretical basis

An Introduction toNonlinearPrincipal Component Analysis – p. 25/33

NLPCA: Limitations and DrawbacksParameter estimation in NLPCA (as in any nonlinearstatistical model) must be done very carefully toensure robust approximation

⇒ analysis time-consuming, data hungry⇒ insufficiently careful analysis leads to spurious

results (e.g. Christiansen, 2005)Theoretical underpinning of NLPCA is weak

⇒ no “rigorous” theory of sampling variabilityInformation theory may provide new tools with

better sampling propertiesbetter theoretical basis

An Introduction toNonlinearPrincipal Component Analysis – p. 25/33

ConclusionsTraditional PCA optimal for dimensionalityreduction only if data distribution falls alongorthogonal axes

Can define nonlinear generalisation, NLPCA, whichcan robustly characterise nonlinear low-dimensionalstructure in datasetsNLPCA approximations can provide afundamentally different characterisation of data thanPCA approximationsImplementation of NLPCA difficult and lacking inunderlying theory; represents a first attempt at a big(and challenging) problem

An Introduction toNonlinearPrincipal Component Analysis – p. 26/33

ConclusionsTraditional PCA optimal for dimensionalityreduction only if data distribution falls alongorthogonal axesCan define nonlinear generalisation, NLPCA, whichcan robustly characterise nonlinear low-dimensionalstructure in datasets

NLPCA approximations can provide afundamentally different characterisation of data thanPCA approximationsImplementation of NLPCA difficult and lacking inunderlying theory; represents a first attempt at a big(and challenging) problem

An Introduction toNonlinearPrincipal Component Analysis – p. 26/33

ConclusionsTraditional PCA optimal for dimensionalityreduction only if data distribution falls alongorthogonal axesCan define nonlinear generalisation, NLPCA, whichcan robustly characterise nonlinear low-dimensionalstructure in datasetsNLPCA approximations can provide afundamentally different characterisation of data thanPCA approximations

Implementation of NLPCA difficult and lacking inunderlying theory; represents a first attempt at a big(and challenging) problem

An Introduction toNonlinearPrincipal Component Analysis – p. 26/33

ConclusionsTraditional PCA optimal for dimensionalityreduction only if data distribution falls alongorthogonal axesCan define nonlinear generalisation, NLPCA, whichcan robustly characterise nonlinear low-dimensionalstructure in datasetsNLPCA approximations can provide afundamentally different characterisation of data thanPCA approximationsImplementation of NLPCA difficult and lacking inunderlying theory; represents a first attempt at a big(and challenging) problem

An Introduction toNonlinearPrincipal Component Analysis – p. 26/33

Acknowledgements

William Hsieh (UBC)Lionel Pandolfo (UBC)John Fyfe (CCCma)Qiaobin Teng (CCCma)Benyang Tang (JPL)

An Introduction toNonlinearPrincipal Component Analysis – p. 27/33

Parameter Estimation in NLPCAAn ensemble approach was taken

For a large number N (∼ 50) of trials:data was randomly split into training andvalidation sets (taking autocorrelation intoaccount)a random initial parameter set was selected

For each ensemble member, iterative minimisationprocedure carried out until either:

error over training data stopped changingerror over validation data started increasing

Method does not look for global error minimum

An Introduction toNonlinearPrincipal Component Analysis – p. 28/33

Parameter Estimation in NLPCAAn ensemble approach was takenFor a large number N (∼ 50) of trials:

data was randomly split into training andvalidation sets (taking autocorrelation intoaccount)a random initial parameter set was selected

For each ensemble member, iterative minimisationprocedure carried out until either:

error over training data stopped changingerror over validation data started increasing

Method does not look for global error minimum

An Introduction toNonlinearPrincipal Component Analysis – p. 28/33

Parameter Estimation in NLPCAAn ensemble approach was takenFor a large number N (∼ 50) of trials:

data was randomly split into training andvalidation sets (taking autocorrelation intoaccount)

a random initial parameter set was selected

For each ensemble member, iterative minimisationprocedure carried out until either:

error over training data stopped changingerror over validation data started increasing

Method does not look for global error minimum

An Introduction toNonlinearPrincipal Component Analysis – p. 28/33

Parameter Estimation in NLPCAAn ensemble approach was takenFor a large number N (∼ 50) of trials:

data was randomly split into training andvalidation sets (taking autocorrelation intoaccount)a random initial parameter set was selected

For each ensemble member, iterative minimisationprocedure carried out until either:

error over training data stopped changingerror over validation data started increasing

Method does not look for global error minimum

An Introduction toNonlinearPrincipal Component Analysis – p. 28/33

Parameter Estimation in NLPCAAn ensemble approach was takenFor a large number N (∼ 50) of trials:

data was randomly split into training andvalidation sets (taking autocorrelation intoaccount)a random initial parameter set was selected

For each ensemble member, iterative minimisationprocedure carried out until either:

error over training data stopped changingerror over validation data started increasing

Method does not look for global error minimum

An Introduction toNonlinearPrincipal Component Analysis – p. 28/33

Parameter Estimation in NLPCAAn ensemble approach was takenFor a large number N (∼ 50) of trials:

data was randomly split into training andvalidation sets (taking autocorrelation intoaccount)a random initial parameter set was selected

For each ensemble member, iterative minimisationprocedure carried out until either:

error over training data stopped changing

error over validation data started increasing

Method does not look for global error minimum

An Introduction toNonlinearPrincipal Component Analysis – p. 28/33

Parameter Estimation in NLPCAAn ensemble approach was takenFor a large number N (∼ 50) of trials:

data was randomly split into training andvalidation sets (taking autocorrelation intoaccount)a random initial parameter set was selected

For each ensemble member, iterative minimisationprocedure carried out until either:

error over training data stopped changingerror over validation data started increasing

Method does not look for global error minimum

An Introduction toNonlinearPrincipal Component Analysis – p. 28/33

Parameter Estimation in NLPCAAn ensemble approach was takenFor a large number N (∼ 50) of trials:

data was randomly split into training andvalidation sets (taking autocorrelation intoaccount)a random initial parameter set was selected

For each ensemble member, iterative minimisationprocedure carried out until either:

error over training data stopped changingerror over validation data started increasing

Method does not look for global error minimum

An Introduction toNonlinearPrincipal Component Analysis – p. 28/33

Parameter Estimation in NLPCAEnsemble member becomes candidate model if

⟨||ε||2

⟩validation

≤⟨||ε||2

⟩training

Candidate models comparedif they share same shape and orientation

⇒ approximation is robustif they differ in shape and orientation

⇒ approximation is not robust

If approximation not robust, model simplified &procedure repeated until robust model found

An Introduction toNonlinearPrincipal Component Analysis – p. 29/33

Parameter Estimation in NLPCAEnsemble member becomes candidate model if

⟨||ε||2

⟩validation

≤⟨||ε||2

⟩training

Candidate models compared

if they share same shape and orientation⇒ approximation is robust

if they differ in shape and orientation⇒ approximation is not robust

If approximation not robust, model simplified &procedure repeated until robust model found

An Introduction toNonlinearPrincipal Component Analysis – p. 29/33

Parameter Estimation in NLPCAEnsemble member becomes candidate model if

⟨||ε||2

⟩validation

≤⟨||ε||2

⟩training

Candidate models comparedif they share same shape and orientation

⇒ approximation is robustif they differ in shape and orientation

⇒ approximation is not robust

If approximation not robust, model simplified &procedure repeated until robust model found

An Introduction toNonlinearPrincipal Component Analysis – p. 29/33

Parameter Estimation in NLPCAEnsemble member becomes candidate model if

⟨||ε||2

⟩validation

≤⟨||ε||2

⟩training

Candidate models comparedif they share same shape and orientation

⇒ approximation is robust

if they differ in shape and orientation⇒ approximation is not robust

If approximation not robust, model simplified &procedure repeated until robust model found

An Introduction toNonlinearPrincipal Component Analysis – p. 29/33

Parameter Estimation in NLPCAEnsemble member becomes candidate model if

⟨||ε||2

⟩validation

≤⟨||ε||2

⟩training

Candidate models comparedif they share same shape and orientation

⇒ approximation is robustif they differ in shape and orientation

⇒ approximation is not robust

If approximation not robust, model simplified &procedure repeated until robust model found

An Introduction toNonlinearPrincipal Component Analysis – p. 29/33

Parameter Estimation in NLPCAEnsemble member becomes candidate model if

⟨||ε||2

⟩validation

≤⟨||ε||2

⟩training

Candidate models comparedif they share same shape and orientation

⇒ approximation is robustif they differ in shape and orientation

⇒ approximation is not robust

If approximation not robust, model simplified &procedure repeated until robust model found

An Introduction toNonlinearPrincipal Component Analysis – p. 29/33

Parameter Estimation in NLPCAEnsemble member becomes candidate model if

⟨||ε||2

⟩validation

≤⟨||ε||2

⟩training

Candidate models comparedif they share same shape and orientation

⇒ approximation is robustif they differ in shape and orientation

⇒ approximation is not robustIf approximation not robust, model simplified &procedure repeated until robust model found

An Introduction toNonlinearPrincipal Component Analysis – p. 29/33

Parameter Estimation in NLPCAProcedure will ultimately yield PCA solution if norobust non-Gaussian structure present

Such a careful procedure necessary to avoid findingspurious non-Gaussian structure

An Introduction toNonlinearPrincipal Component Analysis – p. 30/33

Parameter Estimation in NLPCAProcedure will ultimately yield PCA solution if norobust non-Gaussian structure presentSuch a careful procedure necessary to avoid findingspurious non-Gaussian structure

An Introduction toNonlinearPrincipal Component Analysis – p. 30/33

Applications of NLPCA: Tropical Pacific SST

EOF PatternsAn Introduction toNonlinearPrincipal Component Analysis – p. 31/33

Applications of NLPCA: Tropical Pacific SST

1D NLPCA Approximation: spatial structure

An Introduction toNonlinearPrincipal Component Analysis – p. 32/33

Applications of NLPCA: Tropical Pacific SST

1D NLPCA Approximation: spatial structure

An Introduction toNonlinearPrincipal Component Analysis – p. 33/33