An Introduction to Nonlinear Principal Component AnalysisAn Introduction to Nonlinear Principal...

An Introduction toNonlinear

Principal Component AnalysisAdam [email protected]

School of Earth and Ocean SciencesUniversity of Victoria

An Introduction toNonlinearPrincipal Component Analysis – p. 1/33

OverviewDimensionality reduction

Principal Component AnalysisNonlinear PCA

theoryimplementation

Applications of NLPCALorenz attractorNH Tropospheric LFV

Conclusions


OverviewDimensionality reductionPrincipal Component Analysis

Nonlinear PCAtheoryimplementation


Conclusions


OverviewDimensionality reductionPrincipal Component AnalysisNonlinear PCA



Conclusions



theory

implementation


Conclusions





Conclusions




Applications of NLPCA

Lorenz attractorNH Tropospheric LFV

Conclusions




Applications of NLPCALorenz attractor

NH Tropospheric LFV

Conclusions





Conclusions





Conclusions


Dimensionality Reduction

Climate datasets made up of time series at individualstations/geographical locations

Typical dataset has P ∼ O(103) time seriesOrganised structure in atmosphere/ocean flows

⇒ time series at different locations not independent

⇒ data does not fill out isotropic cloud of points in RP ,but clusters around lower-dimensional surface(reflecting the “attractor”)Goal of dimensionality reduction in climatediagnostics is to characterise such structures inclimate datasets




Typical dataset has P ∼ O(103) time series

Organised structure in atmosphere/ocean flows⇒ time series at different locations not independent



















⇒ data does not fill out isotropic cloud of points in RP ,but clusters around lower-dimensional surface(reflecting the “attractor”)

Goal of dimensionality reduction in climatediagnostics is to characterise such structures inclimate datasets









Realising this goal has both theoretical and practicaldifficulties:

Theoretical:what is the precise definition of “structure”?how to formulate appropriate statistical model?

Practical:many important observational climate datasetsquite short, with O(10)−O(1000) statisticaldegrees of freedomwhat degree of “structure” can be robustlydiagnosed with existing data?



Realising this goal has both theoretical and practicaldifficulties:Theoretical:

what is the precise definition of “structure”?how to formulate appropriate statistical model?





what is the precise definition of “structure”?

how to formulate appropriate statistical model?











Practical:

many important observational climate datasetsquite short, with O(10)−O(1000) statisticaldegrees of freedomwhat degree of “structure” can be robustlydiagnosed with existing data?





Practical:many important observational climate datasetsquite short, with O(10)−O(1000) statisticaldegrees of freedom

what degree of “structure” can be robustlydiagnosed with existing data?







Principal Component Analysis

A classical approach to dimensionalityprincipal component analysis (PCA)

Look for M -dimensional hyperplane approximation, optimalin least-squares sense

X(t) =M∑

k=1

〈X(t), ek〉 ek + ε(t)

minimising E {||ε2||}inner product often (not always) simple dot product

Vectors ek are the empirical orthogonal functions (EOFs)





X(t) =M∑

k=1








X(t) =M∑

k=1


minimising E {||ε2||}

inner product often (not always) simple dot product






X(t) =M∑

k=1








X(t) =M∑

k=1








Operationally, EOFs are found as eigenvectors ofcovariance matrix (in appropriate norm)

PCA optimally efficient characterisation of GaussiandataMore generally: PCA provides optimallyparsimonious data compression for any datasetwhose distribution lies along orthogonal axesBut what if the underlying low-dimensionalstructure is curved rather than straight?(cigars vs. bananas)



Operationally, EOFs are found as eigenvectors ofcovariance matrix (in appropriate norm)PCA optimally efficient characterisation of Gaussiandata

More generally: PCA provides optimallyparsimonious data compression for any datasetwhose distribution lies along orthogonal axesBut what if the underlying low-dimensionalstructure is curved rather than straight?(cigars vs. bananas)



Operationally, EOFs are found as eigenvectors ofcovariance matrix (in appropriate norm)PCA optimally efficient characterisation of GaussiandataMore generally: PCA provides optimallyparsimonious data compression for any datasetwhose distribution lies along orthogonal axes

But what if the underlying low-dimensionalstructure is curved rather than straight?(cigars vs. bananas)



Operationally, EOFs are found as eigenvectors ofcovariance matrix (in appropriate norm)PCA optimally efficient characterisation of GaussiandataMore generally: PCA provides optimallyparsimonious data compression for any datasetwhose distribution lies along orthogonal axesBut what if the underlying low-dimensionalstructure is curved rather than straight?(cigars vs. bananas)


Nonlinear Low-Dimensional Structure


Nonlinear PCAAn approach to diagnosing nonlinear low-dimensionalstructure is Nonlinear PCA (NLPCA)

Goal: find functions (with M < P )

sf : RP → RM , f : RM → RP

such thatX(t) = (f ◦ sf ) (X(t)) + ε(t)

where

E {||ε2||} is minimised

f(λ) ∼ approximation manifold

λ(t) = sf (X(t)) ∼ manifold parameterisation (time series)






where









where









where









where





Nonlinear PCA

s f

f

tλ

X(t)

λ(t) = s f( X(t))

X(t) = ( f o s f)( X(t))

^

From Monahan, Fyfe, and Pandolfo (2003)


Nonlinear PCAAs with PCA, “fraction of variance explained” is ameasure of quality of approximation

PCA is a special case of NLPCAWhen implemented, NLPCA should reduce to PCAif:

data is Gaussiannot enough data is available to robustlycharacterise non-Gaussian structure


Nonlinear PCAAs with PCA, “fraction of variance explained” is ameasure of quality of approximationPCA is a special case of NLPCA

When implemented, NLPCA should reduce to PCAif:



Nonlinear PCAAs with PCA, “fraction of variance explained” is ameasure of quality of approximationPCA is a special case of NLPCAWhen implemented, NLPCA should reduce to PCAif:




data is Gaussian

not enough data is available to robustlycharacterise non-Gaussian structure





NLPCA: Implementation

Implemented NLPCA using neural networks(convenient, not necessary)

Parameter estimation more difficult than for PCAPCA model is linear in statistical parameters:

Y = MX

so variational problem has unique analytic solutionNLPCA model nonlinear in model parameters, sosolution

may not be uniquemust be found through numerical minimisation



Implemented NLPCA using neural networks(convenient, not necessary)Parameter estimation more difficult than for PCA

PCA model is linear in statistical parameters:

Y = MX





Implemented NLPCA using neural networks(convenient, not necessary)Parameter estimation more difficult than for PCAPCA model is linear in statistical parameters:

Y = MX

so variational problem has unique analytic solution

NLPCA model nonlinear in model parameters, sosolution





Y = MX






Y = MX


may not be unique

must be found through numerical minimisation




Y = MX




NLPCA: Parameter EstimationTwo fundamental issues regarding parameterestimation common to all statistical models:

Reproducibilitymodel must be robust to the introduction of newdatanew observations shouldn’t fundamentallychange model

Classifiability:model must be robust to details of optimisationproceduremodel shouldn’t depend on initial parametervalues


NLPCA: Parameter EstimationTwo fundamental issues regarding parameterestimation common to all statistical models:Reproducibility

model must be robust to the introduction of newdatanew observations shouldn’t fundamentallychange model




model must be robust to the introduction of newdata

new observations shouldn’t fundamentallychange model









Classifiability:

model must be robust to details of optimisationproceduremodel shouldn’t depend on initial parametervalues




Classifiability:model must be robust to details of optimisationprocedure

model shouldn’t depend on initial parametervalues






NLPCA: Synthetic Gaussian Data

Synthetic Gaussian dataAn Introduction toNonlinearPrincipal Component Analysis – p. 14/33

Applications of NLPCA: Lorenz Attractor

ScatterplotsAn Introduction toNonlinearPrincipal Component Analysis – p. 15/33


1D PCA approximation (60%)An Introduction toNonlinearPrincipal Component Analysis – p. 16/33


1D NLPCA approximation (76%)An Introduction toNonlinearPrincipal Component Analysis – p. 17/33


2D PCA approximation (94%)An Introduction toNonlinearPrincipal Component Analysis – p. 18/33


2D NLPCA approximation (97%)An Introduction toNonlinearPrincipal Component Analysis – p. 19/33

Applications of NLPCA: NH Tropospheric LFV

EOF1

EOF2

10-day lowpass-filtered 500 hPa geopotential height EOFs



1D NLPCA Approximation: spatial structure(PCA: 14.8%; NLPCA 18.4%)



1D NLPCA Approximation: pdf of time seriesAn Introduction toNonlinearPrincipal Component Analysis – p. 22/33


1D NLPCA Approximation: regime maps



1D NLPCA Approximation: interannual variabilityAn Introduction toNonlinearPrincipal Component Analysis – p. 24/33

NLPCA: Limitations and DrawbacksParameter estimation in NLPCA (as in any nonlinearstatistical model) must be done very carefully toensure robust approximation

⇒ analysis time-consuming, data hungry⇒ insufficiently careful analysis leads to spurious

results (e.g. Christiansen, 2005)Theoretical underpinning of NLPCA is weak

⇒ no “rigorous” theory of sampling variabilityInformation theory may provide new tools with

better sampling propertiesbetter theoretical basis



⇒ analysis time-consuming, data hungry

⇒ insufficiently careful analysis leads to spuriousresults (e.g. Christiansen, 2005)Theoretical underpinning of NLPCA is weak






results (e.g. Christiansen, 2005)

Theoretical underpinning of NLPCA is weak⇒ no “rigorous” theory of sampling variability

Information theory may provide new tools withbetter sampling propertiesbetter theoretical basis











⇒ no “rigorous” theory of sampling variability

Information theory may provide new tools withbetter sampling propertiesbetter theoretical basis












better sampling properties

better theoretical basis








ConclusionsTraditional PCA optimal for dimensionalityreduction only if data distribution falls alongorthogonal axes

Can define nonlinear generalisation, NLPCA, whichcan robustly characterise nonlinear low-dimensionalstructure in datasetsNLPCA approximations can provide afundamentally different characterisation of data thanPCA approximationsImplementation of NLPCA difficult and lacking inunderlying theory; represents a first attempt at a big(and challenging) problem


ConclusionsTraditional PCA optimal for dimensionalityreduction only if data distribution falls alongorthogonal axesCan define nonlinear generalisation, NLPCA, whichcan robustly characterise nonlinear low-dimensionalstructure in datasets

NLPCA approximations can provide afundamentally different characterisation of data thanPCA approximationsImplementation of NLPCA difficult and lacking inunderlying theory; represents a first attempt at a big(and challenging) problem


ConclusionsTraditional PCA optimal for dimensionalityreduction only if data distribution falls alongorthogonal axesCan define nonlinear generalisation, NLPCA, whichcan robustly characterise nonlinear low-dimensionalstructure in datasetsNLPCA approximations can provide afundamentally different characterisation of data thanPCA approximations

Implementation of NLPCA difficult and lacking inunderlying theory; represents a first attempt at a big(and challenging) problem


ConclusionsTraditional PCA optimal for dimensionalityreduction only if data distribution falls alongorthogonal axesCan define nonlinear generalisation, NLPCA, whichcan robustly characterise nonlinear low-dimensionalstructure in datasetsNLPCA approximations can provide afundamentally different characterisation of data thanPCA approximationsImplementation of NLPCA difficult and lacking inunderlying theory; represents a first attempt at a big(and challenging) problem


Acknowledgements

William Hsieh (UBC)Lionel Pandolfo (UBC)John Fyfe (CCCma)Qiaobin Teng (CCCma)Benyang Tang (JPL)


Parameter Estimation in NLPCAAn ensemble approach was taken

For a large number N (∼ 50) of trials:data was randomly split into training andvalidation sets (taking autocorrelation intoaccount)a random initial parameter set was selected

For each ensemble member, iterative minimisationprocedure carried out until either:

error over training data stopped changingerror over validation data started increasing

Method does not look for global error minimum


Parameter Estimation in NLPCAAn ensemble approach was takenFor a large number N (∼ 50) of trials:

data was randomly split into training andvalidation sets (taking autocorrelation intoaccount)a random initial parameter set was selected






data was randomly split into training andvalidation sets (taking autocorrelation intoaccount)

a random initial parameter set was selected




















error over training data stopped changing

error over validation data started increasing















Parameter Estimation in NLPCAEnsemble member becomes candidate model if

⟨||ε||2

⟩validation

≤⟨||ε||2

⟩training

Candidate models comparedif they share same shape and orientation

⇒ approximation is robustif they differ in shape and orientation

⇒ approximation is not robust

If approximation not robust, model simplified &procedure repeated until robust model found



⟨||ε||2

⟩validation

≤⟨||ε||2

⟩training

Candidate models compared

if they share same shape and orientation⇒ approximation is robust

if they differ in shape and orientation⇒ approximation is not robust




⟨||ε||2

⟩validation

≤⟨||ε||2

⟩training







⟨||ε||2

⟩validation

≤⟨||ε||2

⟩training


⇒ approximation is robust

if they differ in shape and orientation⇒ approximation is not robust




⟨||ε||2

⟩validation

≤⟨||ε||2

⟩training







⟨||ε||2

⟩validation

≤⟨||ε||2

⟩training







⟨||ε||2

⟩validation

≤⟨||ε||2

⟩training



⇒ approximation is not robustIf approximation not robust, model simplified &procedure repeated until robust model found


Parameter Estimation in NLPCAProcedure will ultimately yield PCA solution if norobust non-Gaussian structure present

Such a careful procedure necessary to avoid findingspurious non-Gaussian structure


Parameter Estimation in NLPCAProcedure will ultimately yield PCA solution if norobust non-Gaussian structure presentSuch a careful procedure necessary to avoid findingspurious non-Gaussian structure


Applications of NLPCA: Tropical Pacific SST

EOF PatternsAn Introduction toNonlinearPrincipal Component Analysis – p. 31/33


1D NLPCA Approximation: spatial structure



1D NLPCA Approximation: spatial structure


Date post:	04-Jul-2020
Category:	Documents
Upload:	others
View:	7 times
Download:	0 times

An Introduction to Nonlinear Principal Component AnalysisAn Introduction to Nonlinear Principal...

Documents