Post on 04-Jul-2020
transcript
An Introduction toNonlinear
Principal Component AnalysisAdam Monahanmonahana@uvic.ca
School of Earth and Ocean SciencesUniversity of Victoria
An Introduction toNonlinearPrincipal Component Analysis – p. 1/33
OverviewDimensionality reduction
Principal Component AnalysisNonlinear PCA
theoryimplementation
Applications of NLPCALorenz attractorNH Tropospheric LFV
Conclusions
An Introduction toNonlinearPrincipal Component Analysis – p. 2/33
OverviewDimensionality reductionPrincipal Component Analysis
Nonlinear PCAtheoryimplementation
Applications of NLPCALorenz attractorNH Tropospheric LFV
Conclusions
An Introduction toNonlinearPrincipal Component Analysis – p. 2/33
OverviewDimensionality reductionPrincipal Component AnalysisNonlinear PCA
theoryimplementation
Applications of NLPCALorenz attractorNH Tropospheric LFV
Conclusions
An Introduction toNonlinearPrincipal Component Analysis – p. 2/33
OverviewDimensionality reductionPrincipal Component AnalysisNonlinear PCA
theory
implementation
Applications of NLPCALorenz attractorNH Tropospheric LFV
Conclusions
An Introduction toNonlinearPrincipal Component Analysis – p. 2/33
OverviewDimensionality reductionPrincipal Component AnalysisNonlinear PCA
theoryimplementation
Applications of NLPCALorenz attractorNH Tropospheric LFV
Conclusions
An Introduction toNonlinearPrincipal Component Analysis – p. 2/33
OverviewDimensionality reductionPrincipal Component AnalysisNonlinear PCA
theoryimplementation
Applications of NLPCA
Lorenz attractorNH Tropospheric LFV
Conclusions
An Introduction toNonlinearPrincipal Component Analysis – p. 2/33
OverviewDimensionality reductionPrincipal Component AnalysisNonlinear PCA
theoryimplementation
Applications of NLPCALorenz attractor
NH Tropospheric LFV
Conclusions
An Introduction toNonlinearPrincipal Component Analysis – p. 2/33
OverviewDimensionality reductionPrincipal Component AnalysisNonlinear PCA
theoryimplementation
Applications of NLPCALorenz attractorNH Tropospheric LFV
Conclusions
An Introduction toNonlinearPrincipal Component Analysis – p. 2/33
OverviewDimensionality reductionPrincipal Component AnalysisNonlinear PCA
theoryimplementation
Applications of NLPCALorenz attractorNH Tropospheric LFV
Conclusions
An Introduction toNonlinearPrincipal Component Analysis – p. 2/33
Dimensionality Reduction
Climate datasets made up of time series at individualstations/geographical locations
Typical dataset has P ∼ O(103) time seriesOrganised structure in atmosphere/ocean flows
⇒ time series at different locations not independent
⇒ data does not fill out isotropic cloud of points in RP ,but clusters around lower-dimensional surface(reflecting the “attractor”)Goal of dimensionality reduction in climatediagnostics is to characterise such structures inclimate datasets
An Introduction toNonlinearPrincipal Component Analysis – p. 3/33
Dimensionality Reduction
Climate datasets made up of time series at individualstations/geographical locations
Typical dataset has P ∼ O(103) time series
Organised structure in atmosphere/ocean flows⇒ time series at different locations not independent
⇒ data does not fill out isotropic cloud of points in RP ,but clusters around lower-dimensional surface(reflecting the “attractor”)Goal of dimensionality reduction in climatediagnostics is to characterise such structures inclimate datasets
An Introduction toNonlinearPrincipal Component Analysis – p. 3/33
Dimensionality Reduction
Climate datasets made up of time series at individualstations/geographical locations
Typical dataset has P ∼ O(103) time seriesOrganised structure in atmosphere/ocean flows
⇒ time series at different locations not independent
⇒ data does not fill out isotropic cloud of points in RP ,but clusters around lower-dimensional surface(reflecting the “attractor”)Goal of dimensionality reduction in climatediagnostics is to characterise such structures inclimate datasets
An Introduction toNonlinearPrincipal Component Analysis – p. 3/33
Dimensionality Reduction
Climate datasets made up of time series at individualstations/geographical locations
Typical dataset has P ∼ O(103) time seriesOrganised structure in atmosphere/ocean flows
⇒ time series at different locations not independent
⇒ data does not fill out isotropic cloud of points in RP ,but clusters around lower-dimensional surface(reflecting the “attractor”)Goal of dimensionality reduction in climatediagnostics is to characterise such structures inclimate datasets
An Introduction toNonlinearPrincipal Component Analysis – p. 3/33
Dimensionality Reduction
Climate datasets made up of time series at individualstations/geographical locations
Typical dataset has P ∼ O(103) time seriesOrganised structure in atmosphere/ocean flows
⇒ time series at different locations not independent
⇒ data does not fill out isotropic cloud of points in RP ,but clusters around lower-dimensional surface(reflecting the “attractor”)
Goal of dimensionality reduction in climatediagnostics is to characterise such structures inclimate datasets
An Introduction toNonlinearPrincipal Component Analysis – p. 3/33
Dimensionality Reduction
Climate datasets made up of time series at individualstations/geographical locations
Typical dataset has P ∼ O(103) time seriesOrganised structure in atmosphere/ocean flows
⇒ time series at different locations not independent
⇒ data does not fill out isotropic cloud of points in RP ,but clusters around lower-dimensional surface(reflecting the “attractor”)Goal of dimensionality reduction in climatediagnostics is to characterise such structures inclimate datasets
An Introduction toNonlinearPrincipal Component Analysis – p. 3/33
Dimensionality Reduction
Realising this goal has both theoretical and practicaldifficulties:
Theoretical:what is the precise definition of “structure”?how to formulate appropriate statistical model?
Practical:many important observational climate datasetsquite short, with O(10)−O(1000) statisticaldegrees of freedomwhat degree of “structure” can be robustlydiagnosed with existing data?
An Introduction toNonlinearPrincipal Component Analysis – p. 4/33
Dimensionality Reduction
Realising this goal has both theoretical and practicaldifficulties:Theoretical:
what is the precise definition of “structure”?how to formulate appropriate statistical model?
Practical:many important observational climate datasetsquite short, with O(10)−O(1000) statisticaldegrees of freedomwhat degree of “structure” can be robustlydiagnosed with existing data?
An Introduction toNonlinearPrincipal Component Analysis – p. 4/33
Dimensionality Reduction
Realising this goal has both theoretical and practicaldifficulties:Theoretical:
what is the precise definition of “structure”?
how to formulate appropriate statistical model?
Practical:many important observational climate datasetsquite short, with O(10)−O(1000) statisticaldegrees of freedomwhat degree of “structure” can be robustlydiagnosed with existing data?
An Introduction toNonlinearPrincipal Component Analysis – p. 4/33
Dimensionality Reduction
Realising this goal has both theoretical and practicaldifficulties:Theoretical:
what is the precise definition of “structure”?how to formulate appropriate statistical model?
Practical:many important observational climate datasetsquite short, with O(10)−O(1000) statisticaldegrees of freedomwhat degree of “structure” can be robustlydiagnosed with existing data?
An Introduction toNonlinearPrincipal Component Analysis – p. 4/33
Dimensionality Reduction
Realising this goal has both theoretical and practicaldifficulties:Theoretical:
what is the precise definition of “structure”?how to formulate appropriate statistical model?
Practical:
many important observational climate datasetsquite short, with O(10)−O(1000) statisticaldegrees of freedomwhat degree of “structure” can be robustlydiagnosed with existing data?
An Introduction toNonlinearPrincipal Component Analysis – p. 4/33
Dimensionality Reduction
Realising this goal has both theoretical and practicaldifficulties:Theoretical:
what is the precise definition of “structure”?how to formulate appropriate statistical model?
Practical:many important observational climate datasetsquite short, with O(10)−O(1000) statisticaldegrees of freedom
what degree of “structure” can be robustlydiagnosed with existing data?
An Introduction toNonlinearPrincipal Component Analysis – p. 4/33
Dimensionality Reduction
Realising this goal has both theoretical and practicaldifficulties:Theoretical:
what is the precise definition of “structure”?how to formulate appropriate statistical model?
Practical:many important observational climate datasetsquite short, with O(10)−O(1000) statisticaldegrees of freedomwhat degree of “structure” can be robustlydiagnosed with existing data?
An Introduction toNonlinearPrincipal Component Analysis – p. 4/33
Principal Component Analysis
A classical approach to dimensionalityprincipal component analysis (PCA)
Look for M -dimensional hyperplane approximation, optimalin least-squares sense
X(t) =M∑
k=1
〈X(t), ek〉 ek + ε(t)
minimising E {||ε2||}inner product often (not always) simple dot product
Vectors ek are the empirical orthogonal functions (EOFs)
An Introduction toNonlinearPrincipal Component Analysis – p. 5/33
Principal Component Analysis
A classical approach to dimensionalityprincipal component analysis (PCA)
Look for M -dimensional hyperplane approximation, optimalin least-squares sense
X(t) =M∑
k=1
〈X(t), ek〉 ek + ε(t)
minimising E {||ε2||}inner product often (not always) simple dot product
Vectors ek are the empirical orthogonal functions (EOFs)
An Introduction toNonlinearPrincipal Component Analysis – p. 5/33
Principal Component Analysis
A classical approach to dimensionalityprincipal component analysis (PCA)
Look for M -dimensional hyperplane approximation, optimalin least-squares sense
X(t) =M∑
k=1
〈X(t), ek〉 ek + ε(t)
minimising E {||ε2||}
inner product often (not always) simple dot product
Vectors ek are the empirical orthogonal functions (EOFs)
An Introduction toNonlinearPrincipal Component Analysis – p. 5/33
Principal Component Analysis
A classical approach to dimensionalityprincipal component analysis (PCA)
Look for M -dimensional hyperplane approximation, optimalin least-squares sense
X(t) =M∑
k=1
〈X(t), ek〉 ek + ε(t)
minimising E {||ε2||}inner product often (not always) simple dot product
Vectors ek are the empirical orthogonal functions (EOFs)
An Introduction toNonlinearPrincipal Component Analysis – p. 5/33
Principal Component Analysis
A classical approach to dimensionalityprincipal component analysis (PCA)
Look for M -dimensional hyperplane approximation, optimalin least-squares sense
X(t) =M∑
k=1
〈X(t), ek〉 ek + ε(t)
minimising E {||ε2||}inner product often (not always) simple dot product
Vectors ek are the empirical orthogonal functions (EOFs)
An Introduction toNonlinearPrincipal Component Analysis – p. 5/33
Principal Component Analysis
An Introduction toNonlinearPrincipal Component Analysis – p. 6/33
Principal Component Analysis
Operationally, EOFs are found as eigenvectors ofcovariance matrix (in appropriate norm)
PCA optimally efficient characterisation of GaussiandataMore generally: PCA provides optimallyparsimonious data compression for any datasetwhose distribution lies along orthogonal axesBut what if the underlying low-dimensionalstructure is curved rather than straight?(cigars vs. bananas)
An Introduction toNonlinearPrincipal Component Analysis – p. 7/33
Principal Component Analysis
Operationally, EOFs are found as eigenvectors ofcovariance matrix (in appropriate norm)PCA optimally efficient characterisation of Gaussiandata
More generally: PCA provides optimallyparsimonious data compression for any datasetwhose distribution lies along orthogonal axesBut what if the underlying low-dimensionalstructure is curved rather than straight?(cigars vs. bananas)
An Introduction toNonlinearPrincipal Component Analysis – p. 7/33
Principal Component Analysis
Operationally, EOFs are found as eigenvectors ofcovariance matrix (in appropriate norm)PCA optimally efficient characterisation of GaussiandataMore generally: PCA provides optimallyparsimonious data compression for any datasetwhose distribution lies along orthogonal axes
But what if the underlying low-dimensionalstructure is curved rather than straight?(cigars vs. bananas)
An Introduction toNonlinearPrincipal Component Analysis – p. 7/33
Principal Component Analysis
Operationally, EOFs are found as eigenvectors ofcovariance matrix (in appropriate norm)PCA optimally efficient characterisation of GaussiandataMore generally: PCA provides optimallyparsimonious data compression for any datasetwhose distribution lies along orthogonal axesBut what if the underlying low-dimensionalstructure is curved rather than straight?(cigars vs. bananas)
An Introduction toNonlinearPrincipal Component Analysis – p. 7/33
Nonlinear Low-Dimensional Structure
An Introduction toNonlinearPrincipal Component Analysis – p. 8/33
Nonlinear PCAAn approach to diagnosing nonlinear low-dimensionalstructure is Nonlinear PCA (NLPCA)
Goal: find functions (with M < P )
sf : RP → RM , f : RM → RP
such thatX(t) = (f ◦ sf ) (X(t)) + ε(t)
where
E {||ε2||} is minimised
f(λ) ∼ approximation manifold
λ(t) = sf (X(t)) ∼ manifold parameterisation (time series)
An Introduction toNonlinearPrincipal Component Analysis – p. 9/33
Nonlinear PCAAn approach to diagnosing nonlinear low-dimensionalstructure is Nonlinear PCA (NLPCA)
Goal: find functions (with M < P )
sf : RP → RM , f : RM → RP
such thatX(t) = (f ◦ sf ) (X(t)) + ε(t)
where
E {||ε2||} is minimised
f(λ) ∼ approximation manifold
λ(t) = sf (X(t)) ∼ manifold parameterisation (time series)
An Introduction toNonlinearPrincipal Component Analysis – p. 9/33
Nonlinear PCAAn approach to diagnosing nonlinear low-dimensionalstructure is Nonlinear PCA (NLPCA)
Goal: find functions (with M < P )
sf : RP → RM , f : RM → RP
such thatX(t) = (f ◦ sf ) (X(t)) + ε(t)
where
E {||ε2||} is minimised
f(λ) ∼ approximation manifold
λ(t) = sf (X(t)) ∼ manifold parameterisation (time series)
An Introduction toNonlinearPrincipal Component Analysis – p. 9/33
Nonlinear PCAAn approach to diagnosing nonlinear low-dimensionalstructure is Nonlinear PCA (NLPCA)
Goal: find functions (with M < P )
sf : RP → RM , f : RM → RP
such thatX(t) = (f ◦ sf ) (X(t)) + ε(t)
where
E {||ε2||} is minimised
f(λ) ∼ approximation manifold
λ(t) = sf (X(t)) ∼ manifold parameterisation (time series)
An Introduction toNonlinearPrincipal Component Analysis – p. 9/33
Nonlinear PCAAn approach to diagnosing nonlinear low-dimensionalstructure is Nonlinear PCA (NLPCA)
Goal: find functions (with M < P )
sf : RP → RM , f : RM → RP
such thatX(t) = (f ◦ sf ) (X(t)) + ε(t)
where
E {||ε2||} is minimised
f(λ) ∼ approximation manifold
λ(t) = sf (X(t)) ∼ manifold parameterisation (time series)
An Introduction toNonlinearPrincipal Component Analysis – p. 9/33
Nonlinear PCA
s f
f
tλ
X(t)
λ(t) = s f( X(t))
X(t) = ( f o s f)( X(t))
^
From Monahan, Fyfe, and Pandolfo (2003)
An Introduction toNonlinearPrincipal Component Analysis – p. 10/33
Nonlinear PCAAs with PCA, “fraction of variance explained” is ameasure of quality of approximation
PCA is a special case of NLPCAWhen implemented, NLPCA should reduce to PCAif:
data is Gaussiannot enough data is available to robustlycharacterise non-Gaussian structure
An Introduction toNonlinearPrincipal Component Analysis – p. 11/33
Nonlinear PCAAs with PCA, “fraction of variance explained” is ameasure of quality of approximationPCA is a special case of NLPCA
When implemented, NLPCA should reduce to PCAif:
data is Gaussiannot enough data is available to robustlycharacterise non-Gaussian structure
An Introduction toNonlinearPrincipal Component Analysis – p. 11/33
Nonlinear PCAAs with PCA, “fraction of variance explained” is ameasure of quality of approximationPCA is a special case of NLPCAWhen implemented, NLPCA should reduce to PCAif:
data is Gaussiannot enough data is available to robustlycharacterise non-Gaussian structure
An Introduction toNonlinearPrincipal Component Analysis – p. 11/33
Nonlinear PCAAs with PCA, “fraction of variance explained” is ameasure of quality of approximationPCA is a special case of NLPCAWhen implemented, NLPCA should reduce to PCAif:
data is Gaussian
not enough data is available to robustlycharacterise non-Gaussian structure
An Introduction toNonlinearPrincipal Component Analysis – p. 11/33
Nonlinear PCAAs with PCA, “fraction of variance explained” is ameasure of quality of approximationPCA is a special case of NLPCAWhen implemented, NLPCA should reduce to PCAif:
data is Gaussiannot enough data is available to robustlycharacterise non-Gaussian structure
An Introduction toNonlinearPrincipal Component Analysis – p. 11/33
NLPCA: Implementation
Implemented NLPCA using neural networks(convenient, not necessary)
Parameter estimation more difficult than for PCAPCA model is linear in statistical parameters:
Y = MX
so variational problem has unique analytic solutionNLPCA model nonlinear in model parameters, sosolution
may not be uniquemust be found through numerical minimisation
An Introduction toNonlinearPrincipal Component Analysis – p. 12/33
NLPCA: Implementation
Implemented NLPCA using neural networks(convenient, not necessary)Parameter estimation more difficult than for PCA
PCA model is linear in statistical parameters:
Y = MX
so variational problem has unique analytic solutionNLPCA model nonlinear in model parameters, sosolution
may not be uniquemust be found through numerical minimisation
An Introduction toNonlinearPrincipal Component Analysis – p. 12/33
NLPCA: Implementation
Implemented NLPCA using neural networks(convenient, not necessary)Parameter estimation more difficult than for PCAPCA model is linear in statistical parameters:
Y = MX
so variational problem has unique analytic solution
NLPCA model nonlinear in model parameters, sosolution
may not be uniquemust be found through numerical minimisation
An Introduction toNonlinearPrincipal Component Analysis – p. 12/33
NLPCA: Implementation
Implemented NLPCA using neural networks(convenient, not necessary)Parameter estimation more difficult than for PCAPCA model is linear in statistical parameters:
Y = MX
so variational problem has unique analytic solutionNLPCA model nonlinear in model parameters, sosolution
may not be uniquemust be found through numerical minimisation
An Introduction toNonlinearPrincipal Component Analysis – p. 12/33
NLPCA: Implementation
Implemented NLPCA using neural networks(convenient, not necessary)Parameter estimation more difficult than for PCAPCA model is linear in statistical parameters:
Y = MX
so variational problem has unique analytic solutionNLPCA model nonlinear in model parameters, sosolution
may not be unique
must be found through numerical minimisation
An Introduction toNonlinearPrincipal Component Analysis – p. 12/33
NLPCA: Implementation
Implemented NLPCA using neural networks(convenient, not necessary)Parameter estimation more difficult than for PCAPCA model is linear in statistical parameters:
Y = MX
so variational problem has unique analytic solutionNLPCA model nonlinear in model parameters, sosolution
may not be uniquemust be found through numerical minimisation
An Introduction toNonlinearPrincipal Component Analysis – p. 12/33
NLPCA: Parameter EstimationTwo fundamental issues regarding parameterestimation common to all statistical models:
Reproducibilitymodel must be robust to the introduction of newdatanew observations shouldn’t fundamentallychange model
Classifiability:model must be robust to details of optimisationproceduremodel shouldn’t depend on initial parametervalues
An Introduction toNonlinearPrincipal Component Analysis – p. 13/33
NLPCA: Parameter EstimationTwo fundamental issues regarding parameterestimation common to all statistical models:Reproducibility
model must be robust to the introduction of newdatanew observations shouldn’t fundamentallychange model
Classifiability:model must be robust to details of optimisationproceduremodel shouldn’t depend on initial parametervalues
An Introduction toNonlinearPrincipal Component Analysis – p. 13/33
NLPCA: Parameter EstimationTwo fundamental issues regarding parameterestimation common to all statistical models:Reproducibility
model must be robust to the introduction of newdata
new observations shouldn’t fundamentallychange model
Classifiability:model must be robust to details of optimisationproceduremodel shouldn’t depend on initial parametervalues
An Introduction toNonlinearPrincipal Component Analysis – p. 13/33
NLPCA: Parameter EstimationTwo fundamental issues regarding parameterestimation common to all statistical models:Reproducibility
model must be robust to the introduction of newdatanew observations shouldn’t fundamentallychange model
Classifiability:model must be robust to details of optimisationproceduremodel shouldn’t depend on initial parametervalues
An Introduction toNonlinearPrincipal Component Analysis – p. 13/33
NLPCA: Parameter EstimationTwo fundamental issues regarding parameterestimation common to all statistical models:Reproducibility
model must be robust to the introduction of newdatanew observations shouldn’t fundamentallychange model
Classifiability:
model must be robust to details of optimisationproceduremodel shouldn’t depend on initial parametervalues
An Introduction toNonlinearPrincipal Component Analysis – p. 13/33
NLPCA: Parameter EstimationTwo fundamental issues regarding parameterestimation common to all statistical models:Reproducibility
model must be robust to the introduction of newdatanew observations shouldn’t fundamentallychange model
Classifiability:model must be robust to details of optimisationprocedure
model shouldn’t depend on initial parametervalues
An Introduction toNonlinearPrincipal Component Analysis – p. 13/33
NLPCA: Parameter EstimationTwo fundamental issues regarding parameterestimation common to all statistical models:Reproducibility
model must be robust to the introduction of newdatanew observations shouldn’t fundamentallychange model
Classifiability:model must be robust to details of optimisationproceduremodel shouldn’t depend on initial parametervalues
An Introduction toNonlinearPrincipal Component Analysis – p. 13/33
NLPCA: Synthetic Gaussian Data
Synthetic Gaussian dataAn Introduction toNonlinearPrincipal Component Analysis – p. 14/33
Applications of NLPCA: Lorenz Attractor
ScatterplotsAn Introduction toNonlinearPrincipal Component Analysis – p. 15/33
Applications of NLPCA: Lorenz Attractor
1D PCA approximation (60%)An Introduction toNonlinearPrincipal Component Analysis – p. 16/33
Applications of NLPCA: Lorenz Attractor
1D NLPCA approximation (76%)An Introduction toNonlinearPrincipal Component Analysis – p. 17/33
Applications of NLPCA: Lorenz Attractor
2D PCA approximation (94%)An Introduction toNonlinearPrincipal Component Analysis – p. 18/33
Applications of NLPCA: Lorenz Attractor
2D NLPCA approximation (97%)An Introduction toNonlinearPrincipal Component Analysis – p. 19/33
Applications of NLPCA: NH Tropospheric LFV
EOF1
EOF2
10-day lowpass-filtered 500 hPa geopotential height EOFs
An Introduction toNonlinearPrincipal Component Analysis – p. 20/33
Applications of NLPCA: NH Tropospheric LFV
1D NLPCA Approximation: spatial structure(PCA: 14.8%; NLPCA 18.4%)
An Introduction toNonlinearPrincipal Component Analysis – p. 21/33
Applications of NLPCA: NH Tropospheric LFV
1D NLPCA Approximation: pdf of time seriesAn Introduction toNonlinearPrincipal Component Analysis – p. 22/33
Applications of NLPCA: NH Tropospheric LFV
1D NLPCA Approximation: regime maps
An Introduction toNonlinearPrincipal Component Analysis – p. 23/33
Applications of NLPCA: NH Tropospheric LFV
1D NLPCA Approximation: interannual variabilityAn Introduction toNonlinearPrincipal Component Analysis – p. 24/33
NLPCA: Limitations and DrawbacksParameter estimation in NLPCA (as in any nonlinearstatistical model) must be done very carefully toensure robust approximation
⇒ analysis time-consuming, data hungry⇒ insufficiently careful analysis leads to spurious
results (e.g. Christiansen, 2005)Theoretical underpinning of NLPCA is weak
⇒ no “rigorous” theory of sampling variabilityInformation theory may provide new tools with
better sampling propertiesbetter theoretical basis
An Introduction toNonlinearPrincipal Component Analysis – p. 25/33
NLPCA: Limitations and DrawbacksParameter estimation in NLPCA (as in any nonlinearstatistical model) must be done very carefully toensure robust approximation
⇒ analysis time-consuming, data hungry
⇒ insufficiently careful analysis leads to spuriousresults (e.g. Christiansen, 2005)Theoretical underpinning of NLPCA is weak
⇒ no “rigorous” theory of sampling variabilityInformation theory may provide new tools with
better sampling propertiesbetter theoretical basis
An Introduction toNonlinearPrincipal Component Analysis – p. 25/33
NLPCA: Limitations and DrawbacksParameter estimation in NLPCA (as in any nonlinearstatistical model) must be done very carefully toensure robust approximation
⇒ analysis time-consuming, data hungry⇒ insufficiently careful analysis leads to spurious
results (e.g. Christiansen, 2005)
Theoretical underpinning of NLPCA is weak⇒ no “rigorous” theory of sampling variability
Information theory may provide new tools withbetter sampling propertiesbetter theoretical basis
An Introduction toNonlinearPrincipal Component Analysis – p. 25/33
NLPCA: Limitations and DrawbacksParameter estimation in NLPCA (as in any nonlinearstatistical model) must be done very carefully toensure robust approximation
⇒ analysis time-consuming, data hungry⇒ insufficiently careful analysis leads to spurious
results (e.g. Christiansen, 2005)Theoretical underpinning of NLPCA is weak
⇒ no “rigorous” theory of sampling variabilityInformation theory may provide new tools with
better sampling propertiesbetter theoretical basis
An Introduction toNonlinearPrincipal Component Analysis – p. 25/33
NLPCA: Limitations and DrawbacksParameter estimation in NLPCA (as in any nonlinearstatistical model) must be done very carefully toensure robust approximation
⇒ analysis time-consuming, data hungry⇒ insufficiently careful analysis leads to spurious
results (e.g. Christiansen, 2005)Theoretical underpinning of NLPCA is weak
⇒ no “rigorous” theory of sampling variability
Information theory may provide new tools withbetter sampling propertiesbetter theoretical basis
An Introduction toNonlinearPrincipal Component Analysis – p. 25/33
NLPCA: Limitations and DrawbacksParameter estimation in NLPCA (as in any nonlinearstatistical model) must be done very carefully toensure robust approximation
⇒ analysis time-consuming, data hungry⇒ insufficiently careful analysis leads to spurious
results (e.g. Christiansen, 2005)Theoretical underpinning of NLPCA is weak
⇒ no “rigorous” theory of sampling variabilityInformation theory may provide new tools with
better sampling propertiesbetter theoretical basis
An Introduction toNonlinearPrincipal Component Analysis – p. 25/33
NLPCA: Limitations and DrawbacksParameter estimation in NLPCA (as in any nonlinearstatistical model) must be done very carefully toensure robust approximation
⇒ analysis time-consuming, data hungry⇒ insufficiently careful analysis leads to spurious
results (e.g. Christiansen, 2005)Theoretical underpinning of NLPCA is weak
⇒ no “rigorous” theory of sampling variabilityInformation theory may provide new tools with
better sampling properties
better theoretical basis
An Introduction toNonlinearPrincipal Component Analysis – p. 25/33
NLPCA: Limitations and DrawbacksParameter estimation in NLPCA (as in any nonlinearstatistical model) must be done very carefully toensure robust approximation
⇒ analysis time-consuming, data hungry⇒ insufficiently careful analysis leads to spurious
results (e.g. Christiansen, 2005)Theoretical underpinning of NLPCA is weak
⇒ no “rigorous” theory of sampling variabilityInformation theory may provide new tools with
better sampling propertiesbetter theoretical basis
An Introduction toNonlinearPrincipal Component Analysis – p. 25/33
ConclusionsTraditional PCA optimal for dimensionalityreduction only if data distribution falls alongorthogonal axes
Can define nonlinear generalisation, NLPCA, whichcan robustly characterise nonlinear low-dimensionalstructure in datasetsNLPCA approximations can provide afundamentally different characterisation of data thanPCA approximationsImplementation of NLPCA difficult and lacking inunderlying theory; represents a first attempt at a big(and challenging) problem
An Introduction toNonlinearPrincipal Component Analysis – p. 26/33
ConclusionsTraditional PCA optimal for dimensionalityreduction only if data distribution falls alongorthogonal axesCan define nonlinear generalisation, NLPCA, whichcan robustly characterise nonlinear low-dimensionalstructure in datasets
NLPCA approximations can provide afundamentally different characterisation of data thanPCA approximationsImplementation of NLPCA difficult and lacking inunderlying theory; represents a first attempt at a big(and challenging) problem
An Introduction toNonlinearPrincipal Component Analysis – p. 26/33
ConclusionsTraditional PCA optimal for dimensionalityreduction only if data distribution falls alongorthogonal axesCan define nonlinear generalisation, NLPCA, whichcan robustly characterise nonlinear low-dimensionalstructure in datasetsNLPCA approximations can provide afundamentally different characterisation of data thanPCA approximations
Implementation of NLPCA difficult and lacking inunderlying theory; represents a first attempt at a big(and challenging) problem
An Introduction toNonlinearPrincipal Component Analysis – p. 26/33
ConclusionsTraditional PCA optimal for dimensionalityreduction only if data distribution falls alongorthogonal axesCan define nonlinear generalisation, NLPCA, whichcan robustly characterise nonlinear low-dimensionalstructure in datasetsNLPCA approximations can provide afundamentally different characterisation of data thanPCA approximationsImplementation of NLPCA difficult and lacking inunderlying theory; represents a first attempt at a big(and challenging) problem
An Introduction toNonlinearPrincipal Component Analysis – p. 26/33
Acknowledgements
William Hsieh (UBC)Lionel Pandolfo (UBC)John Fyfe (CCCma)Qiaobin Teng (CCCma)Benyang Tang (JPL)
An Introduction toNonlinearPrincipal Component Analysis – p. 27/33
Parameter Estimation in NLPCAAn ensemble approach was taken
For a large number N (∼ 50) of trials:data was randomly split into training andvalidation sets (taking autocorrelation intoaccount)a random initial parameter set was selected
For each ensemble member, iterative minimisationprocedure carried out until either:
error over training data stopped changingerror over validation data started increasing
Method does not look for global error minimum
An Introduction toNonlinearPrincipal Component Analysis – p. 28/33
Parameter Estimation in NLPCAAn ensemble approach was takenFor a large number N (∼ 50) of trials:
data was randomly split into training andvalidation sets (taking autocorrelation intoaccount)a random initial parameter set was selected
For each ensemble member, iterative minimisationprocedure carried out until either:
error over training data stopped changingerror over validation data started increasing
Method does not look for global error minimum
An Introduction toNonlinearPrincipal Component Analysis – p. 28/33
Parameter Estimation in NLPCAAn ensemble approach was takenFor a large number N (∼ 50) of trials:
data was randomly split into training andvalidation sets (taking autocorrelation intoaccount)
a random initial parameter set was selected
For each ensemble member, iterative minimisationprocedure carried out until either:
error over training data stopped changingerror over validation data started increasing
Method does not look for global error minimum
An Introduction toNonlinearPrincipal Component Analysis – p. 28/33
Parameter Estimation in NLPCAAn ensemble approach was takenFor a large number N (∼ 50) of trials:
data was randomly split into training andvalidation sets (taking autocorrelation intoaccount)a random initial parameter set was selected
For each ensemble member, iterative minimisationprocedure carried out until either:
error over training data stopped changingerror over validation data started increasing
Method does not look for global error minimum
An Introduction toNonlinearPrincipal Component Analysis – p. 28/33
Parameter Estimation in NLPCAAn ensemble approach was takenFor a large number N (∼ 50) of trials:
data was randomly split into training andvalidation sets (taking autocorrelation intoaccount)a random initial parameter set was selected
For each ensemble member, iterative minimisationprocedure carried out until either:
error over training data stopped changingerror over validation data started increasing
Method does not look for global error minimum
An Introduction toNonlinearPrincipal Component Analysis – p. 28/33
Parameter Estimation in NLPCAAn ensemble approach was takenFor a large number N (∼ 50) of trials:
data was randomly split into training andvalidation sets (taking autocorrelation intoaccount)a random initial parameter set was selected
For each ensemble member, iterative minimisationprocedure carried out until either:
error over training data stopped changing
error over validation data started increasing
Method does not look for global error minimum
An Introduction toNonlinearPrincipal Component Analysis – p. 28/33
Parameter Estimation in NLPCAAn ensemble approach was takenFor a large number N (∼ 50) of trials:
data was randomly split into training andvalidation sets (taking autocorrelation intoaccount)a random initial parameter set was selected
For each ensemble member, iterative minimisationprocedure carried out until either:
error over training data stopped changingerror over validation data started increasing
Method does not look for global error minimum
An Introduction toNonlinearPrincipal Component Analysis – p. 28/33
Parameter Estimation in NLPCAAn ensemble approach was takenFor a large number N (∼ 50) of trials:
data was randomly split into training andvalidation sets (taking autocorrelation intoaccount)a random initial parameter set was selected
For each ensemble member, iterative minimisationprocedure carried out until either:
error over training data stopped changingerror over validation data started increasing
Method does not look for global error minimum
An Introduction toNonlinearPrincipal Component Analysis – p. 28/33
Parameter Estimation in NLPCAEnsemble member becomes candidate model if
⟨||ε||2
⟩validation
≤⟨||ε||2
⟩training
Candidate models comparedif they share same shape and orientation
⇒ approximation is robustif they differ in shape and orientation
⇒ approximation is not robust
If approximation not robust, model simplified &procedure repeated until robust model found
An Introduction toNonlinearPrincipal Component Analysis – p. 29/33
Parameter Estimation in NLPCAEnsemble member becomes candidate model if
⟨||ε||2
⟩validation
≤⟨||ε||2
⟩training
Candidate models compared
if they share same shape and orientation⇒ approximation is robust
if they differ in shape and orientation⇒ approximation is not robust
If approximation not robust, model simplified &procedure repeated until robust model found
An Introduction toNonlinearPrincipal Component Analysis – p. 29/33
Parameter Estimation in NLPCAEnsemble member becomes candidate model if
⟨||ε||2
⟩validation
≤⟨||ε||2
⟩training
Candidate models comparedif they share same shape and orientation
⇒ approximation is robustif they differ in shape and orientation
⇒ approximation is not robust
If approximation not robust, model simplified &procedure repeated until robust model found
An Introduction toNonlinearPrincipal Component Analysis – p. 29/33
Parameter Estimation in NLPCAEnsemble member becomes candidate model if
⟨||ε||2
⟩validation
≤⟨||ε||2
⟩training
Candidate models comparedif they share same shape and orientation
⇒ approximation is robust
if they differ in shape and orientation⇒ approximation is not robust
If approximation not robust, model simplified &procedure repeated until robust model found
An Introduction toNonlinearPrincipal Component Analysis – p. 29/33
Parameter Estimation in NLPCAEnsemble member becomes candidate model if
⟨||ε||2
⟩validation
≤⟨||ε||2
⟩training
Candidate models comparedif they share same shape and orientation
⇒ approximation is robustif they differ in shape and orientation
⇒ approximation is not robust
If approximation not robust, model simplified &procedure repeated until robust model found
An Introduction toNonlinearPrincipal Component Analysis – p. 29/33
Parameter Estimation in NLPCAEnsemble member becomes candidate model if
⟨||ε||2
⟩validation
≤⟨||ε||2
⟩training
Candidate models comparedif they share same shape and orientation
⇒ approximation is robustif they differ in shape and orientation
⇒ approximation is not robust
If approximation not robust, model simplified &procedure repeated until robust model found
An Introduction toNonlinearPrincipal Component Analysis – p. 29/33
Parameter Estimation in NLPCAEnsemble member becomes candidate model if
⟨||ε||2
⟩validation
≤⟨||ε||2
⟩training
Candidate models comparedif they share same shape and orientation
⇒ approximation is robustif they differ in shape and orientation
⇒ approximation is not robustIf approximation not robust, model simplified &procedure repeated until robust model found
An Introduction toNonlinearPrincipal Component Analysis – p. 29/33
Parameter Estimation in NLPCAProcedure will ultimately yield PCA solution if norobust non-Gaussian structure present
Such a careful procedure necessary to avoid findingspurious non-Gaussian structure
An Introduction toNonlinearPrincipal Component Analysis – p. 30/33
Parameter Estimation in NLPCAProcedure will ultimately yield PCA solution if norobust non-Gaussian structure presentSuch a careful procedure necessary to avoid findingspurious non-Gaussian structure
An Introduction toNonlinearPrincipal Component Analysis – p. 30/33
Applications of NLPCA: Tropical Pacific SST
EOF PatternsAn Introduction toNonlinearPrincipal Component Analysis – p. 31/33
Applications of NLPCA: Tropical Pacific SST
1D NLPCA Approximation: spatial structure
An Introduction toNonlinearPrincipal Component Analysis – p. 32/33
Applications of NLPCA: Tropical Pacific SST
1D NLPCA Approximation: spatial structure
An Introduction toNonlinearPrincipal Component Analysis – p. 33/33