+ All Categories
Home > Documents > VecTOR PROJECTionS

VecTOR PROJECTionS

Date post: 05-Feb-2016
Category:
Upload: jesus
View: 42 times
Download: 0 times
Share this document with a friend
Description:
VecTOR PROJECTionS. 90˚. Matrix Operation: Inverse MAtrix. Important for solving a set of linear equations, is the matrix operation that d efines an inverse of a matrix. X -1 : Inverse matrix of X. X -1 X = I - PowerPoint PPT Presentation
23
VECTOR PROJECTIONS 90˚ , , =
Transcript
Page 1: VecTOR PROJECTionS

VECTOR PROJECTIONS

��

𝑦

90˚𝐿 �� , ��

𝐿𝑦 , 𝑥=�� ��‖��‖

Page 2: VecTOR PROJECTionS

MATRIX OPERATION: INVERSE MATRIX

Important for solving a set of linear equations, is the matrix operation that defines an inverse of a matrix.

X-1 : Inverse matrix of XX-1 X = I

where I is the identity matrix:all entries on the diagonal are 1,

all others 0

( here for 3 x 3 matrix)

Page 3: VecTOR PROJECTionS

MATRIX OPERATION:Important for solving a set of linear equations, is the matrix operation that defines an inverse of a matrix.

X-1 : Inverse matrix of X

Not all matrices have an inverse matrixand there is not a simple rule how to calculate the entries in an inverse matrix!

We skip the formal mathematical aspects and note here only the important facts:

For symmetric square matrices like covariance matrices or correlation matricesthe inverse exists

X-1 X = I where I is the identity matrix

Page 4: VecTOR PROJECTionS

SUMMARYSimple Linear Regression Principal Component Analysis

Page 5: VecTOR PROJECTionS

SUMMARY2-dimensional sample space:Simple Linear Regression:Minimizes the Summed Squared Errors (measured in the vertical direction between Fitted regression line and observed data points)

Principal Component Analysis:Finds the direction of vector that maximizes the variance that is projecting onto this vector.

Page 6: VecTOR PROJECTionS

REGRESSION ANALYSIS IN RSimple linear regression in R:

the function res<-lm( y ~ x ) calculates the linear regression lineIt returns a number of useful additional statistical measures of the quality of theregression line.

Page 7: VecTOR PROJECTionS

Regression line using res$fitted

Page 8: VecTOR PROJECTionS

Residuals (errors) res$residuals

Remember: We assumed that errorsare uncorrelated to the ‘predictor’ variable x. It is recommended to checkthat the errors itself do NOT have an organized structure when plotted over x.

Page 9: VecTOR PROJECTionS

Histogram of residuals (errors) hist(res$residuals)

Remember: We assumed that errorsare uncorrelated to the ‘predictor’ variable x. It is recommended to checkalso if the errors follow a Gaussian (bell-shaped) distribution.

Note: the function fgauss() is defined in myfunctions.R [call source(“scripts/myfunctions.R”)]

Page 10: VecTOR PROJECTionS

LINEAR REGRESSION STATISTICSWhen applying linear regression, a number of test statistics arecalculated in R’s lm() function.

RegressionParameter (slope)

Slope of regression line

Statisticalsignificance:The smaller thevalue, the higherthe significanceof the linear relationship(slope >0)

Correlation coefficient between the fitted y-values and observed y-values

Page 11: VecTOR PROJECTionS

LINEAR REGRESSION:USE THE LINEAR REGRESSION WITH CAUTION!

Outliers can have a large effectand suggest a linear relationshipwhere there is none! It can be tested for the influenceof single outlier observations.

The sample space is important!If you only observed x and y in a limited range or a subdomain of the sample space,

Page 12: VecTOR PROJECTionS

LINEAR REGRESSION:THE DANGER OF USING THE LINEAR REGRESSION!

Outliers can have a large effectand suggest a linear relationshipwhere there is none! It can be tested for the influenceof single outlier observations.

The sample space is important!If you only observed x and y in a limited range or a subdomain of the sample space, extrapolation can give misleading results

Page 13: VecTOR PROJECTionS

MULTIPLE LINEAR REGRESSION

Source: http://reliawiki.org/index.php/Multiple_Linear_Regression_Analysis (figures retrieved April 2014)

Predictand (e.g. Albany Airport Temperature anomalies)Predictors:

e.g.: Temperatures from nearby stationsor: Indices of Large-Scale Climate Modeslike El Nino Southern Oscillation, North Atlantic Oscillationor: prescribed time-dependent functions like linear trend,periodic oscillation, polynoms

Random Error(noise)

Page 14: VecTOR PROJECTionS

MULTIPLE LINEAR REGRESSION

Source: http://reliawiki.org/index.php/Multiple_Linear_Regression_Analysis (figures retrieved April 2014)

Write a set oflinear equationsfor each observationin the sample (e.g.for each year of temperature observations

Page 15: VecTOR PROJECTionS

MULTIPLE LINEAR REGRESSION

Source: http://reliawiki.org/index.php/Multiple_Linear_Regression_Analysis (figures retrieved April 2014)

𝑦=𝑋 ��+ ��Or in short Matrix notation

Page 16: VecTOR PROJECTionS

MULTIPLE LINEAR REGRESSION

Source: http://reliawiki.org/index.php/Multiple_Linear_Regression_Analysis (figures retrieved April 2014)

𝑦=𝑋 ��+ ��

The mathematical problem we need to solve is:

Given all the observations of the predictand (stored in vector ) and thepredictor variables stored in matrix X, we want to find simultaneously a for each predictor variable a proper scaling factor, such that the fitted estimatedvalues minimize the sum of the squared errors.

size of the vectors / matrices: n x 1 n x k k x 1 n x 1

Page 17: VecTOR PROJECTionS

MULTIPLE LINEAR REGRESSION

Source: http://reliawiki.org/index.php/Multiple_Linear_Regression_Analysis (figures retrieved April 2014)

𝑦=𝑋 ��+ ��

𝑦=𝑋 𝛽

𝛽= (𝑋𝑇 𝑋 )−1 𝑋𝑇 ��size of the vectors / matrices: k x 1 ( k x n n x k ) k x n n x 1 (k x k) (k x 1)

We find here the covariancematrix (scaled by n) of the predictor variables.The ‘-1’ indicatesanother fundamentally important matrix operation:The inverse of a matrix

Covariance(scaled by n)of all predictorswith the predictand

Page 18: VecTOR PROJECTionS

MULTIPLE LINEAR REGRESSION

Source: http://reliawiki.org/index.php/Multiple_Linear_Regression_Analysis (figures retrieved April 2014)

𝑦=𝑋 ��+ ��

𝑦=𝑋 𝛽

𝛽= (𝑋𝑇 𝑋 )−1 𝑋𝑇 ��size of the vectors / matrices: k x 1 ( k x n n x k ) k x n n x 1 (k x k) (kx1)

The resulting kx1 matrix (i.e. vector) contains a proper scaling factorfor each predictor.In other words: multiple linearregression is a weighted sumof the predictors (after conversion into units of the predictand y).

Page 19: VecTOR PROJECTionS

EXAMPLE MULTIPLE LINEAR REGRESSIONWITH 2 PREDICTORS

The scatter cloud shows a lineardependence of the values in y along the two predictordimensions x1 x2.

Page 20: VecTOR PROJECTionS

TIPS FOR MULTIPLE LINEAR REGRESSION (MLR)

General rule: work with as few predictors as possible. (every time you add a new predictoryou increase the risk of over-fitting the model)

Observe how good the fitted values and observed values match (correlation)

Choose predictors that provide independent information about the predictand

The problem of collinearity: If the predictors are all highly correlated among each otherthen the MLR can become very ambiguous (because it gets harder to calculate accurately the inverse of the covariance matrix)

Last but not least: the regression coefficients from the MLR are not ‘unique’. If you add or remove one predictor, all regression coefficients can change.

Page 21: VecTOR PROJECTionS

PRINCIPAL COMPONENT ANALYSIS Global Sea Surface Temperatures

From voluntary ship observationscolors show the percentage of monthswith at least one observation in a2 by 2 degree grid box.

From paper in Annual Review of Marine Science (2010)

Page 22: VecTOR PROJECTionS

PRINCIPAL COMPONENT ANALYSIS Global Sea Surface Temperatures

Climatology 1982-2008

Red areas mark regions with highestSST variability

Page 23: VecTOR PROJECTionS

PRINCIPAL COMPONENT ANALYSIS Global Sea Surface Temperatures

Principal Component Analysis (PCA)

(Empirical Orthogonal Functions (EOF))

The first leading Eigenvector Eigenvectors form nowgeographic pattern. Grids with highpositive values and large negative values are covarying out of phase (negative correlation). Green regionsshow small variations in this Eigenvector #1.

The Principal Component is a time series showing the temporal evolution of the SST variations. This mode is associated with the El Niño - Southern Oscillation


Recommended