VecTOR PROJECTionS

VECTOR PROJECTIONS

��

𝑦

90˚𝐿 �� , ��

𝐿𝑦 , 𝑥=�� ‖��‖

MATRIX OPERATION: INVERSE MATRIX

Important for solving a set of linear equations, is the matrix operation that defines an inverse of a matrix.

X-1 : Inverse matrix of XX-1 X = I

where I is the identity matrix:all entries on the diagonal are 1,

all others 0

( here for 3 x 3 matrix)

MATRIX OPERATION:Important for solving a set of linear equations, is the matrix operation that defines an inverse of a matrix.

X-1 : Inverse matrix of X

Not all matrices have an inverse matrixand there is not a simple rule how to calculate the entries in an inverse matrix!

We skip the formal mathematical aspects and note here only the important facts:

For symmetric square matrices like covariance matrices or correlation matricesthe inverse exists

X-1 X = I where I is the identity matrix

SUMMARYSimple Linear Regression Principal Component Analysis

SUMMARY2-dimensional sample space:Simple Linear Regression:Minimizes the Summed Squared Errors (measured in the vertical direction between Fitted regression line and observed data points)

Principal Component Analysis:Finds the direction of vector that maximizes the variance that is projecting onto this vector.

REGRESSION ANALYSIS IN RSimple linear regression in R:

the function res<-lm( y ~ x ) calculates the linear regression lineIt returns a number of useful additional statistical measures of the quality of theregression line.

Regression line using res$fitted

Residuals (errors) res$residuals

Remember: We assumed that errorsare uncorrelated to the ‘predictor’ variable x. It is recommended to checkthat the errors itself do NOT have an organized structure when plotted over x.

Histogram of residuals (errors) hist(res$residuals)

Remember: We assumed that errorsare uncorrelated to the ‘predictor’ variable x. It is recommended to checkalso if the errors follow a Gaussian (bell-shaped) distribution.

Note: the function fgauss() is defined in myfunctions.R [call source(“scripts/myfunctions.R”)]

LINEAR REGRESSION STATISTICSWhen applying linear regression, a number of test statistics arecalculated in R’s lm() function.

RegressionParameter (slope)

Slope of regression line

Statisticalsignificance:The smaller thevalue, the higherthe significanceof the linear relationship(slope >0)

Correlation coefficient between the fitted y-values and observed y-values

LINEAR REGRESSION:USE THE LINEAR REGRESSION WITH CAUTION!

Outliers can have a large effectand suggest a linear relationshipwhere there is none! It can be tested for the influenceof single outlier observations.

The sample space is important!If you only observed x and y in a limited range or a subdomain of the sample space,

LINEAR REGRESSION:THE DANGER OF USING THE LINEAR REGRESSION!

Outliers can have a large effectand suggest a linear relationshipwhere there is none! It can be tested for the influenceof single outlier observations.

The sample space is important!If you only observed x and y in a limited range or a subdomain of the sample space, extrapolation can give misleading results

MULTIPLE LINEAR REGRESSION

Source: http://reliawiki.org/index.php/Multiple_Linear_Regression_Analysis (figures retrieved April 2014)

Predictand (e.g. Albany Airport Temperature anomalies)Predictors:

e.g.: Temperatures from nearby stationsor: Indices of Large-Scale Climate Modeslike El Nino Southern Oscillation, North Atlantic Oscillationor: prescribed time-dependent functions like linear trend,periodic oscillation, polynoms

Random Error(noise)

http://reliawiki.org/index.php/Multiple_Linear_Regression_Analysis





Write a set oflinear equationsfor each observationin the sample (e.g.for each year of temperature observations






𝑦=𝑋 ��+ ��Or in short Matrix notation






𝑦=𝑋 ��+ ��

The mathematical problem we need to solve is:

Given all the observations of the predictand (stored in vector ) and thepredictor variables stored in matrix X, we want to find simultaneously a for each predictor variable a proper scaling factor, such that the fitted estimatedvalues minimize the sum of the squared errors.

size of the vectors / matrices: n x 1 n x k k x 1 n x 1






𝑦=𝑋 ��+ ��

𝑦=𝑋 𝛽

𝛽= (𝑋𝑇 𝑋 )−1 𝑋𝑇 ��size of the vectors / matrices: k x 1 ( k x n n x k ) k x n n x 1 (k x k) (k x 1)

We find here the covariancematrix (scaled by n) of the predictor variables.The ‘-1’ indicatesanother fundamentally important matrix operation:The inverse of a matrix

Covariance(scaled by n)of all predictorswith the predictand






𝑦=𝑋 ��+ ��

𝑦=𝑋 𝛽

𝛽= (𝑋𝑇 𝑋 )−1 𝑋𝑇 ��size of the vectors / matrices: k x 1 ( k x n n x k ) k x n n x 1 (k x k) (kx1)

The resulting kx1 matrix (i.e. vector) contains a proper scaling factorfor each predictor.In other words: multiple linearregression is a weighted sumof the predictors (after conversion into units of the predictand y).




EXAMPLE MULTIPLE LINEAR REGRESSIONWITH 2 PREDICTORS

The scatter cloud shows a lineardependence of the values in y along the two predictordimensions x1 x2.

TIPS FOR MULTIPLE LINEAR REGRESSION (MLR)

General rule: work with as few predictors as possible. (every time you add a new predictoryou increase the risk of over-fitting the model)

Observe how good the fitted values and observed values match (correlation)

Choose predictors that provide independent information about the predictand

The problem of collinearity: If the predictors are all highly correlated among each otherthen the MLR can become very ambiguous (because it gets harder to calculate accurately the inverse of the covariance matrix)

Last but not least: the regression coefficients from the MLR are not ‘unique’. If you add or remove one predictor, all regression coefficients can change.

PRINCIPAL COMPONENT ANALYSIS Global Sea Surface Temperatures

From voluntary ship observationscolors show the percentage of monthswith at least one observation in a2 by 2 degree grid box.

From paper in Annual Review of Marine Science (2010)


Climatology 1982-2008

Red areas mark regions with highestSST variability


Principal Component Analysis (PCA)

(Empirical Orthogonal Functions (EOF))

The first leading Eigenvector Eigenvectors form nowgeographic pattern. Grids with highpositive values and large negative values are covarying out of phase (negative correlation). Green regionsshow small variations in this Eigenvector #1.

The Principal Component is a time series showing the temporal evolution of the SST variations. This mode is associated with the El Niño - Southern Oscillation

Date post:	05-Feb-2016
Category:	Documents
Upload:	jesus
View:	42 times
Download:	0 times

VecTOR PROJECTionS

Documents