+ All Categories
Home > Documents > Multiple Linear Regression.

Multiple Linear Regression.

Date post: 31-Dec-2015
Category:
Upload: robert-larson
View: 37 times
Download: 3 times
Share this document with a friend
Description:
Multiple Linear Regression. Concept and uses. Model and assumptions. Intrinsically linear models. Model development and validation. Problem areas. Non-normality. Heterogeneous variance. Correlated errors. Influential points and outliers. Model inadequacies. Collinearity. - PowerPoint PPT Presentation
18
June 11, 2022 AGR206 1 Multiple Linear Regression. Concept and uses. Model and assumptions. Intrinsically linear models. Model development and validation. Problem areas. Non-normality. Heterogeneous variance. Correlated errors. Influential points and outliers. Model inadequacies. Collinearity. Errors in X variables.
Transcript
Page 1: Multiple Linear Regression.

April 19, 2023 AGR206 1

Multiple Linear Regression. Concept and uses. Model and assumptions. Intrinsically linear models. Model development and validation. Problem areas.

Non-normality. Heterogeneous variance. Correlated errors. Influential points and outliers. Model inadequacies. Collinearity. Errors in X variables.

Page 2: Multiple Linear Regression.

April 19, 2023 AGR206 2

Concept & Uses.

REGRESSION

ANOVA

Description restricted to data set. Did biomass increase with pH in the sample?

Prediction of Y. How much biomass we expect to find in certain soil conditions?

Extrapolation for new conditions: can we predict biomass in other estuaries?

Estimation and understanding. How much does biomass change per unit change in pH and controlling for other factors?

Control of process: requires causality. Can we create sites with certain biomass by changing the pH?

Page 3: Multiple Linear Regression.

April 19, 2023 AGR206 3

Body fat example in JMP.

Three variables (X1, X3, X3) were measured to predict body fat % (Y) in people.

Random sample of people. Y was measured by an expensive and

very accurate method (assume it reveals true %fat).

X1: thickness of triceps skinfold X2: thigh circumference X3: midarm circumference. Bodyfat.jmp

Page 4: Multiple Linear Regression.

April 19, 2023 AGR206 4

Ho’s or values “of interest”

Does thickness of triceps skinfold contribute significantly to predict fat content?

What is the CI for fat content for a person whose X’s have been measured?

Do I have more or less fat than last summer? Do I have more fat than recommended?

Page 5: Multiple Linear Regression.

April 19, 2023 AGR206 5

Model and Assumptions.

Linear, additive model to relate Y to p independent variables.

• Note: here, p is number of variables, but some authors use p for number of parameters, which is one more than variables due to the intercept.

Yi=0+ 1 Xi1+…+ p Xip+i• where i are normal and independent random variables

with common variance 2. In matrix notation the model and solution are

exactly the same as for SLR:Y= X+ b=(X’X)-1(X’Y)

All equations from SLR apply without change.

Page 6: Multiple Linear Regression.

April 19, 2023 AGR206 6

Linear models

Linear, and intrinsically linear models. Linearity refers to the parameters.

The model can involve any function of X’s for as long as they do not have parameters that have to be adjusted.

A linear model does not always produce a hyperplane.

Yi=0+ 1 f1(Xi1)+…+ p fp(Xi1)+i Polynomial regression.

Is a special case where the functions are powers of X.

Page 7: Multiple Linear Regression.

April 19, 2023 AGR206 7

Matrix Equations

Page 8: Multiple Linear Regression.

April 19, 2023 AGR206 8

Extra Sum of Squares

Effects of order of entry on SS.

The 4 types of SS.

Partial correlation.

Page 9: Multiple Linear Regression.

April 19, 2023 AGR206 9

Extra Sum of Squares:body fat

Page 10: Multiple Linear Regression.

April 19, 2023 AGR206 10

Response plane and error

X1

X2

Yi

E{Yi}

Y The response surface in more than 3D is a hyperplane.

Page 11: Multiple Linear Regression.

April 19, 2023 AGR206 11

Model development What variables to include.

Depends on objective:descriptive -> no need to reduce

number of variables.Prediction and estimation of Yhat: OK

to reduce for economical use.Estimation of and understanding:

sensitive to deletions; may bias MSE and . No real solution other than getting more data from better experiment. (Sorry!)

Page 12: Multiple Linear Regression.

April 19, 2023 AGR206 12

Variable Selection

Effects of elimination of variables: MSE is positively biased unless true for

variables eliminated is 0. hat and Yhat are biased unless previous

condition or variables eliminated are orthogonal to those retained.

Variance of estimated parameters and predictions is usually lower.

There are conditions for which MSE for reduced model (including variance and bias2) is smaller.

Page 13: Multiple Linear Regression.

April 19, 2023 AGR206 13

Criteria for variable selection

R2 - Coefficient of determination. R2 = SSReg/SSTotal

MSE or MSRes - Mean squared residuals. if all X’s in it estimates 2.

R2adj - Adjusted R2.

R2adj = 1-MSE/MSTo =

=1-[(n-1)/(n-p)] (SSE/SSTo) Mallow’s Cp

Cp=[SSRes/MSEFull] + 2 p- n(p=number of parameters)

Page 14: Multiple Linear Regression.

April 19, 2023 AGR206 14

Example

Page 15: Multiple Linear Regression.

April 19, 2023 AGR206 15

Checking assumptions.

Note that although we have many X’s, errors are still in a single dimension.

Residual analysis is performed as for SLR, sometimes repeated over different X’s. Normality. Use proc univ normal option.

Transform. Homogeneity of variance. Plot error vs. each X.

Transform. Weighted least squares. Independence of errors. Adequacy of model. Plots errors. LOF.

Influence and outliers. Use influence option in proc reg.

Collinearity. Use collinoint option of proc reg.

Page 16: Multiple Linear Regression.

April 19, 2023 AGR206 16

data s00.spart2;set s00.spartina;colin=2*ph+0.5*acid+sal+rannor(23);run;

proc reg data=s00.spart2;model bmss= colin h2s sal eh7 ph acid p k ca mg na mn zn cu nh4 / r influence vif collinoint stb partial;run;model colin=ph sal acid;run;

code for PROC REG

Page 17: Multiple Linear Regression.

April 19, 2023 AGR206 17

Spartina ANOVA outputModel: MODEL1Dependent Variable: BMSS

Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 15 16369583.2 1091305.552 11.297 0.0001 Error 29 2801379.9 96599.307 C Total 44 19170963.2

Root MSE 310.80429 R-square 0.8539 Dep Mean 1000.80000 Adj R-sq 0.7783 C.V. 31.05558

Page 18: Multiple Linear Regression.

April 19, 2023 AGR206 18

Parameters and VIF Parameter Estimates

Parameter Standard T for H0: Standardized Variable DF Estimate Error Parameter=0 Prob > |T| Estimate

INTERCEP 1 3809.233562 3038.081 1.254 0.2199 0.00000000 COLIN 1 -178.317065 58.718 -3.037 0.0050 -1.06227792 H2S 1 0.336242 2.656 0.127 0.9001 0.01563626 SAL 1 150.513276 61.960 2.429 0.0216 0.84818417 EH7 1 2.288694 1.785 1.282 0.2099 0.12813770 PH 1 486.417077 306.756 1.586 0.1237 0.91891994 ACID 1 -24.816449 109.856 -0.226 0.8229 -0.09422943 P 1 0.153015 2.417 0.063 0.9500 0.00639498 K 1 -0.733250 0.439 -1.668 0.1061 -0.33059243 CA 1 -0.137163 0.111 -1.230 0.2286 -0.35706572 MG 1 -0.318586 0.243 -1.308 0.2010 -0.45340287 NA 1 -0.005294 0.022 -0.239 0.8127 -0.05520175 MN 1 -4.279887 4.836 -0.885 0.3835 -0.15872971 ZN 1 -26.270852 19.452 -1.351 0.1873 -0.32953283 CU 1 346.606818 99.295 3.491 0.0016 0.54452366 NH4 1 0.539373 3.061 0.176 0.8614 0.03862822

VarianceInflation

0.0000000024.28364757 3.0278562624.19556405 1.9821673366.6492101334.53131689 2.02507775 7.7966001716.7270279223.8283572610.57323219 6.3858966211.81574077 4.82931410 9.53842459


Recommended