Multiple Regression
→ One outcome, many explaining variables
Example:Ultrasound scanning, shortly before birth(1-3 days before)
OBS WEIGHT BPD AD
1 2350 88 92
2 2450 91 98
3 3300 94 110
. . . .
. . . .
. . . .
105 3550 92 116
106 1173 72 73
107 2900 92 104
(BPD: Head diameter; AD: Stomach circumference)
Objectives could be:
• Prediction, construction of normal regionsfor diagnostic use (as here)
• Calculation of causal relationships forintervention use
• Scientific insight
1
First we look at a single covariate, bpd:
The statistical model for a simple linearregression was
Yi = α + βXi + εi, εi ∼ N(0, σ2) indep.
Here there is a marked deviation fromlinearity!
How does that look in model checking?
2
Model checking
Statistical model:
Yi = α + βXi + εi, εi ∼ N(0, σ2) indep.
What do we have to check here?
• linearity
• variance homogeneity
• deviations from normality(distance to the line)
Note:
• No assumption of normality for the xi!
• Independence between the Yi is checked byinspecting
– Are there several observations from thesame individual?
– Are there persons from the samefamily? Twins?
3
Model checking consists of
• graphical checks, typically with theresiduals
• perhaps formal tests
Residual:Quantity which expresses the discrepancybetween the observed and the expected(predicted, fitted) value.
There are 4 types of residuals to choose:
1. ordinary: vertical distance of observationto the line, observed - fitted value:
εi = yi − yi
2. standardized (student): ordinary,normalized with the standard deviation
3. press: observed minus predicted, but in amodel, where the current observation hasbeen excluded in the estimation process
4. rstudent (studentized, rstudent):normed press-residuals
4
Problems with the ordinary residuals
We have assumed that
εi ∼ N(0, σ2), independent,
so we would assume that the same holds for theresiduals, εi = yi − yi.
→ This is not so!
• They are not independent (sum up to 0)– not so important, if there are sufficientlymany
• They don’t all have the same variance
Var(εi) = σ2(1− hii)
where
hii =1n
+(xi − x)2
Sxx
denotes the leverage of the ith observation
5
Standardized residuals
ri =εi
s√
1− hii
, Var(ri) ≈ 1
Diagnostic residuals
Here the observations (xi, yi) are excluded oneafter another. For calculating the ith residual,the resulting fitted value (from the modelwithout (xi, yi)) is used - either in raw form(press) or in normalized form (rstudent).
Advantages and disadvantages:
• Nice to have residuals which preserve the
units/scale (type 1 and 3)
• Easiest to find outliers, if observations are
excluded one after another (type 3 and 4)
• Best to normalize, if the observations are
included and one cannot draw...
Thus, in multiple regression type 2 should be
preferred to type 1
6
Residual plots
Residuals (suitably chosen type) are plotted vs.
• the explaining variables xi
– to check linearity
• the fitted values yi
– to check variance homogeneity andnormality of the errors
• ’normal scores’, i.e. probability plotor histogram– to check normality
→ The first two plots should look disordered,i.e. unsystematic.
→ The probability plot should lie on a straightline.
7
Residual plots in ANALYST
Many of the plots can be produced viaStatistics/Regression/Linear, by clickingPlots/Residual, where, for example,Ordinary Residual vs. Predicted is chosen.
Predicted value of weight
8
Several graphs for model checking
Predicted value of weight
9
Linearity
If linearity does not hold, the model will bemisleading and uninterpretable
Ways out:
• add more covariates, e.g.
– a quadratic term: bpd2
weight=α+β1bpd+β2bpd2
Test of linearity: H0 : β2 = 0
– ad (multiple regression)
• transform variables by
– logarithms
– square root
– inverse
• non-linear regression
10
A clear deviation from linearity can be seenwith the test of the quadratic term:
New variable: cbpd2=(bpd-90)**2
Statistics/Regression/Linear, chooseweight as Dependent, bpd and cbpd2 asExplanatory
(or use Statistics/Regression/Simple
and choose Quadratic)
Dependent Variable: weight
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob>F
Model 2 34103611.113 17051805.556 108.081 0.0001
Error 104 16407889.953 157768.17262
C Total 106 50511501.065
Root MSE 397.20042 R-square 0.6752
Dep Mean 2739.09346 Adj R-sq 0.6689
C.V. 14.50116
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Parameter=0 Prob > |T|
INTERCEP 1 2720.981236 42.94411387 63.361 0.0001
CBPD 1 117.631510 9.72368306 12.097 0.0001
CBPD2 1 2.232942 0.63718640 3.504 0.0007
11
Quadratic regression
weight = 2720 + 117.63(bpd−90) + 2.23(bpd−90)2
(’-90’: to avoid numerical instability; 90 is ’in the middle’...)
Prediction limits (chosen under Plotsin Statistics/Regression/Simple):
12
Variance homogeneity
(constant variance / constant standard deviation)
Var(εi) = σ2, i = 1, . . . , n
If there is no (rough) variance homogeneity, the
estimation will be inefficient
(we obtain an unnecessarily large variance)
Which alternatives do we have?
• constant relative standard deviation= constant coefficient of variation
CV (X) = SD(X)E(X)
– often constant, if small positivequantities, e.g. concentrations, aremeasured
– will lead to a trumpet shapedresidual plot
– way out: transform the outome (Yi) bylogarithm
• Compound experiment
– e.g., several instruments or laboratories
13
Normality assumption
Remember: Only the error terms areassumed to be normally distributed,neither the outcome nor the covariates!
Normality assumption
• is not crucial for the fit itself:the least squares method yields the ’best’estimates at any rate
• is a formal pre-requisite for the t
distribution of the test statistics, but reallyonly a normality assumption for theestimate β is needed, and this is often(approximately) given, if there aresufficiently many observations, due to :
The central limit theorem,
which states that sums or other functionsof many observations get ’more and more’normally distributed.
14
Transformations
• logarithms, square root, inverse
Why take logarithms?
• of the explaing variable
– for obtaining linearity: if there aresuccessive doublings, which have aconstant effect: Use logarithms tothe basis 2!
• of the response / outcome
– for obtaining linearity
– for obtaining variance homogeneity
Var(log(Y )) ≈ Var(Y )Y 2
i.e., a constant coefficient of variation ofY means a constant variance of log(Y ),the natural logarithm
15
After log2-transformation of weight:
Predicted value of lweight
16
After log2-transformation ofboth weight and bpd:
Predicted value of lweight
17
Multiple regression
DATA: n persons, p measurements for each:
person x1 . . . xp y
1 x11 . . . x1p y1
2 x21 . . . x2p y2
3 x31 . . . x3p y3
. . . . . . . .
n xn1 . . . xnp yn
The linear regression model with p
explaining variables is given by:
yi︸︷︷︸response
= β0 + β1x1i + · · ·+ βpxpi︸ ︷︷ ︸mean value, regression function
+ εi︸︷︷︸biol. variation
Parameters:
β0 intercept
β1, . . . , βp regression coefficients
18
Graphical Illustration
Graphs/Scatter Plot/Three-Dimensional,under Display choose Needles/Pillar
proc g3d;
scatter bpd*ad=weight / shape=’pillar’ size=0.5;
run;
Weight
19
Regression model
yi = β0 +β1xi1 + · · ·+βpxip +εij , i = 1, . . . , n
Usual assumptions:
εi ∼ N(0, σ2), independent
Least squares method:
S(β0, β1, . . . , βp) =∑n
i=1(yi−β0−β1xi1−. . .−βpxip)2
→ minimize with respect to β0, . . . , βp
20
Example: Sechers data with birth weight as alinear function of both bpd and ad
Analyst: Statistics/Regression/Linear, with
weight as Dependent, bpd and ad as Explanatory
The REG Procedure
Dependent Variable: weight
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 2 40736854 20368427 216.72 <.0001
Error 104 9774647 93987
Corrected Total 106 50511501
Root MSE 306.57298 R-Square 0.8065
Dependent Mean 2739.09346 Adj R-Sq 0.8028
Coeff Var 11.19250
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 -4628.11813 455.98980 -10.15 <.0001
bpd 1 37.13292 7.61510 4.88 <.0001
ad 1 39.76305 4.16394 9.55 <.0001
→ Strongly significant effect of both covariates.→ But: Are the model assumptions fulfilled?
21
Model checks for theuntransformed model
Predicted value of weight
22
Assessment of the model:
• Normality holds roughly, but some singlequite large positive deviations, which couldargue for a logarithmic transformation ofweight.
• Perhaps a light trumpet shape in the plotof residuals vs. predicted values, but notethat the observations are not equallydistributed over the x axis.
• Linearity does not hold well - mainly dueto the earliest born babies.
• Theoretical arguments from clinical expertssuggest a logarithmic transformation ofboth covariates
23
Logarithmic transformation of the data:
lweight=log2(weight)
lbpd=log2(bpd)
lad=log2(ad)
Statistics/Regression/Linear,
choose lweight as Dependent,
lbpd and lad as Explanatory
Dependent Variable: Lweight
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob>F
Model 2 14.95054 7.47527 314.925 0.0001
Error 104 2.46861 0.02374
C Total 106 17.41915
Root MSE 0.15407 R-square 0.8583
Dep Mean 11.36775 Adj R-sq 0.8556
C.V. 1.35530
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Parameter=0 Prob > |T|
INTERCEP 1 -8.456359 0.95456918 -8.859 0.0001
LBPD 1 1.551943 0.22944935 6.764 0.0001
LAD 1 1.466662 0.14669097 9.998 0.0001
24
Test of hypotheses
Is AD of importance, if BPD is already in themodel?
H0 : β2 = 0
Here we have β2=1.467 (SE(β2) = 0.147),and thus the t test yields
t =β2
SE(β2)= 9.998 ∼ t104, p < 0.0001
95% confidence interval for β2:
c.i. = β2 ± t97.5%,n−p−1SE(β2)
= 1.467± 1.984× 0.147 = (1.175 , 1.759)
But:
The βj are correlated – unless the explainingvariables are independent
25
Fitted values
log2(weight) = −8.46 + 1.47 log2(ad)
+1.55 log2(bpd)
i.e.
weight = 0.0028× ad1.47 × bpd1.55
→ If bpd is increased by 10%, this correspondsto multiplying the weight by
1.11.55 = 1.16
i.e. an increase by 16%, if ad is kept fixed
26
Example for calculations
For ad=113 and bpd=88, we would expect
log2(weight)
= −8.46 + 1.47× log2(113) + 1.55× log2(88)
= −8.46 + 1.47× 6.82 + 1.55× 6.46
= 11.58
→ Expected birth weight: 211.58 = 3061g
• Actually observed birth weight: 3400g
• Residual: 3400g - 3061g =339g
27
Uncertainty in prediction
Note: The log-scale results in a constantrelative uncertainty
2±1.96×0.154 = (0.81 , 1.23)
This means, that with 95% probability thebirth weight lies somewhere between 19%under and 23% over the predicted value.
(Here we have cheated a bit: We have neglected the
estimation uncertainty in the β’s themselves.)
28
Marginal vs. multiple models
Marginal models:
The response is considered with each singleexplaining variable on its own.
Multiple regression model:
The response is considered with bothexplaining variables together.
Estimates for these models (with correspondingstandard errors in parentheses) :
β0, int. β1, lbpd β2, lad s R2
-10.223 3.332(0.202) - 0.215 0.72
-3.527 - 2.237(0.111) 0.184 0.80
-8.456 1.552(0.229) 1.467(0.147) 0.154 0.86
29
Interpretation of the coefficientβ1 for lbpd
• Marginal model:Change in lweight, if the covariate lbpd ischanged by 1 unit, i.e. if bpd is doubled
• Multiple regression modelChange in lweight, if the covariate lbpd ischanged by 1 unit, but where all othercovariates (here only ad) are kept fixed
We say, that we have corrected (oradjusted) for the effects of the othercovariates in the model.
The difference between the two models canbe quite drastic, since the covariates aretypically related:
– If one of them is changed,the others are also changed
30
Goodness-of-fit Measure
R2 =SSModel
SSTotal
“How large is the proportion of variationexplained by the model?”
(here: 0.8583, i.e. 85.83%)
Problem of interpretation if the covariatesare controlled (as for the correlation coefficient)
R2 increases with the number of covariates,even if these are not important!
Adjusted R2:
R2adj = 1− MSResidual
MSTotal
(here: 0.8556)
31
Model checking
• Plots:
– residuals vs. each covariate separately(linearity)
– residuals vs. predicted values(variance homogeneity)
– probability plot(normality)
• Tests:
– generalized vs. simple models
– curvature: square term, cubic term, ...
– interaction: product term ?
• Influencing observations
– modified residuals
– Cook’s distance
32
Model checks for thelog2-transformed model
Predicted value of lweight
33
Regression diagnostics
Are the conclusions supported by the wholedata set?
Or are there observations with rather largeinfluence on the results?
Leverage = potential influence(hat-matrix, in SAS called Hat Diag or H)If there is only one covariate, this is simply:
hii =1n
+(xi − x)2
Sxx
Observations with extreme x values can have alarge influence on the results,
34
... but not necessarily!
→ no problem if they lie ’nice’ in relation tothe regression line, i.e. if they have a smallresidual
→ For example:
0 1 2 3 4 5 6
02
46
810
x
y
35
Influencing observations
→ are those, which have a combination of
• high leverage
• large residual
36
Regression diagnostics
• Leave out the ith person and find newestimates, β
(i)0 , β
(i)1 and β
(i)2
• Calculate Cook’s distance, a compoundmeasure for the change in the parameterestimates
• Split Cook’s distance into its coordinatesand state:By how many SE’s does β1 (for example)change, if the ith person is left out?
What to do with influencing observations?
• leave them out?
• quote a measure of their influence?
37
Diagnostics: Cook’s distance
38
Outliers
Observations, which don’t fit in the relationship
• not necessarily influencing
• not necessarily with a large residual
What to do with outliers?
• look closer at them,they are often quite interesting
When can we leave them out?
• if they lie far outside, i.e. have a highleverage
– keep in mind to distinguish thecorresponding conclusions!
• if one can find a reason
– and then all these would be left out!
39
Model checking and Diagnosticsin Analyst
Many graphics can be produced directly in theregression setting in Analyst, underPlots/Residual or Plots/Diagnostics.
If further plots are wanted (e.g. a plot of Cook’sdistance), one should create a new data set:
In the regression setting, go into
• Save Data
• tick Create and save diagnostics data
• choose (click Add) the quantities to besaved (typically Predicted, Residual,Student, Rstudent, Cookd, Press).
• Double-click at Diagnostics Table in theproject tree
• Save this by clickingFile/Save as By SAS Name
• Open it for further use, byFile/Open By SAS Name
40
Example (DGA p.338)
41
Which explaining variables have a marginaleffect on the response PEmax?
Are these (Age, Height, Weight, FEV1, FRC)the variables which should be included in themultiple regression model?
42
Correlations
Correlation Analysis
Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 25
AGE SEX HEIGHT WEIGHT BMP
AGE 1.00000 -0.16712 0.92605 0.90587 0.37776
0.0 0.4246 0.0001 0.0001 0.0626
SEX -0.16712 1.00000 -0.16755 -0.19044 -0.13756
0.4246 0.0 0.4234 0.3619 0.5120
HEIGHT 0.92605 -0.16755 1.00000 0.92070 0.44076
0.0001 0.4234 0.0 0.0001 0.0274
WEIGHT 0.90587 -0.19044 0.92070 1.00000 0.67255
0.0001 0.3619 0.0001 0.0 0.0002
BMP 0.37776 -0.13756 0.44076 0.67255 1.00000
0.0626 0.5120 0.0274 0.0002 0.0
FEV1 0.29449 -0.52826 0.31666 0.44884 0.54552
0.1530 0.0066 0.1230 0.0244 0.0048
RV -0.55194 0.27135 -0.56952 -0.62151 -0.58237
0.0042 0.1895 0.0030 0.0009 0.0023
FRC -0.63936 0.18361 -0.62428 -0.61726 -0.43439
0.0006 0.3797 0.0009 0.0010 0.0300
TLC -0.46937 0.02423 -0.45708 -0.41847 -0.36490
0.0179 0.9085 0.0216 0.0374 0.0729
PEMAX 0.61347 -0.28857 0.59922 0.63522 0.22951
0.0011 0.1618 0.0015 0.0006 0.2698
43
Correlation Analysis
Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 25
FEV1 RV FRC TLC PEMAX
AGE 0.29449 -0.55194 -0.63936 -0.46937 0.61347
0.1530 0.0042 0.0006 0.0179 0.0011
SEX -0.52826 0.27135 0.18361 0.02423 -0.28857
0.0066 0.1895 0.3797 0.9085 0.1618
HEIGHT 0.31666 -0.56952 -0.62428 -0.45708 0.59922
0.1230 0.0030 0.0009 0.0216 0.0015
WEIGHT 0.44884 -0.62151 -0.61726 -0.41847 0.63522
0.0244 0.0009 0.0010 0.0374 0.0006
BMP 0.54552 -0.58237 -0.43439 -0.36490 0.22951
0.0048 0.0023 0.0300 0.0729 0.2698
FEV1 1.00000 -0.66586 -0.66511 -0.44299 0.45338
0.0 0.0003 0.0003 0.0266 0.0228
RV -0.66586 1.00000 0.91060 0.58914 -0.31555
0.0003 0.0 0.0001 0.0019 0.1244
FRC -0.66511 0.91060 1.00000 0.70440 -0.41721
0.0003 0.0001 0.0 0.0001 0.0380
TLC -0.44299 0.58914 0.70440 1.00000 -0.18162
0.0266 0.0019 0.0001 0.0 0.3849
PEMAX 0.45338 -0.31555 -0.41721 -0.18162 1.00000
0.0228 0.1244 0.0380 0.3849 0.0
Note in particular the correlations between age,height and weight!
44
Model selection
(chosen in Model under Regression/Linear):
• Forward selectionInclude each time the most significant→ Final model: WEIGHT BMP FEV1
• Backward eliminationStart with all covariates, then drop eachtime the least significant→ Final model: WEIGHT BMP FEV1
This looks quite stable!?
But:
If WEIGHT would have been logarithmictransformed from the start?
→ Then we would have obtained the finalmodel: AGE FEV1
Rule of thumb:
There should be at least 10 times as manyobservations as parameters in the model.
45
If all 9 covariates are included:
Dependent: pemax
Explanatory:
age sex height weight bmp fev1 rv frc tlc
Dependent Variable: PEMAX
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Prob>F
Model 9 17101.39040 1900.15449 2.929 0.0320
Error 15 9731.24960 648.74997
C Total 24 26832.64000
Root MSE 25.47057 R-square 0.6373
Dep Mean 109.12000 Adj R-sq 0.4197
C.V. 23.34180
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Parameter=0 Prob > |T|
INTERCEP 1 176.058206 225.89115895 0.779 0.4479
AGE 1 -2.541960 4.80169881 -0.529 0.6043
SEX 1 -3.736781 15.45982182 -0.242 0.8123
HEIGHT 1 -0.446255 0.90335490 -0.494 0.6285
WEIGHT 1 2.992816 2.00795743 1.490 0.1568
BMP 1 -1.744944 1.15523751 -1.510 0.1517
FEV1 1 1.080697 1.08094746 1.000 0.3333
RV 1 0.196972 0.19621362 1.004 0.3314
FRC 1 -0.308431 0.49238994 -0.626 0.5405
TLC 1 0.188602 0.49973514 0.377 0.7112
46
Backward elimination
Table of successive p values:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
age 0.604 0.632 0.519 0.616 - - - - -
sex 0.812 - - - - - - - -
height 0.628 0.649 0.550 0.600 0.557 - - - -
weight 0.157 0.143 0.072 0.072 0.040 0.000 0.000 0.000 0.001
bmp 0.152 0.140 0.060 0.056 0.035 0.024 0.019 0.098 -
fev1 0.333 0.108 0.103 0.036 0.024 0.014 0.043 - -
rv 0.331 0.323 0.347 0.326 0.228 0.146 - - -
frc 0.540 0.555 0.638 - - - - - -
tlc 0.711 0.669 - - - - - - -
(Altman stops with step no. 7.)
47
Graph of successive p values
48
Remarks on model selection
• Problem of multiple testing!
• Avoid unclear problem formulations (manycovariates, which express more or less thesame)
• Linearly related covariates:
– What can we say about the ’winners’?
– Were they all the time significant, ordid they come in suddenly?
– In the latter case they could also havebeen excluded while they wereinsignificant . . .
– Estimations become very unstable
• Usual recommendation:
– Backward elimination
– Calculation of all models
– Cross-validation:Estimate the model with a part of thedata, then try it out on the rest.
49
What happens if an explaining variable isexcluded?
• The fit gets worse, i.e. the residual sum ofsquares gets larger.
• The number of degrees of freedom (for theresidual SS) increases.
• The estimate s2 for the residual varianceσ2 can both increase or decrease
s2 =∑
()2
n− p− 1
• The proportion of variation which isexplained by the model, R2, decreases.This is compensated in the adjustedcoefficient of determination, R2
adj
→ As criteria for the model fit we can use boths2 or R2
adj.
50
Marginal models:
• Model 1: pemax vs. height
• Model 2: pemax vs. weight
Multiple regression model:
• Model 3: pemax vs. height and weight
β0 β1(height) β2(weight) s R2 R2
-33.276 0.932(0.260) - 27.34 0.3591 0.33
63.546 - 1.187(0.301) 26.38 0.4035 0.38
47.355 0.147(0.655) 1.024(0.787) 26.94 0.4049 0.35
• Each of the two explaining variables hassome importance, as seen from themarginal models.
• In the multiple regression model it looks asif none of them has any importance.
• This means, that at least one of them isimportant, but it is difficult to say which.It looks rather as if it would be weight...
51
Options in Statistics/Regression/Linear:
• Model:
– Forward selection
– Backward elimination
• Statistics:
– clb: confidence limits for estimates
– corrb: correlation between estimates
– stb: standardized coefficients:effect of changing the covariate by 1 SD
• Statistics/Tests:
– collin: collinearity diagnostics
– vif: variance inflation factor:variance increase due to collinearity
– tol: tolerance factor:1-R2 for regression of one covariatew.r.t. the others
52
If we add clb, stb, vif and tol, we obtain:
Parameter Estimates
Standardized Variance
Variable DF Estimate Tolerance Inflation
Intercept 1 0 . 0
age 1 -0.38460 0.04581 21.82984
sex 1 -0.05662 0.44064 2.26941
height 1 -0.28694 0.07166 13.95493
weight 1 1.60200 0.02093 47.78130
bmp 1 -0.62651 0.14053 7.11575
fev1 1 0.36190 0.18452 5.41951
rv 1 0.50671 0.09489 10.53805
frc 1 -0.40327 0.05833 17.14307
tlc 1 0.09571 0.37594 2.65999
Parameter Estimates
Variable DF 95% Confidence Limits
Intercept 1 -305.41740 657.53381
age 1 -12.77654 7.69262
sex 1 -36.68861 29.21505
height 1 -2.37171 1.47920
weight 1 -1.28704 7.27268
bmp 1 -4.20727 0.71739
fev1 1 -1.22329 3.38468
rv 1 -0.22125 0.61519
frc 1 -1.35794 0.74107
tlc 1 -0.87656 1.25376
53
The quantities calculated for each observationare best saved in a new data set, and then onecan look at descriptive quantities for these, e.g.:
The MEANS Procedure
Variable Label Mean
--------------------------------------------------------------------
resid Residual 2.50111E-14
stresid Studentized Residual 0.0193870
press Residual without Current Observation 1.2483399
residud Studentized Residual without Current Obs 0.0073219
leverage Leverage 0.4000000
cook Cook’s D Influence Statistic 0.0643761
inflpred Standard Influence on Predicted Value 0.0477590
--------------------------------------------------------------------
Variable Label Minimum
--------------------------------------------------------------------
resid Residual -37.3376860
stresid Studentized Residual -1.7680347
press Residual without Current Observation -60.7098868
residud Studentized Residual without Current Obs -1.9197970
leverage Leverage 0.1925968
cook Cook’s D Influence Statistic 0.000558647
inflpred Standard Influence on Predicted Value -1.7428452
--------------------------------------------------------------------
Variable Label Maximum
--------------------------------------------------------------------
resid Residual 33.4051731
stresid Studentized Residual 1.7053874
press Residual without Current Observation 56.4819549
residud Studentized Residual without Current Obs 1.8350344
leverage Leverage 0.5806599
cook Cook’s D Influence Statistic 0.2582067
inflpred Standard Influence on Predicted Value 1.5251936
--------------------------------------------------------------------
54
Selected diagnostic plots
Number of observation
Number of observation
55
Collinearity
→ The covariates are linearly related.There will be always some relationship between them
(hopefully not too strong),
except in designed trials (e.g. in agricultural trials)
Symptoms of collinearity:
• Some of the covariates are stronglycorrelated
• Some parameter estimates have quite largestandard errors
• All covariates in the multiple regressionmodel are insignificant, but R2 isnevertheless large
• There are large changes in the estimates, ifone covariate is excluded from the model
• There are large changes in the estimates, ifan observations is excluded from the model
• The results differ from the expectations.
56