04/22/23 http://numericalmethods.eng.usf.edu 1
Adequacy of Linear Regression Models
http://numericalmethods.eng.usf.eduTransforming Numerical Methods Education for STEM
Undergraduates
Data
-350 -300 -250 -200 -150 -100 -50 0 50 1002
2.5
3
3.5
4
4.5
5
5.5
6
6.5
x
yy vs x
Is this adequate?
-350 -300 -250 -200 -150 -100 -50 0 50 1002
2.5
3
3.5
4
4.5
5
5.5
6
6.5
7
x
yy vs x
Straight Line Model
Quality of Fitted Data• Does the model describe the data
adequately?
• How well does the model predict the response variable predictably?
Linear Regression Models• Limit our discussion to adequacy of
straight-line regression models
Four checks
1. Plot the data and the model.2. Find standard error of estimate.3. Calculate the coefficient of
determination.4. Check if the model meets the
assumption of random errors.
Example: Check the adequacy of the straight line model for given data
T(F)
α (μin/in/F)
-340 2.45
-260 3.58
-180 4.52
-100 5.28
-20 5.86
60 6.36
Taa 10
END
1. Plot the data and the model
Data and model
T(F)
α (μin/in/F)
-340 2.45-260 3.58-180 4.52-100 5.28-20 5.8660 6.36
-350 -300 -250 -200 -150 -100 -50 0 50 1002
2.5
3
3.5
4
4.5
5
5.5
6
6.5
7
T
TT 0096964.00325.6)(
END
2. Find the standard error of estimate
Standard error of estimate
2/
nSs r
T
n
iiir TaaS
1
210 )(
Standard Error of Estimate
-340-260-180-100-2060
2.453.584.525.285.866.36
2.73573.51144.28715.06295.83866.6143
-0.285710.0685710.232860.21714
0.021429-0.25429
iT i iTaa 10 ii Taa 10
TT 0096964.00325.6)(
Standard Error of Estimate
25283.0rS
2/
nS
s rT
2625283.0
25141.0
Standard Error of Estimate
T
ii
sTaa
/
10 ResidualScaled
-350 -300 -250 -200 -150 -100 -50 0 50 1002
3
4
5
6
7
8
T
Scaled Residuals
T
ii
sTaa
/
10 ResidualScaled
Estimateof Error StandardResidual ResidualScaled
95% of the scaled residuals need to be in [-2,2]
Scaled Residuals
Ti αi Residual Scaled Residual
-340-260-180-100-2060
2.453.584.525.285.866.36
-0.285710.0685710.232860.217140.021429-0.25429
-1.13640.272750.926220.86369
0.085235-1.0115
25141.0/ Ts
END
3. Find the coefficient of determination
Coefficient of determination
n
iiir TaaS
1
210
n
iitS
1
2
t
rt
SSS
r
2
Sum of square of residuals between data and mean
n
iit yyS
1
2
11, yx 33 , yx
22 , yx
),( nn yx
ii yx ,y
x
_
yy _
yy
Sum of square of residuals between observed and predicted
n
iiir xaayS
1
210
11, yx
33 , yx
22 , yx
),( nn yx ii yx ,
iii xaayE 10
y
x
Limits of Coefficient of Determination
t
rt
SSS
r
2
10 2 r
Calculation of St
-340-260-180-100-2060
2.453.584.525.285.866.36
-2.2250-1.09500.155000.605001.18501.6850
iT i i
783.10tS6750.4
Calculation of Sr
-340-260-180-100-2060
2.453.584.525.285.866.36
2.73573.51144.28715.06295.83866.6143
-0.285710.0685710.232860.21714
0.021429-0.25429
iT i iTaa 10 ii Taa 10
25283.0rS
Coefficient of determination
t
rt
SSS
r
2
783.1025283.0783.10
97655.0
Correlation coefficient
t
rt
SSS
r
98820.0
How do you know if r is positive or negative ?
What does a particular value of |r| mean?
0.8 to 1.0 - Very strong relationship 0.6 to 0.8 - Strong relationship 0.4 to 0.6 - Moderate relationship 0.2 to 0.4 - Weak relationship 0.0 to 0.2 - Weak or no relationship
Caution in use of r2
• Increase in spread of regressor variable (x) in y vs. x increases r2
• Large regression slope artificially yields high r2
• Large r2 does not measure appropriateness of the linear model
• Large r2 does not imply regression model will predict accurately
Final Exam Grade
Final Exam Grade vs Pre-Req GPA
END
4. Model meets assumption of random errors
Model meets assumption of random errors
• Residuals are negative as well as positive
• Variation of residuals as a function of the independent variable is random
• Residuals follow a normal distribution• There is no autocorrelation between the
data points.
Therm exp coeff vs temperatureT α60 6.3640 6.2420 6.120 6.00-20 5.86-40 5.72-60 5.58-80 5.43
T α-100 5.28-120 5.09-140 4.91-160 4.72-180 4.52-200 4.30-220 4.08-240 3.83
T α-280 3.33-300 3.07-320 2.76-340 2.45
Data and model
-350 -300 -250 -200 -150 -100 -50 0 50 1002
2.5
3
3.5
4
4.5
5
5.5
6
6.5
7
T
T0093868.00248.6
Plot of Residuals
-350 -300 -250 -200 -150 -100 -50 0 50 100-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
T
Resid
ual
Histograms of Residuals
Check for Autocorrelation• Find the number of times, q the sign of the
residual changes for the n data points.• If (n-1)/2-√(n-1) ≤q≤ (n-1)/2+√(n-1), you
most likely do not have an autocorrelation.
1222
1221222
)122(
q
083.159174.5 q
Is there autocorrelation?
083.159174.5 q
-350 -300 -250 -200 -150 -100 -50 0 50 100-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
T
Resid
ual
y vs x fit and residuals
n=40
Is 13.3≤21≤ 25.7? Yes!(n-1)/2-√(n-1) ≤p≤ (n-1)/2+√(n-1)
y vs x fit and residuals
(n-1)/2-√(n-1) ≤p≤ (n-1)/2+√(n-1)Is 13.3≤2≤ 25.7? No!
n=40
END
What polynomial model to choose if one needs to be chosen?
First Order of Polynomial
-350 -300 -250 -200 -150 -100 -50 0 50 1002
2.5
3
3.5
4
4.5
5
5.5
6
6.5
7x 10
-6 Polynomial Regression of order 1
x
y =
a 0+a1*x
+a2*x
2 +....
.+a m
*xm
Second Order Polynomial
-350 -300 -250 -200 -150 -100 -50 0 50 1002
2.5
3
3.5
4
4.5
5
5.5
6
6.5
7x 10
-6 Polynomial Regression of order 2
x
y =
a 0+a1*x
+a2*x
2 +....
.+a m
*xm
Which model to choose?
-350 -300 -250 -200 -150 -100 -50 0 50 1002
2.5
3
3.5
4
4.5
5
5.5
6
6.5
7
x
yy vs x
Optimum Polynomial
0 1 2 3 4 5 60
1
2
3
4
5
x 10-14 Optimum Order of Polynomial
Order of Polynomial, m
Sr
[n-(m
+1)]
THE END
Effect of an Outlier
Effect of Outlier
y = 2xR2 = 1
0
5
10
15
20
25
0 2 4 6 8 10 12
Effect of Outlier
y = 3.2727x - 5.0909R2 = 0.6879
-10
0
10
20
30
40
50
60
0 2 4 6 8 10 12