1
Chapter 8 Conclusion
Load the data:
teachdata =
read.csv("http://home.cc.umanitoba.ca/~godwinrt/3040/data/str4.csv"
)
attach(teachdata)
Variables:
head(teachdata)
sublunch score str avginc hiel
1 2.0408 690.80 17.88991 22.690001 0
2 47.9167 661.20 21.52466 9.824000 0
3 76.3226 643.60 18.69723 8.978000 1
4 77.0492 647.70 17.35714 8.978000 0
5 78.4270 640.85 18.67133 9.080333 1
6 86.9565 605.55 21.40625 10.415000 1
hiel = 1 if 10% or more of the class is learning English; = 0 otherwise
2
In a previous lecture, it was argued that avginc might have a non-linear
relationship with score:
plot(avginc, score, xlim = c(5,60), ylim = c(600,710))
10 20 30 40 50 60
60
06
20
64
06
60
68
07
00
avginc
sco
re
3
Two ways we can deal with this non-linear effect is by using a (i) polynomial
regression model, and (ii) by taking logs.
(i) Polynomials. Create the new variables: avginc2 = avginc^2
avginc3 = avginc^3
eqcubic = lm(score ~ avginc + avginc2 + avginc3)
summary(eqcubic)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.001e+02 5.830e+00 102.937 < 2e-16 ***
avginc 5.019e+00 8.595e-01 5.839 1.06e-08 ***
avginc2 -9.581e-02 3.736e-02 -2.564 0.0107 *
avginc3 6.855e-04 4.720e-04 1.452 0.1471
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 12.71 on 416 degrees of freedom
Multiple R-squared: 0.5584, Adjusted R-squared: 0.5552
F-statistic: 175.4 on 3 and 416 DF, p-value: < 2.2e-16
4
Let’s plot the cubic regression function:
par(new = TRUE)
curve(600.1 + 5.019*x - 0.09581*x^2 + 0.0006855*x^3, xlim =
c(5,60), ylim = c(600,710), ylab = "", xlab = "", col = 2)
10 20 30 40 50 60
60
06
20
64
06
60
68
07
00
avginc
sco
re
10 20 30 40 50 60
60
06
20
64
06
60
68
07
00
5
(ii) Logarithms:
eqlog = lm(score ~ log(avginc))
summary(eqlog)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 557.832 4.200 132.81 <2e-16 ***
log(avginc) 36.420 1.571 23.18 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 12.62 on 418 degrees of freedom
Multiple R-squared: 0.5625, Adjusted R-squared: 0.5615
F-statistic: 537.4 on 1 and 418 DF, p-value: < 2.2e-16
This is the “lin-log” model. Interpretation: a 1% increase in avginc is
associated with a 0.364 increase in score.
Add this regression to the plot:
6
par(new = TRUE)
curve(557.832 + 36.42*log(x), xlim = c(5,60), ylim = c(600,710),
ylab = "", xlab = "", col = 3)
legend("bottomright", c("Cubic", "Lin-Log"), pch ="__", col=c(2,3))
10 20 30 40 50 60
60
06
20
64
06
60
68
07
00
avginc
sco
re
10 20 30 40 50 60
60
06
20
64
06
60
68
07
00
10 20 30 40 50 60
60
06
20
64
06
60
68
07
00
_
_Cubic
Lin-Log
7
Do you like the cubic or lin-log model better? What are the
advantages/disadvantages?
We will proceed by using log(avginc) – this is a simpler model (fewer estimated
𝛽𝑠) and fits better (higher �̅�2). But first, to revise omitted variable bias, let’s see
what happens if we leave log(avginc) out of the regression.
eq1 = lm(score ~ str + hiel + sublunch)
summary(eq1)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 701.3732 4.6496 150.846 < 2e-16 ***
str -1.0312 0.2379 -4.334 1.84e-05 ***
hiel -3.8601 1.0470 -3.687 0.000257 ***
sublunch -0.5636 0.0192 -29.355 < 2e-16 ***
8
Now add log(avginc):
eq2 = lm(score ~ str + pctel + sublunch + log(avginc))
summary(eq2)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 659.35563 7.62805 86.438 < 2e-16 ***
str -0.77244 0.22934 -3.368 0.000828 ***
hiel -5.79113 1.03517 -5.594 4.03e-08 ***
sublunch -0.41701 0.02835 -14.708 < 2e-16 ***
log(avginc) 11.81992 1.74920 6.757 4.77e-11 ***
How have the results changed? What is going on here?
O.V.B. occurs if:
(i) avginc is associated with score (It is! Note the small p-value)
(ii) avginc is associated with str
9
Regressor (1) (2)
str -1.03**
(0.24)
-0.77**
(0.23)
hiel -3.86**
(1.05)
-5.79**
(1.04)
sublunch -0.56**
(0.02)
-0.42**
(0.03)
log(avginc)
11.81**
(1.75)
Intercept 701.37**
(4.65)
659.36**
(7.63)
�̅�2 0.7726 0.7946
** significant at 1% level, * significant at 5% level
10
Does str have a different effect for classes with a high % of English learners?
We can use an interaction term to allow, and test, for this possibility.
Create the interaction term:
hielstr = hiel*str
Add this to the model:
eq3 = lm(score ~ str + hiel + hielstr + sublunch + log(avginc))
summary(eq3)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 653.66612 8.89113 73.519 < 2e-16 ***
str -0.53103 0.30039 -1.768 0.0778 .
hiel 5.49821 9.13897 0.602 0.5478
hielstr -0.57767 0.46463 -1.243 0.2145
sublunch -0.41138 0.02869 -14.337 < 2e-16 ***
log(avginc) 12.12447 1.76513 6.869 2.38e-11 ***
11
Which coefficient should we be testing to see if str has a different effect for
classes with many English learners? What do we conclude?
𝐻0: str has the same effect on score for classes with a high and low % of English
learners
𝐻𝐴: str has a different effect on score for classes with a high and low % of
English learners
Or…
𝐻0: 𝛽ℎ𝑖𝑒𝑙×𝑠𝑡𝑟 = 0
𝐻𝐴: 𝛽ℎ𝑖𝑒𝑙×𝑠𝑡𝑟 ≠ 0
t = -1.243, p-val = 0.2145
We fail to reject 𝐻0.
How would you test the hypothesis that str has no effect on score, using model
(3)?
12
Regressor (1) (2) (3)
str -1.03**
(0.24)
-0.77**
(0.23)
-0.53
(0.30)
hiel -3.86**
(1.05)
-5.79**
(1.04)
5.50
(9.1)
hiel×str
-0.58
(0.47)
sublunch -0.56**
(0.02)
-0.42**
(0.03)
-0.411**
(0.029)
log(avginc)
11.81**
(1.75)
12.12**
(1.8)
Intercept 701.37**
(4.65)
659.36**
(7.63)
653.7**
(8.9)
�̅�2 0.7726 0.7946 0.7949
** significant at 1% level, * significant at 5% level
13
The relationship between str and score may be non-linear. We will use a
polynomial regression model to allow, and test, for this. I leave out the
interaction term hiel×str (for now) since it was previously found to be
insignificant.
str2 = str^2
str3 = str^3
eq4 = lm(score ~ str + str2 + str3 + hiel + sublunch + log(avginc))
summary(eq4)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 252.05089 165.82433 1.520 0.12928
str 64.33886 25.46223 2.527 0.01188 *
str2 -3.42388 1.29374 -2.646 0.00844 **
str3 0.05929 0.02174 2.728 0.00665 **
hiel -5.47399 1.03187 -5.305 1.84e-07 ***
sublunch -0.42006 0.02814 -14.928 < 2e-16 ***
log(avginc) 11.74818 1.73446 6.773 4.34e-11 ***
14
Regressor (1) (2) (3) (4)
str -1.03**
(0.24)
-0.77**
(0.23)
-0.53
(0.30)
64.33**
(25.5)
str2
-3.42**
(1.29)
str3
0.06**
(0.02)
hiel -3.86**
(1.05)
-5.79**
(1.04)
5.50
(9.10)
-5.47**
(1.03)
hiel×str
-0.58
(0.47)
sublunch -0.56**
(0.02)
-0.42**
(0.03)
-0.41**
(0.03)
-0.42**
(0.03)
log(avginc)
11.81**
(1.75)
12.12**
(1.77)
11.75**
(1.73)
Intercept 701.37**
(4.65)
659.36**
(7.63)
653.66**
(8.89)
252.05
(165.82)
�̅�2 0.7726 0.7946 0.7949 0.7982
𝑅2 0.7742 0.7966 0.7974 0.8011
** significant at 1% level, * significant at 5% level
15
Test if the relationship between str and score is linear or non-linear:
𝐻0: 𝛽𝑠𝑡𝑟2 = 0 and 𝛽𝑠𝑡𝑟3 = 0
𝐻𝐴: 𝛽𝑠𝑡𝑟2 ≠ 0 and/or 𝛽𝑠𝑡𝑟3 ≠ 0
We must use an F-test since we have multiple restrictions!
Formula for F-statistic:
𝐹 =(𝑅𝑈
2 − 𝑅𝑅2) 𝑞⁄
(1 − 𝑅𝑈2 ) (𝑛 − 𝑘𝑈 − 1)⁄
What is the unrestricted model?
What is the restricted model?
What is q?
What is n?
What is 𝑘𝑈?
16
We need 𝑅2, not �̅�2!
𝐹 =(0.8011 − 0.7966) 2⁄
(1 − 0.8011) (420 − 6 − 1)⁄= 4.67
The 5% critical values for an F-test with large n are:
q 5% crit. value
1 3.84
2 3.00
3 2.60
4 2.37
5 2.21
We reject the null at the 5% level.
17
Use R to calculate the F-stat and p-val:
anova(eq4, eq2)
Model 1: score ~ str + str2 + str3 + hiel + sublunch + log(avginc)
Model 2: score ~ str + hiel + sublunch + log(avginc)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 413 30257
2 415 30939 -2 -682.05 4.6549 0.01002 *
So far, model (4) seems the most appropriate.
18
Let’s reconsider if the effect of str on score is different for classes with a high
percentage of English learners. Before, we found no difference (using model 3),
but this might change now that we have model 4.
Again, the strategy is:
• have the dummy variable hiel interact with all terms involving str
• this allows for the “marginal effect” to differ between the two groups
• testing to see if the estimated coeffecients on the interaction terms are
jointly equal to zero is equivalent to testing that there is no difference
between the two groups
19
Create the new interaction terms:
hielstr2 = hiel*str2
hielstr3 = hiel*str3
Add the interaction terms to model (4):
eq5 = lm(score ~ str + str2 + str3 + hiel + hielstr + hielstr2 +
hielstr3 + sublunch + log(avginc))
summary(eq5)
20
Regressor (1) (2) (3) (4) (5)
str -1.03**
(0.24)
-0.77**
(0.23)
-0.53
(0.30)
64.33**
(25.46)
83.70**
(29.69)
str2
-3.42**
(1.29)
-4.38**
(1.51)
str3
0.06**
(0.02)
0.08**
(0.03)
hiel -3.86**
(1.05)
-5.79**
(1.04)
5.50
(9.10)
-5.47**
(1.03)
816.08
(434.61)
hiel×str
-0.58
(0.47)
-123.28
(66.35)
hiel×str2
6.12
(3.35)
hiel×str3
-0.10
(0.06)
sublunch -0.56**
(0.02)
-0.42**
(0.03)
-0.41**
(0.03)
-0.42**
(0.03)
-0.42**
(0.03)
log(avginc)
11.81**
(1.75)
12.12**
(1.77)
11.75**
(1.73)
11.80**
(1.75)
Intercept 701.37**
(4.65)
659.36**
(7.63)
653.66**
(8.89)
252.05
(165.82)
122.35
(192.18)
�̅�2 0.7726 0.7946 0.7949 0.7982 0.7988
𝑅2 0.7742 0.7966 0.7974 0.8011 0.8031
** significant at 1% level, * significant at 5% level
21
Test to see if str has a different effect for classes with many English learners.
We must test all terms involving hiel and str together.
𝐻0: 𝛽ℎ𝑖𝑒𝑙×𝑠𝑡𝑟 = 0 and 𝛽ℎ𝑖𝑒𝑙×𝑠𝑡𝑟2 = 0 and 𝛽ℎ𝑖𝑒𝑙×𝑠𝑡𝑟3 = 0
𝐻𝐴: not 𝐻0
Or…
𝐻0: model (4)
𝐻𝐴: model (5)
𝐹 =(0.8031 − 0.8011) 3⁄
(1 − 0.8031) (420 − 9 − 1)⁄= 1.38
Compared to the 2.60 critical value, we fail to reject the null. Model (4) is
preferred.
How would you test if str matters, using model (4)?
22
Summary
There doesn’t appear to be a substantial difference in the effect of str on score
for classes with many English learners.
A hypothesis test involving model (4) indicates the relationship between str and
score is non-linear.
Using F-tests, the null hypothesis that str has no effect on score is rejected in all
models.
23
Interpreting the model
Note: using the values from the table will induce too much rounding error.
If str = 20, then reducing str to 19 would improve score by a predicted:
𝑠𝑐𝑜𝑟�̂�|𝑠𝑡𝑟=20 − 𝑠𝑐𝑜𝑟�̂�|𝑠𝑡𝑟=19
= [64.33886 × (20) − 3.42388 × (20)2 + 0.05929 × (20)3]
− [64.33886 × (19) − 3.42388 × (20)2 + 0.05929 × (19)3]
= 1.54
If str = 15, then reducing str to 14 would improve score by a predicted 2.46.
Going from str = 20 to str = 10 would improve score by a predicted 31.26, etc.
How much does it cost to double the number of teachers? Note that for every
1% increase in the budget, score is predicted to increase by 0.1175.