Chapter 8 Conclusion - University of Manitobahome.cc.umanitoba.ca/~godwinrt/3040/overheads/test...

Post on 26-Dec-2019

4 views 0 download

transcript

1

Chapter 8 Conclusion

Load the data:

teachdata =

read.csv("http://home.cc.umanitoba.ca/~godwinrt/3040/data/str4.csv"

)

attach(teachdata)

Variables:

head(teachdata)

sublunch score str avginc hiel

1 2.0408 690.80 17.88991 22.690001 0

2 47.9167 661.20 21.52466 9.824000 0

3 76.3226 643.60 18.69723 8.978000 1

4 77.0492 647.70 17.35714 8.978000 0

5 78.4270 640.85 18.67133 9.080333 1

6 86.9565 605.55 21.40625 10.415000 1

hiel = 1 if 10% or more of the class is learning English; = 0 otherwise

2

In a previous lecture, it was argued that avginc might have a non-linear

relationship with score:

plot(avginc, score, xlim = c(5,60), ylim = c(600,710))

10 20 30 40 50 60

60

06

20

64

06

60

68

07

00

avginc

sco

re

3

Two ways we can deal with this non-linear effect is by using a (i) polynomial

regression model, and (ii) by taking logs.

(i) Polynomials. Create the new variables: avginc2 = avginc^2

avginc3 = avginc^3

eqcubic = lm(score ~ avginc + avginc2 + avginc3)

summary(eqcubic)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 6.001e+02 5.830e+00 102.937 < 2e-16 ***

avginc 5.019e+00 8.595e-01 5.839 1.06e-08 ***

avginc2 -9.581e-02 3.736e-02 -2.564 0.0107 *

avginc3 6.855e-04 4.720e-04 1.452 0.1471

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 12.71 on 416 degrees of freedom

Multiple R-squared: 0.5584, Adjusted R-squared: 0.5552

F-statistic: 175.4 on 3 and 416 DF, p-value: < 2.2e-16

4

Let’s plot the cubic regression function:

par(new = TRUE)

curve(600.1 + 5.019*x - 0.09581*x^2 + 0.0006855*x^3, xlim =

c(5,60), ylim = c(600,710), ylab = "", xlab = "", col = 2)

10 20 30 40 50 60

60

06

20

64

06

60

68

07

00

avginc

sco

re

10 20 30 40 50 60

60

06

20

64

06

60

68

07

00

5

(ii) Logarithms:

eqlog = lm(score ~ log(avginc))

summary(eqlog)

Estimate Std. Error t value Pr(>|t|)

(Intercept) 557.832 4.200 132.81 <2e-16 ***

log(avginc) 36.420 1.571 23.18 <2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 12.62 on 418 degrees of freedom

Multiple R-squared: 0.5625, Adjusted R-squared: 0.5615

F-statistic: 537.4 on 1 and 418 DF, p-value: < 2.2e-16

This is the “lin-log” model. Interpretation: a 1% increase in avginc is

associated with a 0.364 increase in score.

Add this regression to the plot:

6

par(new = TRUE)

curve(557.832 + 36.42*log(x), xlim = c(5,60), ylim = c(600,710),

ylab = "", xlab = "", col = 3)

legend("bottomright", c("Cubic", "Lin-Log"), pch ="__", col=c(2,3))

10 20 30 40 50 60

60

06

20

64

06

60

68

07

00

avginc

sco

re

10 20 30 40 50 60

60

06

20

64

06

60

68

07

00

10 20 30 40 50 60

60

06

20

64

06

60

68

07

00

_

_Cubic

Lin-Log

7

Do you like the cubic or lin-log model better? What are the

advantages/disadvantages?

We will proceed by using log(avginc) – this is a simpler model (fewer estimated

𝛽𝑠) and fits better (higher �̅�2). But first, to revise omitted variable bias, let’s see

what happens if we leave log(avginc) out of the regression.

eq1 = lm(score ~ str + hiel + sublunch)

summary(eq1)

Estimate Std. Error t value Pr(>|t|)

(Intercept) 701.3732 4.6496 150.846 < 2e-16 ***

str -1.0312 0.2379 -4.334 1.84e-05 ***

hiel -3.8601 1.0470 -3.687 0.000257 ***

sublunch -0.5636 0.0192 -29.355 < 2e-16 ***

8

Now add log(avginc):

eq2 = lm(score ~ str + pctel + sublunch + log(avginc))

summary(eq2)

Estimate Std. Error t value Pr(>|t|)

(Intercept) 659.35563 7.62805 86.438 < 2e-16 ***

str -0.77244 0.22934 -3.368 0.000828 ***

hiel -5.79113 1.03517 -5.594 4.03e-08 ***

sublunch -0.41701 0.02835 -14.708 < 2e-16 ***

log(avginc) 11.81992 1.74920 6.757 4.77e-11 ***

How have the results changed? What is going on here?

O.V.B. occurs if:

(i) avginc is associated with score (It is! Note the small p-value)

(ii) avginc is associated with str

9

Regressor (1) (2)

str -1.03**

(0.24)

-0.77**

(0.23)

hiel -3.86**

(1.05)

-5.79**

(1.04)

sublunch -0.56**

(0.02)

-0.42**

(0.03)

log(avginc)

11.81**

(1.75)

Intercept 701.37**

(4.65)

659.36**

(7.63)

�̅�2 0.7726 0.7946

** significant at 1% level, * significant at 5% level

10

Does str have a different effect for classes with a high % of English learners?

We can use an interaction term to allow, and test, for this possibility.

Create the interaction term:

hielstr = hiel*str

Add this to the model:

eq3 = lm(score ~ str + hiel + hielstr + sublunch + log(avginc))

summary(eq3)

Estimate Std. Error t value Pr(>|t|)

(Intercept) 653.66612 8.89113 73.519 < 2e-16 ***

str -0.53103 0.30039 -1.768 0.0778 .

hiel 5.49821 9.13897 0.602 0.5478

hielstr -0.57767 0.46463 -1.243 0.2145

sublunch -0.41138 0.02869 -14.337 < 2e-16 ***

log(avginc) 12.12447 1.76513 6.869 2.38e-11 ***

11

Which coefficient should we be testing to see if str has a different effect for

classes with many English learners? What do we conclude?

𝐻0: str has the same effect on score for classes with a high and low % of English

learners

𝐻𝐴: str has a different effect on score for classes with a high and low % of

English learners

Or…

𝐻0: 𝛽ℎ𝑖𝑒𝑙×𝑠𝑡𝑟 = 0

𝐻𝐴: 𝛽ℎ𝑖𝑒𝑙×𝑠𝑡𝑟 ≠ 0

t = -1.243, p-val = 0.2145

We fail to reject 𝐻0.

How would you test the hypothesis that str has no effect on score, using model

(3)?

12

Regressor (1) (2) (3)

str -1.03**

(0.24)

-0.77**

(0.23)

-0.53

(0.30)

hiel -3.86**

(1.05)

-5.79**

(1.04)

5.50

(9.1)

hiel×str

-0.58

(0.47)

sublunch -0.56**

(0.02)

-0.42**

(0.03)

-0.411**

(0.029)

log(avginc)

11.81**

(1.75)

12.12**

(1.8)

Intercept 701.37**

(4.65)

659.36**

(7.63)

653.7**

(8.9)

�̅�2 0.7726 0.7946 0.7949

** significant at 1% level, * significant at 5% level

13

The relationship between str and score may be non-linear. We will use a

polynomial regression model to allow, and test, for this. I leave out the

interaction term hiel×str (for now) since it was previously found to be

insignificant.

str2 = str^2

str3 = str^3

eq4 = lm(score ~ str + str2 + str3 + hiel + sublunch + log(avginc))

summary(eq4)

Estimate Std. Error t value Pr(>|t|)

(Intercept) 252.05089 165.82433 1.520 0.12928

str 64.33886 25.46223 2.527 0.01188 *

str2 -3.42388 1.29374 -2.646 0.00844 **

str3 0.05929 0.02174 2.728 0.00665 **

hiel -5.47399 1.03187 -5.305 1.84e-07 ***

sublunch -0.42006 0.02814 -14.928 < 2e-16 ***

log(avginc) 11.74818 1.73446 6.773 4.34e-11 ***

14

Regressor (1) (2) (3) (4)

str -1.03**

(0.24)

-0.77**

(0.23)

-0.53

(0.30)

64.33**

(25.5)

str2

-3.42**

(1.29)

str3

0.06**

(0.02)

hiel -3.86**

(1.05)

-5.79**

(1.04)

5.50

(9.10)

-5.47**

(1.03)

hiel×str

-0.58

(0.47)

sublunch -0.56**

(0.02)

-0.42**

(0.03)

-0.41**

(0.03)

-0.42**

(0.03)

log(avginc)

11.81**

(1.75)

12.12**

(1.77)

11.75**

(1.73)

Intercept 701.37**

(4.65)

659.36**

(7.63)

653.66**

(8.89)

252.05

(165.82)

�̅�2 0.7726 0.7946 0.7949 0.7982

𝑅2 0.7742 0.7966 0.7974 0.8011

** significant at 1% level, * significant at 5% level

15

Test if the relationship between str and score is linear or non-linear:

𝐻0: 𝛽𝑠𝑡𝑟2 = 0 and 𝛽𝑠𝑡𝑟3 = 0

𝐻𝐴: 𝛽𝑠𝑡𝑟2 ≠ 0 and/or 𝛽𝑠𝑡𝑟3 ≠ 0

We must use an F-test since we have multiple restrictions!

Formula for F-statistic:

𝐹 =(𝑅𝑈

2 − 𝑅𝑅2) 𝑞⁄

(1 − 𝑅𝑈2 ) (𝑛 − 𝑘𝑈 − 1)⁄

What is the unrestricted model?

What is the restricted model?

What is q?

What is n?

What is 𝑘𝑈?

16

We need 𝑅2, not �̅�2!

𝐹 =(0.8011 − 0.7966) 2⁄

(1 − 0.8011) (420 − 6 − 1)⁄= 4.67

The 5% critical values for an F-test with large n are:

q 5% crit. value

1 3.84

2 3.00

3 2.60

4 2.37

5 2.21

We reject the null at the 5% level.

17

Use R to calculate the F-stat and p-val:

anova(eq4, eq2)

Model 1: score ~ str + str2 + str3 + hiel + sublunch + log(avginc)

Model 2: score ~ str + hiel + sublunch + log(avginc)

Res.Df RSS Df Sum of Sq F Pr(>F)

1 413 30257

2 415 30939 -2 -682.05 4.6549 0.01002 *

So far, model (4) seems the most appropriate.

18

Let’s reconsider if the effect of str on score is different for classes with a high

percentage of English learners. Before, we found no difference (using model 3),

but this might change now that we have model 4.

Again, the strategy is:

• have the dummy variable hiel interact with all terms involving str

• this allows for the “marginal effect” to differ between the two groups

• testing to see if the estimated coeffecients on the interaction terms are

jointly equal to zero is equivalent to testing that there is no difference

between the two groups

19

Create the new interaction terms:

hielstr2 = hiel*str2

hielstr3 = hiel*str3

Add the interaction terms to model (4):

eq5 = lm(score ~ str + str2 + str3 + hiel + hielstr + hielstr2 +

hielstr3 + sublunch + log(avginc))

summary(eq5)

20

Regressor (1) (2) (3) (4) (5)

str -1.03**

(0.24)

-0.77**

(0.23)

-0.53

(0.30)

64.33**

(25.46)

83.70**

(29.69)

str2

-3.42**

(1.29)

-4.38**

(1.51)

str3

0.06**

(0.02)

0.08**

(0.03)

hiel -3.86**

(1.05)

-5.79**

(1.04)

5.50

(9.10)

-5.47**

(1.03)

816.08

(434.61)

hiel×str

-0.58

(0.47)

-123.28

(66.35)

hiel×str2

6.12

(3.35)

hiel×str3

-0.10

(0.06)

sublunch -0.56**

(0.02)

-0.42**

(0.03)

-0.41**

(0.03)

-0.42**

(0.03)

-0.42**

(0.03)

log(avginc)

11.81**

(1.75)

12.12**

(1.77)

11.75**

(1.73)

11.80**

(1.75)

Intercept 701.37**

(4.65)

659.36**

(7.63)

653.66**

(8.89)

252.05

(165.82)

122.35

(192.18)

�̅�2 0.7726 0.7946 0.7949 0.7982 0.7988

𝑅2 0.7742 0.7966 0.7974 0.8011 0.8031

** significant at 1% level, * significant at 5% level

21

Test to see if str has a different effect for classes with many English learners.

We must test all terms involving hiel and str together.

𝐻0: 𝛽ℎ𝑖𝑒𝑙×𝑠𝑡𝑟 = 0 and 𝛽ℎ𝑖𝑒𝑙×𝑠𝑡𝑟2 = 0 and 𝛽ℎ𝑖𝑒𝑙×𝑠𝑡𝑟3 = 0

𝐻𝐴: not 𝐻0

Or…

𝐻0: model (4)

𝐻𝐴: model (5)

𝐹 =(0.8031 − 0.8011) 3⁄

(1 − 0.8031) (420 − 9 − 1)⁄= 1.38

Compared to the 2.60 critical value, we fail to reject the null. Model (4) is

preferred.

How would you test if str matters, using model (4)?

22

Summary

There doesn’t appear to be a substantial difference in the effect of str on score

for classes with many English learners.

A hypothesis test involving model (4) indicates the relationship between str and

score is non-linear.

Using F-tests, the null hypothesis that str has no effect on score is rejected in all

models.

23

Interpreting the model

Note: using the values from the table will induce too much rounding error.

If str = 20, then reducing str to 19 would improve score by a predicted:

𝑠𝑐𝑜𝑟�̂�|𝑠𝑡𝑟=20 − 𝑠𝑐𝑜𝑟�̂�|𝑠𝑡𝑟=19

= [64.33886 × (20) − 3.42388 × (20)2 + 0.05929 × (20)3]

− [64.33886 × (19) − 3.42388 × (20)2 + 0.05929 × (19)3]

= 1.54

If str = 15, then reducing str to 14 would improve score by a predicted 2.46.

Going from str = 20 to str = 10 would improve score by a predicted 31.26, etc.

How much does it cost to double the number of teachers? Note that for every

1% increase in the budget, score is predicted to increase by 0.1175.