+ All Categories
Home > Documents > STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that...

STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that...

Date post: 30-Aug-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
63
An Overview Inferences about β 1 . 1 Sampling distribution of ˆ β1. 2 Sampling distribution of { ˆ β1 - β1}/ q ˆ var{ ˆ β1}. 3 Confidence interval for β1. 4 Hypothesis testing. Inferences about β 0 . Estimation and prediction (with respect to some x 0 ). ANOVA approach. Coefficient of determination: R 2 . W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 63
Transcript
Page 1: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

An Overview

Inferences about β1.

1 Sampling distribution of β1.

2 Sampling distribution of {β1 − β1}/√

var{β1}.3 Confidence interval for β1.

4 Hypothesis testing.

Inferences about β0.

Estimation and prediction (with respect to some x0).

ANOVA approach.

Coefficient of determination: R2.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 63

Page 2: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Inferences about β1

Recall simple linear regression model

Yi = β0 + β1Xi + εi, εi ∼ iid N(0, σ2),

for i = 1, . . . , n.

Recall that LS and ML estimate of β1 is

β1 =

∑ni=1(Xi − X)(Yi − Y )∑n

i=1(Xi − X)2

As we will show, β1 is normal with

E(β1) = β1 and V ar(β1) =σ2∑n

i=1(Xi − X)2

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 2 / 63

Page 3: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Preliminary Results Concerning β1

Note that

β1 =

∑ni=1(Xi − X)(Yi − Y )∑n

i=1(Xi − X)2

=

∑ni=1(Xi − X)Yi∑ni=1(Xi − X)2

=

n∑i=1

Xi − X∑ni=1(Xi − X)2

Yi

Express β1 as a linear combination of {Yi}:

β1 =

where

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 3 / 63

Page 4: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Preliminary Results Concerning β1

Now we compute∑ni=1 ki,

∑ni=1 kiXi, and

∑ni=1 k

2i .

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 4 / 63

Page 5: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Sampling Distribution of β1

β1 follows a normal distribution. Why?

Mean of β1, E(β1) is

Variance of β1, V ar(β1) is

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 5 / 63

Page 6: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Completing Proof of Gauss-Markov Theorem

Theorem: Under the simple linear regression model with the residuals being mean

zero, constant variance (but not necessarily normal), β0 and β1 are BLUE (Best

Linear Unbiased Estimators) because they have minimum variance among all

linear unbiased estimators.

Proof:

We have already shown that the estimators are linear since

We have already shown that β1 is unbiased, i.e., E(β1) = β1. Now

It remains to show the minimum variance among all linear estimators.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 6 / 63

Page 7: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

V ar(β1) is minimal among unbiased linear estimators

Let an arbitrary linear unbiased estimator be of the form β1 =∑ni=1 ciYi

where ci are constants that satisfy

Note that Var{β1} is

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 7 / 63

Page 8: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

V ar(β1) is minimal among unbiased linear estimators

Also note that∑ni=1 kidi = 0. Why? Show it below.

Thus

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 8 / 63

Page 9: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Estimate of V ar(β1)

Recall that

Var{β1} = σ2

(n∑i=1

(Xi − X)2

)−1

The estimated variance is

Var{β1} =

∑ni=1(Yi − Yi)2/(n− 2)∑n

i=1(Xi − X)2=

MSE∑ni=1(Xi − X)2

.

We will showβ1 − β1√Var{β1}

∼ tn−2.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 9 / 63

Page 10: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Sampling Distribution of β1−β1√Var{β1}

Definition: If Zi ∼ i.i.d. N(µi, σ2i ) for i = 1, . . . , k then∑k

i=1

(Zi−µi

σi

)2

∼ χ2k (Chi-square distribution with k degrees of freedom).

Definition (KNNL A.44): A t random variable with ν degrees of freedom

results from the expression

tν =z√qν/ν

where z and qν are independent standard normal and χ2ν random variables,

respectively.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 10 / 63

Page 11: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Sampling Distribution of β1−β1√Var{β1}

Note thatβ1 − β1√Var{β1}

=β1 − β1√

σ2∑ni=1(Xi−X)2

∼ N(0, 1).

Also

Var{β1}Var{β1}

=

MSE∑ni=1(Xi−X)2

σ2∑ni=1(Xi−X)2

=MSE

σ2=

SSE

(n− 2)σ2∼χ2n−2

n− 2.

Thus,

β1 − β1√Var

{β1

} =

β1−β1√Var{β1}√Var{β1}Var{β1}

∼ tn−2

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 11 / 63

Page 12: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Sampling Distribution of β1−β1√Var{β1}

The last conclusion on the previous slide only holds if SSE/σ2 is

independent of β0 and β1.

This is given in Theorem (2.11) of KNNL for the simple linear regression

model and is proven in general in STAT640.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 12 / 63

Page 13: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Confidence Interval for β1

Denote

√Var

{β1

}= s{β1}. Recall that

β1 − β1

s{β1}∼ tn−2.

The (1− α) CI for β1 is

β1 ± t(1− α/2;n− 2)s{β1}

where t(1− α/2;n− 2) is the (1− α/2)100th percentile of tn−2.

See KNNL pp.1317–1318 Table B.2. (Or use ± qt(.025,df))

Interpretation of CI. For example, a 95% CI for β1 is (–,–).

I If we repeated the study 100 times and created 100 CI’s for β1, we would

expect that 95 of these intervals would include the true value of β1.

I The method used to construct this interval has a 5% error rate.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 13 / 63

Page 14: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Confidence Interval for β1

In the advertising example, recall that

n∑i=1

Xi = 24.40,

n∑i=1

X2i = 107.42,

n∑i=1

XiYi = 154.07

n∑i=1

Yi = 35.50,

n∑i=1

Y 2i = 222.03.

Thus

β1 =154.07− 24.40× 35.50/7

107.42− 24.4× 24.40/7=

30.33

22.37= 1.356

β0 = 35.50/7− 1.356× 24.40/7 = 0.345

Compute MSE

MSE =SSE

n− 2=

∑ni=1(Yi − Yi)2

n− 2=

0.86

5= 0.172

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 14 / 63

Page 15: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Confidence Interval for β1

Compute standard deviation estimate

s{β1} =

√MSE∑n

i=1(Xi − X)2=

√0.172

22.37= 0.0877

For a 95% CI, α = 0.05 and

t(1− α/2;n− 2) = t(0.975; 5) = 2.571

The 95% CI for β1 is

β1 ± t(1− α/2;n− 2)s{β1} = 1.356± 2.571× 0.0877

= 1.356± 0.225

= (1.13, 1.58).

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 15 / 63

Page 16: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Review of Hypothesis Testing

Recall two types of errors.

I Type I: Reject H0 when H0 is true.

I Type II: Fail to reject H0 when H0 is false.

Level of significance α = P (Type I error).

Power = 1− P (Type II error) = 1− β.

Recall p-value.

I A p-value is the probability of observing a sample outcome as extreme or more

extreme than the observed outcome under the assumption that H0 is true.

I Small p-value provides evidence against H0.

I It is misleading to say that p-value = 0. Use p-value ≤ 0.0001.

When we choose α, we control P[type I error] but it will affect β, too. We

can’t choose both α and β (without manipulating n), so we choose to

control α (more important).

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 16 / 63

Page 17: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Review of Hypothesis Testing

1-sided versus 2-sided test.

CI versus hypothesis testing: For the t-test given above, we could state the

conclusion of an α-level test in terms of a (1− α)100% CI. If 0 is contained

in the (1− α)100% CI, then we fail to reject H0.

When writing up a hypothesis test for this class, always include

I Hypothesis in statistical and practical terms.

I Test statistic.

I Decision rule and p-value.

I Conclusion in the context of a problem. Use wording such as “reject H0” or

“fail to reject H0”. Do not use “accept H0”.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 17 / 63

Page 18: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Hypothesis Testing for β1

A test of interest (why?) is:

H0 : β1 = 0 versus Ha : β1 6= 0.

Recall thatβ1 − β1

s{β1}∼ tn−2

Thus an α-level test is based on the test statistic

t∗ =β1 − 0

s{β1}.

Decision rule: If |t∗| > t(1− α/2;n− 2), reject H0; otherwise do not reject

H0.

p-value = 2× P (T > |t∗|) where T ∼ tn−2.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 18 / 63

Page 19: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Hypothesis Testing for β1

Revisit the advertising example and test

H0 : β1 = 0 versus Ha : β1 6= 0.

The test statistic is

t∗ =β1

s{β1}=

1.356

0.0877= 15.46.

Compared with t5, the p-value is

2× P (t5 > 15.46) < 0.0001.

Thus reject H0 and there is strong evidence that there is a positive line

relationship between advertising expenditure and sales.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 19 / 63

Page 20: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Hypothesis Testing for β1

In general, for testing H0 : β1 = βh versus Ha : β1 6= βh, use the test statistic

t∗ =β1 − βhs{β1}

.

and proceed as before.

For testing H0 : β1 ≤ βh versus Ha : β1 > βhI Decision rule: If t∗ > t(1− α;n− 2), reject H0; otherwise do not reject H0.

I p-value = P (T > t∗) where T ∼ tn−2

For testing H0 : β1 ≥ βh versus Ha : β1 < βhI Decision rule: If t∗ < t(α;n− 2), reject H0; otherwise do not reject H0.

I p-value = P (T < t∗) where T ∼ tn−2

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 20 / 63

Page 21: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Inferences about β0

Recall that

β0 = Y − β1X.

It can be shown that β0 is normal with

E(β0) = β0 and σ2{β0} = V ar(β0) = σ2

[1

n+

X2∑ni=1(Xi − X)2

].

(Left as HW)

Estimated variance is

s2{β0} = MSE

[1

n+

X2∑ni=1(Xi − X)2

].

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 21 / 63

Page 22: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Inferences about β0

It can be shown that the sampling distribution is

β0 − β0

s{β0}∼ tn−2.

CIs and hypothesis tests for β0 follow as those for β1.

Note the case of β0 = 0.

In practice, never drop β0 from the model unless there is a scientific reason.

However, rarely is one interested in the actual value of β0

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 22 / 63

Page 23: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Types of Prediction and Estimation

Estimate the mean response E(Yh) for a given level of X = Xh.

Predict a new observation Yh(new) for a given level of X = Xh.

Predict the mean of m new observations all at a given level of Xh.

Estimate confidence band for regression line for several (or all) Xh’s.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 23 / 63

Page 24: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Data Example: muscle mass (HW 1 (7))

Recall that Y = muscle mass and X = age with the fitted regression line

Y = 156.35− 1.19X

What is the population mean measure of muscle mass for a 55-year-old

person?

What should we predict for the muscle mass for a 55-year-old person

randomly selected from the population?

In both cases, the estimate is

Y = 156.35− 1.19× 55 = 90.9

but uncertainty is larger in the second case.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 24 / 63

Page 25: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Estimation of E(Yh)

Let Xh be the level of X for which we want to estimate the mean response.

Xh could be observed or not, but should be within the range of {Xi}.

E(Yh) = the mean response at Xh.

The estimate of E(Yh) is

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 25 / 63

Page 26: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Derivation of V ar(Yh)

Three results are used in the derivation.

Yh =

V ar(a1Y1 + a2Y2) =

Cov(Y , β1) =

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 26 / 63

Page 27: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Derivation of V ar(Yh)

Thus, σ2{Yh} is

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 27 / 63

Page 28: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Inference for E(Yh)

Variance of Yh is

σ2{Yh} = V ar(Yh) = σ2

[1

n+

(Xh − X)2∑ni=1(Xi − X)2

].

Estimated variance is

s2{Yh} = MSE

[1

n+

(Xh − X)2∑ni=1(Xi − X)2

].

Note thatYh − E(Yh)

s{Yh}∼ tn−2.

The (1− α) CI for E(Yh) is

Yh ± t(1− α/2;n− 2)s{Yh}

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 28 / 63

Page 29: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Inference for E(Yh)

In the advertising example, suppose Xh = 6.

The estimate of the mean sales at Xh = 6 is

Yh = β0 + β1Xh = 0.345 + 1.356× 6 = 8.48.

The estimated standard deviation is

s{Yh} =

√MSE

[1

n+

(Xh − X)2∑ni=1(Xi − X)2

]

=√

0.172×√

1

7+

(6− 3.486)2

22.37= 0.271.

The 95% CI for the mean sales at Xh = 6 is

Yh ± t(1− α/2;n− 2)s{Yh} = 8.48± 2.571× 0.271

= 8.48± 0.70 = (7.78, 9.18).

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 29 / 63

Page 30: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Inference for Yh(new)

Xh = the “new” value of X.

I In the previous case, Xh might also be “new” in the sense that it was not a

value in the dataset. But here we talking about a new or hypothetical single

person (i.e., experimental/observational unit).

Yh(new) = the “new” response (as yet unobserved).

The best point prediction of Yh(new) is

Predicts new individual to be the mean for everyone else with Xh

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 30 / 63

Page 31: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Inference for Yh(new)

Best estimate of prediction error is

Yh(new} − Yh

Note Yh(new} and Yh are independent

Variance of the prediction error σ2{pred} is

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 31 / 63

Page 32: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Inference for Yh(new)

Estimated variance is

s2{pred} = MSE

[1 +

1

n+

(Xh − X)2∑ni=1(Xi − X)2

].

Note thatYh − Yh(new)

s{pred}∼ tn−2

The (1− α) prediction interval (PI) for Yh(new) is

Yh ± t(1− α/2;n− 2)s{pred}

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 32 / 63

Page 33: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Prediction of Yh(new)

In the advertising example, again suppose Xh = 6.

The predicted Yh(new) is

Yh = β0 + β1Xh = 0.345 + 1.356× 6 = 8.48.

The estimated standard deviation of the prediction error is

s{pred} =

√MSE

[1 +

1

n+

(Xh − X)2∑ni=1(Xi − X)2

]

=√

0.172×√

1 +1

7+

(6− 3.486)2

22.37= 0.495.

The 95% PI for Yh(new) is

Yh ± t(1− α/2;n− 2)s{pred} = 8.48± 2.571× 0.495

= 8.48± 1.27 = (7.21, 9.75).

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 33 / 63

Page 34: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Analysis of Variance (ANOVA) Approach

The idea is to partition the variation into

SS Total = SS Model + SS Error

Why partition the variation?

1 Weigh different sources of variation.

2 Hypothesis testing.

3 Comparison of models.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 34 / 63

Page 35: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Partitioning Deviation of Each Observation

Yi − Y︸ ︷︷ ︸total dev

= Yi − Y︸ ︷︷ ︸dev of fitted from mean

+ Yi − Yi︸ ︷︷ ︸dev of obs from fitted

.

If {Yi − Y } are large in relation to {Yi − Yi}, then the regression relation

explains (or accounts for) a large proportion of the total variation in {Yi}.

If {Yi − Y } are small in relation to {Yi − Yi}, then the regression relation

explains (or accounts for) a small proportion of the total variation in {Yi}.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 35 / 63

Page 36: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Partitioning Total Sum of Squares

n∑i=1

(Yi − Y )2

︸ ︷︷ ︸SSTO

=

n∑i=1

(Yi − Y )2

︸ ︷︷ ︸SSR

+

n∑i=1

(Yi − Yi)2

︸ ︷︷ ︸SSE

.

SSTO =∑ni=1(Yi − Y )2 is the total sum of squares.

I A measure of total variation in the data (compare to variance).

SSR =∑ni=1(Yi − Y )2 is the regression sum of squares.

I The larger the SSR in relation to SSTO, the larger the proportion of variability

in the Yi’s accounted for by the regression relation.

SSE =∑ni=1(Yi − Yi)2 is the error sum of squares.

I The greater the variation of the Yi’s around the fitted regression line, the

larger the SSE.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 36 / 63

Page 37: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Partitioning Total Sum of Squares

SSTO = SSR + SSE

where

1

SSTO =

n∑i=1

(Yi − Y )2 =

n∑i=1

Y 2i −

1

n(

n∑i=1

Yi)2

df = n− 1

2

SSR =

n∑i=1

(Yi − Y )2 = β21

n∑i=1

(Xi − X)2 = β21

[n∑i=1

X2i −

1

n(

n∑i=1

Xi)2

]df = 1

3

SSE =

n∑i=1

(Yi − Yi)2 = SSTO− SSR

df = n− 2W. Zhou (Colorado State University) STAT 540 July 6th, 2015 37 / 63

Page 38: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Partitioning Degrees of Freedom

n∑i=1

(Yi − Y )2

︸ ︷︷ ︸df=n−1

=

n∑i=1

(Yi − Y )2

︸ ︷︷ ︸df=1

+

n∑i=1

(Yi − Yi)2

︸ ︷︷ ︸df=n−2

.

SSTO df = n− 1: Y is used to estimate µY .

SSE df = n− 2: β0, β1 are used to estimate β0, β1.

Reasons to partition df?

I Compute MSE and MSR.

I See ST640.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 38 / 63

Page 39: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Expected Mean Squares E(MSE)

Define

MSE =SSE

n− 2

Since SSE/σ2 ∼ χ2(n− 2), we have

E(MSE) = σ2.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 39 / 63

Page 40: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Expected Mean Squares E(MSR)

Define MSR = SSR1 , recall SSR = β2

1

∑ni=1(Xi − X)2, we have

E(β21) =

σ2∑ni=1(Xi − X)2

+ β21

Why?

Thus

E(MSR) = σ2 + β21

n∑i=1

(Xi − X)2.

I Observe that when β1 = 0, E(MSR) = σ2.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 40 / 63

Page 41: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Expected Mean Squares

Thus for testing H0 : β1 = 0 versus Ha : β1 6= 0, use the test statistic

SSR/1

SSE/(n− 2)=MSR

MSE

It can be shown that under H0 : β1 = 0,

F ∗ =MSR

MSE∼ F1,n−2.

Thus we can perform an F -test instead of a t-test.

In fact, T ∼ tν then T 2 ∼ F1,ν

I Thus the F -test is equivalent to a two-sided t-test for H0 : β1 = 0 versus

Ha : β1 6= 0.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 41 / 63

Page 42: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Example SSTO

SSTO =

n∑i=1

(Yi − Y )2

=

n∑i=1

Y 2i −

1

n(

n∑i=1

Yi)2

= 222.03− 1

7(35.50)2

= 41.99

df = n− 1 = 6

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 42 / 63

Page 43: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Example SSR and SSE

SSR =

n∑i=1

(Yi − Y )2

= β21

[n∑i=1

X2i −

1

n(

n∑i=1

Xi)2

]= 1.3562 × (107.42− 24.402/7)

= 41.13

df = 1

SSE =

n∑i=1

(Yi − Yi)2

= SSTO− SSR

= 41.99− 41.13 = 0.86

df = n− 2 = 5

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 43 / 63

Page 44: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

General Linear Test Approach

Consider the full model

Yi = β0 + β1Xi + εi, εi ∼ iid N(0, σ2)

and obtain SSE(F).

Consider the reduced model when β1 = 0

Yi = β0 + εi, εi ∼ iid N(0, σ2)

and obtain SSE(R).

It can be shown that SSE(F ) ≤ SSE(R) (intuitively, why?)

In addition, under H0 : β1 = 0,

F ∗ =

SSE(R)−SSE(F )dfR−dfFSSE(F )dfF

∼ F (dfR − dfF , dfF )

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 44 / 63

Page 45: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Example

To test H0 : β1 = 0, the F test statistic is

F ∗ =MSR

MSE=

41.13

0.172= 239.13

Compare with F (1, 5) and the p-value is

P (F (1, 5) > F ∗) = P (F (1, 5) > 239.13) < 0.0001.

Same conclusion as in the t test.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 45 / 63

Page 46: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

ANOVA Table

Summarize results using an ANOVA table.

Source SS df MS F

Regression (X) SSR 1 SSR/1 MSR/MSE

Error SSE n− 2 SSE/n− 2 –

Total SSTO n− 1 – –

For the advertising example, n = 7, then

Source SS df MS F

Ad expenditure

Error 0.86 –

Total 41.99 – –

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 46 / 63

Page 47: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Coefficient of Determination R2

Recall that

I SSTO measures the variation in Yi about Y (which does not take into account

Xi).

I SSE measures variation in Yi after accounting for linear relationship between

X and Y .

I SSTO − SSE = SSR is a measure of the reduction of variation due to

regression of Y on X.

Define a coefficient of determination as

R2 =SSR

SSTO= 1− SSE

SSTO

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 47 / 63

Page 48: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Coefficient of Determination R2

In the advertising example,

R2 =41.13

41.99= 0.9791

Interpret R2 as the proportion of variation in the Yi’s explained by the linear

regression relationship between X and Y

0 ≤ R2 ≤ 1.

Reported as the “multiple R-squared” in R summary output.

Relation to the sample correlation coefficient for simple linear regression

model (only):

r = sign(β1)√R2

Can you show that?

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 48 / 63

Page 49: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Limitations of R2

High R2 does not guarantee that useful predictions can be made.

Low R2 does not imply lacks of associations.I You can get R2 near zero even when there is a strong (or perfect) relationship

between X and Y .

F Sample correlation only measures LINEAR relationshipF E.g., X ∼ N(0, 1), Y = sin(X2), try yourself

I Outliers.

Alternative measure: Spearmen correlation (using nonparametric statistics),

Lowess R2 etc.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 49 / 63

Page 50: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Correlation Analysis

Correlation analysis (Section 2.11 of KNNL) is closely related to regression

analysis

Regression analysis

I One variable is the response Y

I One variable is the predictor X

I Model conditional distribution of Y given X

I Distribution of X is not relevant

I Y given X and X given Y are not the same

Correlation analysis

I Both variables are response variables

I Want to measure association between two variables

I ρX,Y and ρY,X are the same

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 50 / 63

Page 51: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Bivariate Normal Distribution

The bivariate normal distribution is an example of a joint distribution for two

continuous random variables

We say X and Y have a bivariate normal distribution with parameters

µx, µy, σ2x, σ

2y, ρ if the probability density is

Interpretation of parameters

1 µx = mean of X

2 µy = mean of Y

3 σ2x = variance of X

4 σ2y = variance of Y

5 ρ= correlation of X and Y

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 51 / 63

Page 52: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Definition of the Correlation Coefficient

ρ = Cov(X,Y )/(σxσy) is called the correlation coefficient

Recall that Var(X) = E{(X − µx)2} and Cov(X,Y ) =?

Properties

1 ρ ∈ [−1, 1]

2 |ρ| = 1 implies that X and Y have a perfect linear relationship (perfect

correlation)

3 Independence of X and Y implies that ρ = 0

4 ρ = 0, on the other hand, implies independence (no linear relationship) when

(X,Y ) is bivariate normal

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 52 / 63

Page 53: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Bivariate Normal Distribution

Density is constant on ellipses (contour plots)

Marginal distributions are normal that

Conditional distributions are normal that

Note the relationship of the conditional distribution of Y given X = x to

simple linear regression model

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 53 / 63

Page 54: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Bivariate Normal Distribution

Two motivations for simple linear regression

1 Bivariate normal observations

2 x is fixed (not necessarily normal) and Y |x is normal

We can relate the bivariate normal and simple linear regression parameters:

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 54 / 63

Page 55: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Inference for a Correlation Coefficient

The maximum likelihood estimate is the sample correlation coefficient

(Pearson correlation)

This estimate replaces population quantities with sample quantities

Test H0 : ρ = 0 versus H1 : ρ 6= 0

I Equivalent to testing β1 = 0 in regression

I t = r√n− 1/

√1− r2 has t-distribution with n− 2 df

I This is exactly the t-test for H0 : β1 = 0

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 55 / 63

Page 56: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Inference for a Correlation Coefficient

Remaining inference procedures assume bivariate normality and rely on

Fisher’s z-transformation

Z =1

2log

(1 + r

1− r

)Z ∼ N(log((1 + ρ)/(1− ρ))/2, 1/(n− 3))

Var(Z) does not depend on ρ!

Good approximation when n > 25, why?

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 56 / 63

Page 57: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Inference for a Correlation Coefficient

Construct an 100(1− α)% confidence interval for ρ

CI for log((1 + ρ)/(1− ρ))/2 is

Obtain an approximate confidence interval for ρ by applying the inverse

transformation to the ends of the previous confidence interval

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 57 / 63

Page 58: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Inference for a Correlation Coefficient

Test H0 : ρ = ρ0 versus H1 : ρ 6= ρ0

The test statistic is

√n− 3

(1

2log

(1 + r

1− r

)− 1

2log

(1 + ρ0

1− ρ0

))Obtain the p-value from comparison to a standard normal distribution

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 58 / 63

Page 59: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Example: Yields of Broadbalk Wheat (bu/acre)Source: R. A. Fisher, Statistical Methods for Research Workers, 14th ed. page

137.

Same two plots used in each of n=12 years

I Plot 1: fertilized with nitrate of soda, Xi=yield in i-th year

I Plot 2: same amount of N as sulfate of ammonia, Yi = yield in i-th year

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 59 / 63

Page 60: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 60 / 63

Page 61: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Grain Example

Summary statistics

Sample size: n = 12

Sample Means: x = 35.1825 and y = 29.3541

Sums of Squares:∑12i=1(xi − x)2 = 346.184,

∑12i=1(yi − y)2 = 612.285

Sums of Crossproducts:∑12i=1(xi − x)(yi − y) = 238.5449

Sample correlation: r = 0.518

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 61 / 63

Page 62: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Test H0 : ρ = 0 versus H1 : ρ 6=0

t = (0.518√

12− 2)/(√

1− 0.5182) = 2.24 where t ∼ t12−2 and p-value is

0.0844

Conclusion: There is some evidence of a positive correlation in yields, but it

is not conclusive. Why?

I n = 12 is a small sample size

I The pair of yields observed in 1877 is somewhat inconsistent with the pattern

observed in other years. Check the accuracy of the 1877 data.

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 62 / 63

Page 63: STAT 540: Data Analysis and Regressionriczw/teach/STAT540_F15/...Proof: We have already shown that the estimators are linear since We have already shown that b 1 is unbiased, i.e.,

Confidence Interval for ρ

Apply the Fisher z-transformation

z =1

2log

(1 + 0.518

1− 0.518

)= 0.5736

zlower = 0.5736− (1.96)√

1/9 = −0.0797

zupper = 0.5736 + (1.96)√

1/9 = 1.2269

Apply the inverse transformation

(−1 + exp(2(−0.0797))

1 + exp(2(−0.0797)),−1 + exp(2(1.2269))

1 + exp(2(1.2269))

)⇒ (−0.0795, 0.8417)

W. Zhou (Colorado State University) STAT 540 July 6th, 2015 63 / 63


Recommended