An Overview
Inferences about β1.
1 Sampling distribution of β1.
2 Sampling distribution of {β1 − β1}/√
var{β1}.3 Confidence interval for β1.
4 Hypothesis testing.
Inferences about β0.
Estimation and prediction (with respect to some x0).
ANOVA approach.
Coefficient of determination: R2.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 63
Inferences about β1
Recall simple linear regression model
Yi = β0 + β1Xi + εi, εi ∼ iid N(0, σ2),
for i = 1, . . . , n.
Recall that LS and ML estimate of β1 is
β1 =
∑ni=1(Xi − X)(Yi − Y )∑n
i=1(Xi − X)2
As we will show, β1 is normal with
E(β1) = β1 and V ar(β1) =σ2∑n
i=1(Xi − X)2
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 2 / 63
Preliminary Results Concerning β1
Note that
β1 =
∑ni=1(Xi − X)(Yi − Y )∑n
i=1(Xi − X)2
=
∑ni=1(Xi − X)Yi∑ni=1(Xi − X)2
=
n∑i=1
Xi − X∑ni=1(Xi − X)2
Yi
Express β1 as a linear combination of {Yi}:
β1 =
where
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 3 / 63
Preliminary Results Concerning β1
Now we compute∑ni=1 ki,
∑ni=1 kiXi, and
∑ni=1 k
2i .
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 4 / 63
Sampling Distribution of β1
β1 follows a normal distribution. Why?
Mean of β1, E(β1) is
Variance of β1, V ar(β1) is
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 5 / 63
Completing Proof of Gauss-Markov Theorem
Theorem: Under the simple linear regression model with the residuals being mean
zero, constant variance (but not necessarily normal), β0 and β1 are BLUE (Best
Linear Unbiased Estimators) because they have minimum variance among all
linear unbiased estimators.
Proof:
We have already shown that the estimators are linear since
We have already shown that β1 is unbiased, i.e., E(β1) = β1. Now
It remains to show the minimum variance among all linear estimators.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 6 / 63
V ar(β1) is minimal among unbiased linear estimators
Let an arbitrary linear unbiased estimator be of the form β1 =∑ni=1 ciYi
where ci are constants that satisfy
Note that Var{β1} is
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 7 / 63
V ar(β1) is minimal among unbiased linear estimators
Also note that∑ni=1 kidi = 0. Why? Show it below.
Thus
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 8 / 63
Estimate of V ar(β1)
Recall that
Var{β1} = σ2
(n∑i=1
(Xi − X)2
)−1
The estimated variance is
Var{β1} =
∑ni=1(Yi − Yi)2/(n− 2)∑n
i=1(Xi − X)2=
MSE∑ni=1(Xi − X)2
.
We will showβ1 − β1√Var{β1}
∼ tn−2.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 9 / 63
Sampling Distribution of β1−β1√Var{β1}
Definition: If Zi ∼ i.i.d. N(µi, σ2i ) for i = 1, . . . , k then∑k
i=1
(Zi−µi
σi
)2
∼ χ2k (Chi-square distribution with k degrees of freedom).
Definition (KNNL A.44): A t random variable with ν degrees of freedom
results from the expression
tν =z√qν/ν
where z and qν are independent standard normal and χ2ν random variables,
respectively.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 10 / 63
Sampling Distribution of β1−β1√Var{β1}
Note thatβ1 − β1√Var{β1}
=β1 − β1√
σ2∑ni=1(Xi−X)2
∼ N(0, 1).
Also
Var{β1}Var{β1}
=
MSE∑ni=1(Xi−X)2
σ2∑ni=1(Xi−X)2
=MSE
σ2=
SSE
(n− 2)σ2∼χ2n−2
n− 2.
Thus,
β1 − β1√Var
{β1
} =
β1−β1√Var{β1}√Var{β1}Var{β1}
∼ tn−2
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 11 / 63
Sampling Distribution of β1−β1√Var{β1}
The last conclusion on the previous slide only holds if SSE/σ2 is
independent of β0 and β1.
This is given in Theorem (2.11) of KNNL for the simple linear regression
model and is proven in general in STAT640.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 12 / 63
Confidence Interval for β1
Denote
√Var
{β1
}= s{β1}. Recall that
β1 − β1
s{β1}∼ tn−2.
The (1− α) CI for β1 is
β1 ± t(1− α/2;n− 2)s{β1}
where t(1− α/2;n− 2) is the (1− α/2)100th percentile of tn−2.
See KNNL pp.1317–1318 Table B.2. (Or use ± qt(.025,df))
Interpretation of CI. For example, a 95% CI for β1 is (–,–).
I If we repeated the study 100 times and created 100 CI’s for β1, we would
expect that 95 of these intervals would include the true value of β1.
I The method used to construct this interval has a 5% error rate.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 13 / 63
Confidence Interval for β1
In the advertising example, recall that
n∑i=1
Xi = 24.40,
n∑i=1
X2i = 107.42,
n∑i=1
XiYi = 154.07
n∑i=1
Yi = 35.50,
n∑i=1
Y 2i = 222.03.
Thus
β1 =154.07− 24.40× 35.50/7
107.42− 24.4× 24.40/7=
30.33
22.37= 1.356
β0 = 35.50/7− 1.356× 24.40/7 = 0.345
Compute MSE
MSE =SSE
n− 2=
∑ni=1(Yi − Yi)2
n− 2=
0.86
5= 0.172
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 14 / 63
Confidence Interval for β1
Compute standard deviation estimate
s{β1} =
√MSE∑n
i=1(Xi − X)2=
√0.172
22.37= 0.0877
For a 95% CI, α = 0.05 and
t(1− α/2;n− 2) = t(0.975; 5) = 2.571
The 95% CI for β1 is
β1 ± t(1− α/2;n− 2)s{β1} = 1.356± 2.571× 0.0877
= 1.356± 0.225
= (1.13, 1.58).
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 15 / 63
Review of Hypothesis Testing
Recall two types of errors.
I Type I: Reject H0 when H0 is true.
I Type II: Fail to reject H0 when H0 is false.
Level of significance α = P (Type I error).
Power = 1− P (Type II error) = 1− β.
Recall p-value.
I A p-value is the probability of observing a sample outcome as extreme or more
extreme than the observed outcome under the assumption that H0 is true.
I Small p-value provides evidence against H0.
I It is misleading to say that p-value = 0. Use p-value ≤ 0.0001.
When we choose α, we control P[type I error] but it will affect β, too. We
can’t choose both α and β (without manipulating n), so we choose to
control α (more important).
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 16 / 63
Review of Hypothesis Testing
1-sided versus 2-sided test.
CI versus hypothesis testing: For the t-test given above, we could state the
conclusion of an α-level test in terms of a (1− α)100% CI. If 0 is contained
in the (1− α)100% CI, then we fail to reject H0.
When writing up a hypothesis test for this class, always include
I Hypothesis in statistical and practical terms.
I Test statistic.
I Decision rule and p-value.
I Conclusion in the context of a problem. Use wording such as “reject H0” or
“fail to reject H0”. Do not use “accept H0”.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 17 / 63
Hypothesis Testing for β1
A test of interest (why?) is:
H0 : β1 = 0 versus Ha : β1 6= 0.
Recall thatβ1 − β1
s{β1}∼ tn−2
Thus an α-level test is based on the test statistic
t∗ =β1 − 0
s{β1}.
Decision rule: If |t∗| > t(1− α/2;n− 2), reject H0; otherwise do not reject
H0.
p-value = 2× P (T > |t∗|) where T ∼ tn−2.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 18 / 63
Hypothesis Testing for β1
Revisit the advertising example and test
H0 : β1 = 0 versus Ha : β1 6= 0.
The test statistic is
t∗ =β1
s{β1}=
1.356
0.0877= 15.46.
Compared with t5, the p-value is
2× P (t5 > 15.46) < 0.0001.
Thus reject H0 and there is strong evidence that there is a positive line
relationship between advertising expenditure and sales.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 19 / 63
Hypothesis Testing for β1
In general, for testing H0 : β1 = βh versus Ha : β1 6= βh, use the test statistic
t∗ =β1 − βhs{β1}
.
and proceed as before.
For testing H0 : β1 ≤ βh versus Ha : β1 > βhI Decision rule: If t∗ > t(1− α;n− 2), reject H0; otherwise do not reject H0.
I p-value = P (T > t∗) where T ∼ tn−2
For testing H0 : β1 ≥ βh versus Ha : β1 < βhI Decision rule: If t∗ < t(α;n− 2), reject H0; otherwise do not reject H0.
I p-value = P (T < t∗) where T ∼ tn−2
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 20 / 63
Inferences about β0
Recall that
β0 = Y − β1X.
It can be shown that β0 is normal with
E(β0) = β0 and σ2{β0} = V ar(β0) = σ2
[1
n+
X2∑ni=1(Xi − X)2
].
(Left as HW)
Estimated variance is
s2{β0} = MSE
[1
n+
X2∑ni=1(Xi − X)2
].
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 21 / 63
Inferences about β0
It can be shown that the sampling distribution is
β0 − β0
s{β0}∼ tn−2.
CIs and hypothesis tests for β0 follow as those for β1.
Note the case of β0 = 0.
In practice, never drop β0 from the model unless there is a scientific reason.
However, rarely is one interested in the actual value of β0
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 22 / 63
Types of Prediction and Estimation
Estimate the mean response E(Yh) for a given level of X = Xh.
Predict a new observation Yh(new) for a given level of X = Xh.
Predict the mean of m new observations all at a given level of Xh.
Estimate confidence band for regression line for several (or all) Xh’s.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 23 / 63
Data Example: muscle mass (HW 1 (7))
Recall that Y = muscle mass and X = age with the fitted regression line
Y = 156.35− 1.19X
What is the population mean measure of muscle mass for a 55-year-old
person?
What should we predict for the muscle mass for a 55-year-old person
randomly selected from the population?
In both cases, the estimate is
Y = 156.35− 1.19× 55 = 90.9
but uncertainty is larger in the second case.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 24 / 63
Estimation of E(Yh)
Let Xh be the level of X for which we want to estimate the mean response.
Xh could be observed or not, but should be within the range of {Xi}.
E(Yh) = the mean response at Xh.
The estimate of E(Yh) is
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 25 / 63
Derivation of V ar(Yh)
Three results are used in the derivation.
Yh =
V ar(a1Y1 + a2Y2) =
Cov(Y , β1) =
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 26 / 63
Derivation of V ar(Yh)
Thus, σ2{Yh} is
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 27 / 63
Inference for E(Yh)
Variance of Yh is
σ2{Yh} = V ar(Yh) = σ2
[1
n+
(Xh − X)2∑ni=1(Xi − X)2
].
Estimated variance is
s2{Yh} = MSE
[1
n+
(Xh − X)2∑ni=1(Xi − X)2
].
Note thatYh − E(Yh)
s{Yh}∼ tn−2.
The (1− α) CI for E(Yh) is
Yh ± t(1− α/2;n− 2)s{Yh}
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 28 / 63
Inference for E(Yh)
In the advertising example, suppose Xh = 6.
The estimate of the mean sales at Xh = 6 is
Yh = β0 + β1Xh = 0.345 + 1.356× 6 = 8.48.
The estimated standard deviation is
s{Yh} =
√MSE
[1
n+
(Xh − X)2∑ni=1(Xi − X)2
]
=√
0.172×√
1
7+
(6− 3.486)2
22.37= 0.271.
The 95% CI for the mean sales at Xh = 6 is
Yh ± t(1− α/2;n− 2)s{Yh} = 8.48± 2.571× 0.271
= 8.48± 0.70 = (7.78, 9.18).
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 29 / 63
Inference for Yh(new)
Xh = the “new” value of X.
I In the previous case, Xh might also be “new” in the sense that it was not a
value in the dataset. But here we talking about a new or hypothetical single
person (i.e., experimental/observational unit).
Yh(new) = the “new” response (as yet unobserved).
The best point prediction of Yh(new) is
Predicts new individual to be the mean for everyone else with Xh
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 30 / 63
Inference for Yh(new)
Best estimate of prediction error is
Yh(new} − Yh
Note Yh(new} and Yh are independent
Variance of the prediction error σ2{pred} is
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 31 / 63
Inference for Yh(new)
Estimated variance is
s2{pred} = MSE
[1 +
1
n+
(Xh − X)2∑ni=1(Xi − X)2
].
Note thatYh − Yh(new)
s{pred}∼ tn−2
The (1− α) prediction interval (PI) for Yh(new) is
Yh ± t(1− α/2;n− 2)s{pred}
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 32 / 63
Prediction of Yh(new)
In the advertising example, again suppose Xh = 6.
The predicted Yh(new) is
Yh = β0 + β1Xh = 0.345 + 1.356× 6 = 8.48.
The estimated standard deviation of the prediction error is
s{pred} =
√MSE
[1 +
1
n+
(Xh − X)2∑ni=1(Xi − X)2
]
=√
0.172×√
1 +1
7+
(6− 3.486)2
22.37= 0.495.
The 95% PI for Yh(new) is
Yh ± t(1− α/2;n− 2)s{pred} = 8.48± 2.571× 0.495
= 8.48± 1.27 = (7.21, 9.75).
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 33 / 63
Analysis of Variance (ANOVA) Approach
The idea is to partition the variation into
SS Total = SS Model + SS Error
Why partition the variation?
1 Weigh different sources of variation.
2 Hypothesis testing.
3 Comparison of models.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 34 / 63
Partitioning Deviation of Each Observation
Yi − Y︸ ︷︷ ︸total dev
= Yi − Y︸ ︷︷ ︸dev of fitted from mean
+ Yi − Yi︸ ︷︷ ︸dev of obs from fitted
.
If {Yi − Y } are large in relation to {Yi − Yi}, then the regression relation
explains (or accounts for) a large proportion of the total variation in {Yi}.
If {Yi − Y } are small in relation to {Yi − Yi}, then the regression relation
explains (or accounts for) a small proportion of the total variation in {Yi}.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 35 / 63
Partitioning Total Sum of Squares
n∑i=1
(Yi − Y )2
︸ ︷︷ ︸SSTO
=
n∑i=1
(Yi − Y )2
︸ ︷︷ ︸SSR
+
n∑i=1
(Yi − Yi)2
︸ ︷︷ ︸SSE
.
SSTO =∑ni=1(Yi − Y )2 is the total sum of squares.
I A measure of total variation in the data (compare to variance).
SSR =∑ni=1(Yi − Y )2 is the regression sum of squares.
I The larger the SSR in relation to SSTO, the larger the proportion of variability
in the Yi’s accounted for by the regression relation.
SSE =∑ni=1(Yi − Yi)2 is the error sum of squares.
I The greater the variation of the Yi’s around the fitted regression line, the
larger the SSE.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 36 / 63
Partitioning Total Sum of Squares
SSTO = SSR + SSE
where
1
SSTO =
n∑i=1
(Yi − Y )2 =
n∑i=1
Y 2i −
1
n(
n∑i=1
Yi)2
df = n− 1
2
SSR =
n∑i=1
(Yi − Y )2 = β21
n∑i=1
(Xi − X)2 = β21
[n∑i=1
X2i −
1
n(
n∑i=1
Xi)2
]df = 1
3
SSE =
n∑i=1
(Yi − Yi)2 = SSTO− SSR
df = n− 2W. Zhou (Colorado State University) STAT 540 July 6th, 2015 37 / 63
Partitioning Degrees of Freedom
n∑i=1
(Yi − Y )2
︸ ︷︷ ︸df=n−1
=
n∑i=1
(Yi − Y )2
︸ ︷︷ ︸df=1
+
n∑i=1
(Yi − Yi)2
︸ ︷︷ ︸df=n−2
.
SSTO df = n− 1: Y is used to estimate µY .
SSE df = n− 2: β0, β1 are used to estimate β0, β1.
Reasons to partition df?
I Compute MSE and MSR.
I See ST640.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 38 / 63
Expected Mean Squares E(MSE)
Define
MSE =SSE
n− 2
Since SSE/σ2 ∼ χ2(n− 2), we have
E(MSE) = σ2.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 39 / 63
Expected Mean Squares E(MSR)
Define MSR = SSR1 , recall SSR = β2
1
∑ni=1(Xi − X)2, we have
E(β21) =
σ2∑ni=1(Xi − X)2
+ β21
Why?
Thus
E(MSR) = σ2 + β21
n∑i=1
(Xi − X)2.
I Observe that when β1 = 0, E(MSR) = σ2.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 40 / 63
Expected Mean Squares
Thus for testing H0 : β1 = 0 versus Ha : β1 6= 0, use the test statistic
SSR/1
SSE/(n− 2)=MSR
MSE
It can be shown that under H0 : β1 = 0,
F ∗ =MSR
MSE∼ F1,n−2.
Thus we can perform an F -test instead of a t-test.
In fact, T ∼ tν then T 2 ∼ F1,ν
I Thus the F -test is equivalent to a two-sided t-test for H0 : β1 = 0 versus
Ha : β1 6= 0.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 41 / 63
Example SSTO
SSTO =
n∑i=1
(Yi − Y )2
=
n∑i=1
Y 2i −
1
n(
n∑i=1
Yi)2
= 222.03− 1
7(35.50)2
= 41.99
df = n− 1 = 6
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 42 / 63
Example SSR and SSE
SSR =
n∑i=1
(Yi − Y )2
= β21
[n∑i=1
X2i −
1
n(
n∑i=1
Xi)2
]= 1.3562 × (107.42− 24.402/7)
= 41.13
df = 1
SSE =
n∑i=1
(Yi − Yi)2
= SSTO− SSR
= 41.99− 41.13 = 0.86
df = n− 2 = 5
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 43 / 63
General Linear Test Approach
Consider the full model
Yi = β0 + β1Xi + εi, εi ∼ iid N(0, σ2)
and obtain SSE(F).
Consider the reduced model when β1 = 0
Yi = β0 + εi, εi ∼ iid N(0, σ2)
and obtain SSE(R).
It can be shown that SSE(F ) ≤ SSE(R) (intuitively, why?)
In addition, under H0 : β1 = 0,
F ∗ =
SSE(R)−SSE(F )dfR−dfFSSE(F )dfF
∼ F (dfR − dfF , dfF )
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 44 / 63
Example
To test H0 : β1 = 0, the F test statistic is
F ∗ =MSR
MSE=
41.13
0.172= 239.13
Compare with F (1, 5) and the p-value is
P (F (1, 5) > F ∗) = P (F (1, 5) > 239.13) < 0.0001.
Same conclusion as in the t test.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 45 / 63
ANOVA Table
Summarize results using an ANOVA table.
Source SS df MS F
Regression (X) SSR 1 SSR/1 MSR/MSE
Error SSE n− 2 SSE/n− 2 –
Total SSTO n− 1 – –
For the advertising example, n = 7, then
Source SS df MS F
Ad expenditure
Error 0.86 –
Total 41.99 – –
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 46 / 63
Coefficient of Determination R2
Recall that
I SSTO measures the variation in Yi about Y (which does not take into account
Xi).
I SSE measures variation in Yi after accounting for linear relationship between
X and Y .
I SSTO − SSE = SSR is a measure of the reduction of variation due to
regression of Y on X.
Define a coefficient of determination as
R2 =SSR
SSTO= 1− SSE
SSTO
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 47 / 63
Coefficient of Determination R2
In the advertising example,
R2 =41.13
41.99= 0.9791
Interpret R2 as the proportion of variation in the Yi’s explained by the linear
regression relationship between X and Y
0 ≤ R2 ≤ 1.
Reported as the “multiple R-squared” in R summary output.
Relation to the sample correlation coefficient for simple linear regression
model (only):
r = sign(β1)√R2
Can you show that?
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 48 / 63
Limitations of R2
High R2 does not guarantee that useful predictions can be made.
Low R2 does not imply lacks of associations.I You can get R2 near zero even when there is a strong (or perfect) relationship
between X and Y .
F Sample correlation only measures LINEAR relationshipF E.g., X ∼ N(0, 1), Y = sin(X2), try yourself
I Outliers.
Alternative measure: Spearmen correlation (using nonparametric statistics),
Lowess R2 etc.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 49 / 63
Correlation Analysis
Correlation analysis (Section 2.11 of KNNL) is closely related to regression
analysis
Regression analysis
I One variable is the response Y
I One variable is the predictor X
I Model conditional distribution of Y given X
I Distribution of X is not relevant
I Y given X and X given Y are not the same
Correlation analysis
I Both variables are response variables
I Want to measure association between two variables
I ρX,Y and ρY,X are the same
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 50 / 63
Bivariate Normal Distribution
The bivariate normal distribution is an example of a joint distribution for two
continuous random variables
We say X and Y have a bivariate normal distribution with parameters
µx, µy, σ2x, σ
2y, ρ if the probability density is
Interpretation of parameters
1 µx = mean of X
2 µy = mean of Y
3 σ2x = variance of X
4 σ2y = variance of Y
5 ρ= correlation of X and Y
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 51 / 63
Definition of the Correlation Coefficient
ρ = Cov(X,Y )/(σxσy) is called the correlation coefficient
Recall that Var(X) = E{(X − µx)2} and Cov(X,Y ) =?
Properties
1 ρ ∈ [−1, 1]
2 |ρ| = 1 implies that X and Y have a perfect linear relationship (perfect
correlation)
3 Independence of X and Y implies that ρ = 0
4 ρ = 0, on the other hand, implies independence (no linear relationship) when
(X,Y ) is bivariate normal
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 52 / 63
Bivariate Normal Distribution
Density is constant on ellipses (contour plots)
Marginal distributions are normal that
Conditional distributions are normal that
Note the relationship of the conditional distribution of Y given X = x to
simple linear regression model
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 53 / 63
Bivariate Normal Distribution
Two motivations for simple linear regression
1 Bivariate normal observations
2 x is fixed (not necessarily normal) and Y |x is normal
We can relate the bivariate normal and simple linear regression parameters:
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 54 / 63
Inference for a Correlation Coefficient
The maximum likelihood estimate is the sample correlation coefficient
(Pearson correlation)
This estimate replaces population quantities with sample quantities
Test H0 : ρ = 0 versus H1 : ρ 6= 0
I Equivalent to testing β1 = 0 in regression
I t = r√n− 1/
√1− r2 has t-distribution with n− 2 df
I This is exactly the t-test for H0 : β1 = 0
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 55 / 63
Inference for a Correlation Coefficient
Remaining inference procedures assume bivariate normality and rely on
Fisher’s z-transformation
Z =1
2log
(1 + r
1− r
)Z ∼ N(log((1 + ρ)/(1− ρ))/2, 1/(n− 3))
Var(Z) does not depend on ρ!
Good approximation when n > 25, why?
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 56 / 63
Inference for a Correlation Coefficient
Construct an 100(1− α)% confidence interval for ρ
CI for log((1 + ρ)/(1− ρ))/2 is
Obtain an approximate confidence interval for ρ by applying the inverse
transformation to the ends of the previous confidence interval
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 57 / 63
Inference for a Correlation Coefficient
Test H0 : ρ = ρ0 versus H1 : ρ 6= ρ0
The test statistic is
√n− 3
(1
2log
(1 + r
1− r
)− 1
2log
(1 + ρ0
1− ρ0
))Obtain the p-value from comparison to a standard normal distribution
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 58 / 63
Example: Yields of Broadbalk Wheat (bu/acre)Source: R. A. Fisher, Statistical Methods for Research Workers, 14th ed. page
137.
Same two plots used in each of n=12 years
I Plot 1: fertilized with nitrate of soda, Xi=yield in i-th year
I Plot 2: same amount of N as sulfate of ammonia, Yi = yield in i-th year
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 59 / 63
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 60 / 63
Grain Example
Summary statistics
Sample size: n = 12
Sample Means: x = 35.1825 and y = 29.3541
Sums of Squares:∑12i=1(xi − x)2 = 346.184,
∑12i=1(yi − y)2 = 612.285
Sums of Crossproducts:∑12i=1(xi − x)(yi − y) = 238.5449
Sample correlation: r = 0.518
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 61 / 63
Test H0 : ρ = 0 versus H1 : ρ 6=0
t = (0.518√
12− 2)/(√
1− 0.5182) = 2.24 where t ∼ t12−2 and p-value is
0.0844
Conclusion: There is some evidence of a positive correlation in yields, but it
is not conclusive. Why?
I n = 12 is a small sample size
I The pair of yields observed in 1877 is somewhat inconsistent with the pattern
observed in other years. Check the accuracy of the 1877 data.
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 62 / 63
Confidence Interval for ρ
Apply the Fisher z-transformation
z =1
2log
(1 + 0.518
1− 0.518
)= 0.5736
zlower = 0.5736− (1.96)√
1/9 = −0.0797
zupper = 0.5736 + (1.96)√
1/9 = 1.2269
Apply the inverse transformation
(−1 + exp(2(−0.0797))
1 + exp(2(−0.0797)),−1 + exp(2(1.2269))
1 + exp(2(1.2269))
)⇒ (−0.0795, 0.8417)
W. Zhou (Colorado State University) STAT 540 July 6th, 2015 63 / 63