Mahler’s Guide to Regression · 30 284-295 Durbin-Watson Statistic 31 296-302 Correcting for...

Mahler’s Guide to

Regression

Sections 1-4:

1 Fitting a Straight Line with No Intercept2 Fitting a Straight Line with an Intercept3 Residuals4 Dividing the Sum of Squares into Two Pieces

VEE-Applied Statistical Methods Exam

prepared by

Howard C. Mahler, FCASCopyright 2006 by Howard C. Mahler.

Study Aid F06-Reg-A

New England Actuarial Seminars Howard MahlerPOB 315 [email protected], MA, 02067www.neas-seminars.com

Mahler’s Guide to RegressionCopyright 2006 by Howard C. Mahler.

While these study guides were written for the VEE-Applied Statistical Methods Exam given by the Casualty Actuarial Society, they should be of value to anyone learning Regression.They should also help those trying to refresh their memories about a particular idea.There is no knowledge specific to actuarial work assumed.

The material on the regression portion of the VEE-Applied Statistical Methods Exam is covered.1 The material on time series, that is also on this exam, is not covered.2

Information in bold or sections whose title is in bold are more important for passing the exam. Larger bold type indicates it is extremely important.

Starred sections, subsections, and questions should not be needed to directly answer exam questions and should be skipped on first reading. It is provided to aid the reader’s overall understanding of the subject, and to be useful in practical applications.

For those who have trouble getting through the material, concentrate on the sections in bold.

Highly Recommended problems (about 1/6 of the total) are double underlined.Recommended problems (about 1/6 of the total) are underlined. Do at least the Highly Recommended problems your first time through.It is important that you do problems when learning a subject and then some more problems a few weeks later.

The points assigned to each problem are based on 100 points for a four hour exam.1 point problems are shorter than typical exam questions.2 and 3 point problems are similar in length to typical exam questions.4 point problems are longer than typical exam questions.Solutions to problems are given at the end.3

The following tables will be provided to the candidate with the exam: Normal Distribution, Chi-square Distribution, t-Distribution, and F-Distribution.

1 Econometric Models and Economic Forecasts, by Pindyck and Rubinfeld.Chapters 1, 3, 4, 5, 6 (excluding Appendix 6.1), and Sections 8.1, 8.2, 10.1. Sections 8.1, 8.2, and 10.1, covered in my Sections 29-31 and 38, were added to the syllabus in 2005.2 Econometric Models and Economic Forecasts, by Pindyck and Rubinfeld.Chapters 15, 16 (excluding Appendix 16.1), 17( excluding Appendix 17.1), and 18, cover time series.3 Note that problems include both some written by me and some from past exams. The latter are copyright by the Casualty Actuarial Society and Society of Actuaries, or the Institute of Actuaries and Faculty of Actuaries, and are reproduced here solely to aid students in studying for exams. The solutions and comments are solely the responsibility of the author. While some of the comments may seem critical of certain questions, this is intended solely to aid you in studying and in no way is intended as a criticism of the many volunteers who work extremely long and hard to produce quality exams. In some cases I’ve rewritten these questions in order to match the notation in the current Syllabus.

HCMSA-F06-Reg-A, Mahler’s Guide to Regression, 7/11/06, Page 1

Section # Pages Section Name

A 1 5 - 1 7 Fitting a Straight Line with No Intercept2 1 8 - 2 9 Fitting a Straight Line with an Intercept3 3 0 - 3 2 Residuals4 3 3 - 4 1 Dividing the Sum of Squares into Two Pieces

B 5 4 2 - 5 1 R-Squared6 5 2 - 5 9 Corrected R-Squared7 6 0 - 6 7 Normal Distribution8 6 8 - 7 1 Assumptions of Linear Regression9 7 2 - 8 4 Properties of Estimators

C 1 0 8 5 - 9 5 Variances and Covariances1 1 9 6 - 1 0 2 t-Distr ibut ion1 2 1 0 3 - 1 1 4 t - tes t1 3 1 1 5 - 1 2 1 Confidence Intervals for Estimated Parameters

D 1 4 1 2 2 - 1 3 5 F Distribution1 5 1 3 6 - 1 4 7 Testing the Slope, Two Variable Model1 6 1 4 8 - 1 5 5 Hypothesis Testing1 7 1 5 6 - 1 5 8 A Simulation Experiment *

E 1 8 1 5 9 - 1 6 7 Three Variable Regression Model1 9 1 6 8 - 1 8 0 Matrix Form of Multiple Regression2 0 1 8 1 - 1 9 3 Tests of Slopes, Multiple Regression2 1 1 9 4 - 2 0 7 Additional Tests of Slopes

F 2 2 2 0 8 - 2 2 5 Additional Models2 3 2 2 6 - 2 3 6 Dummy Variables2 4 2 3 7 - 2 4 0 Piecewise Linear Regression

G 2 5 2 4 1 - 2 4 9 Weighted Regression2 6 2 5 0 - 2 5 6 Heteroscedasticity2 7 2 5 7 - 2 6 3 Tests for Heteroscedasticity2 8 2 6 4 - 2 7 4 Correcting for Heteroscedasticity

H 2 9 2 7 5 - 2 8 3 Serial Correlation3 0 2 8 4 - 2 9 5 Durbin-Watson Statistic3 1 2 9 6 - 3 0 2 Correcting for Serial Correlation3 2 3 0 3 - 3 0 6 Multicoll inearity

Table of Contents Continued on the Next Page


Section # Pages Section Name

I 3 3 3 0 7 - 3 2 3 Forecasting3 4 3 2 4 - 3 3 2 Testing Forecasts3 5 3 3 3 - 3 4 2 Forecasting with Serial Correlation

J 3 6 3 4 3 - 3 4 9 Standardized Coefficients3 7 3 5 0 - 3 5 4 Elasticity3 8 3 5 5 - 3 6 0 Partial Correlation Coefficients3 9 3 6 1 - 3 6 7 Regression Diagnostics *4 0 3 6 8 Stepwise Regression *4 1 3 6 9 Stochastic Explanatory Variables *4 2 3 7 0 - 3 7 3 Generalized Least Squares *4 3 3 7 4 - 3 8 5 Nonlinear Estimation

K 4 4 3 8 6 - 3 9 9 Generalized Linear Models*4 5 4 0 0 - 4 1 5 Important Ideas and Formulas

L 4 1 6 - 4 4 7 Solutions to Problems, Sections 1-12

M 4 4 8 - 4 8 7 Solutions to Problems, Sections 13-21

N 4 8 8 - 5 2 9 Solutions to Problems, Sections 22-32

O 5 3 0 - 5 6 9 Solutions to Problems, Sections 33-44

The CAS/SOA did not release the 5/02, 5/03, and 5/04 exams.Only the first VEE exam was released.

Sample Exam Q.40: statements C, D, and E are from chapter 7 of Pindyck & Rubinfeld, no longer on the Syllabus. 5/00, Q.16 and 11/00 Q.35 can be answered using ideas specifically discussed in chapter 7 of Pindyck & Rubinfeld, no longer on the syllabus. However, they can also be answered from first principles.


Course 4 and VEE Exam Questions by Section of this Study Aid

Section Sample 5 / 0 0 1 1 / 0 0 5 / 0 1 1 1 / 0 1 1 1 / 0 2 1 1 / 0 3 1 1 / 0 4 VEE 8/05

1 1 6 * 3 5 * 2 9 4234

5 2 9 5 30 96 3 1789 3 5

1 0 4 0 3 5 1 3

1 11 21 3 5 5 3 8

1 41 5 11 6

17 *

1 8 3 5 1 31 9 3 6 3 22 02 1 12 35 9 2 1 2 1 2 7 2 0 1 9 7

2 2 5 2 02 3 2 4 5 92 4

2 5 7 *2 62 7 1 12 8 3 1 2 1 2 8 2 3

2 93 0 3 0 2 4 1 43 1 1 2 3 33 2

3 3 2 53 4 63 5

3 6 3 7 1 33 7 2 73 8 5 1 2 1 1

3 9 *4 0 *4 1 *4 2 *4 3 1 0

4 4 * 3 4


Section 1, Fitting a Straight Line with No Intercept

Assume we have the following heights of eight fathers and their adult sons (in inches):4 Father Son53 5654 5857 6158 6061 6362 6263 6566 64

Here is a graph of this data:

54 56 58 60 62 64 66Father

56

58

60

62

64

Son

There appears to be a relationship between the height of the father, X, and the height of his son, Y. A taller father seems to be more likely to have a taller son.

4 There are only 8 pairs of observations solely in order to keep things simple.


Straight Line with No Intercept:

Let us assume Y = βX. We want to determine the “best” value of β. The most common way to do so is to minimize the sum of the squared differences between the height of each son estimated by our equation βXi, and the actual height of that son Yi.

5

Sum of Squared Errors = Σ(Yi - βXi)2.

Exercise: If β = 1.01, what is the sum of squared errors?[Solution: Σ(Yi - βXi)2 = (56 - 53.53)2 + (58 - 54.54)2 + (61 - 57.57)2 + (60 - 58.58)2 +

(63 - 61.61)2 + (62 - 62.62)2 + (65 - 63.63)2 + (64 - 66.66)2 = 43.12.]

Here is a graph of the sum of squared errors, as a function of β:

1.01 1.02 1.03 1.04 1.05 1.06

35

40

45

50

55

The smallest sum of squared errors corresponds to β ≅ 1.03. We refer to 1.03 as the least squares estimate of the slope, β.

We would determine the least squares estimate of β algebraically, by setting equal to zero the partial derivative with respect to β of the sum of squared errors.6 0 = ∂Σ(Yi - βXi)2/ ∂β = -2Σ(Yi - βXi)Xi. ⇒ 0 = ΣXiYi - ΣβXiXi. ⇒ βΣXi2 = ΣXiYi . ⇒β = ΣXiYi /ΣXi2.

5 Minimizing squared differences is not the only criterion that could be used. For example, one could instead minimize the absolute differences, which produces different results. See Figure 1.3 in Pindyck and Rubinfeld.See also Section 12.4.2 in Loss Models.6 We treat β as the only variable.


Exercise: Use the above equation in order to determine the least squares estimate of β.[Solution: ΣXiYi = (53)(56) + ... + (66)(64) = 29063. ΣXi2 = 532 + ... + 662 = 28228.

estimate of β = ΣXiYi /ΣXi2 = 29063/28228 = 1.02958 ≅ 1.03.]

The estimated value of beta is usually written with a ^ over it, ^β. In this case

^β = 1.03.

Estimated values of other quantities are written in a similar manner.We do not expect any model to exactly predict the height of a son from the height of his father, therefore we include an error term in the model. The model we have been using is usually written Y = βX + ε, or Yi = βXi + εi, where εi is an error term.

In general, for the least squares fit to the linear model with no intercept, Y = ββββX + εεεε:::: ^ββ = ΣΣΣΣXiYi /ΣΣΣΣXi2.

Here is a graph of the least squares line fit to the data on heights, with ^β = 1.03:

54 56 58 60 62 64 66Father

54

56

58

60

62

64

66

68

Son


Residuals:

The estimated height of the sons is written as ^Y.

îY = 1.03Xi. The difference between each

son’s height and his height estimated by the model is the error, referred to as the residual.Residual = actual - estimated.

The residual for son i is written as îε .

îε ≡ Yi -

îY .

Exercise: What are the residuals for the fitted model ^

iY = 1.03Xi?

[Solution: îε = 56 - 54.59, 58 - 55.62, 61 - 58.71, 60 - 59.74, 63 - 62.83, 62 - 63.86,

65 - 64.89, 64 - 67.98 = 1.41, 2.38, 2.29, .26, .17, -1.86, .11, -3.98.Comment: Note that these residuals do not sum to zero.7 ]

Here is a plot of these residuals:

54 56 58 60 62 64 66Father

-4

-3

-2

-1

1

2

Residual

Exercise: For the fitted model ^

iY = 1.03Xi, what is the sum of squared errors?

[Solution: Σ îε 2 = 1.412 + 2.382 + 2.292 + .262 + .172 + (-1.86)2 + .112 + (-3.98)2 = 32.3.

Comment: This matches the result shown previously in a graph.]

The sum of squared errors is referred to the Error Sum of Squares or ESS.8

ESS ≡≡≡≡ ΣΣΣΣ îεε 2 = ΣΣΣΣ (Yi -

îY )2 .

In this case, ESS = 32.3.

7 As will be discussed later, when there is an intercept, the residuals sum to zero.8 ESS is the sum of squared errors for a fitted model, as opposed to the sum of squared errors for any value of β.


Unbiased Estimator:

For the one variable linear regression model with no intercept, Yi = βXi + εi:

We assume E[εi]. Each error term has mean of zero.

Then E[Yi] = E[βXi + εi] = βXi.

^β = ΣXiYi /ΣXi2.

E[^β] = ΣXiE[Yi] /ΣXi2 = ΣXiβXi /ΣXi2 = βΣXi2 /ΣXi2 = β.

Thus, ^β is an unbiased estimator of the slope β.

Expected Value of Residuals:

îε = Yi -

îY = Yi -

^βXi.

E[ îε ] = E[Yi] - E[

^β]Xi = βXi - βXi = 0.

Thus the expected value of each residual is zero.

However, it is important to note that the observed residuals will usually be nonzero.One is interested in the variance of each residual around its expected value.

Variances of Residuals:*9

Assume we have: i Xi Var(εi) 1 1 12 3 53 8 10

Exercise: Fit the model Yi = βXi + εi.

[Solution: ^β = ΣXiYi /ΣXi2 = (Y1 + 3Y2 + 8Y3)/74.]

ε1 = Y1 - X1^β = Y1 - (1)(Y1 + 3Y2 + 8Y3)/74 = (73Y1 - 3Y2 - 8Y3)/74.

9 See 4, 11/03, Q.29.


Assuming εi and εj are independent,10 then Yi and Yj are independent.

Var[ ε1] = Var[(73Y1 - 3Y2 - 8Y3)/74] = (732Var[ε1] + 32Var[ε2] + 82Var[ε3])/742 =

(732(1) + 32(5) + 82(10))/742 = 1.098.

Note that since E[ ε1] = 0, E[ ε12] = Var[ ε1] = 1.098.

Exercise: What is Var[ ε2 ]?

[Solution: ε2 = Y2 - X2^β = Y2 - (3)(Y1 + 3Y2 + 8Y3)/74 = (65Y2 - 3Y1 - 24Y3)/74.

Var[ ε2 ] = Var[(65Y2 - 3Y1 - 24Y3)/74] = (652Var[ε2] + 32Var[ε1] + 242Var[ε3])/742 =

(652(5) + 32(1) + 242(10))/742 = 4.911.]

Formula for the Variance of the Residuals:*

One can derive a general formula for Var[ îε ] as follows.

E[Yi2] = Var[Yi] + E[Yi]2 = Var[εi] + β2Xi2.

Yi and Yj are independent ⇒ E[YiYj] = Cov[Yi, Yj] + E[Yi]E[Yj] = 0 + βXiβXj = β2XiXj, i ≠ j.

^β = ΣXiYi /ΣXi2.

E[^β2] = E[ΣΣXiYi XjYj]/ΣXi22 = ΣXi2Var[εi]/ΣXi22 + ΣΣβ2Xi2Xj2/ΣXi22 =

ΣXi2Var[εi]/ΣXi22 + β2.

E[Yj^β] = E[Yj ΣXiYi]/ΣXi2 = XjVar[εj]/ΣXi2 + β2ΣXjXi2/ΣXi2 = XjVar[εj]/ΣXi2 + Xjβ2.

îε = Yi -

^βXi.

E[ îε 2] = E[Yi2] + Xi2E[

^β2] - 2XiE[Yi

^β] =

Var[εi] + β2Xi2 + Xi2ΣXj2Var[εj]/ΣXj22 + Xi2β2 - 2Xi2Var[εi]/ΣXj2 - 2Xi2β2.

Var[ îε ] = E[ ^

iε 2] = Var[εi] + Xi2ΣXj2Var[εj]/ΣXj22 - 2Xi2Var[εi]/ΣXj2.

10 In the absence of serial correlation, we assume that the error terms are independent. Serial correlation will be discussed in a subsequent section.


Exercise: What is Var[ ε3 ]?

[Solution: Var[ ε3 ] = Var[ε3] + X32ΣXj2Var[εj]/ΣXj22 - 2X32Var[ε3]/ΣXj2 =

10 + 82(12)(1) + (32)(5) + (82)(10)/742 - (2)(82)(10)/74 = .720.

Alternately, ε3 = Y3 - X3^β = Y3 - (8)(Y1 + 3Y2 + 8Y3)/74 = (10Y3 - 8Y1 - 24Y2)/74.

Var[ ε3 ] = Var[(10Y3 - 8Y1 - 24Y2)/74] = (102Var[ε3] + 82Var[ε1] + 242Var[ε2])/742 =

(102(10) + 82(1) + 242(5))/742 = .720.]

If all of the Var[εi] are equal, Var[εi] = σ2, then:11

Var[ îε ] = Var[εi] + Xi2ΣXj2Var[εj]/ΣXj22 - 2Xi2Var[εi]/ΣXj2 =

σ2 + Xi2ΣXj2σ2/ΣXj22 - 2Xi2σ2/ΣXj2 = σ2(1 - Xi2/ΣXj2) = σ2ΣXj2/ΣXj2. j ≠ i

E[ îε 2] = Var[ ^

iε ] = σ2(1 - Xi2/ΣXj2).

E[ESS] = E[Σ îε 2] = ΣE[^

iε 2] = Σσ2(1 - Xi2/ΣXj2) = σ2(N - 1). i

Thus ESS/(N-1) is an unbiased estimator of σ2.

Covariances of Residuals:*

E[ ε1 ε2 ] = E[(Y1 - ^βX1)(Y2 -

^βX2)] = E[Y1Y2] + X1X2E[

^β2] - X2E[Y1

^β] - X1E[Y1

^β] =

β2X1X2 + X1X2ΣXi2Var[εi]/ΣXi22 + X1X2β2 - X2X1Var[ε1]/ΣXi2 - X1X2Var[ε2]/ΣXi2 - 2X1X2β2 =

X1X2ΣXi2Var[εi]/ΣXi22 - X1X2(Var[ε1] + Var[ε2] )/ΣXi2.

Cov[ ε1 , ε2 ] = E[ ε1 ε2 ] - E[ ε1]E[ ε2 ] = X1X2ΣXi2Var[εi]/ΣXi22 - X1X2(Var[ε1] + Var[ε2])/ΣXi2.

In the example, Cov[ ε1 , ε2 ] = (1)(3)(686)/742 - (1)(3)(1 + 5)/74 = .1326.

Corr[ ε1 , ε2 ] = .1326/√((1.098)(4.911)) = .057.12

Cov[ îε , ^

jε ] = XiXjΣXk2Var[εk]/ΣXk2 - Var[εi] - Var[εj]/ΣXk2.

11 Homoscedasticity is the term used for the situation in which all of the error terms have the same variance. Homoscedasticity and heteroscedasticity will be discussed in a subsequent section.12 While ε1 and ε2 are independent, the same is not true of the observed residuals.


Exercise: What is Corr[ ε1 , ε3 ]?

[Solution: Cov[ ε1 , ε3 ] = (1)(8)686/74 - 1 - 10/74 = -.1870.

Corr[ ε1 , ε3 ] = -.1870/√((1.098)(.720)) = -.210.]

Exercise: What is Corr[ ε2 , ε3 ]?

[Solution: Cov[ ε2 , ε3 ] = (3)(8)686/74 - 5 - 10/74 = -1.8583.

Corr[ ε2 , ε3 ] = -1.8583/√((4.911)(.720)) = -.988.]

For this example, the variance-covariance matrix of the residuals is:

(1.098 .133 -.187)( .133 4.911 -1.858)(-.187 -1.858 .720)

If all of the Var[εi] are equal, Var[εi] = σ2, then:

Cov[ îε , ^

jε ] = XiXjΣXk2σ2/ΣXk2 - σ2 - σ2/ΣXk2 = -σ2XiXj/ΣXk2.

Corr[ îε , ^

jε ] = -XiXj/√ΣXk2)(ΣXk2). k ≠ i k ≠ j

Simulation:

Assume we have Yi = 2Xi + εi, with εi independent and Normal with mean zero, and:

i Xi Var(εi) 1 1 12 3 53 8 10

We can simulate this situation as follows:1. Simulate ε1, ε2, and ε3.

2. Calculate Yi = 2Xi + εi.

3. Fit a regression, ^β = ΣXiYi /ΣXi2.

4. Calculate ^

iY = ^βXi.

5. Calculate îε = Yi -

îY .


For example, let -1.272, -.620, and .574, be 3 independent random Standard Normals. ε1 = -1.272√1 = -1.272. ε2 = -.620√5 = -1.386. ε3 = .574√10 = 1.815.

Y1 = 2X1 + ε1 = (2)(1) - 1.272 = .728. Y2 = (2)(3) - 1.386 = 4.614. Y3 = (2)(8) + 1.815 = 17.815.

^β = ΣXiYi /ΣXi2 = 157.1/74 = 2.123.

Y1^ = (2.123)(1) = 2.123. Y2

^ = (2.123)(3) = 6.369. Y3^ = (2.123)(8) = 16.984.

ε1 = .728 - 2.123 = -1.395. ε2 = 4.614 - 6.369 = -1.755. ε3 = 17.815 - 16.984 = .831.

Exercise: Let 2.388, -.849, and -2.315, be 3 independent random Standard Normals. Simulate the above situation and determine the residuals.[Solution: ε1 = 2.388√1 = 2.388. ε2 = -.849√5 = -1.898. ε3 = -2.315√10 = -7.321.Y1 = (2)(1) + 2.388 = 4.388. Y2 = (2)(3) - 1.898 = 4.102. Y3 = (2)(8) - 7.321 = 8.679.

^β = ΣXiYi /ΣXi2 = 86.126/74 = 1.164.

Y1^ = (1.164)(1) = 1.164. Y2

^ = (1.164)(3) = 3.492. Y3^ = (1.164)(8) = 9.312.

ε1 = 4.388 - 1.164 = 3.224. ε2 = 4.102 - 3.492 = .610. ε3 = 8.679 - 9.312 = -.633.]

Notice that each time we perform this simulation we get a different set of Yis, a different fitted slope, and a different set of residuals. If we ran this simulation 1000 times, we would get a set

of 1000 different values for ε1. Var[ ε1] measures the variance of ε1 around its expected value of zero.

If we ran this simulation 1000 times, we would get a set of 1000 different values for ^β.

Var[ ^ββ] measures the variance of ^ββ around its expected value of β = 2.13

13 The variance of fitted regression parameters will be discussed subsequently.


Problems:

Use the following 4 observations for the next 3 questions:X: 4 7 13 19Y: 5 15 22 35

1 .1 (1 point) Via least squares, fit to the above observations the following model Y = βX + ε.

What is the fitted value of β?(A) 1.4 (B) 1.5 (C) 1.6 (D) 1.7 (E) 1.8

1 .2 (2 points) For the model fit in the previous question, what is the Error Sum of Squares?(A) 11 (B) 12 (C) 13 (D) 14 (E) 15

1.3 (2 points) For the model Y = 2X, what is the sum of squared errors?(A) 30 (B) 35 (C) 40 (D) 45 (E) 50

1.4 (2 points) You are given:(i) The model is Yi = βXi + εi, i = 1, 2, 3.

(ii) i Xi Var(εi) 1 1 12 5 23 10 4

(iii) The ordinary least squares residuals are îε = Yi -

^βXi, i = 1, 2, 3.

Determine E( ε22 | X1, X2, X3).(A) 1.7 (B) 1.8 (C) 1.9 (D) 2.0 (E) 2.1

1.5 (1 point) Via ordinary least squares, the model Y = βX + ε is fit to the following data: X: 1 5 10 25Y: 5 15 50 100

Determine ^β.

(A) 3.9 (B) 4.0 (C) 4.1 (D) 4.2 (E) 4.3

1.6 (2 points) You are given the following data on the appraised values and sale prices of six homes, in thousands of dollars: Appraised Value: 170 213 68 66 96 137Sale Price: 180 245 85 88 132 156Fit a least squares line with no intercept. What is the estimated sale price of a home appraised at 300?(A) 340 (B) 342 (C) 344 (D) 346 (E) 348


1.7 (2 points) Fit a least squares line with no intercept to the following data:X -2 -1 0 1 2 3 4 5Y -12 -7 0 6 14 21 24 31What is the slope of the fitted line? (A) 6.1 (B) 6.2 (C) 6.3 (D) 6.4 (E) 6.5

1.8 (3 points) You are given the following information on the SAT scores for 10 students.English: 630 700 540 610 580 670 710 630 580 760Math: 570 710 570 580 610 640 660 640 670 720Fit via least squares the model: Math Score = β(English Score).What is the fitted value of β?A. 0.98 B. 0.99 C. 1.00 D. 1.01 E. 1.02

1.9 (1 point) Given the following information: ΣXi = -1015. ΣYi = -1410. ΣXi2 = 191,711. ΣYi2 = 123,526. ΣXiYi = 36,981. n = 20.

Determine the value of β fitted via least squares for the following model: Yi = βXi + ε.

A. Less than 0.18 B. At least 0.18, but less than 0.19C. At least 0.19, but less than 0.20 D. At least 0.20, but less than 0.21 E. 0.21 or more

1.10 (2, 5/85, Q. 19) (1.5 points) For the data (x1, y1) = (1, 2) and (x2, y3) = (5, 3) and the

model E(Y) = βx, the least squares estimate of β is: A. 1/4 B. 17/26 C. 17/13 D. 17/6 E. 4

* 1.11 (4, 5/00, Q.16) (2.5 points) You are given:(i) x1 = -2 x2 = -1 x3 = 0 x4 = 1 x5 = 2

(ii) The true model for the data is y = 10x + 3x2 + ε.(iii) The model fitted to the data is y = β*x + ε*.Determine the expected value of the least-squares estimator of β*.(A) 6 (B) 7 (C) 8 (D) 9 (E) 10

* 1.12 (4, 11/00, Q.35) (2.5 points) You are analyzing a large set of observations from a population.The true underlying model is: y = 0.1t - z + ε.You fit a two-variable model to the observations, obtaining: y = 0.3t + ε*.You are given: Σ t = 0. Σ t2 = 16. Σ z = 0. Σ z2 = 9.Estimate the correlation coefficient between z and t.(A) -0.7 (B) -0.6 (C) -0.5 (D) -0.4 (E) -0.3


1.13 (IOA 101, 9/03, Q.14) (12 points) Consider a linear regression model in which responses Yi are uncorrelated and have expectations βXi and common variance

σ2 (i = 1,... ,n) ; i.e., Yi is modeled as a linear regression through the origin:

E(Yi | Xi) = βXi and V(Yi | Xi) = σ2 (i = 1,... ,n).

(i) (3.75 points) (a) Show that the least squares estimator of β is ^β1 = ΣXiYi/ΣXi2.

(b) Derive the expectation and variance of ^β1 under the model.

(ii) (3 points) An alternative to the least squares estimator in this case is:^β2 = ΣYi/ΣXi = Y /X.

(a) Derive the expectation and variance of ^β2 under the model.

(b) Show that the variance of the estimator ^β2 is at least as large as that of the least squares

estimator ^β1.

(iii) (5.25 points) Now consider an estimator ^β3 of β which is a linear function of the responses;

i.e., an estimator which has the form ^β3 = ΣaiYi, where a1,..., an are constants.

(a) Show that ^β3 is unbiased for β if ΣaiXi = 1, and that the variance of

^β3 is Σai2σ2.

(b) Show that the estimators ^β1 and

^β2 above may be expressed in the form

^β3 = ΣaiYi and

hence verify that ^β1 and

^β2 satisfy the condition for unbiasedness in (iii)(a).

(c) It can be shown that, subject to the condition ΣaiXi = 1, the variance of ^β3 is minimized by

setting ai = Xi/ΣXi2. Comment on this result.

1.14 (4, 11/03, Q.29) (2.5 points) You are given:(i) The model is Yi = βXi + εi, i = 1, 2, 3.

(ii) i Xi Var(εi) 1 1 12 2 93 3 16

(iii) The ordinary least squares residuals are îε = Yi -

^βXi, i = 1, 2, 3.

Determine E( ε12 | X1, X2, X3).(A) 1.0 (B) 1.8 (C) 2.7 (D) 3.7 (E) 7.6


1.15 (VEE-Applied Statistics Exam, 8/05, Q.4) (2.5 points) You are given: Yi = β + βXi + εi.

Determine the least-squares estimate of β. (A) ΣYi / ΣXi(B) ΣYi / Σ(1 + Xi)

(C) ΣXiYi / ΣXi2

(D) Σ(1 + Xi)Yi / Σ(1 + Xi)2

(E) Σ(Xi - X)(Yi - Y )/ Σ(Xi - X)2


Section 2, Fitting a Straight Line with an Intercept

In the previous section we fit a straight line with no intercept, to the heights of fathers and their sons. In this section we will include an intercept in the model.

Let us assume Y = α + βX + ε, where X is the height of the father and Y is the height of his son. This model with one independent variable and one intercept, is called the two-variable regression model. We want to determine the best values of α and β, those that minimize the sum of the squared differences between the height of each son estimated by our equation α + βXi, and the actual height of that son Yi. This is called the ordinary least squares regression.14

Sum of Squared Errors = Σ(Yi - α − βXi)2.

We would determine the least squares estimates of α and β algebraically, by setting equal to zero the partial derivatives with respect to α and β of the sum of squared errors.

0 = ∂Σ(Yi - α − βXi)2/ ∂α = -2Σ(Yi - α - βXi). ⇒ 0 = ΣYi - Σα - ΣβXi. ⇒ αN + βΣXi = ΣYi , where N is the number of observations.

0 = ∂Σ(Yi - α − βXi)2/ ∂β = -2Σ(Yi - α - βXi)Xi. ⇒ 0 = ΣXiYi - αΣXi - ΣβXiXi. ⇒ αΣXi + βΣXi2 = ΣXiYi.

Exercise: Use the above equations in order to determine the least squares estimates of α and β for the fathers and sons example.[Solution: ΣXiYi = (53)(56) + ... + (66)(64) = 29063.

ΣXi2 = 532 + ... + 662 = 28228.N = number of observations = 8.ΣXi = 53 + ... + 66 = 474. ΣYi = 56 + ... + 64 = 489.

Therefore, 8α + 474β = 489 and 474α + 28228β = 29063.

Therefore, α = (489)(28228) - (29063)(474)/(8)(28228) - 4742 = 27630/1148 = 24.07, and^β = (29063)(8) - (489)(474)/(8)(28228) - 4742 = 718/1148 = .6254.]

Thus the result of this regression is: ^

iY = 24.07 + .6254Xi.

For example, the fitted height of the first son is: Y1^ = 24.07 + .6254X1 = 24.07 + (.6254)(53) =

57.216. This of course differs somewhat from the actual height of the first son which is 56.

14 The term “regression” was introduced by Francis Galton in the 1880s, referring to his analysis of the heights of adult children versus the heights of their parents.


Here is a graph of the least squares line with intercept (solid) and that without intercept (dashed), each fit to the same data on heights:

54 56 58 60 62 64 66Father

54

56

58

60

62

64

66

68

Son

The line with intercept (solid) seems to fit better than that without intercept (dashed). However, this will always be the case, since the line with no intercept is just a special case of that with

intercept, with α = 0. How to determine whether the line with intercept is a significantly better fit, will be discussed subsequently.

We obtained two equations in two unknowns:αN + βΣXi = ΣYi , where N is the number of observations.

αΣXi + βΣXi2 = ΣXiYi .

The solution is:

α = ΣYiΣXi2 - ΣXiΣXiYi / NΣXi2 - (ΣXi)2, or αα ==== Y −−−− ^ββ X ....

^ββ = NΣΣΣΣXiYi - ΣΣΣΣXiΣΣΣΣYi / NΣΣΣΣXi2 - (ΣΣΣΣXi)2.

These are the easiest formulas to use, when one is given the summary statistics such as ΣXiYi.


Using the Functions of the Calculator:

Provided you are given the individual data rather than the summary statistics, the allowed electronic calculators will fit a least squares straight line with an intercept.

Father Son53 5654 5857 6158 6061 6362 6263 6566 64

Using the TI-30X-IIS, one would fit a straight line with intercept as follows:2nd STAT CLRDATA ENTER

2nd STAT2-VAR ENTER (Use the key if necessary to select 2-VAR rather than 1-VAR.)DATAX1 = 53 Y1 = 56 X2 = 54 Y2 = 58 etc.X8 = 66 Y8 = 64 ENTERSTATVAR

Various outputs are displayed. Use the key and the key to scroll through them.n = 8. (number of pairs of data.)X = 59.25 (sample mean of X.)Sx = 4.5277 (square root of the sample variance of X, computed with n - 1 in the denominator.)

σx = 4.2353 (square root of the variance of X, computed with n in the denominator.)

Y = 61.125 (sample mean of Y.)Sy = 3.0443 (square root of the sample variance of Y, computed with n - 1 in the denominator.)

σy = 2.8477 (square root of the variance of Y, computed with n in the denominator.)

Σ X = 474 ΣX2 = 28228 ΣY = 489 ΣY2 = 29955 ΣXY = 29063

a = 0.6254 slopeb = 24.07 intercept

r = 0.93019 (sample correlation coefficient between X and Y.)


Deviations Form:

While these are perfectly valid solutions, when given individual data some people find it easier to work with the variables in deviations form.

Exercise: What is the mean height of the fathers?[Solution: (53 + 54 + 57 + 58 + 61 + 62 + 63 + 66)/8 = 474/8 = 59.25.]

Exercise: What is the mean height of the sons?[Solution: (56 + 58 + 61 + 60 + 63 + 62 + 65 + 64)/8 = 489/8 = 61.125.]

The mean of a variable is written as that variable with a bar over it. Mean of X is X.Mean height of fathers = X = 59.25.Mean height of sons = Y = 61.125.

To convert a variable to deviations form, one subtracts its mean.A variable in deviations form is written with a small rather than capital letter.

xi = Xi - X.

Exercise: What are xi and yi?

[Solutions: xi = Xi - X = (53, 54, 57, 58, 61, 62, 63, 66) - 59.25 = (-6.25, -5.25, -2.25, -1.25, 1.75, 2.75, 3.75, 6.75). yi = Yi - Y = (56, 58, 61, 60, 63, 62, 65, 64) - 61.125 =(-5.125, -3.125, -.125, -1.125, 1.875, .875, 3.875, 2.875).]

Σxi = ΣXi - N X = N X - N X = 0.

Verify that in this case both xi and yi sum to zero. In general, the sum of any variable in deviations form is zero. Therefore, its mean is also zero.

Variables in deviations always have a mean of zero.


Least Squares Regression in Deviations Form:

We have assumed the model, Yi = α + βXi + εi, i = 1, 2, ... N. Then adding up the N equations and dividing by N we get:Y = α + βX + Σεi/N. We have no reason to believe the average error is positive or negative. Lets assume it is zero.15

Then we would expect that: Y = α + ^βX . ⇒ α = Y −

^βX .16

One could verify that this is true in general for the solutions given previously, for α and ^β.

In any case, when we set the partial derivative of the squared error with respect to α equal to

zero we got: αN + ^βΣXi. = ΣYi ⇒ Y = α +

^βX . ⇒ α = Y −

^βX .

Exercise: For the regression fit to heights, verify that α = Y − ^βX .

[Solution: α = 24.07. ^β = .6254. X = 59.25. Y = 61.125.

24.07 = 61.125 - (.6254)(59.25).]

We can take the original model and convert it to deviations form:

Yi = α + βXi + εi = Y − ^βX + βXi + εi. ⇒ Yi - Y = β(Xi - X) + εi. ⇒

yi = βxi + εi.

In deviations we get the same equation, except with no intercept. Based on the previous

section, the least squares fit is: ^β = Σxiyi /Σxi2.

In deviations form, the least squares regression to the two-variable (linear) regression model, Yi = α + βXi + εi, has solution:

^ββ = ΣΣΣΣxiyi /ΣΣΣΣxi2

αα ==== Y −−−− ^ββ X ....

Exercise: Using deviations form, fit the least squares regression to the data on heights.[Solution: xi = (-6.25, -5.25, -2.25, -1.25, 1.75, 2.75, 3.75, 6.75). yi = (-5.125, -3.125, -.125, -1.125, 1.875, .875, 3.875, 2.875).

Σxi2 = 143.5. Σxiyi = 89.75. ^β = Σxiyi /Σxi2 = 89.75/143.5 = .625.

α = Y − ^βX = 61.125 - (.625)(59.25) = 24.1.

Comment: This matches the result obtained previously.]

A Shortcut when using Deviations Form:*15 Assumptions behind least squares regression will be discussed subsequently.16 This is a good way to remember this formula.


A Shortcut when using Deviations Form:*

Σxiyi = Σxi(Yi - Y ) = ΣxiYi - YΣxi = ΣxiYi - Y0 = ΣxiYi.

Therefore, ^β = ΣxiYi /Σxi2.

This can save some time on an exam, by avoiding having to calculate yi = Yi - Y .

One would still have to calculate Y , in order to calculate α = Y − ^βX .

S Notation:*

Another notation that some people find useful is:

SXX = Σ(Xi - X)2 = ΣXi2 - (ΣXi)2/N.

SYY = Σ(Yi - Y )2 = ΣYi2 - (ΣYi)2/N.

SXY = Σ(Xi - X)(Yi - Y ) = ΣXiYi - ΣXiΣYi/N.

Exercise: For the regression of heights example, calculate SXX, SYY, and SXY

[Solution: SXX = 143.5, SYY = 64.875, and SXY = 89.75.]

Then, ^β = SXY/SXX.

For the regression of heights example, ^β = 89.75/143.5 = .625.

As before, α = Y − ^βX.

The sample variance of X is: SXX/(N-1).

The sample variance of Y is: SYY/(N-1).

The sample covariance of X and Y is: SXY/(N-1).

The sample correlation of X and Y is: SXY/√(SXXSYY).

For the regression of heights example, r = 89.75/√(143.5)(64.875) = 0.9302.


Relation of Fitted Slope to Covariances or Correlations:

The sample variance of X is: sX2 = Σ (Xi - X)2 / (N - 1) = Σxi2 / (N - 1).The sample covariance of X and Y is: Cov[X, Y] = Σ (Xi - X)(Yi - Y ) / (N - 1) = Σ xiyi / (N - 1).

Therefore, ^β = Σxiyi/Σxi2 = Cov[X, Y] / Var[X].

^ββ = Cov[X, Y] / Var[X].

Exercise: The sample variance of X is 125. The sample covariance of X and Y is 167.

What is ^β in a two variable linear regression?

[Solution: ^β = Cov[X, Y] / Var[X] = 167/125 = 1.336.]

Note that the sample correlation coefficient is: r = Cov[X, Y]/(sXsY) =

Σ (Xi - X)(Yi - Y )/(N - 1) / √(Σ (Xi - X)/(N - 1)2(Yi - Y )2/(N - 1)) = Σ xiyi / √ (Σxi2Σyi2).

Therefore, ^β = Σxiyi /Σxi2 = r√(Σyi2/Σxi2) = rsY/sX.

^ββ = r sY/sX.

For the heights example, r = .930, sx = 11.979, sY = 8.055, ^β = (.9302)(8.055)/11.979 = .625.

Exercise: The sample correlation of X and Y is -.4. The sample standard deviation of X is 5.

The sample standard deviation of Y is 10. What is ^β in a two variable linear regression?

[Solution: ^β = rsY/sX = -.4(10/5) = -.8.]


Problems:

Use the following 4 observations for the next 2 questions:X: 0 4 8 12Y: 834 889 916 950

2.1 (2 points) Via least squares, fit to the above observations the following model Y = α + βX + ε. What is the fitted value of β?(A) 9.0 (B) 9.2 (C) 9.4 (D) 9.6 (E) 9.8

2.2 (1 point) Via least squares, fit to the above observations the following model Y = α + βX + ε. What is the fitted value of α?(A) 800 (B) 810 (C) 820 (D) 830 (E) 840

2.3 (1 point) The sample covariance of X and Y is -413. The sample variance of X is 512. What

is ^β in a two variable linear regression?

(A) -0.8 (B) -0.7 (C) -0.6 (D) -0.5 (E) -0.4

2 .4 (3 points) You fit a two-variable linear regression to the following 5 observations: X: 1 2 3 4 5Y: 202 321 404 480 507What is the predicted value of Y, when X = 7?(A) 650 (B) 670 (C) 690 (D) 710 (E) 730

2.5 (1 point) The sample correlation of X and Y is 0.6. The sample variance of X is 36. The

sample variance of Y is 64. What is ^β in a two variable linear regression?

(A) 0.6 (B) 0.8 (C) 1.0 (D) 1.2 (E) 1.4

2.6 (2 points) Use the following 4 observations:X: -1 1 3 5Y: 3 4 7 6Fit a least squares straight line with intercept and use it to estimate y for x = 6. (A) 7.0 (B) 7.2 (C) 7.4 (D) 7.6 (E) 7.8

2.7 (3 points) Use the following information:Year (t) Loss Ratio (Y)1 822 783 804 735 77You fit the following model: Y = α + βt + ε.What is the estimated Loss Ratio for year 7?(A) 71 (B) 72 (C) 73 (D) 74 (E) 75


2.8 (2 points) For each of five policy years an actuary has estimated the ultimate losses based on the information available at the end of that policy year.Policy Year Estimated Actual Ultimate1991 45 431992 50 581993 55 631994 60 761995 65 78Let Xt be the actuary’s estimate and Yt be the actual ultimate.

Fit the ordinary least squares model, Yt = α + βXt.

2.9 (2 points) You are given the following data on the number of exams and the salaries of seven actuaries in the land of Elbonia: Number of Exams: 2 3 3 4 2 4 3Salaries: 50 63 56 66 60 82 71Fit a least squares line with intercept. What is the estimated salary of an actuary with 5 exams?(A) 83 (B) 84 (C) 85 (D) 86 (E) 87 2.10 (3 points) You are given the following data for 10 taxi drivers. For each driver you are given the number of moving traffic violations during three years and the sum of their basic limit losses for Bodily Injury Liability Insurance (in $1000) during the following three years. Violations: 0 0 0 0 1 1 1 2 3 5Losses: 10 0 43 0 35 0 80 0 58 64Fit a least squares line with intercept. What are the estimated losses for a taxi driver with 4 moving violations?(A) 49 (B) 51 (C) 53 (D) 55 (E) 57

2.11 (2 points) Given the following information: ΣXi = 351. ΣYi = 15,227. ΣXi2 = 6201. ΣYi2 = 9,133,797. ΣXiYi = 204,296. n = 26. Determine the least squares equation for the following model: Yi = β0 + β1Xi + ε.

A. ^

iY = 601.2 - 1.153Xi.

B. ^

iY = 570.1 + 1.153Xi.

C. ^

iY = 597.4 - 0.867Xi.

D. ^

iY = 573.9 + 0.867Xi.

E. None of the above


2.12 (3 points) A linear regression, Y = α + βX, is fit to a set of observations (Xi, Yi), where X is

in feet and Y is in dollars. α = 37 and ^β = 2.4.

If instead X had been in meters, 1 meter = 3.28 feet, and Y had been in yen, 1 yen = 116 dollars, what would have been the fitted model?

2.13 (2 points) Use the following information:Year (t) Claim Frequency (Y)1 3.18%2 3.12%3 3.30%4 3.39%5 3.41%You fit via least squares the following model: Y = α + βt.What is the fitted claim frequency for year 7?(A) 3.51% (B) 3.53% (C) 3.55% (D) 3.57% (E) 3.59%

2.14 (2 points) You are given the following information on six women each 35 years old and five foot 4 inches tall: Weight: 142 146 156 163 170 177Household income($000): 52 49 48 47 43 42Determine the least squares equation for the following model: Yi = β0 + β1Xi + ε.

What is the fitted value of β1?A. -0.27 B. -0.25 C. -0.23 D. -0.21 E. -0.19

2.15 (3 points) A linear regression, Y = α + βX, is fit to a set of observations (Xi, Yi).What would be the effect on the fitted regression if a constant c had been added to each Xi?What would be the effect on the fitted regression if instead a constant c had been added to each Yi?

2.16 (2 points) For each of 10 insureds, you are given the number of claims in year 1 and the number of claims in year 2. Insured: 1 2 3 4 5 6 7 8 9 10Year 1: 0 0 0 0 0 1 1 1 1 1Year 2: 0 0 0 1 1 0 0 1 1 1Fit a least squares line with intercept, using the number of claims in year 1 as the independent variable and the number of claims in year 2 as the dependent variable. What is the estimated future claim frequency for an insured with one claim in the most recent year?(A) 45% (B) 50% (C) 55% (D) 60% (E) 65%


2 .17 (2 points) Given the following information:

ΣXi = 153. ΣYi = 727. ΣXi2 = 2016. ΣYi2 = 17972. ΣXiYi = 4002. n = 62.

Fit the following model via least squares: Yi = α + βXi + ε.

For the fitted model, what value of Y corresponds to X = 20?A. Less than 35 B. At least 35, but less than 36 C. At least 36, but less than 37 D. At least 37, but less than 38 E. At least 38

2.18 (3 points) For a set of 10,000 private passenger automobile insureds you are given their claim counts in 2004 and 2005.

2004 Claim Count2005 Claim Count 0 1 2 Total

0 8300 740 40 90801 750 100 8 8582 50 10 2 62

9100 850 50 10,000

Fit a least squares regression, Y = α + βX, where X is the claim count in 2004 and Y is the claim count in 2005. Joe had 2 claims in 2004. Use this regression in order to estimate Joe’s expected claim frequency in 2005. A. 0.14 B. 0.16 C. 0.18 D. 0.20 E. 0.22

2.19 (Course 120 Sample Exam #2, Q.1) (2 points) You fit the model Yi = α + βXi + εi to the following data:i 1 2 3Xi 1 3 4Yi 2 Y2 5

You determine that α = 5/7. Calculate Y2.(A) 0 (B) 1 (C) 2 (D) 3 (E) 4

2.20 (Course 120 Sample Exam #2, Q.7) (2 points) You are given the following information about a simple linear regression fit to 10 observations:10 10 10 10

Σ Xi = 20. ΣYi = 100. Σ (Xi - X)2/9 = 4. Σ (Yi - Y )2/9 = 64. i=1 i=1 i=1 i=1You are also given that the simple correlation coefficient r = -0.98.Determine the predicted value of Y when X = 5.(A) -10 (B) -2 (C) 11 (D) 30 (E) 37


2.21 (IOA 101, 9/01, Q.4) (1.5 points) Let (Xi , Yi); i = 1, … , n denote a set of n pairs of

points, with X the sample mean of the Xs and Y the sample mean of the Ys.Assuming the usual expressions for the estimated coefficients, verify that theleast squares fitted regression line of Y on X passes through the point (X , Y ).

2.22 (IOA 101, 9/03, Q.6) (1.5 points) Show that the slope of the regression line fitted by least squares to the three points:

(0, 0) , (1, y) , (2, 2)is 1 for all values of y.

2.23 (CAS3, 5/05, Q.27) (2.5 points) Given the following information: ΣXi = 144. ΣYi = 1,742. ΣXi2 = 2,300. ΣYi2 = 312,674. ΣXiYi = 26,696. n = 12. Determine the least squares equation for the following model: Yi = β0 + β1Xi + ε

A. ^

iY = -0.73 + 12.16Xi

B. ^

iY = -8.81 + 12.16Xi

C. ^

iY = 283.87 + 10.13Xi

D. ^

iY = 10.13 + 12.16Xi

E. ^

iY = 23.66 + 10.13Xi

2.24 (CAS3, 5/06, Q.9) (2.5 points) The following summary statistics are available with respect to a random sample of seven observations of the price of gasoline, Y, versus the price of oil, X: ΣXi = 315. ΣXi2 = 14,875. ΣYi = 12.8. ΣYi2 = 24.3. ΣXiYi = 599.5.

Use the available information and a linear regression model of the form Y = α + βX to calculate the predicted cost of gasoline if the price of oil reaches $75. A. Less than $2.85 B. At least $2.85, but less than $2.90 C. At least $2.90, but less than $2.95 D. At least $2.95, but less than $3.00 E. At least $3.00


Section 3, Residuals

Continuing the example from the previous section, the fitted height of a son is: ^

iY = 24.07 + .6254Xi, where Xi is the height of his father.

As discussed previously, the difference between each son’s height and his height estimated by the model is the residual.

Residual = actual - estimated. îε ≡ Yi -

îY .

Exercise: What are the residuals for the fitted model ^

iY = 24.07 + .6254Xi?

[Solution: îε = 56 - 57.216, 58 - 57.842, 61 - 59.718, 60 - 60.343, 63 - 62.219, 62 - 62.845,

65 - 63.470, 64 - 65.346 = -1.216, .158, 1.282, -.343, .781, -.845, 1.530, -1.346.]

For the two variable linear regression model with an intercept:

îε = Yi -

îY = Yi - α -

^βXi = Yi - (Y -

^βX) -

^βXi = yi -

^βxi.

Σîε = Σ(yi -

^βxi) = Σyi -

^βΣxi = 0 -

^β0 = 0.

For the linear regression model with an intercept, the sum of the residuals is always zero.17

This provides a good check of your work.

For the current example, Σ îε = -1.216 + .158 + 1.282 - .343 + .781 - .845 + 1.530 - 1.346 =

0.001, zero subject to rounding.

Error Sum of Squares:

Exercise: For the fitted model ^

iY = 24.07 + .6254Xi, what is the sum of squared errors?

[Solution: Σ îε 2 = 1.2162 + .1582 + 1.2822 - .3432 + .7812 - .8452 + 1.532 + 1.3462 = 8.741.]

The sum of squared errors = Error Sum of Squares = ESS ≡≡≡≡ ΣΣΣΣ îεε 2 = ΣΣΣΣ (Yi -

îY )2 .

In this case, ESS = 8.741.

The error sum of squares will be discussed further in the section on Analysis of Variance.

17 This is not necessarily true for a model with no intercept.


Other Properties of Residuals:*18

One can prove that the residuals are uncorrelated with X.

Corr[ ε , X] = Cov[ ε , X]/√(Var[ ε ]Var[X]). Cov[ ε , X] = E[ εX] - E[ ε ]E[X]. Since the mean of the residuals is always zero, the numerator of Corr[ ε , X] is:

Cov[ ε , X] = E[ εX] = Σîε (Xi - X) = Σ^

iε xi.

In the current example, Σ îε xi is: (-6.25)(-1.216) + (-5.25)(.158) + (-2.25)(1.282) +

(-1.25)(-.343) + (1.75)(.781) + (2.75)(-.845) + (3.75)(1.53) + (6.75)(-1.346) = .01, or zero subject to rounding.

In general, îε = Yi -

îY = Yi - α -

^βXi = Yi - (Y -

^βX) -

^βXi = yi -

^βxi.

Σîε xi = Σ(yi -

^βxi)xi = Σxiyi -

^βΣxi2 = 0, since ^

β = Σxiyi / Σxi2.

Therefore, Corr[εε , X] = 0.

As will be seen when we discuss analysis of variance, the difference between the fitted Y and

the mean of Y, ^

iY - Y , is also of interest. ^

iY - Y = α + ^βXi - Y = Y -

^βX +

^βXi - Y =

^βxi.

Σîε (

îY - Y ) = Σ^

iε^βxi =

^βΣ^

iε xi = ^β(0) = 0.

Thus, ^Y - Y and εε are uncorrelated.

Exercise: In the current example, compute Σ îε (

îY - Y ).

[Solution: ^

iY - Y = 57.22 - 61.125, 57.84 - 61.125, 59.72 - 61.125, 60.34 - 61.125, 62.22 - 61.125, 62.84 - 61.125, 63.47 - 61.125, 65.35 - 61.125 = -3.91, -3.28, -1.41, -.78, 1.09, 1.72, 2.34, 4.22.

Σîε (

îY - Y ) = (-3.91)(-1.216) + (-3.28)(.158) + (-1.41)(1.282) + (-.78)(-.343) + (1.09)(.781) +

(1.72)(-.845) + (2.34)(1.53) + (4.22)(-1.346) = -.006, or zero subject to rounding.]

18 See Appendix 3.2 of Pindyck and Rubinfeld.


Problems:

3.1 (1 point) A regression is fit to 5 observations. The first four residuals are: 12, -4, -9, and 6. What is the error sum of squares?A. 220 B. 240 C. 260 D. 280 E. 300

3 .2 (3 points) A two-variable regression is fit to the following 4 observations. t 1 2 3 4Y 30 40 55 60What is the error sum of squares?(A) Less than 16(B) At least 16, but less than 17(C) At least 17, but less than 18(D) At least 18, but less than 19(E) At least 19

3.3 (2 points) A two-variable regression is fit to 5 observations.The first four values of the independent variable X and the residuals are as follows:i 1 2 3 4Xi 7 12 15 21

îε 1.017 0.409 -0.557 -2.487

What is X5?A. 29 B. 30 C. 31 D. 32 E. 33

3.4 (3 points) A two-variable regression is fit to 5 observations. The first 4 values of the

dependent variable Y and the corresponding fitted values ^Y are as follows:

i 1 2 3 4Yi 13 25 36 40

îY 18.036 22.989 30.419 40.325

What is Y5?A. 48 B. 49 C. 50 D. 51 E. 52


Section 4, Dividing the Sum of Squares into Two Pieces

As will be discussed, one can divide the Total Sum of Squares (TSS) into two pieces: the Regression Sum of Squares (RSS) and Error Sum of Squares (ESS).19

Sample Variance:

Exercise: X1 and X2 are two independent, identically distributed variables, with mean µ and

variance σ2. X = (X1 + X2)/2. What is the expected value of: (X1 - X)2 + (X2 - X)2?

[Solution: (X1 - X)2 + (X2 - X)2 = (X1/2 - X2/2)2 + (X2/2- X1/2)2 = 2(X1 - X2)2/4 =

X12 /2 + X22 /2 - X1X2. E[(X1 - X)2 + (X2 - X)2 ] = E[X12 /2 + X22 /2 - X1X2 ] =

(σ2 + µ2)/2 + (σ2 + µ2)/2 + µ2 = σ2.]

Thus (X1 - X)2 + (X2 - X)2/(2 - 1) = (X1 - X)2 + (X2 - X)2 is an unbiased estimator of σ2.

In general, with N independent, identically distributed variables Xi, Σ(Xi - X)2/(N - 1) is an unbiased estimator of the variance.

ΣΣΣΣ (Xi - X)2/(N - 1) is called the sample variance of X.

The sample variance has in its numerator the sum of squared differences between each element and the mean. The denominator of the sample variance is the number of elements minus one.20 With this denominator, the sample variance is an unbiased estimator of the underlying variance, when the underlying mean is unknown.21

Exercise: The heights of the eight sons were: 56, 58, 61, 60, 63, 62, 65, and 64. What is the sample variance of heights of these sons?[Solution: Y = 61.125. Sample Variance ≡ Σ(Yi - Y )2/(N - 1) =

(56 - 61.125)2 + (58 - 61.125)2 + (61 - 61.125)2 + (60 - 61.125)2 + (63 - 61.125)2 + (62 - 61.125)2 + (65 - 61.125)2 + (64 - 61.125)2 / (8 - 1) = 64.875/7 = 9.27.]

19 This is similar to the ideas behind Analysis of Variance (ANOVA). See for example, Probability and Statistical Inference, by Hogg and Tanis.Similar ideas also apply to Buhlmann Credibility. See “Credibility” by Mahler and Dean. 20 As will be discussed subsequently, the number of degrees of freedom associated with the sum of squares in the numerator is N - 1.21 The (non-sample) variance, Σ(Yi - Y )2/N = 2nd moment - square of the mean, is a biased estimator of the true

underlying variance.


Using the Functions of the Calculator to Compute Samples Means and Variances:

Using the TI-30X-IIS, one could work as follows with the sample of size eight: 56, 58, 61, 60, 63, 62, 65, and 64.

2nd STAT CLRDATA ENTER

2nd STAT1-VAR ENTER (Use the key if necessary to select 1-VAR rather than 2-VAR.)DATAX1 = 56 Freq = 1 X2 = 58 Freq = 1 X3 = 61 Freq = 1 X4 = 60 Freq = 1 X5 = 63 Freq = 1 X6 = 62 Freq = 1 X7 = 65 Freq = 1 X8 = 64 Freq = 1 ENTERSTATVAR

Various outputs are displayed. Use the key and the key to scroll through them.n = 8. (number of data points.)X = 61.125 (sample mean of X.)Sx = 3.044316 (square root of the sample variance of X.)

σx = 2.847696 (square root of the variance of X, computed with n in the denominator.)

Σ X = 489 ΣX2 = 29955

Sx2 = 3.0443162 = 9.268.


Total Sum of Squares:

The Total Sum of Squares or TSS is defined as the sum of squared differences between Yi and Y .

TSS ≡≡≡≡ ΣΣΣΣ (Yi - Y)2 = ΣΣΣΣ yi2 = ΣYi2 - (ΣYi)2/N = SYY.

Note that while both TSS and ESS involve squared differences from the observations of the dependent variable, Yi, in the case of the total sum of squares we subtract the mean, Y , while

in the case of the error sum of squares we subtract the estimated height, ^

iY .

TSS is just the numerator of the sample variance of Y.

In this example, TSS = (56 - 61.125)2 + (58 - 61.125)2 + (61 - 61.125)2 + (60 - 61.125)2 + (63 - 61.125)2 + (62 - 61.125)2 + (65 - 61.125)2 + (64 - 61.125)2 = 64.875.

The TSS quantifies the total variation in the observations of the dependent variable. In the case of a series of experiments, TSS would measure the total variation in outcomes.

Error Sum of Squares:

Recall that the Error Sum of Squares or ESS is:22

ESS ≡ Σ îε 2 = Σ (Yi -

îY )2.

As computed previously, for this example, ESS = 8.741.

Since Σ îε = 0, ESS is the numerator of the variance of ^

iε .

Other Ways to write ESS for the two-variable model:*

Since, îε = Yi -

îY = Yi - α -

^βXi = Yi - (Y -

^βX) -

^βXi = yi -

^βxi.

^β = Σxiyi / Σxi2

ESS = Σ îε 2 = Σ(yi -

^βxi)2 = Σyi2 +

^β2Σxi2 - 2

^βΣxiyi =

Σyi2 + (Σxiyi/ Σxi2)2Σxi2 - 2(Σxiyi/ Σxi2)Σyi xi = Σyi2 - (Σxiyi)2/ Σxi2 = Σyi2 - ^βΣxiyi.

Σîε

îY = Σ^

iε ( α + ^βXi + ^

iε ) = αΣîε +

^βΣ^

iε Xi + Σîε 2 = α0 +

^β0 + Σ^

iε 2 = Σîε 2 = ESS.

22 The Error Sum of Squares, ESS, is also sometimes called the residual sum of squares.


Regression Sum of Squares:

There is a third sum of squared differences that is of importance.

The Regression Sum of Squares or RSS23 is defined as the sum of squared differences between the fitted values and the mean of Y.

RSS = ΣΣΣΣ (^

iY - Y)2.

Exercise: For the fitted model of heights,^

iY = 24.07 + .6254Xi, what is the RSS?

[Solution: ^

iY - Y = 57.216 - 61.125, 57.842 - 61.125, 59.718 - 61.125, 60.343 - 61.125, 62.219 - 61.125, 62.845 - 61.125, 63.470 - 61.125, 65.346 - 61.125 = -3.909, -3.283, -1.407, -0.782, 1.094, 1.720, 2.345, 4.221.RSS = 3.9092 + 3.2832 + 1.4072 + .7822 + 1.0942 + 1.7202 + 2.3452 + 4.2212 = 56.121.]

In this example, RSS = 56.121.

Σîε = 0 ⇒ ΣYi -

îY ⇒ ΣYi = Σ ^

iY ⇒ mean of îY is Y .

Exercise: For this example, verify that the mean of ^

iY is 61.125 = Y .

[Solution: Σ îY = 57.216 + 57.842 + 59.718 + 60.343 + 62.219 + 62.845 + 63.470 + 65.346 =

488.999. 488.999/8 = 61.125.]

Therefore, RSS = Σ ( îY - Y )2 is the numerator of the variance of

îY .

RSS = Σ ( îY - Y )2 = Σ ( ^

iY - Y )(Yi - Y - îε ) = Σ ( ^

iY - Y )(Yi - Y ) - Σ ( îY - Y ) ^

iε =

Σ ( îY - Y )(Yi - Y ), since

îY - Y and ^

iε are uncorrelated.

Therefore, RSS = Σ ( îY - Y )(Yi - Y ) = the numerator of the correlation of

^Y and Y.

For this example, one can verify that Σ ( îY - Y )(Yi - Y ) = 56.121 = RSS.

23 The RSS is also sometimes called the sum of squares associated with the model as opposed to the error.


Other Ways to write RSS for the two-variable model:*

ΣYi = Σ îY ⇒ RSS = Σ ( ^

iY - Y )2 = Σ îY 2 - (Σ ^

iY )2/N.

îY - Y = α +

^βXi - Y = (Y -

^βX) +

^βXi - Y =

^βxi.

RSS = Σ( îY - Y )2 =

^β2Σxi2 = (Σxiyi/ Σxi2)2Σxi2 = (Σxiyi)2/ Σxi2 =

^βΣxiyi.

TSS = RSS + ESS:

For this example, TSS = 64.875, RSS = 56.121, and ESS = 8.741. Note that RSS + ESS = 64.862, equal to TSS subject to rounding.

In general for a regression model with an intercept, the Total Sum of Squares is equal to the Regression Sum of Squares plus the Error Sum of Squares.

TSS = RSS + ESS.

The total variation has been broken into two pieces: that explained by the regression model, RSS, and that unexplained by the regression model, ESS. This very important result holds for any linear regression model with an intercept, whether it is the two-variable model such as in this example, or a multivariable regression model to be discussed subsequently.

Proof of TSS = RSS + ESS:*

Yi - Y = Yi - ^

iY + (^

iY - Y ) = îε + (

îY - Y ).

(Yi - Y )2 = îε 2 + (

îY - Y )2 + 2^

iε (^

iY - Y ).

TSS = Σ (Yi - Y )2 = Σîε 2 + Σ(

îY - Y )2 + 2Σ^

iε (^

iY - Y ).

It has been shown previously that îε and

îY - Y have a correlation of zero and Σ ^

iε (^

iY - Y ) = 0. Thus the final term drops out and:

TSS = Σ îε 2 + Σ(

îY - Y )2 = ESS + RSS.

Note that the final term dropping out followed from a result that was proven for a regression model with an intercept. Analysis of Variance is not generally applied to a model without an intercept.

Alternately, for the two-variable model:

RSS + ESS = ^βΣxiyi + Σyi2 -

^βΣxiyi = Σyi2 = TSS.


Degrees of Freedom:

Each of these sums of squares has a number of “Degrees of Freedom” associated with it. The number of degrees of freedom will be needed in order to perform t-tests and F-tests.

Exercise: There were four observations. In deviations form, y1 = -6, y2 = -3, and y3 = 2. The value of y4 is unreadable because a coworker spilled coffee on the report.What is TSS?[Solution: In deviations form, the sum of yi is zero. Therefore, the missing y4 must be: 7.

TSS = Σyi2 = 62 + 32 + 22 + 72 = 98.]

In this exercise, we can compute TSS only knowing three out of the four yi. In that sense, TSS only depends on 3 pieces of information. Therefore, we say TSS has 3 degrees of freedom. Another way to look at the same thing, is that TSS has 4 squared terms, but there is one linear constraint on the yi: Σyi = 0. This linear constraint results in a loss of one degree of freedom, and therefore we have: 4 - 1 = 3 degrees of freedom.

In any case, in general, if we have N points, TSS has N -1 degrees of freedom.24

Now RSS = Σ( îY - Y )2 =

^β2Σxi2. Treating the xi as known, we need only

^β, one piece of

information depending on the yi, the outcomes of the experiment. Therefore, RSS has 1 degree of freedom, for the two-variable model.

Since TSS = RSS + ESS, (number of d.f. for TSS) = (number of d.f. for RSS) + (number of d.f. for ESS).Therefore, ESS has N - 2 degrees of freedom, for the two-variable model. The number of degrees of freedom for ESS is the number of points minus the number of fitted parameters (including the fitted intercept.)

Exercise: A linear regression is fit to two observations. What is ESS?[Solution: A line passes though any two points, therefore the regression line perfectly fits the data. Therefore, the residuals are zero, and ESS = 0.]

With 2 observations, ESS has 2 - 2 = 0 degrees of freedom, for the two-variable model.This is consistent with the result of the above exercise. ESS is automatically zero in this case, regardless of the particular observations. We need zero pieces of information in order to determine ESS in this case.

24 TSS is the numerator of the sample variance, while its degrees of freedom, N - 1, is the denominator of the sample variance.


Degrees of Freedom, Multivariable Regression:

When we subsequently discuss the multivariable regression model, the following more general formulas will hold:

Source of Variation Sum of Squares Degrees of FreedomModel RSS k - 1Error ESS N - k

Total TSS N - 1

Where N is the number of points, and k is the number of variables including the intercept (k = 2 for the two-variable model with one slope and an intercept.)

Note that TSS = RSS + ESS, while N - 1 = (k - 1) + (N - k).

Exercise: For the model fit to heights,^

iY = 24.07 + .6254Xi, what are the degrees of freedom?[Solution: There are 8 points, N = 8. There are two variables, including the intercept, k = 2.RSS has k - 1 = 2 - 1 = 1 degree of freedom. ESS has N - k = 8 - 2 = 6 degrees of freedom.TSS has N - 1 = 8 - 1 = 7 degrees of freedom. Note 7 = 1 + 6.]

ANOVA Table:

When you run a regression program on a computer, it will usually print out an Analysis of Variance (ANOVA) Table.25

For example, for the two-variable model (k = 2) fit to heights, with eight observations (N = 8), the ANOVA Table might look like:26

Source of Variation Sum of Squares27 Degrees of Freedom Mean SquareModel 56.13 1 56.13Error 8.74 6 1.46Total 64.87 7 9.27

Note that: RSS + TSS = 56.13 + 8.74 = 64.87 = TSS. 1 + 6 = 7. 8.74/6 = 1.46. 64.87/7 = 9.27 = sample variance of Y.

This ANOVA table was for a two-variable regression model. For a multivariable regression model, the ANOVA table would look similar, with of course the appropriate degrees of freedom.

25 Those who have not done so, will probably benefit from running such a program a few times. Most such programs will print out many values related to items on the Syllabus, such as residuals, ESS, RSS, TSS, t-statistics, F-Statistics, Durbin-Watson Statistics, variance-covariance matrices, etc.26 Different computer programs may arrange things slightly differently. Also some additional information is probably shown relating to items we have yet to discuss. This ANOVA table was produced by Mathematica.27 The values for the sums of squares differ slightly from those shown previously, due to the lack of intermediate rounding in the calculations underlying what is shown here.


Problems:

4 .1 (1 point) For a two variable model (slope and intercept) fit to 25 points, what are the degrees of freedom associated with the three sums of squares?

Use the following information for the next two questions:For a multivariable regression, you have the following ANOVA Table, with certain items left blank:Source of Variation Sum of Squares Degrees of Freedom Mean SquareModel 1020 255 Error 7 Total 1230

4 .2 (1 point) How many observations were there?(A) 30 or less (B) 35 (C) 40 (D) 45 (E) 50 or more

4 .3 (1 point) How many variables were there in the regression, including the intercept?(A) 2 (B) 3 (C) 4 (D) 5 (E) 6 or more

4.4 (1 point) A regression model with 4 variables (3 slopes and one intercept) has been fit to 50 observations. What are the degrees of freedom associated with the Total Sum of Squares, Regression Sum of Squares, and Error Sum of Squares?

4.5 (10 points) You are given the following 17 observations:X 0 25 50 75 100 125 150 175 200 225 250 275Y 4.90 7.41 6.19 5.57 5.17 6.89 7.05 7.11 6.19 8.28 4.84 8.29

X 300 325 350 375 395Y 8.91 8.54 11.79 12.12 11.02Fit a two-variable linear regression.Graph the data and the fitted line.Graph the residuals.Put together the ANOVA Table, showing the sum of squares and the degrees of freedom.(You may use a computer, but do not use a regression software package.After completing your work, you may then check it using a regression software package.)

4.6 (1 point) A linear regression has been fit to 10 points, (Xi, Yi).

The fitted intercept is α. The fitted slope is ^β.

Σ( α + ^βXi - Y )2 = 49. The sample variance of Y is 8. Determine Σ( α +

^βXi - Yi )2.

(A) 19 (B) 20 (C) 21 (D) 22 (E) 23


Use the following information for the next 5 questions:A linear regression, X = α + βY, is fit to 20 observations, (Xi, Yi).

ΣXi = 42, ΣXi2 = 101, ΣYi = 76, ΣYi2 = 310, ΣXiYi = 167.

4.7 (2 points) Determine ^β.

4.8 (2 points) Determine α.

4.9 (2 points) Determine TSS, the total sum of squares.

4.10 (2 points) Determine RSS, the regression sum of squares.

4.11 (2 points) Determine ESS, the error sum of squares.

4.12 (165, 11/88, Q.2) (1.7 points) You are given the following table: Xi E[Yi ] ei 0 2.0 1.0 1 3.5 1.52 5.0 -2.0 3 6.5 0.5

where: (i) E[Yi ] is the sequence of true values to be estimated.

(ii) ei are particular realizations of the error random variables εi. (iii) Yi are the corresponding particular observations.

(iv) ^

iY are obtained by linear regression of Yi on Xi, including an intercept. 3

Determine Σ (Yi - ^

iY )2. i = 0 (A) 4 (B) 6 (C) 8 (D) 10 (E) 12 Note: The original exam question has been rewritten.


Mahler’s Guide to

Regression

Sections 5-9:

5 R-Squared6 Corrected R-Squared7 Normal Distribution8 Assumptions of Linear Regression9 Properties of Estimators


prepared by


Study Aid F06-Reg-B


Section 5, R-Squared

For the two-variable model fit to heights, the ANOVA Table was:

Source of Variation Sum of Squares Degrees of Freedom Mean SquareModel 56.13 1 56.13Error 8.74 6 1.46Total 64.87 7 9.27

RSS is the amount of variation explained by the model. Thus in this example, RSS/TSS = 56.13/64.87 = 86.5% of the total variation has been explained by the regression model. R-Squared is the percentage of variation explained by the regression model.28

R2 ≡ RSS/TSS ≡ Σ(^

iY - Y )2 / Σ(Yi - Y )2. In this example, R2 = 56.13/64.87 = .865.

R2 = RSS/TSS = 1 - ESS/TSS = 1 - ΣΣΣΣ îε 2222/ΣΣΣΣyi2.

These formulas apply to the multiple-variable case as well as the two variable case.

0 ≤≤≤≤ R2 ≤≤≤≤ 1.29

R2 = 1, when the observed points fall exactly on the fitted line.R2 = 0, when the regression explains none of the variation in the independent variable.

There are number of additional ways to write R2.

For example, since as shown previously, for the two-variable model, RSS = ^β2 Σxi2:

R2 = RSS/TSS = ^β2 Σxi2 /Σyi2.30

Exercise: Verify the above formula in the case of the regression of heights.[Solution: xi = (-6.25, -5.25, -2.25, -1.25, 1.75, 2.75, 3.75, 6.75). yi = (-5.125, -3.125, -.125, -1.125, 1.875, .875, 3.875, 2.875).

Σxi2 = 143.5. Σyi2 = 64.875. Σxiyi = 89.75. ^β = Σxiyi /Σxi2 = 89.75/143.5 = .6254.

^β2 Σxi2 /Σyi2 = (.62542)(143.5/64.875) = .865 = R2.]

28 R-Squared is sometimes called the coefficient of determination. In the case of multiple regression R-Squared is sometimes called the coefficient of multiple determination.29 This restriction on the value of R-squared does not apply to a regression without an intercept.30 See 4, 11/02, Q.30. This formula does not hold for the multiple-variable model.

HCMSA-F06-Reg-B, Mahler’s Guide to Regression, 7/11/06, Page 42

Correlations:

Since for the two variable model, ^β = Σxiyi / Σxi2:

R2 = ^β2 Σxi2 /Σyi2 =

^β Σxiyi /Σyi2 = (Σxiyi)2/ Σxi2Σyi2 = Corr[X, Y]2.

For the 2-variable model, R2 is the square of the correlation between X and Y.

Exercise: What is the correlation of the heights of the fathers and sons?[Solution: Corr[X , Y] = Σxiyi/ √(Σxi2Σyi2) = 89.75/√((143.5)(64.875)) = .9302.]

For this regression of heights, Corr[X, Y]2 = .93022 = .865 = R2.

Corr[Y, ^Y]2 = R2:*

As was shown previously, RSS = Σ ( îY - Y )(Yi - Y ) = the numerator of the correlation of Y and

^Y, and the mean of

^Y is Y .

Therefore, Corr[Y, ^Y] = RSS/√(Σ (Yi - Y )2Σ ( ^

iY - Y )2) = RSS/√(TSS RSS) = √(RSS/ TSS).

Thus, Corr[Y, ^Y]2 = RSS/TSS = R2.

R2 is the square of the correlation between Y and ^Y.31

Exercise: For the regression of the heights of the fathers and sons, what is the correlation

between Y and ^Y?

[Solution: yi = Yi - Y = (-5.125, -3.125, -.125, -1.125, 1.875, .875, 3.875, 2.875).

Σyi2 = 64.875 = TSS. Mean of ^Y is 61.125 = Y .

îY - Y = (-3.909, -3.283, -1.407, -0.782, 1.094, 1.720, 2.345, 4.221)

Σ(^

iY - Y )2 = 56.121 = RSS .

Σ(Yi - Y )(^

iY - Y ) = (-5.125)(-3.909) + (-3.125)(-3.283) + (-.125)(-1.407) + (-1.125)(-.782) + (1.875)(1.094) + (.875)(1.720) + (3.875)(2.345) + (2.875)(4.221) = 56.127.

Corr[Y , ^Y] = 56.127/√((64.875)(56.121)) = .9302.

Comment: For this regression of heights, Corr[Y , ^Y]2 = .93022 = .865 = R2.]

31 This is what is meant by: R-Squared is the square of the multiple correlation coefficient.


Various Formulas for R2:

For either the two-variable or multiple-variable model:

R2 ≡ RSS/TSS ≡ Σ ( îY - Y )2 /Σ (Yi - Y )2.

R2 = the percentage of variation explained by the model.

R2 = RSS/TSS = 1 - ESS/TSS = 1 - Σ îε 2/Σyi2.

R2 = 1 - ESS/TSS = 1 - (N - k)s2/TSS.32

R2 = (k-1)Fk-1,N-k/(k-1)Fk-1,N-k + N - k.33

R2 = 1 - (1 - 2

R )(N - k)/(N - 1).34

For the two-variable model only:

RSS = ^β2 Σxi2=

^β Σxiyi.

R2 = RSS/TSS = ^β2 Σxi2 /Σyi2 =

^β Σxiyi /Σyi2 = (Σxiyi)2/ Σxi2Σyi2 = Corr[X, Y]2.

Therefore, R2 = 0 ⇔ ^β = 0. A small fitted slope ⇒ a small R2.

S Notation:*

R2 = Corr[X, Y]2 = SXY/√(SXX SYY)2 = SXY2/(SXX SYY).

⇒ RSS = R2TSS = SXY2/(SXX SYY)SYY = SXY2/SXX = ΣXiYi - ΣXiΣYi/N2/ΣXi2 - (ΣXi)2/N.

For the regression of heights example, SXX = 143.5, SYY = 64.875, and SXY = 89.75.

RSS = SXY2/SXX = 89.752/143.5 = 56.13, matching the previous result subject to rounding.

ESS = TSS - RSS = SYY - SXY2/SXX = 64.875 - 89.752/143.5 = 8.74.

32 Where k is the number of variables including the intercept. As discussed in subsequent sections,

s2 ≡ ESS/(N-k) = estimated variance of the error terms. See pages 65 and 88 of Pindyck and Rubinfeld.33 Where k is the number of variables including the intercept. As discussed in subsequent sections,

Fk-1,N-k = RSS/(k-1)/ESS/(N-k) = TSS R2/(k-1)/TSS(1 - R2)/(N-k) = R2/(1 - R2)(N-k)/(k-1),

follows an F Distribution with k-1 and N-k degrees of freedom.See Equation 4.12 in Pindyck and Rubinfeld.34 Where k is the number of variables including the intercept.

2R , the corrected R2, is discussed subsequently in this section.


Interpretation of R2:

A large R2 indicates a good fit. A small R2 indicates a poor fit.

In the two variable model, R2 is the square of the correlation between X and Y. Thus an R2 close to 1, means the correlation between X and Y is close to ±1. However we need to distinguish between a large correlation (either positive or negative) and causation.

For example, it seems reasonable to assume that the height of the father is indirectly responsible in part for the height of his son. Yet we would get the same R2 for a regression with the heights of the sons as the independent variable and the heights of the fathers as the dependent variable.

Exercise: Fit a regression model Y = α + βX + ε, where X is the height of the son and Y is the height of his father. [Solution: Remembering, Xi is now the heights of the sons, Σxi2 = 64.875. Σxiyi = 89.75.

^β = Σxiyi /Σxi2 = 89.75/64.875 = 1.383. α = Y −

^βX = 59.25 - (1.383)(61.125) = -25.29.]

Here is a graph of the new regression, Father’s Height = -25.29 + (1.383)(Son’s Height):

56 58 60 62 64 66Son

54

56

58

60

62

64

66

Father

R2 = .865 = Corr[sons, fathers]2, the same as for the previous regression of son’s heights as a function of father’s height.


Here is a comparison of the two regression lines, with the previous regression, Son’s Height = 24.07 + (.6254)(Father’s Height), as a solid line, and the new regression, Father’s Height = -25.29 + (1.383)(Son’s Height) as a dotted line:

54 56 58 60 62 64 66Father

56

58

60

62

64

66

Son

While the two regressions are similar they are not identical. They would only be identical if R2 were equal to 1.

Limitations of R2:35

Since they have the same R2, there is no way to chose between these two regressions on this basis. Based on other than statistical considerations, one might assume that the father’s height helps to determine his son’s height, rather than vice versa. In general, a high R2 indicates a high correlation, which may or may not be related to causality.

Quite often time series of costs will have high correlations, resulting in R2 ≥ .90 when one regresses one on the other. For example, if one regressed average claim costs for automobile insurance as a function of average claim costs for fire insurance, one would likely get a very high R2. Both of these series tend to increase over time at a somewhat similar rate, thus they will likely have a very high correlation. Neither causes the other, although they probably have causes in common.

35 See “The Usefulness of the R2 Statistic”, by Ross Fonticella, CAS Forum Winter 1998, and “A Statistical Note on Trend Factors: The Meaning of R-Squared,” by D. Lee Barclay, CAS Forum Fall 1991.


Similarly, you might also get a high R2 if you regressed average claim costs for automobile insurance in New York as function of the cost of living in England. Yet the cost of living in England would not be a very sensible choice of explanatory variable for automobile insurance severities in New York. In contrast, a consumer price index of the costs of repairing an automobile in the northeast United States might be a good variable to use to try to explain a portion of the movement of automobile insurance claim costs in New York.

So sometimes a high R2 does not indicate a meaningful regression result. On the other hand, sometimes a low R2 will result, even when the independent variable(s) do explain a useful portion of the variation of the dependent variable.

First, the dependent variable may be subject to a large amount of random fluctuation, so it is hard for any model to fit the observations. This could occur if one was trying to fit the observed loss ratios for a small book of business. Second, it may be that the R2 would be higher if additional causes of the dependent variable were added to the regression.

The value of R2 may depend on how a model is stated. Two models with essentially the same information, may have different R2.

Exercise: Fit a regression of the difference between the height of the father and his son, as a function of the height of the father. Determine R2.[Solution: Height of Son - Height of Father = 24.079 - .3746(Height of Father). RSS = 20.13. ESS = 8.74. R2 = 20.13/(20.13 + 8.74) = .672.]

This regression contains the exact same information as our original regression:Height of Son = 24.079 + .6254(Height of Father). However, the original regression had an R2 of .865 rather than .672. The residuals and ESS are the same for both models, but the TSS is smaller for the second model. Thus the R-Squared for two essentially identical models can be significantly different. This highlights one of the problems of relying solely on R2 in order to decide between models.

As one adds more and more relevant variables, one is usually able to get a regression that seems to fit very well. As one adds more variables, R2 increases.

However, we can also overfit the data. For example if there are only five observations, we can

exactly fit them with a model: ^Y = β1 + β2X + β3X2 + β4X3 + β5X4 + ε.

The principle of parsimony states we do not wish to use more parameters than needed to get the job done. While we would like a large R2, we would also like to use few parameters. These are countervailing goals. Thus one should not directly compare the R2 for two models with different numbers of parameters. As discussed in the next section, one can directly compare

the corrected R2, 2

R , for two models.


Pure Error:*36

When there are repeated observations with the same value of X, usually the corresponding Y values differ. In this situation, no model can perfectly fit the data. Therefore, the maximum possible R2 is less than one.

One can quantify the “pure error” that results from these differing Y values for the same value of X. Then one can divide the Error Sum of Squares between “lack of fit” and “pure error”.The model can not explain the “pure error”.

One could then compare the R2 for a model to the maximum possible R2 in this situation. For example, if the maximum possible R2 were 0.60, then an R2 of 0.56 is reasonably high.

36 See Section 2.1 of Applied Linear Regression by Draper and Smith, not on the syllabus.


Problems:

5 .1 (1 point) A two-variable linear regression has its regression sum of squares equal to 124 and error sum of squares equal to 21. What is R2?(A) .77 (B) .79 (C) .81 (D) .83 (E) .85

5.2 (3 points) You are given the following five observations:X 1 2 3 4 5Y 1 1 2 2 4For a two-variable linear regression fit to this data, what is R2?(A) 0.82 (B) 0.84 (C) 0.86 (D) 0.88 (E) 0.90

5.3 (1 point) A linear regression is fit, Y = α + βX.For the same set of data, another linear regression is also fit, X = γ + δY.

Determine the value of ^β ^δ.

5.4 (2 points) You fit a two-variable regression model:^β = 17.25.Σ(Xi - X)2 = 37.0.

Σ(Yi - Y )2 = 20019.

Determine R2.(A) 0.55 (B) 0.60 (C) 0.65 (D) 0.70 (E) 0.75

Use the following information for the next two questions:X 0 1 2 5Y 2 5 11 18

5.5 (3 points) Using the method of least squares, you fit the model Yi = α + βXi + εi.

Determine the value of R2 for this model.A. 93% B. 94% C. 95% D. 96% E. 97%

5.6 (3 points) Using the method of least squares, you fit the model (Xi - Yi) = α + βXi + εi.

Determine the value of R2 for this model.A. 93% B. 94% C. 95% D. 96% E. 97%

5.7 (2 points) You fit a two-variable regression model:Σ(Xi - X)2 = 1660.

Σ(Yi - Y )2 = 899.

Σ(Xi - X)(Yi - Y ) = 1022.



5.8 (Course 120 Sample Exam #2, Q.3) (2 points) You fit a simple linear regression to five pairs of observations. The residuals for the first four observations are 0.4, -0.3, 0.0, -0.7, and the estimated variance of the dependent variable Y is Σ(Yi - Y )2/(N - 1) = 1.5.

Calculate R2.(A) 0.82 (B) 0.84 (C) 0.86 (D) 0.88 (E) 0.90

5.9 (Course 4 Sample Exam, Q.29) (2.5 points) You wish to determine the relationship between sales (Y) and the number of radio advertisements broadcast (X). Data collected on four consecutive days is shown below.

Day Sales Number of Radio Advertisements1 10 22 20 23 30 34 40 3

Using the method of least squares, you determine the estimated regression line:

^Y = -25 + 20X.

Determine the value of R2 for this model.

5.10 (IOA 101, 9/00, Q.6) (1.5 points) Suppose that the linear regression modelY = α + βX + ε

is fitted to data (Xi , Yi) : i = 1, 2, … , n, where Y is the salary of a company manager and X (years) is the number of years of relevant experience of that manager.State the units of measurement (if any) of

(a) α, the estimate of α,

(b)^β, the estimate of β,

(c) R2, the coefficient of determination of the fit.

5.11 (IOA 101, 9/02, Q.5) (2.75 points) Suppose that a line is fitted by least squares to a set of data, (Xi, Yi), i =1, 2, ..., n, which has sample correlation coefficient r.

Let the fitted value at Xi be denoted ^

iY .

Show that the sample correlation coefficient of the data (Yi, ^

iY ), i =1, 2, ..., n, that is, of the observed and fitted y values, is also equal to r.


5.12 (4, 11/02, Q.5) (2.5 points) You fit the following model to eight observations:Y = α + βX + εYou are given:^β = 2.065Σ(Xi - X)2 = 42

Σ(Yi - Y )2 = 182


5.13 (4, 11/02, Q.30) (2.5 points)Which of the following is not an objection to the use of R2 to compare the validity ofregression results under alternative specifications of a multiple linear regression model?(A) The F statistic used to test the null hypothesis that none of the explanatory variableshelps explain variation of Y about its mean is a function of R2 and degrees of freedom.(B) Increasing the number of independent variables in the regression equation can neverlower R2 and is likely to raise it.(C) When the model is constrained to have zero intercept, the ratio of regression sum ofsquares to total sum of squares need not lie within the range [0,1].(D) Subtracting the value of one of the independent variables from both sides of theregression equation can change the value of R2 while leaving the residuals unaffected.(E) Because R2 is interpreted assuming the model is correct, it provides no directprocedure for comparing alternative specifications.

5.14 (VEE-Applied Statistics Exam, 8/05, Q.9) (2.5 points) The method of ordinary least squares is used to fit the following two models to the same data set: Model I: Yi = α1 + β1Xi + ε1i

Model II: (Xi - Yi) = α2 + β2Xi + ε2i Which of (A), (B), (C), and (D) is false?

(A) α1 = - α2

(B) ^β1 +

^β2 = 1

(C) Σε1i2 = Σε2i2

(D) R2 for Model I is equal to R2 for Model II. (E) None of (A), (B), (C), and (D) is false.


Section 6, Corrected R2:

The corrected R2, 2

R , adjusts for the number of variables used.37

Recall that R2 = 1 - ESS/ TSS = 1 - Σ îε 2/Σyi2 . The definition of

2R is similar, except it uses

sample variances rather than sums of squared errors.

2R ≡ 1 - (sample variance of residuals)/ (sample variance of Y) = 1 - Σ ^

iε 2/(N-k)/Σyi2/(N-1) =

1 - (ESS/TSS)(N - 1)/(N - k) = 1 - (1 - R2)(N - 1)/(N - k).

1 - 2

R = (1 - R2)(N - 1)/(N - k), where N is the number of observations, and k is the number of variables including the intercept.Note that the correction factor of: (N - 1)/(N - k) = (# degrees of freedom associated with TSS) / (# degrees of freedom associated with ESS).

For the two-variable model, 1 - 2

R = (1 - R2)(N - 1)/(N - 2).

Exercise: For the regression of the heights of the fathers and sons, what is 2

R ?

[Solution: N = 8, k = 2 and R2 = .865. 1 - 2

R = (1 - R2)(N - 1)/(N - k) = (1 - .865)(7/6) = .1575.2

R = .843.]

For k = 1, a one variable model, 2

R = R2

In general, 2

R ≤≤≤≤ R2.

As k, the number of variables, increases, the correction due to the factor of (N - 1)/(N - k) also

increases. Thus as one adds more variables, 2

R may either increase or decrease. If R2 is small, and the correction factor is big due to using lots of variables relative to the number of

observations, then it may turn out that 2

R < 0.

Exercise: For a multiple regression with 5 slopes plus an intercept fit to 11 observations, R2 = 0.3. What is the corrected R2?

[Solution: N = 11. k = 6. 1 - 2

R = (1 - R2)(N - 1)/(N - k) = (1 - .3)(10/5) = 1.4. 2

R = -0.4.]

One can usefully compare the corrected R2’s of different regressions.

All other things being equal, we prefer the model with the largest 2

R .

As will be discussed subsequently, one can perform F-Tests, in order to check the significance of adding variables to a model.37 The corrected R2 is also called the “adjusted R2” or “R2 adjusted for degrees of freedom.”


Exercise: The following seven regression models each with intercept and different numbers of explanatory variables have been fit to the same set of 10 observations:Model Variables in the Model ESS I X2 5.58 II X3 7.54 III X4 6.51 IV X2, X3 5.21 V X2, X4 4.53 VI X3, X4 3.72 VII X2, X3, X4 3.31The estimated variance of the dependent variable Y is 2.20.

Which model has the best corrected R2, 2

R ?[Solution: TSS = (N - 1)(sample variance of Y) = (9)(2.20) = 19.8. R2 = 1 - ESS/TSS.

1 - 2

R = (1 - R2)(N - 1)/(N - k), where k is the number of variables including the intercept.Model k ESS TSS R2 corrected R2

I 2 5 .58 19.8 0 .718 0.683I I 2 7 .54 19.8 0 .619 0.572I I I 2 6 .51 19.8 0 .671 0.630IV 3 5.21 19.8 0 .737 0.662V 3 4.53 19.8 0 .771 0.706VI 3 3.72 19.8 0 .812 0 . 7 5 8VII 4 3 .31 19.8 0 .833 0.749

Model VI has the largest 2

R .

Comment: For a fixed number of variables, the model with the smallest ESS has the best 2

R .For two variables, model I is best. For three variables, model VI is best. Thus in order to

answer this question, there is no reason to compute 2

R for models IV and V.]

Principle of Parsimony:*

The principle of parsimony states that one should not make more assumptions than the minimum needed.38 As applied here, one should not use more independent variables in the model than get the job done.39 Adding additional independent variables usually reduces R2, although rarely R2 remains the same.

One should not use an additional variable that results in a reduction in 2

R . Many actuaries would not use an additional independent variable unless there was a substantial improvement

in 2

R . We will discuss subsequently how to test whether model coefficients are zero.

38 Also called Occam’s razor.39 As applied to fitting size of loss distributions, one should not use more parameters than get the job done.


The Expected Value of R2:*

Since 1 ≥ R2 ≥ 0, the expected value of R2 is positive, even if ρ = 0 in the two variable model.

Since 2

R has been corrected for degrees of freedom, if ρ = 0 in the two variable model, or more generally if in a multiple regression the dependent variable is independent of each of the

independent variables, then we expect that E[2

R ] = 0.40

R2 = 1 - (1 - 2

R )(N - k)/(N - 1).

ρ = 0. ⇒ E[2

R ] = 0. ⇒ E[R2] = 1 - (1 - E[2

R ])(N - k)/(N - 1) = 1 - (N - k)/(N - 1) = (k - 1)/(N - 1).

Exercise: A linear regression is fit to 6 observations. If X and Y are independent, what is the expected value of R2?[Solution: E[R2] = (k - 1)/(N - 1) = (2 - 1)/(6 - 1) = 20%.]

Thus in spite of there being no relationship between X and Y in this case, E[R2] = 20%.When one has few observations, N is small, or one has fit many variables, k is large, the expected value of R2 is relatively large.

Exercise: A multiple regression with 6 variables is fit to 6 observations.The dependent variable is independent of each of the independent variables.What is the expected value of R2?[Solution: E[R2] = (k - 1)/(N - 1) = (6 - 1)/(6 - 1) = 1.Comment: This model will fit perfectly, even if there is no actual relationship between any of the independent variables and the dependent variable. R2 = 1.]

The Distribution of R2:*

R2 = RSS/TSS = RSS/(RSS + ESS) = (RSS/ESS)/(RSS/ESS) + 1.

Define the statistic, F = (RSS/ν1)/(ESS/ν2), where ν1 = k - 1, the number of degrees of freedom

of RSS, and ν2 = N - k, the number of degrees of freedom of ESS.41

Then R2 = ν1F/(ν1F + ν2). ⇒ F = (ν2/ν1)R2/(1 - R2).

For the Classical Normal Linear Regression Model, it can be shown that if all of the actual slopes of the model are zero, in other words the dependent variable is independent of each of the independent variables, then the F-statistic follows an F-Distribution with ν1 and ν2 degrees of freedom. 42

40 This result will be demonstrated below. Recall that unlike R2, 2

R can be either positive or negative.41 As discussed subsequently, this F-statistic will be used to test hypotheses about the slopes of multiple regression models.42 The assumptions of the regression models are discussed in a subsequent section.


The distribution of F is in terms of an incomplete Beta Function: β[ν1/2, ν2/2; ν1x/(ν2 + ν1x)].43

Applying a change of variables, the distribution of R2 is:β[ν1/2, ν2/2; ν1(ν2/ν1)R2/(1 - R2)/ν2 + ν1(ν2/ν1)R2/(1 - R2)]

= β[ν1/2, ν2/2; R2]

Thus if all of the actual slopes of the model are zero, R2 follows a Beta Distribution as per Loss Models with a = ν1/2 = (k - 1)/2, b = ν2/2 = (N - k)/2, and θ = 1.44

For example, for a linear regression fit to 6 observations, if ρ = 0, R2 follows a Beta Distribution with a = (2 - 1)/2 = 1/2, b = (6 - 2)/2 = 2, and θ = 1, with density graphed below:45

0.2 0.4 0.6 0.8 1R2

2.5

5

7.5

10

12.5

15

17.5

Prob.

A Beta Distribution as per Loss Models has mean: θa/(a + b).Therefore, E[R2] = a/(a + b) = (ν1/2)/(ν1/2 + ν2/2) = ν1/(ν1 + ν2) = (k - 1)/(N - 1).

This matches the result discussed previously, for the case where the dependent variable is independent of each of the independent variables.

Exercise: For a linear regression fit to 6 observations, if ρ = 0, what is the expected value of R2?[Solution: E[R2] = (k - 1)/(N - 1) = (2 - 1)/(6 - 1) = 0.20.]43 The F-Distribution is discussed in a subsequent section.44 See Section 5.3 of Applied Regression Analysis, by Draper and Smith.45 This density is: .75(1 - x)/√x, 0 ≤ x ≤ 1.


A Beta Distribution as per Loss Models has second moment: θ2a(a + 1)/(a + b)(a + b + 1), and variance: θ2ab / (a + b)2 (a + b + 1).Therefore, Var[R2] = ν1ν2/(ν1 + ν2)2 (ν1 + ν2 + 2)/2 = 2(k - 1)(N - k)/(N - 1)2(N + 1).

Exercise: For a linear regression fit to 6 observations, if ρ = 0, what is the variance of R2?[Solution: Var[R2] = 2(k - 1)(N - k)/(N - 1)2(N + 1) = (2)(2 - 1)(6 - 2)/(6 - 1)2(6 + 1) = 0.0457.Comment: The standard deviation is: 0.214.]

The Expected Value of Corrected R2:*

2R ≡ 1 - (1 - R2)(N - 1)/(N - k).

E[2

R ] = 1 - (1 - E[R2])(N - 1)/(N - k).

When the dependent variable is independent of each of the independent variables:

E[2

R ] = 1 - 1 - (k - 1)/(N - 1)(N - 1)/(N - k) = 1 - (N - k)/(N - 1)(N - 1)/(N - k) = 0.

This demonstrates a key advantage of 2

R as compared to R2. When the dependent variable is independent of each of the independent variables, in other words when the model actually

explains none of the variation of the dependent variable, E[2

R ] = 0, while E[R2] > 0.


Problems:

Use the following information for the next two questions:Source Sum of Squares Degrees of FreedomRegression 335.2 3Error 102.5 6Total 437.7 9

6 .1 (1 point) Determine R2.(A) .77 (B) .79 (C) .81 (D) .83 (E) .85

6 .2 (1 point) Determine 2

R , the corrected R2.(A) .61 (B) .63 (C) .65 (D) .67 (E) .69

6.3 (2 points) Several regression models, each with intercept and different sets of explanatory variables, have been fit to the same set of 15 observations in order to explain Workers Compensation Insurance Claim Frequencies. The explanatory variables used were: Log of Employment (E), Log of Unemployment Rate (U), Log of Waiting Period for Benefits (W), and a Cost Containment Dummy Variable (C).Model Variables in the Model Error Sum of Squares I E, U 0.0131 II E, W 0.0123 III U, W 0.0144 IV E, U, C 0.0115 V E, W, C 0.0117 VI U, W, C 0.0118 VII E, U, W, C 0.0106

The estimated variance of the dependent variable, the log of the claim frequencies, is .0103.

Which model has the best 2

R ?A. II B. III C. IV D. VI E. VII

* 6.4 (3 points) A three variable linear regression, Y = β1 + β2X2 + β3X3 + ε, has been fit to 11 observations.If β2 = β3 = 0, what is the distribution of R2? What is its density, mean, and variance?


6.5 (3 points) Two linear regressions, with slope β and intercept α, have been fit to two different sets of 19 observations. For both sets of observations, the sample variance of the values of the independent variables is 37. For both sets of observations, the sample variance of the values of the dependent variables is 13. For the first set of observations Σ(Xi - X)(Yi - Y ) is greater than for the second set of observations. Which of the following statements is not true?

A. ^β is greater for the first regression than it is for the second regression.

B. α is greater for the first regression than it is for the second regression.C. s2 is smaller for the first regression than it is for the second regression.D. R2 is greater for the first regression than it is for the second regression.

E. 2

R is greater for the first regression than it is for the second regression.

6 .6 (2 points) Which of the following is not equal to R2?A. RSS/TSS.

B. Σ ( îY - Y )2 /Σ (Yi - Y )2.

C. 2

R (N - k)/(N - 1).

D. The percentage of variation explained by the regression model.

E. 1 - Σîε 2/Σyi2.

6.7 (2 points) A multiple regression model is fit to some data. Then an additional independent variable is added to the model, and the regression is fit to the same data.Which of the following statements is true?A. R2 may decrease.

B. 2

R may decrease.C. ESS may increase.D. RSS may decrease.E. None of A, B, C, and D are true.

6.8 (2 points) A linear regression model with slope and intercept is fit to 6 observations. 2

R = 0.64. If the same form of model is fit to 10 new similar observations, what is the expected value of R2?A. 0.66 B. 0.67 C. 0.68 D. 0.69 E. 0.70


* 6.9 (3 points) A two variable linear regression, Y = α + βX + ε, has been fit to 3 observations.If β = 0, what is the distribution of R2? What is its density, mean, and variance?

6.10 (Course 120 Sample Exam #1, Q.9) ( 2 points) You are given:Source Sum of Squares Degrees of FreedomRegression 1115.11 2Error 138.89 5Total 1254.00 7

Determine 2

R , the corrected R2.(A) 0.84 (B) 0.89 (C) 0.93 (D) 0.97 (E) 1.00

6.11 (Course 120 Sample Exam #3, Q.4) (2 points) You fit the regression model Yi = α + βXi + εi to 11 observations. You are given that R2 = 0.85.

Determine 2

R , the corrected R2.(A) 0.77 (B) 0.79 (C) 0.80 (D) 0.83

6.12 (4, 5/00, Q. 31) (2.5 points) You fit the following model to 48 observations:Y = β1 + β2X2 + β3X3 + β4X4 + εYou are given:Source of Variation Degrees of Freedom Sum of SquaresRegression 3 103,658Error 44 69,204

Calculate 2

R , the corrected R2.(A) 0.57 (B) 0.58 (C) 0.59 (D) 0.60 (E) 0.61


Section 7, Normal Distribution

The Normal Distribution is a bell-shaped symmetric distribution. Its two parameters are

its mean µ and its standard deviation σ. f(x) = exp[-(x-µ)2 / (2σ2) ] / σ(2π).5, -∞ < x < ∞. The sum of two independent Normal Distributions is also a Normal Distribution, with the sum of the means and variances. If X is normally distributed, then so is aX + b, but with mean aµ+b and standard deviation aσ. If one standardizes a normally distributed variable by subtracting µ and dividing by σ, then one obtains a unit normal with mean 0 and standard deviation of 1.

Normal Distribution

Support: ∞ > x > -∞ Parameters: ∞ > µ > -∞ (location parameter) σ > 0 (scale parameter)

D. f. : F(x) = ΦΦΦΦ((((((((x−−−−µµµµ))))////σσσσ))))

P. d. f. : f(x) = φφφφ((((((((x−−−−µµµµ))))////σσσσ)))) ==== (1/σσσσ√√√√2ππππ) exp( -[(x-µµµµ )2]/[2σσσσ2] )

Mean = µµµµ Variance = σσσσ2

Skewness = 0000 ((((distribution is symmetric)Kurtosis = 3Mode = µ Median = µMethod of Moments: µ = µ1′ , σ = (µ2′ - µ1′2).5

Percentile Matching: Set gi = Φ−1(pi), then σ = (x1 - x2)/(g1 - g2), µ = x1 - σg1

Method of Maximum Likelihood: Same as Method of Moments

Sample Distribution, µ = 10 and σ = 5:

-10 10 20 30

0.02

0.04

0.06

0.08

The density of the Unit Normal is denoted by φ(x) = exp[-x2 / 2 ] / (2π).5, -∞ < x < ∞.


The corresponding distribution function is denoted by Φ(x).

The following table is similar to that attached to the exam and shows the values of the Unit Normal Distribution Φ(x) with mean of 0 and variance of 1. In order to use this table one must first standardize an approximately normal variable by subtracting its mean and dividing by its standard deviation. For x < 0 one must make use of symmetry: Φ(x) = 1 - Φ(-x).

NORMAL DISTRIBUTION TABLE

The first table below gives values of the distribution function, Φ (x), of the standard normal distribution for selected values of x. The integer part of x is given in the top row, and the first decimal place of x is given in the left column.

x 0 1 2 3

0.0 0.5000 0.8413 0.9772 0.99870.1 0.5398 0.8643 0.9821 0.99900.2 0.5793 0.8849 0.9861 0.99930.3 0.6179 0.9032 0.9893 0.99950.4 0.6554 0.9192 0.9918 0.99970.5 0.6915 0.9332 0.9938 0.99980.6 0.7257 0.9452 0.9953 0.99980.7 0.7580 0.9554 0.9965 0.99990.8 0.7881 0.9641 0.9974 0.99990.9 0.8159 0.9713 0.9981 1.0000

This second table provides the x values that correspond to some selected values of Φ(x).

Φ(x) x

0.800 0.8420.850 1.0360.900 1.2820.950 1.6450.975 1.9600.990 2.3260.995 2.576


Central Limit Theorem:

Let X1, X2, ..., Xn be a series of independent, identically distributed variables,

with finite mean and variance. Let Xn = average of X1, X2, ..., Xn.

Then as n approaches infinity, Xn approaches a Normal Distribution.

If each Xi has mean µ and variance σ2, then Xn has mean m and variance σ2/n.

Therefore, ( Xn - µµµµ))))////((((σσσσ/√√√√n) approaches a unit Normal as n approaches infinity.

For example, assume you X follows a uniform distribution from 0 to 10.This has mean of 5 and variance of 102/12 = 8.333.

The average of 20 independent random draws from this uniform distribution, has a mean of 5 and a variance of 8.333/20 = .41665. The average of these 20 values is approximately Normal, with this mean and variance.

Exercise: What is the probability that the average of these 20 random draws is less than 4?[Solution: Prob[ X < 4] ≅ Φ((4 - 5)/√.41665) = Φ(-1.549) = 6.1%.]

The sum of 20 independent random draws from this uniform distribution, has a mean of (20)(5) = 100 and a variance of (20)(8.333) = 166.67. The sum of these 20 values is approximately Normal, with this mean and variance.

Exercise: What is the probability that the sum of these 20 random draws is less than 80?[Solution: Prob[Σ Xi < 80] ≅ Φ((80 - 100)/√166.67) = Φ(-1.549) = 6.1%.

Comment: Prob[Σ Xi < 80] = Prob[ X < 4].]


Kurtosis:*

The kurtosis is defined as the fourth central moment divided by the square of the variance. As with the skewness, the kurtosis is a dimensionless quantity (both the numerator and denominator are in dollars to the fourth power), which describes the shape of the distribution. Since the fourth central moment is always non-negative, so is the kurtosis. Large kurtosis corresponds to a heavier-tailed curve, and vice versa.

Exercise: Compute the 4th central moment of a Normal Distribution.∞ ∞

∫ (x-µ)4 (1/σ√2π) exp( -[(x-µ )2]/[2σ2] ) dx = (2/σ√2π)∫ σ4z2 exp(-z/2) .5σz-.5dz = x=-∞ z=0

∞

(σ4/√2π)∫ z1.5 exp(-z/2)dz = (σ4/√2π)Γ(2.5)(1/2)-2.5 = (σ4/√π)(1.5)Γ(1.5)22 = z=0

(σ4/√π)(6)(.5√π) = 3σ4. Note we made a change of variables z = ((x-µ)/σ)2 and got an integral involving a complete Gamma function.]

Exercise: Compute the kurtosis of a Normal Distribution with parameters µ and σ.[Solution: The kurtosis is defined as the fourth central moment divided by the square of the variance = 3σ4/ (σ2)2 = 3.]

All Normal Distributions have a kurtosis of 3. Thus curves with a kurtosis more than 3 are heavier-tailed than a Normal Distribution. Rather than kurtosis, some people use Excess, which is just kurtosis - 3. Thus the Normal Distribution has an Excess of 0.Curves with positive excess are heavier-tailed than the Normal Distribution.

Exercise: Let µ1′, µ2′, µ3′, and µ4′ be the first four moments (around the origin) of a distribution. What is the 4th central moment of this distribution?

[Solution: The 4th central moment is: E[(X-µ1′)4] = E[X4 - 4µ1′X3 + 6µ1′2X2 - 4µ1′3X + µ1′4] =

E[X4] - 4µ1′E[X3] + 6µ1′2E[X2] - 4µ1′3E[X] + µ1′4 = µ4′ - 4µ1′µ3′+ 6µ1′2µ2′ - 4µ1′3µ1′ + µ1′4 = µ4′

- 4µ1′µ3′ + 6µ1′2µ2′ - 3µ1′4.]

Approximations to the Normal Distribution:*46

Φ(x) ≅ 1- φ(x).4361836t -.1201676t2 +.9372980t3, where t = 1/(1+.33267x)

46 See pages 103-104 of Simulation by Ross or 26.2.16 in Handbook of Mathematical Functions.


Testing the Normality of Residuals:

One could test whether the residuals are Normally distributed using the methods one can apply to any data and distribution.47 One could use graphical techniques such as:histogram, ogive, or p-p plots. One could use statistical tests such as the K-S Statistic.

However, there are special techniques one can apply for the Normal Distribution.If the graph of the residuals does not look roughly symmetric, they are probably not Normally distributed.48 If the residuals are Normally Distributed one also expects the skewness of the residuals to be close to zero and the kurtosis to be close to 3, those of a Normal Distribution.

Jarque-Bera Statistic:*

One can combine these ideas into a statistical test using the Jarque-Bera statistic:49 JB = (N/6)(Skewness2 + (Kurtosis - 3)2/4).

The JB Statistic has a Chi-Square Distribution with 2 degrees of freedom,50 which is an Exponential Distribution a with mean of 2.51 If the residuals are Normal, we expect the JB Statistic to be close to zero, since the observed skewness should be close to zero and the observed kurtosis should be close to 3. If the JB Statistic is sufficiently large, then we reject the hypothesis that the residuals are Normal.

Exercise: For a regression fit to 100 points, the residuals have a skewness of -0.6 and a Kurtosis of 3.8. Compute the Jarque-Bera statistic and test the null hypothesis that the residuals are Normally Distributed [Solution: JB = (100/6)(.62 + .82/4) = 8.67.The p-value of the test is the chance that the JB statistic would be this large or larger if the null hypothesis were true. If H0 is true, JB follows an Exponential Distribution with mean 2 and

therefore, p-value = e-8.67/2 = 1.3%. Using the Chi-Square table for 2 degrees of freedom, since 7.38 < 8.67 < 9.21, we reject H0 at 2.5% and do not reject at 1%.]

47 See “Mahler’s Guide to Fitting Loss Distributions.”48 The Normal Distribution is symmetric. Note that the mean of the residuals is always zero.49 See page 47 of Econometric Models and Economic Forecasts.50 The skewness of a sample from a Normal Distribution is asymptotically Normal with mean zero and variance of 6/n.

Therefore, (N/6)Skewness2 for large samples is approximately a Unit Normal squared.The kurtosis of a sample from a Normal Distribution is asymptotically Normal with mean 3 and variance of 24/n.

Therefore, (N/24)(Kurtosis - 3)2 for large samples is approximately a Unit Normal squared. See Statistical Methods by Snedecor and Cochran or Volume 1 of Kendall’s Advanced Theory of Statistics.Also the correlation between the skewness and kurtosis from Normal Samples is asymptotically (27/n). Thus for large samples, the two terms in the JB Statistic are approximately independent, and their sum is approximately the sum of two independent unit Normals squared, which is a Chi-Square Distribution with 2 degrees of freedom. See Volume 1 of Kendall’s Advanced Theory of Statistics.51 In general, a Chi-Square Distribution with ν degrees of freedom, is a Gamma Distribution with α = ν/2 and θ = 2. See “Mahler’s Guide to Conjugate Priors.”


Problems:

7 .1 (1 point) For a Standard Normal Distribution, with mean 0 and standard deviation of 1, which of the following is a symmetric 90% confidence interval?A. [-1.282, 1.282] B. [-1.645, 1.645] C. [-1.960, 1.960]D. [-2.326, 2.326] E. None of A, B, C, or D.

7 .2 (1 point) X has a Normal Distribution with µ = 7 and σ = 4. What is the probability that 6 ≤ x ≤ 12?A. 49% B. 50% C. 51% D. 52% E. 53%

7.3 (3 points) For which of the following situations are the residuals of the regression most likely to be Normally Distributed?Regression Second Moment Third Moment Fourth Moment

of the Residuals of the Residuals of the ResidualsA. 5 2 80B. 5 1 50C. 5 0 100D. 5 -2 50E. 5 -1 70

7.4 (1 point) ^β = 13 and Var[

^β] = 9. Using the Normal Approximation, which of the following is

a symmetric 95% confidence interval for β?A. [9.1, 16.9] B. [8.1, 17.9] C. [7.1, 18.9]D. [6.1, 19.9] E. None of A, B, C, or D.

7.5 (2, 5/83, Q. 19) (1.5 points) Let X1, X2, and X3 be a random sample from a normal

distribution with mean µ ≠ 0 and variance σ2 = 1/24. What are the values of a and b, respectively, in order for L = aX1 + 4X2 + bX3, to have a standard normal distribution? A. a = -2, b = -2 B. a = -2, b = 2 C. a = -1, b= -3 D. a = 2, b = 2 E. Cannot be determined from the given information

7.6 (2, 5/88, Q. 29) (1.5 points) A symmetric 98% confidence interval is needed for µ, the mean of a normal population whose variance is 10. What is the smallest sample size required so that the length of the confidence interval will be no more than 3? A. 5 B. 7 C. 25 D. 30 E. 242

7.7 (2, 5/90, Q. 3) (1.7 points) Let X1, X2, . . . , X36 and Y1, Y2, . . . , Y49, be independent

random samples from distributions with means µX = 30.4 and µY = 32.1 and with standard

deviations σX = 12 and σY = 14.

What is the approximate value of P[ X > Y ]? A. 0.27 B. 0.34 C. 0.50 D. 0.66 E. 0.73


7.8 (2, 5/92, Q. 37) (1.7 points) A random sample X1, . . . , Xn is taken from a normal

distribution with mean µ and variance 12. A symmetric 95% confidence interval is needed for µ. What is the smallest sample size for which the length of the desired confidence interval is less than or equal to 5?A. 3 B. 7 C. 8 D. 62 E. 89

7.9 (2, 2/96, Q.2) (1.7 points) Let X be a normal random variable with mean 0 and variance a > 0. Calculate P[X2 < a]. A. 0.34 B. 0.42 C. 0.68 D. 0.84 E. 0.90

7.10 (2, 2/96, Q.45) (1.7 points) The weights of the animals in a population are normally distributed with variance 144. A random sample of 16 of the animals Is taken. The mean weight of the sample is 200 pounds. Calculate the lower bound of the symmetric 90% confidence interval for the mean weight of the population. A. 140.96 B. 194.12 C. 194.75 D. 195.08 E. 198.77

7.11 (1, 5/00, Q.6) (1.9 points) Two instruments are used to measure the height, h, of a tower. The error made by the less accurate instrument is normally distributed with mean 0 and standard deviation 0.0056h. The error made by the more accurate instrument is normally distributed with mean 0 and standard deviation 0.0044h.Assuming the two measurements are independent random variables, what is theprobability that their average value is within 0.005h of the height of the tower?(A) 0.38 (B) 0.47 (C) 0.68 (D) 0.84 (E) 0.90

7.12 (1, 5/00, Q.9) (1.9 points) The total claim amount for a health insurance policy follows a distribution with density function e-x/1000/1000, for x > 0.The premium for the policy is set at 100 over the expected total claim amount.If 100 policies are sold, what is the approximate probability that the insurance company will have claims exceeding the premiums collected?(A) 0.001 (B) 0.159 (C) 0.333 (D) 0.407 (E) 0.460

7.13 (1, 5/00, Q.19) (1.9 points) In an analysis of healthcare data, ages have been rounded to the nearest multiple of 5 years. The difference between the true age and the rounded age is assumed to be uniformly distributed on the interval from -2.5 years to 2.5 years. The healthcare data are based on a random sample of 48 people.What is the approximate probability that the mean of the rounded ages is within 0.25 years of the mean of the true ages?(A) 0.14 (B) 0.38 (C) 0.57 (D) 0.77 (E) 0.88

7.14 (1, 11/00, Q.19) (1.9 points) Claims filed under auto insurance policies follow a normal distribution with mean 19,400 and standard deviation 5,000.What is the probability that the average of 25 randomly selected claims exceeds 20,000?(A) 0.01 (B) 0.15 (C) 0.27 (D) 0.33 (E) 0.45


7.15 (1, 5/01, Q.19) (1.9 points) A company manufactures a brand of light bulb with a lifetime in months that is normally distributed with mean 3 and variance 1. A consumer buys a number of these bulbs with the intention of replacing them successively as they burn out. The light bulbs have independent lifetimes.What is the smallest number of bulbs to be purchased so that the succession of light bulbsproduces light for at least 40 months with probability at least 0.9772?(A) 14 (B) 16 (C) 20 (D) 40 (E) 55

7.16 (1, 5/03, Q.13) (2.5 points) A charity receives 2025 contributions. Contributions are assumed to be independent and identically distributed with mean 3125 and standard deviation 250. Calculate the approximate 90th percentile for the distribution of the total contributions received.(A) 6,328,000 (B) 6,338,000 (C) 6,343,000 (D) 6,784,000 (E) 6,977,000


Section 8, Assumptions of Linear Regression52

There are a number of assumptions of the two-variable and multivariable regression models.

Linear Relationship:

In the two variable model we assume: Yi = α + βXi + εi,

where α is the intercept, β is the slope, and εi is the ith error term.

As will be discussed subsequently, a similar relationship holds for the multivariable model. For example, for the three variable model: Yi = β1 + β2X2i + β3X3i + εi.

For the four variable model: Yi = β1 + β2X2i + β3X3i + β4X4i + εi.

As will be discussed subsequently, via change of variables one can obtain other relationships between the independent variable(s) and the dependent variable.

Meaning of the Error Terms:

The error terms, εi, are random variables. They represent the variation in the dependent variable caused by independent variables not included in the model and/or random fluctuation.

For example, in the heights example, the height of the mother probably explains some of the differences in the heights of sons, that is not explained solely by the heights of the fathers.Nevertheless, a model that included the heights of both parents, would still have an error term, since there are other factors that affect height, as well as a random element.

Many items important in insurance, such as the number of claims, aggregate dollars of loss, etc., have a large purely random component.

Nonstochastic Independent Variables:

We assume the values of the independent variable(s) are observed known values. In actuarial work, we usually have little or no control over what values of X have been observed.

52 See Sections 3.1 and 4.1 of Pindyck and Rubinfeld.


Error Terms Have Mean of Zero:

We assume E[εi] = 0, for all i.

This implies that the linear regression estimator for Yi, α + βXi, is unbiased:

E[Yi] = E[α + βXi + εi] = E[α + βXi ] + E[εi] = E[α + βXi].53

Homoscedasticity:

We assume the errors terms have constant variance; Var[εi] = σ2 for all i. This is the equivalent to assuming the Yi have constant variance. This is called homoscedasticity. We will subsequently discuss what to do when the we have heteroscedasticity rather than homoscedasticity.

Independent Errors:

We assume the errors terms are independent.54 This is the equivalent to assuming the Yi are independent.

This implies that Cov[εi, εj] = 0 for i ≠ j. Since E[ε] = 0, this implies E[εiεj] = 0 for i ≠ j. We will subsequently discuss what to do when the error terms are autocorrelated and thus not independent.

Classical Linear Regression Model:

The above five assumptions are made in the Classical Linear Regression Model:1. Yi = α + βXi + εi.2. Xi are known fixed values.

3. E[εi] = 0 for all i.

4. Var[εi] = σ2 for all i.

5. εi and εj independent for i ≠ j.

53 For the model with an intercept, this would be equivalent to an assumption that all of the error terms have the same expected value. If that expected value were not zero, we could just subtract that expected value from the intercept and get a new model with E[εi*] = 0, for all i.

54 While the errors are independent, the residuals îε are not, since as discussed previously they are constrained to

sum to zero.


No Multicollinearity:

In the multivariate version, we add an assumption that no exact linear relationship exists between two or more of the independent variables. In other words, we assume that the matrix whose rows are X2, X3, ..., Xk has rank k -1, the number of independent variables other than the constant. We will subsequently discuss multicollinearity or approximate multicollinearity.

Classical Normal Linear Regression Model:

If we add an assumption, we get the Classical Normal Linear Regression Model:6. The error terms are Normally Distributed.

Thus Yi is Normally Distributed with mean αααα + ββββXi and variance σσσσ2222.

It is this assumption of normality that will allow the very important use of t-tests and F-tests, to be discussed subsequently.55

55 In practical applications it is only required that the errors be approximately normally distributed.


Problems:

8.1 (2 points) In the Classical Linear Regression Model, Yi = α + βXi + εi, which of the

statements following are false?A. The Xs are nonstochastic.B. E[εi] = 0.

C. E[εi εj] = δij, where δij = 0 if i ≠ j and 1 if i = j.

D. Var[εi] = Var[εj].E. Statements A, B, C, and D are all true.

8.2 (3 points) In the Classical Linear Regression Model, determine the correlation between X and Y.


Section 9, Properties of Estimators56

We are interested in what is expected to happen if we were to use an estimator again and again to make a particular estimate. The errors that would result from the repeated use of a procedure is what is referred to when we discuss the qualities of an estimator. Various desirable properties of estimators will be discussed.

Estimates versus Estimators:

Using linear regression to estimate the salary of an actuary with 5 exams is an example of an estimator.

If for a particular set of data, α = 35.5, ^β = 9.5, then 35.5 + (5)(9.5) = 83 is an example of an

estimate.

An estimator is a procedure used to estimate a quantity of interest. An estimate is the result of using an estimator.

An estimator is a random variable or random function. An estimate is a number or a function.

Point Estimators:

A point estimator provides a single value, or point estimate as an estimate of a quantity of

interest.57 An example of a point estimator, would be to take α + 5^β for the parameters of a

fitted linear regression as the estimate of the value of Y when X is 5.

One wants point estimators: to be unbiased, to be consistent, to be efficient, and to have a small mean squared error.

Bias:

The Bias of an estimator is the expected value of the estimator minus the true value. An unbiased estimator has a Bias of zero.

The sample variance, Σ(Xi - X)2/(N-1), is a good example of an unbiased estimator.

56 While Section 2.3 of Pindyck and Rubinfeld is not on the syllabus, these ideas are. See also Loss Models by Klugman, Panjer, and Willmot.

57 Point estimates differ from interval estimates. A point estimate of ^β might be 17.

An interval estimate of ^β might be: 17 ± 5, with 90% confidence.


Exercise: Demonstrate that the sample variance is an unbiased estimator for a random sample of size 3.[Solution: Let X1, X2, and X3, be three independent identically distributed variables.

X = (X1 + X2 + X3) / 3.

(X1 - X)2 = (2X1/3 - X2/3 - X3/3)2 = 4X12/9 + X22/9 + X32/9 - 4X1X2/9 - 4X1X3/9 + 2X2X3/9.

E[(X1 - X)2] = E[4X12/9 + X22/9 + X32/9 - 4X1X2/9 - 4X1X3/9 + 2X2X3/9] =

(4/9)E[X2] + (1/9)E[X2] + (1/9)E[X2] - (4/9)E[X]E[X] - (4/9)E[X]E[X] + (2/9)E[X]E[X] =(2/3)E[X2] - E[X]2 = (2/3)Var[X].E[(X1 - X)2] = E[(X2 - X)2] = E[(X3 - X)2] = (2/3)Var[X].The expected value of the sample variance is:E[(X1 - X)2 + (X2 - X)2 + (X3 - X)2/(3-1)] = (2/3)Var[X] + (2/3)Var[X] + (2/3)Var[X]/2 = Var[X].]

ΣXi = N X. ⇒ Σ(Xi - X) = 0. Thus if one knows any N - 1 of the Xi - X , then one knows the remaining one. We lose a degree of freedom; we have N - 1 rather than N degrees of freedom. Therefore, in order to obtain an unbiased estimator of the variance, we put N - 1 rather than N in the denominator.

Asymptotically Unbiased:

For an asymptotically unbiased estimator, as the number of data points, N →→→→ ∞∞∞∞, the bias approaches zero.58 In other words, as the sample size N → ∞ approaches infinity, the expected value of the estimator approaches the true value of the quantity being estimated.

Σ(Xi - X)2/N = (N - 1)/N(sample variance).The sample variance is an unbiased estimator of the variance, and therefore the expected value of the sample variance is Var[X].Thus, Σ(Xi - X)2/N has an expected value of Var[X](N - 1)/N, which goes to Var[X] as N → ∞.

Therefore, Σ(Xi - X)2/N is an asymptotically unbiased estimator of the variance.

Consistency:

When based on a large number of observations, a consistent estimator, also called weakly consistent, has a very small probability that it will differ by a large amount from the true value.59 Let ψn be the estimator with a sample size of n and c be the true value, then ψ is a consistent

estimator if given any ε > 0:

limit Probability| ψn - c | < ε = 1 n → ∞

58 See Definition 9.6 in Loss Models.59 A consistent estimator may also be defined as one that converges stochastically to the true value.


Most estimators used by actuaries are consistent. For example the sample mean is a consistent estimator of the underlying mean assuming the data are independent draws from a single distribution with finite mean.60

Exercise: Use as an estimator of the mean, the first element of a data set of size N.Is this estimator consistent?[Solution: The estimate resulting from this estimator does not depend on the sample size N.The probability of a large error is independent of N and therefore does not approach zero as N approaches infinity. This estimator is not consistent.Comment: This is a stupid estimator to use.]

Mean Squared Errors:

The mean square error (MSE) of an estimator is the expected value of the squared difference between the estimate and the true value. The smaller the MSE, the better the estimator, all else equal.

MSE(θ ) = Var(θ ) + [Bias(θ )]2

The mean squared error is equal to the variance plus the square of the bias. Thus for an unbiased estimator, the mean square error is equal to the variance.

When there is a tradeoff between the bias and variance (efficiency) of an estimator, one may look to minimize the mean squared error.

Variances:

Let ψn be asymptotically unbiased estimator, whose variance goes to zero as n, the number of data points, goes to infinity. Then as n goes to infinity, since both the variance and bias go to zero, the Mean Squared Error, MSE(ψn) = Var(ψn) + [Bias(ψn)]2, also goes to zero.

Let c be the true value, then and for any ε > 0:

MSE(ψn) = ∫( ψn - c )2 f(ψn) dψn ≥ Probability| ψn - c | ≥ ε ε2 ≥ 0.

Therefore, since as n goes to infinity, MSE(ψn) goes to zero:

limit Probability| ψn - c | ≥ ε = 0. n → ∞

Therefore, an asymptotically unbiased estimator, whose variance goes to zero as the number of data points goes to infinity, is consistent (weakly consistent).

60 The Law of Large Numbers. See for example, An Introduction to Probability Theory and Its Applications by Feller,of A First Course in Probability by Ross.


For example, (x - x) ni i = 1

i = n2∑∑ / is an asymptotically unbiased estimator of the variance. For a

distribution with finite fourth moment, the variance of this estimator goes to zero. Therefore, in

that case, (x - x) ni i = 1

i = n2∑∑ / is a consistent estimator of the variance.

The minimal variance for an unbiased estimator of a parameter, that is not in the support of a density, is the Rao-Cramer lower bound:

-1 / n E [∂∂∂∂2222 ln f(x) //// ∂∂∂∂θθθθ2222] = 1 / n E [(∂ ln f(x) / ∂θ)2] .

Thus an unbiased estimator of a parameter has the smallest MSE among unbiased estimators if and only if its variance attains the Rao-Cramer lower bound.

Efficiency:

For an unbiased estimator, the variance is equal to the mean squared error. Thus an unbiased estimator has the smallest MSE among unbiased estimators if and only if it has the smallest variance among unbiased estimators.

An unbiased estimator is efficient if for a given sample size it has the smallest variance of any unbiased estimator.

The efficiency of an unbiased estimator is:(the Rao-Cramer lower bound)/(Variance of that estimator). Thus an unbiased estimator with a variance equal to the Rao-Cramer lower bound is 100% efficient.

Exercise: The Rao-Cramer lower bound is 23. What is the efficiency of an unbiased estimator with Mean Squared Error = 37? [Solution: Efficiency = 23/37 = 62.2%.]

Maximum Likelihood estimators usually have an efficiency less than 100% for finite sample sizes. However, the efficiency of Maximum Likelihood estimators approaches 100% as the sample sizes increases towards infinity; they are asymptotically efficient.


Linear Regression Estimators:

Under the assumptions of the classical linear regression model61, the least squares estimator of the slope is unbiased. In other words, the expected value of the estimate of the slope is equal to the true slope:62

E[^β] = β.

This holds for both the two-variable model and the multi-variable model, provided the assumed model is the correct model.

The ordinary least squares estimator is consistent. It is consistent even with heteroscedasticity and/or serial correlation of errors, to be discussed subsequently.

Under the assumptions of the classical linear regression model, the regression estimators of the intercept and slope(s) are the most efficient linear unbiased estimators.63 This is the Gauss-Markov Theorem. In other words, the ordinary least squares estimator are the best linear unbiased estimator, BLUE; they have the smallest variance among linear unbiased estimators.

If the assumptions of the classical linear regression model do not hold then the ordinary least squares estimator is not necessarily efficient. Specifically with heteroscedasticity and/or serial correlation of errors, to be discussed subsequently, the ordinary least squares estimator is not efficient.64

When Relevant Variables Are Omitted from Linear Regression Models:*65

Let us assume that Y = β1 + β2X2 + β3X3 + ε, or in deviations form y = β2x2 + β3x3 + ε.However, we inadvertently exclude the independent variable X3 from our model.

Then ^β2= Σx2iyi/Σx2i2 = Σx2i(β2x2i + β3x3i + εi)/Σx2i2 = β2 + β3Σx2ix3i/Σx2i2 + Σx2iεi/Σx2i2 =

β2 + β3Cov[X2 , X3]/Var[X2] + Σx2iεi/Σx2i2.

E[^β2 ] = β2 + β3Cov[X2 , X3]/Var[X2].

Bias = E[^β2 ] - β2 = Cov[X2 , X3]/Var[X2] = Corr[X2 , X3]√(Var[X3]/Var[X2]).

61 For unbiasedness, only the first three assumptions are needed: linearity, X not stochastic, and E[ε] = 0.62 See equation 3.6 of Econometric Models and Economic Forecasts.63 We do not require normal errors for the Gauss-Markov Theorem to hold.64 See pages 147 and 159 of Pindyck and Rubinfeld.65 See Section 7.3.1. of Pindyck and Rubinfeld, not on the Syllabus.


Thus, when a relevant variable is (unknowingly) omitted from the model, then the estimator of the slope will be biased, unless the omitted variable is uncorrelated with the included variables.66

Correlation of X2 and X3 Sign of β3. Bias of ^β2

Positive Positive UpwardsPositive Negative DownwardsZero NoneNegative Positive DownwardsNegative Negative Upwards

We note that the above bias does not depend on the sample size.Thus as N → ∞, the probability of large absolute error does not go to zero.^β2 is not a consistent estimator of β2.

Thus, an example of an inconsistent estimator is the regression estimator of the slope, when a relevant variable is (unknowingly) omitted from the model.67

As discussed in a subsequent section, the omission of relevant variables from a linear regression model of time series, often leads to positive serial correlation of errors.In that case, the estimators of the slopes are no longer efficient.

When Irrelevant Variables Are Included in Linear Regression Models:*68

If one includes an extra variable or variables in a linear regression model that do not explain the independent variable, then the least squares estimators of the coefficients are:1. Still unbiased. 2. Still consistent.3. No longer efficient.

Specifically, if irrelevant variables are included, the variance of the estimated slope of a relevant variable will increase.

Specification Error:*69

Specification Error or Modeling Error occurs when one (inadvertently) uses the wrong model for a real world situation. specification error ⇔ (inadvertently) used inappropriate model or assumptions.

We have discussed two examples, either using an irrelevant variable or omitting relevant variables. Another example of specification error is if one assume a linear model, when the real world relationship is not linear.

66 This bias is also present when more than one variables is omitted.67 This problem does not occur if the omitted variable is uncorrelated with the included variables.68 See Section 7.3.2. of Pindyck and Rubinfeld, not on the Syllabus.69 See Section 7.3 of Pindyck and Rubinfeld, not on the Syllabus.


Other Types of Errors:*70

Even if one has specified the correct model there are other types of errors.

Due to random fluctuation in the data used to fit the model, which depends on the size of the sample, there is parameter error. The fitted parameters vary around their actual values.

If we then use the fitted model for forecasting, due to random fluctuations the observed value will vary around its mean. This is usually referred to as process error.

As will be discussed in a subsequent section, both parameter error and process error contribute to forecasting error.

Variances of Estimated Parameters:

We will subsequently discuss how to estimate variances of estimated parameters such as ^β.

As will be discussed, variances such as Var[^β] are used to get standard errors used in t-tests.

Therefore, it is important to be able to get good estimates of the variances of estimated parameters.

Under the assumptions of the classical linear regression model, the usual estimators of these variances are unbiased, consistent, and efficient. However, when heteroscedasticity is present, this is no longer true. Heteroscedasticity-consistent estimators (HCE), to be discussed subsequently, provide unbiased, and consistent estimators of variances of estimated parameters, when heteroscedasticity is present.

Heteroscedasticity-consistent estimators are not efficient. Efficient estimators of variances of estimated parameters are obtained via weighted regression, to be discussed subsequently.

Heteroscedasticity-consistent estimators are consistent in the presence of heteroscedasticity of unknown form. In order to apply weighted regression to correct for heteroscedasticity, one must know or estimate how the variances vary across the observations.

70 Not on the syllabus. Different ways to categorize errors are used by various authors.


Loss Functions:*

Assume we observe values of: 12, 3, 38, 5, 8. If one wanted to estimate the next value, one might take either the empirical mean of 11 or the empirical median of 8. Which of these two estimates is “better” depends on which criterion one uses to decide the question. As we’ll see, either of these estimates or some other estimate might be considered “best”.

Which estimator of a quantity of interest is the “best” depends on what criterion is used. Commonly one wants to minimize the expected value of the “error”. Depending on the error or loss function one wishes to minimize one gets different estimators. Let xi be the observations of

the quantity we which to estimate and let α be the estimate of that quantity of interest which minimizes a given loss function.71

If the loss function is Squared Deviations, then the best estimator is the Mean.

∂ (Σ (α − xi)2 / ∂ α = 2αΣ(α−xi), which equals 0 when α = Σxi / n = mean.

The mean of the conditional distribution minimizes the expected squared error.

If the loss function is Absolute Deviations, then the best estimator is the Median.

∂ (Σ | α − xi | / ∂ α = Σ sgn (α−xi), (where sgn(y) equals 1 when y > 0 , equals -1 when y < 0 and equals 0 when y = 0.) This partial derivative of the absolute deviations has an expected value equal to zero when there is an equal chance that α > xi or α < xi. This occurs when α is the median.

The median of the conditional distribution minimizes the expected absolute error.

Note that the squared error loss function (dashed line) counts extreme errors more heavily than does the absolute error function (solid line.)

-2 -1 1 2

71 Note that multiplying an loss function by a constant does not change when that function is minimized. Thus one gets the same “best” estimator.


Problems:

9. 1 (1 point) For a linear regression model, which of the following statements about estimators of the slope parameters are true?(A) Heteroscedasticity can produce biased estimators.(B) Serial correlation can produce biased estimators.(C) Heteroscedasticity can produce inconsistent estimators.(D) Serial correlation can produce inconsistent estimators.(E) None of A, B, C, or D.

9.2 (3 points) X has a 70% chance of being 0 and a 30% chance of being 10.What is the variance of X?For a sample of size two, compute the expected value of the sample variance by listing all possible samples of size two.

9.3 (1 point) A variable is omitted from a linear regression model. The omitted variable is correlated with the included variables. Which of the following statements about estimators of the slope parameters are true?1. They are unbiased.2. They are consistent.3. They are efficient.A. None of 1, 2, or 3 B. 1, 2 C. 1, 3 D. 2, 3 E. None of A, B, C, or D

9.4 (3 points) You are given two estimators, α and β, of the same unknown quantity.(i) Bias of α is -1. Variance of α is 5.(ii) Bias of β is 2. Variance of β is 3.(iii) The correlation of α and β is 0.6.Estimator γ is a weighted average of the two estimators α and β, such that:

γ = wα + (1 - w)β.Determine the value of w that minimizes the mean squared error of γ.A. Less than 0.4B. At least 0.4, but less than 0.5C. At least 0.5, but less than 0.6D. At least 0.6, but less than 0.7E. 0.7 or more

9.5 (1 point) A variable that does not help explain the dependent variable is included in a linear regression model. Which of the following statements about estimators of the slope parameters are true?1. They are unbiased.2. They are consistent.3. They are efficient.A. None of 1, 2, or 3 B. 1, 2 C. 1, 3 D. 2, 3 E. None of A, B, C, or D


9.6 (2 points) You are given:x 0 10 100Pr[X = x] 0.6 0.3 0.1For a sample of size n, X = ΣXi/n, and the population variance is estimated by Σ(Xi - X)2/n. When n = 6, calculate the bias of this estimator of the variance.A. Less than -100B. At least -100, but less than -50C. At least -50, but less than 50D. At least 50, but less than 100E. At least 100

* 9.7 (2 points) One is fitting a size of loss distribution to some data.Ignoring any effects of inflation, give examples in this context of specification error (modeling error), parameter error, and process error.

9.8 (3 points) Y = 10 + 3X2 + 2X3 + ε, where each εi has mean of zero.Observations are made for the following pairs of X2 and X3: (0, 0), (1, 1), (2, 3).

A linear regression Y = α + βX2 + ε, is fit the observations, with X3 omitted from the model.

Determine the expected values of α and ^β.

9.9 (2, 5/83, Q.33) (1.5 points) Let X1, X2, . . . , Xn be a random sample of size n ≥ 2 from a

Poisson distribution with mean λ. Consider the following three statistics as estimators of λ.

I. X = X / nii=1

n∑∑

II. (X - X) / (n-1)i2

i=1

n∑∑

III. 2X1 - X2 Which of these statistics are unbiased? A. I only B. II only C. III only D. I, II, and III E. The correct answer is not given by A, B, C, or D

9.10 (4B, 5/92, Q.2) (1 point) Which of the following are true?1. The expected value of an unbiased estimator of a parameter is equal to the true value of the

parameter.2. If an estimator is asymptotically unbiased, the probability that an estimate based on n

observations differs from the true parameter by more than some fixed amount converges to zero as n grows large.

3. A consistent estimator is one with a minimal variance.A. 1 only B. 3 only C. 1 and 2 only D. 1, 2, and 3E. The correct answer is not given by A, B, C, or D


9.11 (4B, 11/92, Q.8) (1 point) You are given the following information:X is a random variable whose distribution function has parameter α = 2.00.Based on n random observations of X you have determined:• E[α1] = 2.05 where α1 is an estimator of α having variance = 1.025.

• E[α2] = 2.05 where α2 is an estimator of α having variance = 1.050.

• As n increases to ∞, P[|α1 - α| > ε] approaches 0 for any ε > 0.Which of the following are true?1. α1 is an unbiased estimator of α.

2. α2 has a smaller Mean Squared Error than α1.

3. α1 is a consistent estimator of α.A. 1 only B. 2 only C. 3 only D. 1, 3 only E. 2, 3 only

9.12 (4B, 11/93, Q.13) (3 points) You are given the following:• Two instruments are available for measuring a particular (non-zero) distance.• X is the random variable representing the measurement using the first instrument and Y is the random variable representing the measurement using the second instrument.• X and Y are independent.• E[X] = 0.8m; E[Y] = m; Var[X] = m2; and Var[Y] = 1.5m2 where m is the true distance.Consider the class of estimators of m which are of the form Z = αX + βY.Within this class of estimators of m, determine the value of α that makes Z an unbiased estimator with smallest mean squared error.A. Less than 0.45 B. At least 0.45, but less than 0.50C. At least 0.50, but less than 0.55D. At least 0.55, but less than 0.60E. At least 0.60

9.13 (4B, 5/95, Q.27) (2 points) Two different estimators, ψ and φ, are available for estimating the parameter, β, of a given loss distribution. To test their performance, you have conducted 75 simulated trials of each estimator, using β = 2, with the following results: 75 75 75 75

∑ ψi = 165, ∑ ψi2 = 375, ∑ φi = 147, ∑ φi2 = 312 i=1 i=1 i=1 i=1

Let MSE(ψ) = the mean squared error of estimator ψ.

Let MSE(φ) = the mean squared error of estimator φ.

In this simulation, what is MSE(ψ) / MSE(φ)?A. Less than 0.50B. At least 0.50, but less than 0.65C. At least 0.65, but less than 0.80D. At least 0.80, but less than 0.95E. At least 0.95, but less than 1.00


9.14 (4B, 5/96, Q.12) (1 point) Which of the following must be true of a consistent estimator?1. It is unbiased.2. For any small quantity ε, the probability that the absolute value of the deviation of the

estimator from the true parameter value is less than ε tends to 1 as the number of observations tends to infinity.

3. It has minimal variance.A. 1 B. 2 C. 3 D. 2, 3 E. 1, 2, 3

9.15 (4B, 11/96, Q.21) (2 points) You are given the following:• The expectation of a given estimator is 0.50.• The variance of this estimator is 1.00.• The bias of this estimator is 0.50.Determine the mean square error of this estimator.A. 0.75 B. 1.00 C. 1.25 D. 1.50 E. 1.75

9.16 (4B, 11/97 Q.13) (2 points) You are given the following:• The random variable X has the density function

f(x)=e-x, 0 < x < ∞.• A loss function is given by |X - k|, where k is a constant.Determine the value of k that will minimize the expected loss.A. ln 0.5 B. 0 C. ln 2 D. 1 E. 2

9.17 (4, 11/01, Q.35) (2.5 points) You observe N independent observations from a process whose true model is: Yi = α + βXi + εi.You are given:(i) Zi = Xi2, for i = 1, 2, ..., N.

(ii) b∗ = Σ(Zi - Z )(Yi - Y )/ Σ(Zi - Z )(Xi - X).Which of the following is true?(A) b∗ is a nonlinear estimator of β.(B) b∗ is a heteroscedasticity-consistent estimator (HCE) of β.(C) b∗ is a linear biased estimator of β.(D) b∗ is a linear unbiased estimator of β, but not the best linear unbiased estimator (BLUE) of β.(E) b∗ is the best linear unbiased estimator (BLUE) of β.


9.18 (CAS3, 5/05, Q.21) (2.5 points) An actuary obtains two independent, unbiased estimates, Y1 and Y2, for a certain parameter. The variance of Y1 is four times that of Y2. A new unbiased estimator of the form k1Y1 + k2Y2 is to be constructed. What value of k1 minimizes the variance of the new estimate? A. Less than 0.18 B. At least 0.18, but less than 0.23 C. At least 0.23, but less than 0.28 D. At least 0.28, but less than 0.33 E. 0.33 or more

9.19 (4, 5/05, Q.16) (2.9 points) For the random variable X, you are given:(i) E[X] = θ, θ > 0.(ii) Var(X) = θ2/25.

(iii) ^θ = k/(k+1) X, k > 0.

(iv) MSE ( ) = 2[bias ( )]^ ^ 2θ θθ θ .

Determine k.(A) 0.2 (B) 0.5 (C) 2 (D) 5 (E) 25

9.20 (CAS3, 5/06, Q.3) (2.5 points) Mrs. Actuarial Gardner has used a global positioning system to lay out a perfect 20-meter by 20-meter gardening plot in her back yard. Her husband, Mr. Actuarial Gardner, decides to estimate the area of the plot. He paces off a single side of the plot and records his estimate of the length. He repeats this experiment an additional 4 times along the same side. Each trial is independent and follows a Normal distribution with mean 20 meters and a standard deviation of 2 meters. He then averages his results and squares that number to estimate the total area of the plot.Which of the following is a true statement regarding Mr.Gardener’s method of estimating the area?A. On average, it will underestimate the true area by at least 1 square meter.B. On average, it will underestimate the true area by less than 1 square meter.C. On average, it is an unbiased method.D. On average, it will overestimate the true area by less than 1 square meter.E. On average, it will overestimate the true area by at least 1 square meter.


Mahler’s Guide to

Regression

Sections 10-14:

10 Variances and Covariances11 t-Distribution12 t- test13 Confidence Intervals for Estimated Parameters


prepared by


Study Aid F06-Reg-C


Section 10, Variances and Covariances

As with any model, it is useful to estimate the variances and covariances of the estimated parameters of the linear regression model.

Estimating the Variance of the Regression:

One assumption of the Classical Linear Regression Model is that Var[εi] = σ2 for all i.In order to estimate the variances and covariances of the estimated parameters of the regression model, the key step is to estimate σ2, the variance of the regression.An unbiased of estimator σ2 is the residual variance, s2:

s2 = ΣΣΣΣ îε 2 / (N - k) = ESS/(N - k) = estimated variance of the regression.

In order to estimate the variance of the regression, we divide the error sum of squares by its number of degrees of freedom: N - k = number of data points minus number of parameters estimated (including the intercept).72

For the two-variable model: s2 = ΣΣΣΣ îε 2 / (N - 2) = ESS/(N - 2).

Exercise: Estimate the variance of the regression of heights discussed previously.[Solution: As determined previously for this example, ESS = 8.741. There are 8 fathers and 8 sons. Therefore, s2 = ESS/(N - 2) = 8.741/(8 - 2) = 1.457.]

s is called the standard error of the regression.

72 This is analogous to the calculation of a sample variance, which has denominator N-1 rather than N.

HCMSA-F06-Reg-C, Mahler’s Guide to Regression, 7/11/06, Page 85

An Example Showing that s2 is an Unbiased Estimator of σ2:73 *

Take a particular example of a two-variable regression.Let X1 = 1, X2 = 2, and X3 = 3.

Let α = 4 and β = 5.74 Thus E[Y1] = 9, E[Y2] = 14, E[Y3] = 19.

Let each error term have a distribution of 50% chance of -1 and 50% chance of +1.75 σ2 = 1. As usual, the error terms are independent.

There are then 8 equally likely sets of values for Y: (8, 13, 18), (8, 13, 20), (8, 15, 18), (8, 15, 20), (10, 13, 18), (10, 13, 20), (10, 15, 18), (10, 15, 20).

For each of these sets of values of Y one can fit a regression and calculate ESS.In each case, in deviations form, x = (-1 , 0, 1) and Σxi2 = 2.

In deviations form, TSS = Σyi2, RSS = (Σxiyi)2/ Σxi2, and ESS = Σyi2 - (Σxiyi)2/ Σxi2.

Y y Σyi2 Σxiyi ESS8, 13, 18 -5, 0, 5 50 10 08, 13, 20 -17/3, -2/3, 19/3 218/3 12 2/38, 15, 18 -17/3, 4/3, 13/3 158/3 10 8/38, 15, 20 -19/3, 2/3, 17/3 218/3 12 2/3 10, 13, 18 -11/3, -2/3, 13/3 98/3 8 2/3 10, 13, 20 -13/3, -4/3, 17/3 158/3 10 8/3 10, 15, 18 -13/3, 2/3, 11/3 98/3 8 2/310, 15, 20 -5, 0, 5 50 10 0

E[ESS] = (0 + 2/3 + 8/3 + 2/3 + 2/3 + 8/3 + 2/3 + 0)/8 = 1.E[s2] = E[ESS/(N - 2)] = 1/(3 - 2) = 1 = σ2.

73 The fact that E[ESS] = (N - k)σ2 will be proved in a subsequent section.74 Of course we do not usually know the actual slope and intercept.75 While this has mean of zero, it is not Normally distributed.

The result does not depend on the distributional form of ε.


Straight Line with No Intercept:

In the case of fitting a straight line with no intercept, ^β = ΣXiYi /ΣXi2.

Var[^β] = Var[ΣXiYi /ΣXi2] = ΣXi2Var[Yi] /ΣXi22 = ΣXi2σ2 /ΣXi22 = σ2 /ΣXi2.

Thus one estimates the variance of the slope as:

Var[^β] = s2 /ΣXi2 = ESS/(N - 1)/ΣXi2.

Exercise: Estimate the variance of the slope in regression of heights with no intercept.[Solution: As determined previously for this example with no intercept, ESS = 32.3. s2 = ESS/(N - k) = 32.3/(8 - 1) = 4.614. ΣXi2 = 532 + ... + 662 = 28,228.

Var[^β] = s2 /ΣXi2 = 4.614/28228 = .000163. We had

^β = 1.03.]

For a 95% confidence interval, the critical value on the t-table with 8 - 1 = 7 d.f. is 2.365.^β = 1.03 ± 2.365 √.000163 = 1.03 ± .03.


Two Variable Regression, Variance of the Estimated Slope:

For any finite sample of data, the estimated slope will vary around the true slope of the model due to random fluctuations in values of the dependent variable.

Under the assumptions of the Classical Linear Regression Model76 discussed previously, the variance of least squares estimator of the slope of the two-variable regression model is:(Variance of errors)/(Variance of X)/(number of data points):77

Var[^β] = (σ2/Var[X])/N = σ2 / Σxi2.78

More random noise ⇔ larger σ2 ⇔ worse estimate of slope.

Independent Variable over wider domain ⇔ larger Var[X] ⇔ better estimate of slope.

More observations ⇔ larger N ⇔ better estimate of slope.

As is usual, the variance of the estimate goes down as the inverse of the number of observations.

The more noise, as quantified by Var[ε] = σ2, the worse the estimate.79

The more spread apart the places at which the observations take place, as quantified by Var[X], the better the estimate of the slope.80 If the independent variable were time, then we expect to get a better estimate of the slope of the line if we observe at times 0 and 100, rather than at times 49 and 51. Observations at times 0 and 100 would provide more useful information about the slope than would observations at times 49 and 51.

Two Variable Regression, Variance of the Estimated Intercept:

There is somewhat different formula for the variance of the estimated intercept:81

Var[ α] = σ2ΣXi2/(NΣxi2) = (ΣXi2/N)(σ2/Σxi2) = E[X2]σ2/Σxi2.

76 The assumption that the errors are Normally distributed is not used.77 See equation 3.7 of Econometric Models and Economic Forecasts.78 Var[X] = Σxi2 / N.79 * Thus σ2 acts somewhat analogously to the Expected Value of the Process Variance in credibility.80 * Thus Var[X] acts somewhat analogously to the Variance of the Hypothetical Means in credibility.81 See equation 3.9 of Econometric Models and Economic Forecasts.


Covariances and Correlations:

The Covariance of two variables X and Y is defined by: Cov[X,Y] = E[XY] - E[X]E[Y]. Since Cov[X,X] = E[X2] - E[X]E[X] = Var[X], the covariance is a generalization of the variance.

Covariances have the following useful properties: Cov[X, aY] = aCov[X, Y]. Cov[X, Y] = Cov[Y, X].Cov[X, Y + Z] = Cov[X, Y] + Cov[X, Z].

The Correlation of two random variables is defined in terms of their covariances:Corr[X, Y] = Cov[X ,Y] / √√√√Var[X]Var[Y]. The correlation is always between -1 and +1.Corr[X, X] = 1. Corr[X, -X] = -1. Corr[X, Y] = Corr[Y, X].

Two Variable Regression, Covariance of the Estimated Slope and Intercept:

The covariance of the estimated slope and intercept is:82

Cov[ α,^β] = -σ2 X /Σxi2.

Estimating the Variances and Covariances of the Two Variable Regression:83

We use s2, the estimated variance of the regression, in order to estimate the variances of the estimated slope and intercept as well as their covariance.84

Var[ αα] = s2ΣΣΣΣXi2 /(NΣΣΣΣxi2) = (ΣXi2/N)(s2/Σxi2) = E[X2]s2/Σxi2.

Var[ ^ββ] = s2/ΣΣΣΣxi2.

Cov[ αα, ^ββ] = -s2 X /ΣΣΣΣxi2.

82 See equation 3.10 of Econometric Models and Economic Forecasts.83 The multiple-variable case will be covered in a subsequent section.84 See pages 63-64 of Pindyck and Rubinfeld.


Exercise: Estimate these variances and covariance for the regression of heights discussed previously.[Solution: As determined previously for this example, s2 = 1.457. N = 8. X = 59.25. ΣXi2 = 28,228. Σxi2 = 143.5. Therefore,

Var[ α] = s2ΣXi2 /(NΣxi2) = (1.457)(28228)/((8)(143.5)) = 35.83.

Var[^β] = s2 /Σxi2 = 1.457/143.5 = .01015.

Cov[ α,^β] = -s2 X /Σxi2 = -(1.457)(59.25)/143.5 = -.6016.]

Thus for this heights example, the variance-covariance matrix is:

(Var[ α] Cov[ α,^β]) (35.83 -.6016)

( ) = ( )

(Cov[ α,^β] Var[

^β] ) (-.6016 .01015)

In general for the two-variable model, the variance-covariance matrix is:

(Var[ α] Cov[ α,^β]) (s2ΣXi2 /(NΣxi2) -s2X /Σxi2) (ΣXi2/N - X)

( ) = ( ) = (s2 /Σxi2)( )

(Cov[ α,^β] Var[

^β] ) (-s2X /Σxi2 s2 /Σxi2 ) (- X 1)

The above formulas for Var[ α], Var[^β], and Cov[ α,

^β] for the two-variable model, are special

cases of those for multiple regression discussed subsequently. Some may find it easier to

memorize the general matrix form of the variance-covariance matrix, Var[^β] = s2(X’X)-1,

discussed subsequently for the multiple-variable case.

S Notation:*

One can also put the variance of the estimated slope in the S.. notation discussed previously.

Var[^β] = s2 /Σxi2 = ESS/(N-2)/SXX = (TSS - RSS)/(N-2)SXX = (SYY - SXY2/SXX)/(N-2)SXX =

SYY/SXX - (SXY/SXX)2/(N-2).


Var[^β] = 64.875/143.5 - (89.75/143.5)2/(8 - 2) = 0.01015.


Correlation of Estimated Slope and Intercept:

Corr[ α,^β] ≡ Cov[ α,

^β]/ √(Var[ α]Var[

^β]) = - X/√(E[X2]).

Thus if the mean of X is positive, then the estimates of αααα and ββββ are negatively correlated. Therefore, if the mean of X is positive, then if one of the coefficients is overestimated, the other is more likely to be underestimated, and vice-versa.

If instead, the mean of X is negative, then the estimates of α and β are positively correlated. Therefore, if the mean of X is negative, then if one of the coefficients is overestimated, the other is more likely to be overestimated.

Exercise: Estimate Corr[ α,^β] for the regression of heights discussed previously.

[Solution: Corr[ α,^β] = Cov[ α,

^β]/ √Var[ α]Var[

^β] = -.6016/√((35.83)(.01015)) = -.9976.]

For the heights example, the slope and intercept are almost perfectly negatively correlated.

Distribution of the Fitted Coefficients:

^β = NΣXiYi - ΣXiΣYi / NΣXi2 - (ΣXi)2, a linear combination of the Yi, independent Normals.

Therefore, ^ββ is Normally Distributed, with mean ββββ and variance σσσσ2222/ΣΣΣΣxi2.85

α = ΣYiΣXi2 - ΣXiΣXiYi / NΣXi2 - (ΣXi)2, a linear combination of the Yi, independent Normals.

Therefore, αα is Normally Distributed, with mean αααα and variance E[X2]σσσσ2222/ΣΣΣΣxi2.86

αα and ^ββ are jointly Bivariate Normally Distributed, with correlation - X/√√√√ (E[X2]).

Coefficient of Variation:*

The coefficient of variation can be written in terms of the mean and second moment,

E[X2]/E[X]2 = 1 + CVX2.87 Therefore, Corr[ α,^β] = -1/√(1 + CVX2). The smaller the coefficient of

variation of the independent variable X, the closer this correlation is to -1. The larger the CV of X, the closer this correlation is to 0.

85 Recall that ^β is an unbiased estimator of β. The variance has been given previously.

86 Recall that α is an unbiased estimator of α. The variance has been given previously.87 Coefficient of Variation ≡ Standard Deviation / Mean.


Exercise: The Coefficient of Variation of X is 3. What is Corr[ α,^β] ?

[Solution: Corr[ α,^β] = -1/√(1 + CVX2) = -1/√10 = -.316.]

Exercise: For the heights example, what is the coefficient of variation of the fathers heights?[X = 59.25. Σxi2 = 143.5. Var[X] = 143.5/8 = 17.9375. CVX = (√17.9375)/59.25 = .0715.]

For the heights example, -1/√(1 + CVX2) = -.9975, matching the previously estimated

correlation of α and ^β, subject to rounding.

Standard Errors of the Estimated Parameters:

The square root of the variance of an estimated parameter is its standard error.

βs = √Var[^β] = standard error of the estimate of β =

βs = s /√Σxi2.

sα = √Var[ α] = standard error of the estimate of α.

Exercise: For the heights regression, what are the standard errors of the slope and intercept?

[Solution: βs = √Var[^β] = √.01015 = .101. sα = √Var[ α] = √35.83 = 5.99.]

The larger βs the more dispersed ^β is around its mean.

The larger sα the more dispersed α is around its mean.

The larger s, the more dispersed the error terms, îε .

The t-statistic for the slope, to be discussed subsequently, is:

t = ^β/ βs = t-statistic for testing the hypothesis β = 0.

For the heights regression, ^β = .6254, βs = .101, and t = .6254/.101 = 6.2.

For the heights example, a typical output from a regression program would look like this:88

Parameter Table Estimate SE T-Stat p-Value1 24.0679 5.98553 4.02102 0.00695079x 0.625436 0.100765 6.2069 0.000806761

88 This is from Mathematica. The p-values of the t-statistics will be discussed subsequently.


Problems:

10.1 (8 points) Fit a two-variable regression to the following data:X 2 5 8 9Y 10 6 11 13

Determine, α, ^β, R2,

2R , s2, sα , βs , Cov[ α,

^β] and Corr[ α,

^β].

Use the following information for the next 3 questions:A two-variable regression has been fit to 100 data points. X = 122. ΣXi2 = 5,158,000. ESS = 533,000.

10 .2 (2 points) Calculate βs , the standard error of β.

A. less than .02B. at least .02 but less than .03C. at least .03 but less than .04D. at least .04 but less than .05E. at least .05

10 .3 (2 points) Calculate sα , the standard error of α.A. less than 5B. at least 5 but less than 6C. at least 6 but less than 7D. at least 7 but less than 8E. at least 8

10.4 (2 points) Calculate Cov[ α,^β].

A. less than -.4B. at least -.4 but less than -.3C. at least -.3 but less than -.2D. at least -.2 but less than -.1E. at least -.1

10.5 (2 points) For a two-variable regression based on five observations, you are given:Y = 0, 3, 6, 10, 8 and s2 = 2.554. Determine R2.(A) 0.84 (B) 0.86 (C) 0.88 (D) 0.90 (E) 0.92

10.6 (2 points) A linear regression model, Yi = a + bXi + εi, has been fit to 15 observations.

Let yi = Yi - Y . Σ yi2 = 169. The estimated variance of the regression, s2 = 4.

Determine 2

R .(A) 0.65 (B) 0.66 (C) 0.67 (D) 0.68 (E) 0.69


10.7 (2 points) For a two-variable regression based on 30 observations, σ2 = 255.

Determine the minimum possible value for the variance of the estimated intercept, Var[ α].(A) 8.0 (B) 8.5 (C) 9.0 (D) 9.5 (E) 10.0

10.8 (2 points) A linear regression model with 5 explanatory variables (4 independent

variables plus the intercept), has been fit to 15 observations. Σ îε 2 = 1036.

If we had fit instead twice as many observations, which are similar to these observations

except due to random fluctuation, what is the expected value of Σ îε 2?

A. less than 2000B. at least 2000, but less than 2200C. at least 2200, but less than 2400D. at least 2400, but less than 2600E. at least 2600

Use the following information for the next 8 questions:A linear regression, X = α + βY, is fit to 30 observations, (Xi, Yi).

ΣXi = 44, ΣXi2 = 81, ΣYi = 106, ΣYi2 = 410, ΣXiYi = 173.

10.9 (2 points) Determine ^β.

10.10 (2 points) Determine α.

10.11 (3 points) Determine R2.

10.12 (2 points) Determine s2.

10.13 (2 points) Determine βs .

10.14 (2 points) Determine sα .

10.15 (2 points) Determine Cov[ α,^β].

10.16 (2 points) Determine Corr[ α,^β].

10.17 (2 points) For a two-variable regression based on five observations, you are given:

^Y = 16.1, 15.4, 13.3, 12.6, 10.5 and s2 = 8.356. Determine R2.(A) 0.45 (B) 0.50 (C) 0.55 (D) 0.60 (E) 0.65


10.18 (1 point) Bob and Ray each fit a linear regression X = α + βY, where X = age of male, and Y = weight. Bob’s data set consists of 60 boys equally split between ages 11, 12, and 13.Ray’s data set consists of 60 boys equally split between ages 10, 11, 12, 13, and 14.Which regression has a larger mean squared in its estimate of the slope?

10.19 (4, 5/01, Q.40) (2.5 points) For a two-variable regression based on seven observations, you are given:(i) Σ(Xi - X)2 = 2000.

(ii) Σεi2 = 967.

Calculate sβ , the standard error of β.(A) 0.26 (B) 0.28 (C) 0.31 (D) 0.33 (E) 0.35

10.20 (4, 11/04, Q.35) (2.5 points) Which of the following statements regarding the ordinary least squares fit of the model Y = α + βX + ε is false?(A) The lower the ratio of the standard error of the regression s to the mean of Y, the more

closely the data fit the regression line.(B) The precision of the slope estimator decreases as the variation of the X’s increases.(C) The residual variance s2 is an unbiased as well as consistent estimator of the error

variance σ2.(D) If the mean of X is positive, then an overestimate of α is likely to be associated with

an underestimate of β.

(E) ^β is an unbiased estimator of β.

10.21 (VEE-Applied Statistics Exam, 8/05, Q.13) (2.5 points) You fit the model Yi = α + βXi + εi to twenty observations. You are given:

Error sum of squares (ESS) = 2000 Σ Xi = -300

Σ Xi2 = 6000

Determine Cov( α, ^β).

(A) 0.7 (B) 0.8 (C) 0.9 (D) 1.0 (E) 1.1

10.22 (2 points) In the previous question, determine Var( α) and Var(^β).


Section 11, t-Distribution

As will be discussed subsequently, the t-Distribution, also called the Student’s t-distribution, can be used to get confidence intervals for fitted regression coefficients and to test the significance of fitted regression coefficients.

The t-distribution depends on one parameter, ν, the number of degrees of freedom. The t-distribution is symmetric around 0, with support from -∞ to ∞.For example here is a graph of the density of a t-distribution for ν = 4.

-4 -2 2 4

0.05

0.1

0.15

0.2

0.25

0.3

0.35

The density for ν = 4 is: f(x) = (3/8)(1 + x2/4)-2.5, -∞ < x < ∞.

Relation to the Normal Distribution:

The t-distribution is heavier-tailed than the Normal Distribution. For large absolute values of x, the density of the t-distribution is larger than the Standard Normal Distribution.

Here is a graph of a t-distribution for 4 degrees of freedom, compared to that of a Standard Normal Distribution (shown dashed). Note the way that the t-distribution is heavier-tailed than the Normal Distribution; the Normal approaches the x-axis more quickly.

-4 -2 2 4

0.1

0.2

0.3

0.4


Here is a table of the densities of the t-distribution at 3:

ν 5 10 15 25 100 500 0.0173 0.0114 0.0091 0.0073 0.0051 0.0046As ν increases, the t-distribution gets lighter-tailed. As ν → ∞, the density of the t-distribution at 3 approaches φ(3) = 0.0044, the density at 3 of the Standard Normal Distribution.As νννν →→→→ ∞∞∞∞, the t-Distribution →→→→ the Standard Normal Distribution.

Using the t-table attached to the exam, for some values of ν, here are the values at which the t-distribution is 95%, so that there is a total of 10% in both tails:ν 1 2 5 10 25 120 ∞ 6.314 2.920 2.015 1.812 1.708 1.658 1.645The value shown in the table for ν = ∞, is for the Standard Normal Distribution. Φ(1.645) = .95. For a Standard Normal Distribution, ±1.645 is a 90% confidence interval.

The survival function at 1.645 for the Standard Normal Distribution is: 1 - .95 = 5%.The survival function for a t-distribution at 1.645 is larger than 5%, since it has a heavier tail.For example, for ν = 10, from the t-table S(1.812) = 5%, while using a computer S(1.645) = 6.55% > 5%. Here is a graph of the t-distribution for 10 degrees of freedom:

-3 -2 -1 0 1 2 35% 5%

For the t-distribution, here is a graph the survival functions at 1.645 as a function of ν:

20 40 60 80 100

0.06

0.08

0.1

0.12

0.14

0.16

As ν increases, S(1.645) → 5%; the tail gets lighter and approaches a Normal Distribution.


Summary of Student’s t Distribution:*

Support: -∞∞∞∞ < x < ∞∞∞∞. Parameters: νννν = positive integer D. f. : F(x) = β[ν/2, 1/2; ν/(ν+x2)] /2 for x≤ 0

F(x) = 1 - β[ν/2, 1/2; ν/(ν+x2)] /2 for x≥ 0

For ν = 1: F(x) = .5 + ArcTan[x]/π, -∞ < x < ∞.For ν = 2: F(x) = .5 + x / 2√(2 + x2), -∞ < x < ∞.

P. d. f. : f(x) = 1/(1 + x2/ν)(ν+1)/2 β[ν/2, 1/2] ν.5where β[ν/2, 1/2] = Γ(1/2)Γ(ν/2)/Γ((ν+1)/2)β[ν/2, 1/2] = (ν/2 - 1)! / ((ν - 1)/2)((ν - 3)/2)....(1/2), for ν even.

β[ν/2, 1/2] = π (ν/2 - 1)(ν/2 - 2)....(1/2)/ ((ν−1)/2)!, for ν odd.

For ν = 1: f(x) = (1/π)1/(1 + x2), -∞ < x < ∞.For ν = 2: f(x) = 2-1.5 (1 + x2/2)-1.5 = (2 + x2)-1.5, -∞ < x < ∞.

Moments: E[Xn] = νn/2(n-1)(n-3)...(3)(1)/(ν-2)(ν-4)...(ν-n), n even, ν > n

E[Xn] = 0, n odd

Mean = 0 Variance = ν/(ν−2), ν>2 89

Skewness = 0 (symmetric) Kurtosis = 3 + 6/(ν-4), ν>4 90

Mode = 0 Median = 0

If U is a Unit Normal variable and χ2 follows an independent chi-square distribution with ν degrees of freedom, then U/ √ χ2/ν follows a t-distribution, with ν degrees of freedom.91

89 As ν goes to infinity, the variance approaches 1, that of a Standard Normal Distribution. When it exists, the variance is greater than 1. 90 As ν goes to infinity, the kurtosis approaches 3, that of a Standard Normal Distribution. When it exists, the kurtosis is greater than 3. The t-distribution is heavier-tailed than the Standard Normal Distribution.91 A chi-square distribution with ν degrees of freedom is the sum of ν squares of independent Unit Normals. See Sections 2.4 2 and 2.4.3 of Econometric Models and Economic Forecasts, by Pindyck and Rubinfeld.


Relation to the Beta Distribution:*

The t-Distribution can be written in terms of the Incomplete Beta Function.

For parameter ν, the density is:f(x) = (Γ((ν+1)/2)/ Γ(ν/2) √(π ν)) (1 + x2/ν)-(ν+1)/2 = (1/β(ν/2,1/2))(1 + x2/ν)-(ν+1)/2, -∞<x<∞.

The Distribution Function is:92 F(x) = β[ν/2, 1/2; ν/(ν+x2)] /2 for x≤ 0, and F(x) = 1 - β[ν/2, 1/2; ν/(ν+x2)] /2 for x≥ 0.

Exercise: In terms of an Incomplete Beta Function, what is the t-Distribution with 12 degrees of freedom, at -2.179?[Solution: β[12/2, 1/2, 12/(12 + 2.1792)] /2 = β[6 , .5, .7165] /2 .]

It turns out that β[6 , .5, .7165] = .050. Therefore, for the t-Distribution with 12 degrees of freedom, the distribution function at -2.179 is .050/2 = 2.5%. Similarly, at 2.179 it is 97.5%. (The t-distribution is symmetric.) Thus there is 5% probability outside at ±2.179. That is why 2.179 appears in the table of the t-distribution for 12 degrees of freedom and a total of 5% probability in the tails, (2.5% in each tail.)

92 See page 96 of Loss Models or Section 26.7 of the Handbook of Mathematical Functions by Abramowitz, et. al.


t-Table:

The rows of the t-table attached to the exam are the number of degrees of freedom.For most exam questions, one determines the number of degrees of freedom, and then looks at the appropriate row of the table, ignoring all of the other rows.

The values in each row are the sum of the area in both the righthand and lefthand tails.For example, for 5 degrees of freedom, there is a total of 10% below -2.015 and above +2.015.

-4 -3 -2.015 -1 1 2.015 3 4

5% 5%

There is 5% area in the lefthand tail below -2.015. There is also 5% area in the righthand tail above 2.015. In other words, for the t-Distribution with 5 degrees of freedom, the 5th percentile is -2.015, and the 95th percentile is 2.015. There is 90% area between -2.015 and 2.015.

For 5 degrees of freedom, similarly there is a total area of 2% below -3.365 and above +3.365.

-3.365 -2 -1 1 2 3.365

1% 1%


Percentage Points of the t Distribution

0aê2 aê2

Area in both tails (α) ν 0.10 0.05 0.02 0.01

1 6.314 12.706 31.821 63.657 2 2.920 4.303 6.965 9.925 3 2.353 3.182 4.541 5.841 4 2.132 2.776 3.747 4.604 5 2.015 2.571 3.365 4.032

6 1.943 2.447 3.143 3.707 7 1.895 2.365 2.998 3.499

8 1.860 2.306 2.896 3.355 9 1.833 2.262 2.821 3.250 10 1.812 2.228 2.764 3.169

11 1.796 2.201 2.718 3.106 12 1.782 2.179 2.681 3.055 13 1.771 2.160 2.650 3.012 14 1.761 2.145 2.624 2.97715 1.753 2.131 2.602 2.947

16 1.746 2.120 2.583 2.92117 1.740 2.110 2.567 2.89818 1.734 2.101 2.552 2.87819 1.729 2.093 2.539 2.86120 1.725 2.086 2.528 2.845

21 1.721 2.080 2.518 2.83122 1.717 2.074 2.508 2.81923 1.714 2.069 2.500 2.80724 1.711 2.064 2.492 2.79725 1.708 2.060 2.485 2.787

26 1.706 2.056 2.479 2.77927 1.703 2.052 2.473 2.77128 1.701 2.048 2.467 2.76329 1.699 2.045 2.462 2.75630 1.697 2.042 2.457 2.750

40 1.684 2.021 2.423 2.70460 1.671 2.000 2.390 2.660

120 1.658 1.980 2.358 2.617 ∞ 1.645 1.960 2.326 2.576


Problems:

11 .1 (1 point) For a t-distribution with 16 degrees of freedom, what is the distribution function at 2.583?(A) .95 (B) .975 (C) .98 (D) .99 (E) .995

11.2 (1 point) For a t-distribution with 6 degrees of freedom, what is Prob[t < -3.5]?(A) Less than 0.5%(B) At least 0.5%, but less than 1%(C) At least 1%, but less than 2.5%(D) At least 2.5%, but less than 5%(E) At least 5%

11.3 (1 point) For a t-distribution with 7 degrees of freedom, what is the distribution function at -1.895?(A) .01 (B) .02 (C) .05 (D) .10 (E) .20

11.4 (1 point) For a t-distribution with 27 degrees of freedom, what is Prob[|t| < 2]?(A) Less than 90%(B) At least 90%, but less than 95%(C) At least 95%, but less than 98%(D) At least 98%, but less than 99%(E) At least 99%

11.5 (1 point) For a t-distribution with 7 degrees of freedom, what is the distribution function at -1.895?(A) .01 (B) .02 (C) .05 (D) .10 (E) .20

11.6 (2, 5/83, Q. 42) (1.5 points) Let X1, X2, X3, and X4 be independent random variables having a normal distribution with mean 0 and variance 1. The distribution of (X1 + X4)/√(X22 + X32) is the same as that of aY where:A. a = 1 and Y has a t-distribution with 1 degree of freedom B. a = 1 and Y has a t-distribution with 2 degrees of freedom C. a = 1/√2 and Y has a t-distribution with 2 degrees of freedom D. a = √2 and Y has a t-distribution with 2 degrees of freedom E. a = 2 and Y has a t-distribution with 2 degrees of freedom

11.7 (2, 5/92, Q.26) (1.7 points) Z1 and Z2 be independent and identically distributed normal random variables with mean 0 and variance 1. If W = Z1/√Z22, then what is the number w0 for which P[W < w0] is closest to .95? A. 1.64 B. 2.92 C. 3.84 D. 5.99 E. 6.31


Section 12, t-test

The t-distribution can be used to provide confidence intervals for an estimated mean, and to test whether two samples come from Normal Distributions with the same mean. These ideas are preliminary to ideas covered on this exam, which will be discussed in subsequent sections. Some of you may find it helpful to review these preliminary ideas, even though they should not be directly tested on your exam.

Confidence Intervals for an Estimated Mean:

If one wants a confidence interval for an estimated mean and one knows the variance, then one can use the Normal Distribution.

Exercise: Based on 20 observations from a variable with variance 49, the observed mean is 31. What is a 95% confidence interval for the mean?[Solution: The mean of 20 observations has variance 49/20 = 2.45. For the Standard Normal, Φ(1.960) = .975. Therefore, ±1.960 standard deviations would have 2.5% outside on either tail, and 95% probability inside. Take 31 ± 1.960√2.45 = 31 ± 3.07 = [27.93 , 33.07].]

In general with known variance, if we want a confidence interval of probability P, we take X ± y√(σ2/n), where Φ(y) = (1 + P)/2.

If one does not know the variance, the t-distribution is used instead.The interval estimate is: the sample mean ± t0 √sample variance / √N,

where t0 is such that for the Student’s t distribution with N-1 degrees of freedom,

F(t0) - F(-t0) = desired confidence.

For example, for 10 points, in order to get a 95% confidence interval for the mean one would look up the Student’s t for 9 degrees of freedom. The probability that the absolute value of t is greater than 2.262 is 5%. Therefore for 10 data points, the sample mean ±2.262 standard

deviations / √10 is an approximate 95% confidence interval for the mean.

Exercise: Let 0, 2, 4, 5, 6, 6, 8, 9, 9, 12 be a random sample from a Normal Distribution with unknown mean and variance. What is an approximate 95% confidence interval for the mean?[Solution: The point estimate of the mean is: (0 + 2 + 4 + 5 + 6 + 6 + 8 + 9 + 9 + 12)/10 = 6.1.The second moment is: (0 + 22 + 42 + 52 + 62 + 62 + 82 + 92 + 92 + 122)/10 = 48.7.Thus the sample variance is: (10/9)(48.7 - 6.12) = 12.76. The sample standard deviation is: √12.76 = 3.57 . For 10 data points, we have 10 - 1 = 9 degrees of freedom. Consulting the t-table, the critical value for 5% and 9 degrees of freedom is 2.262.Therefore, the sample mean ±2.262 standard deviations /√10 is an approximate 95%

confidence interval for the mean. An approximate 95% confidence interval for the mean is:6.1 ± t0s/√n = 6.1 ± (2.262)(3.57)/√10 = 6.1 ± 2.55 = (3.55, 8.65). ]


In general, if we want a confidence interval for the mean of probability 1 - αααα , we take X ± t√√√√ (S2/n), where S2 is the sample variance and t is the critical value for the t-distribution with n-1 degrees of freedom and αααα area in both tails. This critical value is in the t-table attached to the exam.

Exercise: Based on 20 observations from a variable, the observed mean is 31 and the sample variance is 49. What is a 95% confidence interval for the mean?[Solution: The mean of 20 observations has variance: 49/20 = 2.45. For 20 observations we have 20 - 1 = 19 degrees of freedom, the denominator of the sample variance. Consulting the t-table, for 19 degrees of freedom and 5% total area in both tails, the critical value for the t-distribution is 2.093. Take 31 ± 2.093√2.45 = 31 ± 3.28 = [27.72, 32.08].]

We note that when we have an unknown variance, the use of the t-distribution results in a somewhat wider interval than the use of the Normal Distribution would have, if S2 = σ2.In addition, in the case of an unknown variance, S2 is an estimate of this unknown variance.In both cases, the confidence interval is approximate.

In the case with known variance, we do not always require that the variable being observed be Normal, but rather that its average can be approximated by a Normal Distribution. Similarly, in the case with unknown variance, in order to employ the t-distribution to get a confidence interval, we do not always require that the variable being observed be Normal, but rather that its average is approximately Normal.

Exercise: Based on 200 observations from a variable, the observed mean is 47 and the sample variance is 112. What is a 95% confidence interval for the mean?[Solution: The mean of 200 observations has variance: 112/200 = 0.56. For 200 observations we have 200 - 1 = 199 degrees of freedom. Consulting the t-table, for 199 degrees of freedom and 10% total area in both tails, the critical value for the t-distribution is 1.960. Take 47 ± 1.960√0.56 = 47 ± 1.47 = [45.53, 48.47].Comment: For large samples, the t-distribution is approximately Normal. Therefore, the critical values in the t-table for ∞ degrees of freedom are those for the Normal Distribution.]


The t-Statistic, and Testing Hypotheses about the Mean:

For the null hypothesis H0: µµµµ = µµµµ0000, the test statistic is the t-statistic:

( X - µµµµ 0000)/(S/√√√√n).If H0 is true, then the t-statistic follows a t-distribution with n-1 degrees of freedom.

For example, let 1, 4, 6, 9 is a sample of size four from a Normal Distribution.Then X = 20/4 = 5. S2 = (1 - 5)2 + (4 - 5)2 + (6 - 5)2 + (9 - 5)2/3 = 11.333.

Exercise: Test the hypothesis that H0: µ = 9 versus H1: µ ≠ 9.

[Solution: t = 2( X - µ)/S = (2)(5 - 9)/√√√√11.333 = -2.376.For 3 d.f., for a 2-sided test, the 10% critical value is 2.353 and the 5% critical value is 3.182.2.353 < 2.376 < 3.182. ⇒ reject H0 at 10% and do not reject at 5%.]

Therefore, the t-statistic and the t-table can be used to test H0: µ = µ0, versus H1: µ ≠ µ0, using a two-sided test.

Exercise: A sample of size 25 from a Normal Distribution, has a mean of 8 and sample variance of 17. Test the hypothesis that H0: µ = 6 versus H1: µ ≠ 6.

[Solution: t = ( X - µ)/(S/√n) = (8 - 6)/√(17/25) = 2.425.For 24 d.f., for a 2-sided test, the 5% critical value is 2.064 and the 2% critical value is 2.492. 2.064 < 2.425 < 2.492. ⇒ reject H0 at 5% and do not reject at 2%.]

If H1: µ ≠ µ0, reject H0: µ = µ0 at a significance level of α if |t| ≥ tα.


Relationship to Confidence Intervals:

Testing the hypothesis that µ takes on a particular value µ0, is equivalent to testing whether

that value µ0 is in the appropriate confidence interval for µ.

Assume one has a sample of size 29 from a Normal Distribution, with X = -0.64, and S2/29 = 0.182. Then for 29 - 1 = 28 degrees of freedom, the critical values of the t-distribution are:

Area in both tails (α) ν 0.10 0.05 0.02 0.0128 1.701 2.048 2.467 2.763

Therefore, we can get the following confidence intervals for µ:90% confidence interval: -.64 ± (1.701)(.18) = -.64 ± .31 = [-0.95, -0.33].95% confidence interval: -.64 ± (2.048)(.18) = -.64 ± .37 = [-1.01, -0.27].98% confidence interval: -.64 ± (2.467)(.18) = -.64 ± .44 = [-1.08, -0.20]. 99% confidence interval: -.64 ± (2.763)(.18) = -.64 ± .49 = [-1.13, -0.15].

-1.2 -1 -0.8 -0.6 -0.4 -0.2 0

90%

95%

98%

99%

Zero is not in the 99% confidence interval for µ. Therefore, there is less than a 1 - 99% = 1% probability that µ has a value at least as far (on either side) from X as 0. Therefore, if H0 is the

hypothesis that µ = 0, and H1 is µ ≠ 0, then we can reject H0 at 1%.

On the other hand, -.25 is in the 98% confidence interval but not in the 95% confidence interval. Therefore, for the hypothesis that µ = -.25, we reject at 5% but do not reject at 2%.

In general, if µ0 is not within the P confidence interval for µ, then reject at significance level

1- P the hypothesis that µ = µ0 in favor of the alternative µ ≠ µ0.93

93 Confidence values are large such as 90%, 95%, or 99%, while significance levels are small such as 1%, 5%, or 10%.


One-Sided Tests:

If the alternative hypothesis is either H1: µ < µ0 or H1: µ > µ0, then one performs a one-sided

test.

Exercise: A sample of size 25 from a Normal Distribution, has a mean of 8 and sample variance of 17. Test the hypothesis that H0: µ = 6 versus H1: µ > 6.

[Solution: t = (X - µ)/(S/√n) = (8 - 6)/√(17/25) = 2.425.For 24 d.f., for a 1-sided test, the 2.5% critical value is 2.064 and the 1% critical value is 2.492. 2.064 < 2.425 < 2.492. ⇒ reject H0 at 2.5% and do not reject at 1%.Alternately, as discussed below, t = 2.425 is in the shaded tail with 2.5% probability. ⇒ Reject H0 at 2.5%.

-4 -3 -2 -1 2.064 4x

fHxL

2.5%

-4 -3 -2 -1 2.492 4x

fHxL

1%

t = 2.425 is not in the shaded tail with 1% probability. ⇒ Do not reject H0 at 1%.]

One needs to recall that the values at the top of the columns in the t-table are the sum of the areas in both tails. In the above exercise, there is a total area of 5% below -2.064 and above 2.064. Performing the one-sided test, we are interested in the 2.5% in the righthand tail above 2.064.

When performing a one-sided test, one is interested in the area in one tail, and therefore you should halve the values at the top of the columns in the t-table.Put another way:If H1: µ > µ0, reject H0: µ = µ0 at a significance level of α if t ≥ t2α.

If H1: µ < µ0, reject H0: µ = µ0 at a significance level of α if -t ≥ t2α.

Exercise: A sample of size 15 from a Normal Distribution, has a mean of 7 and sample variance of 12. Test the hypothesis that H0: µ = 10 versus H1: µ < 10.

[Solution: t = ( X - µ)/(S/√n) = (7 - 10)/√(12/15) = -3.354.For 14 degrees of freedom, for a 1-sided test, the 0.5% critical value is 2.977. 3.354 > 2.977. ⇒ Reject H0 at 0.5%.]


Testing Whether Two Samples from Normal Distributions Have the Same Mean:*94

Assume you have the loss ratios (losses divided by premiums) for two similar insurers writing the same line of business in a state.

Loss Ratios (%)Year Insurer A Insurer B1 72.2 71.22 68.3 76.13 72.6 78.34 70.1 77.85 69.4 73.0

Assume that each set of five loss ratios is a sample from a Normal Distribution. Further assume that the two Normal Distributions have the same (unknown) variance.95

We wish to test the hypothesis that the two insurers have the same expected loss ratio, in other words that the two Normal Distributions have the same mean.

We test the hypothesis H0 that the mean of the difference in expected loss ratios is zero, versus the alternate that it is not.

The five differences in loss ratio are: 1.0, -7.8, -5.7, -7.7, -3.6.96 The mean of the five differences is: -4.76.The second moment of the five differences is: 12 + 7.82 + 5.72 + 7.72 + 3.62/5 = 33.316.

The sample variance of the five differences is: (5/4)(33.316 - 4.762) = 13.323.The estimated variance of the mean difference is: 13.323/5 = 2.6646.

The t-statistic of the test of H0 is: sample mean / √(sample variance / n) = -4.76/√2.6646 = -2.916.

We perform a two-sided t-test.The number of degrees = the sample size - 1 = denominator of the sample variance = 4. From the t-table:

ν 0.1 0.05 0.02 0.01 4 2.132 2.776 3.747 4.604

Since we 2.776 < 2.916 < 3.747, we reject H0 at 5% and do not reject H0 at 2%.

94 See for example, Statistical Methods, by Snedecor and Cochran.95 These assumptions lead to the difference being Normally Distributed, which is required for the t-test to be statistically valid. 96 The result of the t-test would be the same, regardless of in which order we took the differences.


Problems:

12 .1 (2 points) One observes a sample of 10 values from a variable which has a variance of 80: 18, 24, 33, 34, 30, 35, 39, 12, 18, 30. Determine the upper end of the symmetric 95% confidence interval for the mean of this variable.(A) Less than 33(B) At least 33, but less than 34(C) At least 34, but less than 35(D) At least 35, but less than 36(E) At least 36

12 .2 (3 points) One observes a sample of 10 values from a variable: 18, 24, 33, 34, 30, 35, 39, 12, 18, 30. Determine the upper end of the symmetric 95% confidence interval for the mean of this variable.(A) Less than 33(B) At least 33, but less than 34(C) At least 34, but less than 35(D) At least 35, but less than 36(E) At least 36

12.3 (2 points) Let 0, 3, 4, 4, 6, 9, 9, 13 be a random sample from a distribution with unknown mean and variance. Which of the following is an approximate 90% confidence interval for the mean of this distribution?A. ( 3.93, 8.07 )B. ( 4.06, 7.94 )C. ( 3.28, 8.72 )D. ( 3.41, 8.59 )E. None of A, B, C, or D.

12.4 (2 points) A random sample of eleven observations yields the values: 8, 14, 18, 20, 21, 22, 26, 30, 42, 55, and 96. ΣXi = 32. ΣXi2 = 1590.Assume the sample is taken from a Normal Distribution, with unknown mean and variance.Determine a 95% confidence interval for the mean.A. (19.6, 44.4) B. (17.3, 46.7) C. (15.4, 48.6) D. (15.2, 48.8) E. (15.0, 49.0)

12.5 (2 points) A sample of size 20 from a Normal Distribution, has a sample mean of -6 and sample variance of 46. Test the hypothesis that H0: µ = -2 versus H1: µ < -2.Which of the following is true? A. Reject H0 at 0.5%.B. Do not reject H0 at 0.5%. Reject H0 at 1%.C. Do not reject H0 at 1%. Reject H0 at 2.5%.D. Do not reject H0 at 2.5%. Reject H0 at 5%.E. Do not reject H0 at 5%.


Use the following information for the next three questions:

An insurer is investigating whether to introduce a program to inspect all the homes it insures for homeowners insurance, with hopes that by then inducing homeowners to repair damaged roofs, eliminate fire hazards, etc., it will lead to a reduction in losses.To test the program, in each of eight counties the insurer inspects at random half the homes it insures and provides the homeowners with the appropriate advice. It then collects data on the loss ratios the following year for both sets of homes in each county.

County1 2 3 4 5 6 7 8

With new Program: 56 52 52 49 59 56 60 56Without new Program: 49 57 64 53 64 68 68 66Each loss ratio is losses divided by premiums shown as a percent. For example, the displayed loss ratio of 56 means losses were 56% of premiums.

12.6 (3 points) Use the t-distribution to test the hypothesis that the two samples have the same mean.A. Reject H0 at 1%.B. Do not reject H0 at 1%. Reject H0 at 2%.C. Do not reject H0 at 2%. Reject H0 at 5%.D. Do not reject H0 at 5%. Reject H0 at 10%.E. Do not reject H0 at 10%.

12.7 (1 point) Let H0 be the hypothesis that the expected loss ratio with inspections is greater than or equal to that without inspections. Let H1 be the hypothesis that the expected loss ratio with inspections is less than that without inspections. A. Reject H0 at 0.5%.B. Do not reject H0 at 0.5%. Reject H0 at 1%.C. Do not reject H0 at 1%. Reject H0 at 2.5%.D. Do not reject H0 at 2.5%. Reject H0 at 5%.E. Do not reject H0 at 5%.

12.8 (1 point) Spreading the cost of this inspection program over several years, the cost would be equivalent to 1.5% of premiums (for those homes to which it was applied). Let H0 be the hypothesis that the expected loss ratio with inspections plus the cost of inspections is greater than or equal to the expected loss ratio without inspections. Let H1 be the hypothesis that the expected loss ratio with inspections plus the cost of inspections is less than the expected loss ratio without inspections. Which of the following is true? A. Reject H0 at 0.5%.B. Do not reject H0 at 0.5%. Reject H0 at 1%.C. Do not reject H0 at 1%. Reject H0 at 2.5%.D. Do not reject H0 at 2.5%. Reject H0 at 5%.E. Do not reject H0 at 5%.


12.9 (2 points) A sample of size 15 from a Normal Distribution, has a sample mean of 60 and sample variance of 33. Test the hypothesis that H0: µ = 55 versus H1: µ ≠ 55.Which of the following is true? A. Reject H0 at 1%.B. Do not reject H0 at 1%. Reject H0 at 2%.C. Do not reject H0 at 2%. Reject H0 at 5%.D. Do not reject H0 at 5%. Reject H0 at 10%.E. Do not reject H0 at 10%.

12.10 (3 points) Five pairs of college graduates are selected who are otherwise similar except for gender. Their starting salaries are as follows:Pair 1 2 3 4 5Male 43 65 54 35 86Female 38 66 48 35 76The starting salaries of males and females are each Normally Distributed.H0: the mean starting salary of males is equal to that of females.H1: the mean starting salary of males is greater than that of females.A. Reject H0 at 1/2%.B. Do not reject H0 at 1/2%. Reject H0 at 1%.C. Do not reject H0 at 1%. Reject H0 at 2.5%.D. Do not reject H0 at 2.5%. Reject H0 at 5%.E. Do not reject H0 at 5%.

12.11 (2, 5/83, Q. 47) (1.5 points) Let X1, X2, . . ., X11 be a random sample of size 11 from a

normal distribution with unknown mean µ and unknown variance σ2 > 0. If Σxi = 132 and Σ(xi - x)2 = 99, then for what value of k is

(12 - k√.90, 12 + k√.90) a 90% confidence interval for µ? A. 1.36 B. 1.37 C. 1.64 D. 1.80 E. 1.81

12.12 (2, 5/85, Q. 8) (1.5 points) Let x1, x2, x3, x4 be the values of a random sample from a

normal distribution with unknown mean µ and unknown variance σ2 > 0. The null hypothesis H0: µ = 10 is to be tested against the alternative H1: µ ≠ 10 at a significance level (size) of .05

using the Student’s t-statistic. If the resulting sample mean is X = 15.84 and s2 = Σ(Xi - X)2/3 = 16, then what are the critical t-value and the decision reached?A. t = 2.13: reject H0 B. t = 2.35: do not reject H0 C. t = 2.78: reject H0D. t = 3.18: do not reject H0 E. t = 3.18: reject H0


12.13 (2, 5/85, Q. 17) (1.5 points) Let X1,. . . . X9, be a random sample from a normal

distribution with unknown mean µ and unknown variance σ2 > 0.

Let X = X / 9ii=1

9∑∑ and S2 = (X - X) / 8i

i=1

9 2∑∑ . What is P[( X- µ) < .62S]?

A. 0.050 B. 0.100 C. 0.500 D. 0.900 E. 0.950

12.14 (2, 5/85, Q. 41) (1.5 points) Let X1, . . . , X10 be the values of a random sample from a

normal distribution with unknown mean µ and unknown variance σ2 > 0. Let x be the sample mean, and let s2 = (1/9)Σ(xi - x)2.

Which of the following is a 95% confidence interval for µ? A. ( x - 2.26 s/√10, x + 2.26 s/√10) B. ( x - 2.26 s/√9, x + 2.26 s/√9) C. ( x - 2.23 s/√10, x + 2.23 s/√10) D. ( x - 2.23 s/√9 , x + 2.23 s/√9) E. ( x - 1.83 s/√10 , x + 1.83 s/√10)

12.15 (4, 5/87, Q.58) (2 points) Let 0, 3, 5, 5, 5, 6, 8, 8, 9, 11 be a random sample from a distribution.Which of the following is an approximate 95% confidence interval for the mean?A. (3.738, 8.262) B. (3.855, 8.145) C. (3.934, 8.066) D. (4.040, 7.960) E. Cannot be determined.

12.16 (2, 5/88, Q. 19) (1.5 points) Let X1, X2, ..., X9 be a random sample from a normal

distribution with mean µ and variance σ2 > 0. The null hypothesis H0: µ = 50 is tested against

the alternative H1: µ > 50 at a significance level (size) of .025.

If X = 52.53 and (1/8) ( )X Xi2

i = 1

9−−∑∑ = (3.3)2, what is the value of the Student's t-statistic and its

critical value? A. 0.77; 2.26 B. 0.77; 2.31 C. 2.30; 1.96 D. 2.30; 2.26 E. 2.30; 2.31

12.17 (2, 5/88, Q. 39) (1.5 points) A random sample of size 3 from a normal distribution yielded the values 12, 8 and 10. A 95% confidence interval for µ based on the standard t-statistic is of the form (k, ∞). What is k?A. 4.2 B. 5.0 C. 6.6 D. 7.3 E. 8.1


12.18 (2, 5/90, Q. 6) (1.7 points) Let (X1, Y1), (X2, Y2), (X3, Y3), be a random sample of

paired observations from distributions with means µX and µY respectively, and with positive

variances. The null hypothesis H0: µX = µY is to be tested against the alternative H1: µX ≠ µY, using the Student’s t statistic based on the difference scores Xi - Yi. If the significance level (size) of the test is .05, and the value of the test statistic is 4.10, what is the critical value of this test and what is the decision reached? A. 2.92, reject H0 B. 3.18, reject H0 C. 4.30, reject H0 D. 3.18, do not reject H0 E. 4.30, do not reject H0

12.19 (4, 5/90, Q.38) (2 points) The following observations:2, 0, 4, 4, 6, 3, 1, 5, 6, 9

are taken from a normal distribution with mean µ and variance σ2.

Which of the following is an approximate 90% confidence interval for the mean µ? A. 2.84 ≤ µ ≤ 5.16 B. 2.83 ≤ µ ≤ 5.17 C. 2.82 ≤ µ ≤ 5.18D. 2.45 ≤ µ ≤ 5.55 E. 2.43 ≤ µ ≤ 5.57

12.20 (2, 5/92, Q.23) (1.7 points) In a random sample of 15 residents of Tampa, the time (in minutes) spent commuting to work has a sample mean of 47.21 and an unbiased sample variance of 135. If commute times are normally distributed, then what is the shortest 90% confidence interval for the mean commute time? A. (41.93, 52.49) B. (42.29, 52.13) C. (43.16, 51.26) D. (43.37, 51.05) E. (45.84, 48.57)

12.21 (2, 5/92, Q. 43) (1.7 points) Let X1, . . . , X9 and Y1, . . . , Y9 be random samples from

independent normal distributions with common mean µ and variances σX2 > 0 and σY2 > 0.

Let Zi = Yi - Xi for i = 1, ..., 9, Z= Z 9ii=1

9/∑∑ and SZ2 = (Z - Z) 8i

2

i 1

9

==∑∑ / .

What is the value of c such that P[ Z /SZ ≤ c] = .95 ? A. .207 B. .547 C. .620 D. 1.640 E. 1.860

12.22 (2, 2/96, Q.6) (1.7 points) Let (X1, Y1), ..., (X8, Y8) be a random sample from a

bivariate normal distribution with means µx and µy and nonzero variances. The null hypothesis

H0: µx = µy is rejected in favor of the alternate hypothesis H1: µx ≠ µy if 8

√8 I X - Y I /√(1/7)Σ(Xi - Yi) - (X - Y )2 > k. i=1Determine the value of k for which the significance level (size) of the test is 0.05. A. 1.64 B. 1.90 C. 1.96 D. 2.31 E. 2.37


12.23 (4B, 11/96, Q.24) (3 points) The random variable X has a lognormal distribution, with parameters µ and σ. A random sample of four observations of X yields the values: 2, 8, 13, and 27. Determine a 90% confidence interval for µ.A. (0.867, 3.449) B. (0.989, 3.328) C. (1.040, 3.276) D. (1.145, 3.171) E. (1.256, 3.061)

12.24 (IOA 101, 9/00, Q.4) (2.25 points) Suppose that a random sample of nine observations is taken from a normal distribution with mean µ = 0. Let X and S2 denote the sample mean and variance respectively. Determine the probability that the value of X exceeds that of S, i.e. determine P( X > S).


Section 13, Confidence Intervals for Estimated Parameters

One can use the t-distribution to get confidence intervals for estimated parameters.

Exercise: One has fit a two-variable regression to 30 observations.

α = 37.2. ^β = -0.64. sα = 0.53. βs = 0.18.

Determine a 95% confidence interval for the intercept. Determine a 95% confidence interval for the slope.[Solution: For 30 - 2 = 28 degrees of freedom, for 1 - 95% = 5% area in both tails, the critical value for the t-distribution is 2.048. The 95% confidence interval for the intercept is:

α + t sα = 37.2 ± (2.048)(.53) = 37.2 ± 1.1 = [36.1, 38.3].The 95% confidence interval for the slope is:^β + t βs = -.64 ± (2.048)(.18) = -.64 ± .37 = [-1.01, -0.27]. ]

In general, in order to get a confidence interval for a regression parameter, one uses the critical value for N - k degrees of freedom.97 In order to cover probability 1 - α, take the t-distribution critical value corresponding to αααα area in both tails.

Then the confidence interval is: ^ββ ± t ββs .

Exercise: For the heights example with 8 observations, ^β = .6254 and βs = .101.

Construct a 99% confidence interval for the slope. [Solution: The critical value at 1% for the t-distribution with 6 degrees of freedom is 3.707. Therefore a 99% confidence interval for the slope is: .6254 ± (3.707)(.101) = .6254 ± .3744 = [.25, 1.00]. ]

97 k = number of variables including the intercept. k = 2 for the two-variable model.


Problems:

13 .1 (1 point) You fit the regression model Yi = α + βXi + εi to 25 observations.

^β = 1.73. βs = 0.24.

Determine the lower limit of the symmetric 90% confidence interval for the slope parameter.(A) 1.1 (B) 1.2 (C) 1.3 (D) 1.4 (E) 1.5

13.2 (1 point) You fit the regression model Yi = α + βXi + εi to 12 observations. sα = 126.Determine the width of the symmetric 98% confidence interval for the intercept.(A) 500 (B) 550 (C) 600 (D) 650 (E) 700

Use the following information for the next two questions:The following model: Y = α + βX + ε, has been fit to 10 observations.

α = 6.93.^β = 2.79. ΣXi2 = 385. ΣYi2 = 5893.

Σxi2 = Σ(Xi - X)2 = 82.5. Σyi2 = Σ(Yi - Y )2 = 920. Σîε 2 = Σ(Yi -

îY )2 = 276.

13 .3 (2 points) Determine the shortest 99% confidence interval for α.(A) (-8.0, 21.9) (B) (-6.5, 20.4) (C) (-5.0, 18.9) (D) (-3.5, 17.4) (E) (-2.0, 15.9)

13 .4 (2 points) Determine the shortest 99% confidence interval for β.(A) (1.4, 4.2) (B) (1.2, 4.4) (C) (1.0, 4.6) (D) (.8, 4.8) (E) (.6, 5.0)

Use the following information for the next two questions:You fit the following model to 12 observations: Y = α + βX + ε.

You are given: α = 9.88.^β = 2.36.

Σ(Xi - X)2 = 1283. Σ(Yi - ^

iY )2 = 272. ΣYi = 390.

13.5 (2 points) Determine the upper limit of the symmetric 90% confidence interval for α.(A) 12.8 (B) 13.0 (C) 13.2 (D) 13.4 (E) 13.6

13.6 (2 points) Determine the upper limit of the symmetric 90% confidence interval for β.(A) 2.6 (B) 2.8 (C) 3.0 (D) 3.2 (E) 3.4

13.7 (8 points) You are given the following 6 pairs of observations:X: 10 25 50 100 250 500Y: 60 40 50 30 10 0Fit the regression model: Y = α + βX + ε.Determine 98% confidence intervals for α and β.


Use the following information for the next two questions:You fit a two-variable linear regression model to 14 pairs of observations. You are given:The sample mean of the independent variable is 13.86.The sum of squared deviations from the mean of the independent variable is 3096.The sample mean of the dependent variable is 25.86.The sum of squared deviations from the mean of the dependent variable is 6748.The ordinary least-squares estimate of the slope parameter is 0.643.The regression sum of squares (RSS) is 1279.

13.8 (2 points) Determine the lower limit of the symmetric 95% confidence interval for the intercept parameter.(A) -3 (B) -2 (C) -1 (D) 0 (E) 1

13.9 (2 points) Determine the lower limit of the symmetric 95% confidence interval for the slope parameter.(A) -0.6 (B) -0.4 (C) -0.2 (D) 0 (E) 0.2

13.10 (Course 120 Sample Exam #1, Q.3) ( 2 points) You fit the regression model Yi = α + βXi + εi to 10 observations (Xi, Yi).

You determine: Σ(Yi - ^

iY )2 = 2.79. Σ(Xi - X)2 = 180. Σ(Yi - Y )2 = 152.40. X = 6. Y = 7.78.

Determine the width of the shortest symmetric 95% confidence interval for α.(A) 1.1 (B) 1.2 (C) 1.3 (D) 1.4 (E) 1.5

13.11 (Course 120 Sample Exam #3, Q.2) (2 points) You fit a regression model Yi = α + βXi + εi to 12 observations.

You determine that the symmetric 95% confidence interval for β is (1.2, 3.8) and thatΣ(Xi - X)2 = 0.826.

Determine the residual variance, s2.(A) 0.1 (B) 0.2 (C) 0.3 (D) 0.4 (E) 0.5


13.12 (IOA 101, 4/00, Q.16) (9.75 points)The table below contains measurements on the strengths of beams. The width and height of each beam was fixed but the lengths varied. Data are available on the length (cm) and strength (Newtons) of each beam. Length, l x = log l Strength, p y = log p Fitted value Residual

7 1.946 11775 9.374 9.379 -0.0057 1.946 11275 9.330 9.379 -0.0499 2.197 8400 9.036 9.055 -0.0199 2.197 8200 9.012 9.055 -0.04312 2.485 6100 8.716 8.684 0.03212 2.485 6050 8.708 8.684 0.02414 2.639 5200 8.556 8.486 0.07018 2.890 3750 8.230 8.162 0.06818 2.890 3650 8.202 8.162 0.04020 2.996 3275 8.094 8.026 0.06820 2.996 3175 8.063 8.026 0.03724 3.178 2200 7.696 7.791 -0.09524 3.178 2125 7.662 7.791 -0.129 Σx = 34.023, Σx2 = 91.3978, Σy = 110.679, Σxy = 286.6299

It is thought that P and L satisfy the law P = k/L where k is a constant, solog P = log k - log L, i.e. Y = log k - X.

A graph of log P against log L is displayed below.

The simple linear regression model y = α + βx has been fitted to the data, and the fitted values and residuals are recorded in the table above.


(i) (2.25 points) Use the data summaries above to calculate the least squares estimates

α of α and ^β of β. Show all work.

(ii) (5.25 points) Assuming the usual normal linear regression model(a) estimate the error variance σ2,(b) calculate a 95% confidence interval for β, and(c) discuss briefly whether the data are consistent with the relationship P = k/L.(iii) (2.25 points) Plot the residuals of the model against X and comment on the information

contained in the plot.

13.13 (4, 11/00, Q.5) (2.5 points) You are investigating the relationship between per capita consumption of natural gas and the price of natural gas. You gathered data from 20 cities and constructed the following model: Y = α + βX + ε, where Y is per capita consumption, X is the price, and ε is a normal random error term.

You have determined: α = 138.561.^β = -1.104. ΣXi2 = 90,048. ΣYi2 = 116,058.

Σxi2 = Σ(Xi - X)2 = 10,668. Σyi2 = Σ(Yi - Y )2 = 20,838. Σîε 2 = Σ(Yi -

îY )2 = 7,832.

Determine the shortest 95% confidence interval for β.(A) (-2.1, -0.1) (B) (-1.9, -0.3) (C) (-1.7, -0.5) (D) (-1.5, -0.7) (E) (-1.3, -0.9)

13.14 (2 points) In the previous question, determine the corrected R2, 2

R .(A) 0.54 (B) 0.56 (C) 0.58 (D) 0.60 (E) 0.62

13.15 (4, 11/01, Q.5) (2.5 points) You fit the following model to eight observations:Y = α + βX + ε.You are given:^β = -35.69.Σ(Xi - X)2 = 1.62.

Σ(Yi - ^

iY )2 = 2394.

Determine the symmetric 90-percent confidence interval for β.(A) (–74.1, 2.7) (B) (–66.2, –5.2) (C) (–63.2, –8.2) (D) (–61.5, –9.9) (E) (–61.0, –10.4)


13.16 (IOA 101, 4/02, Q.14) (13.5 points) The table below gives the numbers of deaths nx in a year in groups of women aged x years. The exposures of the groups, denoted Ex, are also given (the exposure is essentially the number of women alive for the year in question). The values of the death rates yx, where yx = nx/Ex, and the log(death rates), denoted wx, are also given.age x number of deaths nx exposure Ex yx = nx/Ex wx = logyx70 30 426 0.07042 -2.653271 38 471 0.08068 -2.517372 38 454 0.08370 -2.480573 53 482 0.10996 -2.207774 59 445 0.13258 -2.020575 61 423 0.14421 -1.936576 82 468 0.17521 -1.741777 96 430 0.22326 -1.4994Σx = 588 Σx2 = 43260 Σw = -17.0568 Σw2 = 37.5173 Σxw = -1246.7879(i) (2.25 points) A scatter plot of yx against x is shown below.

70 71 72 73 74 75 76 77x

0.0750.1

0.1250.15

0.1750.2

0.225yx

Draw a scatter plot of wx against x and comment briefly on the two scatter plots and the relationships displayed. (ii) (9 points) (a) Calculate the least squares fit regression line in which wx is modeled as the response and x as the explanatory variable.(b) Draw the fitted line on your scatter plot of wx against x.(c) Calculate a 95% confidence interval for the slope coefficient of the regression model of wx on x, adopting the assumptions of the usual “normal regression model”.(d) Calculate the fitted values for the number of deaths for the group aged 71 years and the group aged 76 years. (iii) (2.25 points) Explain briefly the relationship between the fitting procedure used in part (ii)and a model which states that the number of deaths Nx is a random variable with mean Exbcx for some constants b and c.


13.17 (IOA 101, 9/02, Q.13) (9.75 points) The table below gives the frequency of coronary heart disease by age group. The table also gives the age group midpoint (x) and y = ln[p/(1-p)], where p denotes the proportion in an age group with coronary heart disease.

Coronary Heart DiseaseAge group x Yes No n y20-29 25 1 9 10 -2.1972230-34 32.5 2 13 15 -1.8718035-39 37.5 3 9 12 -1.0986140-44 42.5 5 10 15 -0.6931545-49 47.5 6 7 13 -0.1541550-54 52.5 5 3 8 0.5108355-59 57.5 13 4 17 1.1786560-69 65 8 2 10 1.38629Σx = 360; Σx2 = 17437.5; Σy = -2.9392; Σy2 = 13.615; Σxy = -9.0429. Consider the regression model y = α + βx.(a) Draw a scatterplot of y against x, and comment on the appropriateness of the suggested model.(b) Calculate the least squares fitted regression line of y on x.(c) Calculate a 99% confidence interval for the slope parameter.(d) Discuss whether there are differences in the probability of having coronary heat disease for the different age groups with reference to the confidence interval obtained in (ii)(c). Comment: Only part ii of the original past exam question is shown here.

13.18 (4, 11/02, Q.38) (2.5 points) You fit a two-variable linear regression model to 20 pairs of observations. You are given:(i) The sample mean of the independent variable is 100.(ii) The sum of squared deviations from the mean of the independent variable is 2266.(iii) The ordinary least-squares estimate of the intercept parameter is 68.73.(iv) The error sum of squares (ESS) is 5348.Determine the lower limit of the symmetric 95% confidence interval for the intercept parameter.(A) -273 (B) -132 (C) -70 (D) -8 (E) -3

13.19 (2 points) In the previous question, what is the width of a 95% confidence interval for the slope parameter?(A) 1.5 (B) 1.6 (C) 1.7 (D) 1.8 (E) 1.9


Mahler’s Guide to

Regression

Sections 14-17:

14 F Distribution15 Testing the Slope, Two Variable Model16 Hypothesis Testing17 A Simulation Experiment *


prepared by


Study Aid F06-Reg-D


Section 14, F-Distribution

The F-distribution is used in many tests of hypotheses of regression models.We will start with a review of the variance ratio test, which is preliminary to the ideas on the your exam.

F-Test or Variance Ratio Test:98

Assume you have 5 observations of a variable X: 5, 5, 7, 8, 10. Then the sample mean is: (5 + 5 + 7 + 8 + 10)/5 = 7. The sample variance is: ((5 - 7)2 + (5 - 7)2 + (7 - 7)2 + (8 - 7)2 + (10 - 7)2)/(5 -1) = 4.5.

Exercise: Given three observations of the variable Y: 8, 12, 22, what are the mean and sample variance?[Solution: Mean is: (8 + 12 + 22)/3 = 14. Sample Variance is: ((8 - 14)2 + (12 - 14)2 + (22 - 14)2)/(3 -1) = 52.]

If we assume X is Normally distributed and Y is Normally distributed, then we can apply an F-Test, in order to test the hypothesis that X and Y have the same variance. The test statistic is the ratio of the sample variances, with the larger one in the numerator:99 (Sample Variance of Y)/(Sample Variance of X) = 52/4.5 = 11.56.

Consulting the F-Table, the critical value for 5% for 2 and 4 degrees of freedom100 is 6.94. Therefore, since 11.56 > 6.94, if the null hypothesis is true, there is less than a 5% chance of seeing a variance ratio of 11.56 or higher. Doing a two-sided test, there is less than a 10% chance of seeing a variance ratio this high, with either sample variance larger. Thus we reject the null hypothesis at a 10% level (two-sided test.)

Consulting the F-Table, the critical value for 1% for 2 and 4 degrees of freedom101 is 18.00. Therefore, since 11.56 < 18.00, if the null hypothesis is true, there is more than a 1% chance of seeing a variance ratio of 11.56 or higher. Doing a two-sided test, there is more than a 2% chance of seeing a variance ratio this high, with either sample variance larger. Thus we do not reject the null hypothesis at a 2% level (two-sided test.)

While this is the original and most common use of the F-Distribution, this is not tested on your exam! Rather, as will be discussed in later sections, you may be asked to apply the F-Distribution to test various hypotheses about the slope coefficients of regressions. The F-Test depends on assuming X and Y are each (approximately) Normal.

98 See for example Section 2.4.4 of Econometric Models and Economic Forecasts, by Pindyck and Rubinfeld.99 One takes the ratio of the larger sample variance to that of the smaller sample variance.

One expects the ratio to be near one if the null hypothesis is true. H0: σY2 = σX2.100 The sample variance of Y in the numerator has 2 degrees of freedom, the denominator of its sample variance.The sample variance of X in the denominator has 4 degrees of freedom, the denominator of its sample variance.101 The sample variance of Y in the numerator has 2 degrees of freedom, the denominator of its sample variance.The sample variance of X in the denominator has 4 degrees of freedom, the denominator of its sample variance.

HCMSA-F06-Reg-D, Mahler’s Guide to Regression, 7/11/06, Page 122

Relationship to the Chi-Square Distribution:*

In the case above, if X is Normal, then the numerator of the sample variance of X is a sum of squared Normals, each with mean zero and the same variance. Thus if we divide by the variance we get a Chi-Square distribution.102 Thus the numerator of this sample variance is σX2 times a Chi-Square distribution with 4 degrees of freedom. The sample variance of X is therefore, (σX2 /4) times a Chi-Square distribution with 4 degrees of freedom.

Similarly, the sample variance of Y is therefore, (σY2 /2) times a Chi-Square distribution with 2 degrees of freedom.103

(Sample Variance of Y)/(Sample Variance of X) = (σY2 /σX2)(Chi-Square 2 d.f./ 2)/(Chi-Square 4 d.f./ 4).

If σX2 = σY2, then this ratio follows an F-Distribution with 2 and 4 degrees of freedom.

In general, if χ12 follows a chi-square distribution with ν1 degrees of freedom and χ22 follows

an independent chi-square distribution with ν2 degrees104, then (χ12/ν1)/(χ22/ν2) follows an F-

distribution, with ν1 and ν2 degrees of freedom.

F-Statistic:

In the example above, the numerator of the F-Statistic was the sample variance of Y, a sum of squares divided by its number of degrees of freedom. Similarly, the denominator of the F-Statistic was the sample variance of X, a sum of squares divided by its number of degrees of freedom.

Generally an F-Statistic will involve in the numerator some sort of sum of squares divided by its number of degrees of freedom. In the denominator will be another sum of squares divided by its number of degrees of freedom.

When applied to regression models, the particular numerator and denominator of the computed F-Statistic will depend on the particular hypothesis being tested. However, they will all have this same basic form.

102 A sum of squares of Unit Normals is a Chi-Square Distribution, a special case of the Gamma Distribution. We lose one degree of freedom because the sum of Xi - X is zero, and therefore, knowing the value of n-1 of these

terms, we know the value of the last one.103 This is where the assumption that Y is Normal is used. 104 For example, in the variance ratio test, χ12 would be the estimated variance from a sample of ν1 drawn from a

Normal Distribution and χ22 would be the estimated variance from an independent sample of ν2 drawn from a

second Normal Distribution.


Graphs:

The F-Distribution has two parameters, ν1 and ν2, each integers.

Here is a graph of the F-Distribution for ν1 = 2 and ν2 = 4:105

2 4 6 8 10

0.2

0.4

0.6

0.8

1

The F-Distribution is a heavy-tailed distribution.106 The F-Distribution is skewed to the right; it has positive skewness.Here is a graph of the F-Distribution for ν1 = 12 and ν2 = 9:107

1 2 3 4 5 6

0.1

0.2

0.3

0.4

0.5

0.6

0.7

95%

It turns out that the F-Distribution for 12 and 9 degrees of freedom, at 3.07 is .950; the survival function is 5%. That is why for the 5% significance level, for ν1 = 12 and ν2 = 9, 3.07 appears in the F-Table attached to the exam. At 5.11 the survival function is 1%, and 5.11 appears in the F-Table for the 1% significance level, for ν1 = 12 and ν2 = 9.105 In terms of an Incomplete Beta Function, F(x) = β[2/2, 4/2; 2x/(4 + 2x)]. 106 In the F-Statistic, if the numerator is unusually large and the denominator is unusually small, then their ratio can be very big.107 In terms of an Incomplete Beta Function, F(x) = β[12/2, 9/2; 12x/(9 + 12x)].


Summary of the F Distribution:* 108

Support: 0 < x < ∞∞∞∞ Parameters: νννν1111 = positive integer, νννν2222 = positive integer. D. f. : F(x) = β[ν1/2, ν2/2; ν1x / (ν2 + ν1x)] = 1 - β[ν2/2, ν1/2; ν2/(ν2 + ν1x)].

Fν1,ν

2(x) = 1 - Fν

2,ν

1(1/x).

P. d. f. : f(x) = ν1ν1/2 ν2

ν2/2xν1/2 -1 /(ν2 + ν1x)(ν1 +ν2)/2 β[ν1/2, ν2/2]

where β[ν1/2, ν2/2] = Γ(ν1/2)Γ(ν2/2)/Γ((ν1+ν2)/2).

Moments: E[Xn] = (ν2/ν1)nΓ(ν1/2 + n)Γ(ν2/2 - n)/Γ(ν1/2) Γ(ν2/2), ν2 > 2n.

Mean = ν2/(ν2 - 2), ν2>2. Variance = 2ν22(ν1 + ν2 - 2)/ν1(ν2 - 2)2(ν2 - 4), ν2>4.

Skewness = 21.5(ν2 - 4).5(2ν1 + ν2 - 2)/ν1.5(ν2 - 6)(ν1 + ν2 - 2).5, ν2>6.

Kurtosis = 3 + 12(ν2 - 4)(ν2 - 2)2 + ν1(ν1 + ν2 -2)(5ν2 + 22)/ν1(ν2 - 6)(ν2 - 8)(ν1 + ν2 -2), ν2>8

Mode = (ν2/ν1)(ν1 - 2)/(ν2 + 2), ν1 > 2; Mode = 0 for ν1 ≤ 2.

The F-Distribution is a heavy tailed distribution on 0 to ∞, with a righthand tail somewhat similar to a Pareto Distribution with α = ν2/2.

Exercise: What is the density of an F-Distribution with ν1 = 4 and ν2 = 6?

[Solution: β[ν1/2, ν2/2] = β[2, 3] =Γ(2)Γ(3)/Γ(5) = (1!)(2!)/(4!) = (1)(2)/24 = 1/12.

f(x) = 42 63 x /(6 + 4x)5 β[2, 3] = 1296x/(3 + 2x)5, x > 0.]

Relation to the t-distribution:

The F-Distribution for ν1 = 1 is related to the t-distribution.

Prob[F-Distribution with 1 and νννν degrees of freedom > c2] = Prob[absolute value of t-distribution with νννν degrees of freedom > c].

For example Prob[F-Distribution with 1 and 4 degrees of freedom > 2.7762 = 7.71] = 5% =Prob[absolute value of t-distribution with 4 degrees of freedom > 2.776].The critical values for 5% in the column of the F-table for ν1 = 1 are the squares of the critical values for the two-sided t-test for 5%. Similarly, the critical values for 1% in the column of the F-table for ν1 = 1 are the squares of the critical values for the two-sided t-test for 1%. Prob[F-Distribution with 1 and 4 degrees of freedom > 4.6042 = 21.20] = 1% =Prob[absolute value of t-distribution with 4 degrees of freedom > 4.604].

108 Also called the F Ratio Distribution or the Variance-Ratio Distribution. The F comes from the last name of the statistician R. A. Fisher, who devised the variance ratio test.


Relation to the Pareto distribution:*

The F-Distribution for ν1 = 2 is a Pareto Distribution with α = ν2/2 and θ = ν2/2.

For example, for ν1 = 2 and ν2 = 1, the F-Distribution has survival function:S(x) = (1/2)/(1/2 + x)1/2 = 1/√(1+2x).

Exercise: For ν1 = 2 and ν2 = 1, determine the critical values for 5%, 2.5%, and 1%.[Solution: .05 = 1/√(1+2x). ⇒ x = 199.5. .025 = 1/√(1+2x). ⇒ x = 799.5. .01 = 1/√(1+2x). ⇒ x = 4999.5.Comment: These match the critical values shown in the F-Table.]

Exercise: For ν1 = 2 and ν2 = 2, determine the critical values for 5%, 2.5%, and 1%.

[Solution: For a Pareto distribution with α = 1 and θ = 1, S(x) = 1/(1 + x). .05 = 1/(1 + x). ⇒ x = 19. .025 = 1/(1+ x). ⇒ x = 39. .01 = 1/(1+ x). ⇒ x = 99.Comment: These match the critical values shown in the F-Table.]

More generally, an F-Distribution with ν1 and ν2 degrees of freedom is a Generalized Pareto

Distribution, as per Loss Models, with τ = ν1/2, α = ν2/2, θ = ν2/ν1.

Therefore, the F-Distribution for ν2 = 2 is an Inverse Pareto Distribution with τ = ν1/2 and

θ = 2/ν1. S(x) = 1 - ν1x/(ν1x + ν2)ν1/ 2 .

Exercise: For ν1 = 4 and ν2 = 2, determine the critical values for 5%, 2.5%, and 1%.

[Solution: For an Inverse Pareto distribution with τ = 2 and θ = .5, S(x) = 1 - (x/(x + .5))2..95 = (x/(x + .5))2. ⇒ x = 19.25. .975 = (x/(x + .5))2. ⇒ x = 39.25. .99 = (x/(x + .5))2. ⇒ x = 99.25. Comment: These match the critical values shown in the F-Table.]

Limits:*109

As ν1 → ∞, the survival function of the F-Distribution at y approaches the distribution function of

a Chi-Square Distribution with ν2 degrees of freedom at ν2/y.

For example for ν2 = 7, as ν1 → ∞, the critical value at 5% of the F-Distribution is 3.23;

for ν2 = 7, as ν1 → ∞, the survival function of the F-Distribution at 3.23 is 5%.For a Chi-Square Distribution with 7 degrees of freedom, the distribution function at 7/3.23 = 2.17 is 5%, as shown in the Chi-Square Table.

109 See for example, Handbook of Mathematical Functions, edited by Abramowitz and Stegun.


As ν2 → ∞, the survival function of the F-Distribution at y approaches the survival function of a

Chi-Square Distribution with ν1 degrees of freedom at yν1.

For example, for a Chi-Square Distribution with 20 degrees of freedom, the survival function at

31.41 is 5%, as shown in the Chi-Square Table. Therefore, for ν1 = 20, as ν2 → ∞, the critical value at 5% of the F-Distribution approaches 31.41/20 = 1.571.

Using a computer for large values of ν2, the 5% critical values for ν1 = 20 are:

ν2 10 20 100 1000 10,000 100,0005% critical value: 2.77 2.12 1.676 1.581 1.572 1.571

F-Table:

In order to enter the Table of the F-Distribution, one needs to know νννν1111 and νννν2222, where νννν1111 is

number of degrees of freedom associated with the numerator and νννν2222 is number

of degrees of freedom associated with the denominator. The columns correspond to ν1 while the rows correspond to ν2.

ν1 ⇔ number of degrees of freedom associated with the numerator ⇔ columns of table.

ν2 ⇔ number of degrees of freedom associated with the denominator ⇔ rows of table.

Listed in the table are the critical values for 5% and 1%.110

For example, for ν1 = 5 and ν2 = 3, the critical value for 5% is 9.01. In other words, the

distribution function at 9.01, F5,3(9.01) = .95 = 1 - .05. For ν1 = 5 and ν2 = 3, the critical value for 1% is 28.24. In other words, the distribution function at 28.24, F5,3(28.24) = .99 = 1 - .01.

For ν1 = 5 and ν2 = 3, the entries in the F-table look as follows:9.01 2 8 . 2 4

110 While the tables attached to your exam contain only these two critical values, another table might contain additional critical values. Exact p-values can be calculated via computer.


Percentage Points of the F-Distribution

Italic type ⇔ critical value for 5%. Bold face type ⇔ critical value for 1%.ν1

ν2 1 2 3 4 5 6 7 8 9 10 11 12

1 161 200 216 225 230 234 237 239 241 242 243 244 4052 4999 5403 5625 5764 5859 5928 5981 6022 6056 6082 6 1 0 6

2 18.51 19.00 19.16 19.25 19.30 19.33 19.36 19.37 19.38 19.39 19.40 19.4198.49 99.00 99.17 99.25 99.30 99.33 99.36 99.37 99.39 99.40 99.41 99.42

3 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 8.81 8.78 8.76 8.7434.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 27.34 27.23 27.13 27.05

4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.93 5.912 1 . 2 0 1 8 . 0 0 1 6 . 6 9 1 5 . 9 8 1 5 . 5 2 1 5 . 2 1 1 4 . 9 8 1 4 . 8 0 1 4 . 6 6 1 4 . 5 4 1 4 . 4 5 1 4 . 3 7

5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.78 4.74 4.70 4.681 6 . 2 6 1 3 . 2 7 1 2 . 0 6 1 1 . 3 9 1 0 . 9 7 1 0 . 6 7 1 0 . 4 5 1 0 . 2 9 1 0 . 1 5 1 0 . 0 5 9 . 9 6 9 . 8 9

6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 4.03 4.001 3 . 7 4 1 0 . 9 2 9 . 7 8 9 . 1 5 8 . 7 5 8 . 4 7 8 . 2 6 8 . 1 0 7 . 9 8 7 . 8 7 7 . 7 9 7 . 7 2

7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.63 3.60 3.571 2 . 2 5 9 . 5 5 8 . 4 5 7 . 8 5 7 . 4 6 7 . 1 9 7 . 0 0 6 . 8 4 6 . 7 1 6 . 6 2 6 . 5 4 6 . 4 7

8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.34 3.31 3.281 1 . 2 6 8 . 6 5 7 . 5 9 7 . 1 0 6 . 6 3 6 . 3 7 6 . 1 9 6 . 0 3 5 . 9 1 5 . 8 2 5 . 7 4 5 . 6 7

9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.13 3.10 3.071 0 . 5 6 8 . 0 2 6 . 9 9 6 . 4 2 6 . 0 6 5 . 8 0 5 . 6 2 5 . 4 7 5 . 3 5 5 . 2 6 5 . 1 8 5 . 1 1

1 0 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.97 2.94 2.911 0 . 0 4 7 . 5 6 6 . 5 5 5 . 9 9 5 . 6 4 5 . 3 9 5 . 2 1 5 . 0 6 4 . 9 5 4 . 8 5 4 . 7 8 4 . 7 1

1 1 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.86 2.82 2.799 . 6 5 7 . 2 0 6 . 2 2 5 . 6 7 5 . 3 2 5 . 0 7 4 . 8 8 4 . 7 4 4 . 6 3 4 . 5 4 4 . 4 6 4 . 4 0

1 2 4.75 3.88 3.49 3.26 3.11 3.00 2.92 2.85 2.80 2.76 2.72 2.699 . 3 3 6 . 9 3 5 . 9 5 5 . 4 1 5 . 0 6 4 . 8 2 4 . 6 5 4 . 5 0 4 . 3 9 4 . 3 0 4 . 2 2 4 . 1 6

1 3 4.67 3.80 3.41 3.18 3.02 2.92 2.84 2.77 2.72 2.67 2.63 2.609 . 0 7 6 . 7 0 5 . 7 4 5 . 2 0 4 . 8 6 4 . 6 2 4 . 4 4 4 . 3 0 4 . 1 9 4 . 1 0 4 . 0 2 3 . 9 6

Note: This is only 1/4 of the table attached to the exam, which also contains critical values for larger numbers of degrees of freedom than shown here.


Hypothesis Testing:

The null hypothesis, H0, is that the variances of the variables associated with the numerator and denominator of an F-Test are equal. One can apply the F-Test as either a one or a two sided test. If the alternate hypothesis, H1, is that the variables have different variances, then one applies a two sided test. If instead the alternate hypothesis, H1, is that the variable or sum of squares in the numerator has the larger variance, then one applies a one sided test. This is the generally the case for the tests applied to regression models, and therefore one applies a one-sided F-test.

Assume for a particular one-sided hypothesis test111, the F-Statistic with ν1 = 5 and ν2 = 3 is: 15.4. The critical values shown in the F-Table for 5% and 1% are 9.01 and 28.24. Then since 9.01 < 15.4 < 28.24, we can reject the hypothesis at 5% and do not reject at 1%. The p-value is somewhere in between 1% and 5%.112 A table of critical values for ν1 = 5 and ν2 = 3:113

p-value = α: 5% 1%critical value = c: 9.01 28.24

Then one rejects to the left and does not reject to the right.If instead the F-statistic were 7.3, since 7.3 < 9.01, one would not reject the null hypothesis at 5%. If instead the F-statistic were 31.7, since 31.7 > 28.24, one would reject the null hypothesis at 1%.

If the null hypothesis is true, then we are unlikely to observe a very large F-Statistic. However, there is always some positive probability due to random fluctuation producing “unusual” observations. For example, with ν1 = 5 and ν2 = 3, there is only a 1% of observing an F-Statistic of 28.24 or more, if the null hypothesis is true.

Large F-Statistic ⇒ reject the null hypothesis.

Small F-Statistic ⇒ do not reject the null hypothesis.

With ν1 = 5 and ν2 = 3, if the F-Statistic is less than 9.01, do not reject at 5%. If the F-Statistic is between 9.01 and 28.24, then reject at 5% and do not reject at 1%. If the F-Statistic is greater than 28.24, then reject at 1%.

111 There are a number of different applications of the F-test, as discussed in subsequent sections. However, once one has computed the correct F-Statistic and determined the corresponding degrees of freedom ν1 and ν2, then a one-sided test proceeds in the same manner.112 Using a computer, the p-value corresponding to 15.4 is 2.4%.113 Put in a similar in format to a row of the Chi-Square Table.


Additional Critical Values:*

If one wanted to, using a computer one could construct the following table of critical values for ν1 = 5 and ν2 = 3:

p-value = α: 5% 2.5% 1% 0.5%critical value = c: 9.01 14.88 28.24 45.39

Then one rejects to the left and does not reject to the right. The table attached to your exam only contains the critical values for 5% and 1%, so one can not be as precise in conducting an F-test, as one could be with the above table.


Problems:

14.1 (1 point) The F-Statistic is 30.03. ν1 = 7 and ν2 = 3.

What is the p-value of a one-sided F-test?

14 .2 (1 point) F-Statistic = 2.72 for 10 degrees of freedom in the numerator and 14 degrees of freedom in the denominator. For a one-sided F-Test, at what level, if any, do you reject the null hypothesis?

14.3 (1 point) For ν1 = 6, the F Distribution is .95 at 3. What is v2?

14.4 (1 point) Use the F-Table in order to determine the critical value for the 1% significance level (two sided t-test) for the t-statistic with 10 degrees of freedom.

14.5 (1 point) What is the 99th percentile of the F-Distribution with 6 and 4 degrees of freedom?

14.6 (1 point) What is the 5th percentile of the F-Distribution with 8 and 3 degrees of freedom?

14.7 (2 points) Let Z1, Z2, Z3, Z4, and Z5 be independent Standard Normal Distributions, each with mean zero and standard deviation one.Determine Prob[Z12 + Z22 ≥ 20.55(Z32 + Z42 + Z52)].

14 .8 (1 point) What is the 95th percentile of the F-Distribution with 7 and 11 degrees of freedom?

14.9 (3 points) You have a sample of size 4 from a Normal Distribution: 5, 7, 2, 3.You have a sample of size 3 from another Normal Distribution: 4, 1, 15. Test the null hypothesis H0: the two distributions have the same variance, versus the alternative H1: the variance of the second distribution is greater than the variance of the first distribution.

14.10 (2 points) Let X1, X2, . . . , X12 be a random sample obtained from a normal distribution

with unknown mean µX and unknown variance σX2 > 0.Let Y1, Y2, . . . , Y15 be a random sample obtained independently from a normal distribution

with unknown mean µY and unknown variance σY2 > 0.

The statistic W = Σ(Xi - X)2/Σ(Yi - Y )2 is to be used to test the null hypothesis H0: σX2 = σY2

versus the alternative hypothesis H1: σX2 > σY2. If H0 is rejected when W > C, and the significance level of the test is .01, then C must equal: A. 2.4 B. 2.6 C. 2.8 D. 3.0 E. 3.2


14.11 (3 points) For a sample of size 25 from a Normal Distribution: Σ Xi = 255 and Σ Xi2 = 2867.

For a sample of size 20 from a Normal Distribution: Σ Yi = 212 and Σ Yi2 = 2368.Test the null hypothesis H0: the two distributions have the same variance, versus the alternative H1: the variance of the first distribution is greater than the variance of the second distribution. Which of the following is true?A. The F statistic has 19 and 24 degrees of freedom and H0 is rejected at the .05 level, but not

rejected at the .01 level. B. The F statistic has 24 and 19 degrees of freedom and H0 is rejected at the .05 level, but not

rejected at the .01 level. C. The F statistic has 19 and 24 degrees of freedom and H0 is rejected at the .01 level. D. The F statistic has 24 and 19 degrees of freedom and H0 is rejected at the .01 level. E. None of A, B, C, or D

14.12 (2 points) W, X, and Y are three independent samples, each from Normal Distributions. Each sample is of size 13. The sample variance of W is 598. The sample variance of X is 1787. The sample variance of Y is 3560. You test the hypothesis σW2 = σX2 versus σW2 < σX2.

You also test the hypothesis σX2 = σY2 versus σX2 < σY2.Which of the following is true?A. Do not reject σW2 = σX2 at 5%, and do not reject σX2 = σY2 at 5%.

B. Reject σW2 = σX2 at 5% but not at 1%, and do not reject σX2 = σY2 at 5%.

C. Reject σW2 = σX2 at 1%, and reject σX2 = σY2 at 5% but not at 1%.

D. Reject σW2 = σX2 at 1%, and reject σX2 = σY2 at 1%. E. None of A, B, C, or D


with unknown mean µX and unknown variance σX2 > 0. The sample variance of X is 722. Let Y1, Y2, . . . , Y200 be a random sample obtained independently from a normal distribution

with unknown mean µY and unknown variance σY2 > 0. The sample variance of Y is 1083.

Test the null hypothesis H0: σX2 = σY2 versus the alternative hypothesis H1: σY2 > σX2.

For the F-Distribution with ν1 and ν2 degrees of freedom, F(x) = β[ν1/2, ν2/2; ν1x / (ν2 + ν1x)].Determine the p-value of this test. A. β[50, 100; .429]B. 1 - β[50, 100; .429]C. β[100, 50; .750]D. 1 - β[100, 50; .750]E. None of A, B, C, or D



with unknown mean µX and unknown variance σX2 > 0. The sample variance of X is 189. Let Y1, Y2, . . . , Y7 be a random sample obtained independently from a normal distribution with

unknown mean µY and unknown variance σY2 > 0. The sample variance of Y is 37.

Test the null hypothesis H0: σX2 = b σY2 versus the alternative hypothesis H1: σX2 > b σY2. At the 5% significance level, what is the largest value of b, such that one rejects H0?A. 1.25 B. 1.30 C. 1.35 D. 1.40 E. 1.45

14.15 (3 points) You have two independent samples from LogNormal Distributions.The first sample is: 1000, 1500, 3000, 25,000, and 500,000.The second sample is: 500, 1000, and 2000.You test the hypothesis that the two LogNormal Distributions have the same σ parameter, versus the alternate that they do not. Which of the following is true?A. H0 is rejected at the 1% significance level.B. H0 is rejected at the 2% significance levelC. H0 is rejected at the 5% significance levelD. H0 is rejected at the 10% significance levelE. H0 is not rejected at the 10% significance level.

14.16 (2, 5/83, Q. 29) (1.5 points) Let X1, X2, . . . , X10 be a random sample obtained from a

normal distribution with unknown mean µX and unknown variance σX2 > 0.Let Y1, Y2, . . . , Y6 be a random sample obtained independently from a normal distribution with

known mean µY = 0 and unknown variance σY2 > 0.

The statistic W = Σ(Xi - X)2/ΣYi2 is to be used to test the null hypothesis H0: σX2 = σY2 versus

the alternative hypothesis H1: σX2 > σY2. If H0 is rejected when W > C, and the significance level (size) of the test is .05, then C must equal: A. 4.10 B. 6.09 C. 6.15 D. 8.28 E. 8.59

14.17 (2, 5/85, Q. 30) (1.5 points) Let X1, . . . , X6 and Y1, . . . ,Y8 be independent random samples from a normal distribution with mean 0 and variance 1.

Let Z = ( / ) /4 3 X Yi2

i=1

6i2

i=1

8∑∑ ∑∑ .

What is the 99th percentile of the distribution of Z? A. 6.37 B. 7.46 C. 8.10 D. 16.81 E. 20.09

14.18 (2, 5/88, Q. 13) (1.5 points) Let X1, . . . , X4 and Y1, . . . , Y4 be independent random samples from the same normal distribution with unknown mean and variance. For what value of k does k( X - Y )2/Σ(Xi - X)2 + Σ(Yi - Y )2 have an F-distribution? A. 3 B. 6 C. 8 D. 12 E. 16


14.19 (2, 5/90, Q. 20) (1.7 points) Let X, Y, and Z be independent normally distributed random variables with E(X) = 2, E(Y) = 1, E(Z) = 2, and common variance σ2 > 0. Let W = c4(X - 2)2/(Y - 1)2 + (Z - 2)2. For what value of c will W have an F-distribution with 1 and 2 degrees of freedom? A. 0.25 B. 0.50 C. 1 D. 2 E. 4

14.20 (2, 5/90, Q. 34) (1.7 points) Let X1, X2,. . . . , X9, be a random sample from a normal distribution with mean 0 and variance 4, and let Y1, Y2,. . . ,Y8, be an independent random sample from a normal distribution with mean 0 and variance 9.

P[ Xi2

i 1

9

==∑∑ / Yj

2

j 1

8

==∑∑ > c] = .010.

Determine the value of c. A. 2.66 B. 2.96 C. 3.42 D. 5.91 E. 6.84 Note: This former exam question has been rewritten.

14.21 (2, 5/92, Q. 30) (1.7 points) Independent random samples of size 9 and 6 are taken from two normal populations with variances σ12 > 0 and σ22 > 0, respectively. Let S12 and S22

be the unbiased sample variances. The null hypothesis H0: 2 σ12 = σ22 is to be tested against

the alternative H1: 2 σ12 > σ22 using the test statistic W = S12 / S22.What is the critical value for a test of size .05? A. 2.05 B. 2.41 C. 3.86 D. 4.82 E. 9.64

14.22 (2, 2/96, Q.14) (1.7 points) Let X1, . . . , X7 and Y1, . . . , Y14 be independent random

samples from normal distributions with common mean µ = 30 and common variance σ2 > 0. The statistic W = 2(Y - 30)2/( X - 30)2 has an F distribution with c and d degrees of freedom. Determine c and d. A. c = 1, d = 1 B. c = 6, d =13 C. c = 7, d = 14D. c = 13, d = 6 E. c = 14, d = 7

14.23 (IOA 101, 4/00, Q.4) (2.25 points) Consider the following three probability statements concerning an F variable with 6 and 12 degrees of freedom.(a) P(F6,12 > 0.250) = 0.95(b) P(F6,12 < 4.82) = 0.99(c) P(F6,12 < 0.130) = 0.01State, with reasons, whether each of these statements is true.


14.24 (IOA 101, 9/01, Q.13) (12 points) Twenty overweight executives take part in an experiment to compare the effectiveness of two exercise methods, A (isometric), and B (isotonic). They are allocated at random to the two methods, ten to isometric, ten to isotonic methods.After several weeks, the reductions in abdomen measurements are recorded incentimeters with the following results:A (isometric method) 3.1 2.1 3.3 2.7 3.4 2.7 2.7 3.0 3.0 1.6B (isotonic method) 4.5 4.1 2.7 2.2 4.7 2.2 3.6 3.0 3.3 3.4(i) (6.75) (a) Plot the data for the two exercise methods on a single diagram.

Comment on whether the response values for each exercise method are well modeled by normal random variables.

(b) Perform a test to investigate whether the assumption of equal variability for the responses for the two exercise methods is reasonable.

(c) Perform a t-test to investigate whether these data support the claim that the isotonic method is more effective than the other method.

(ii) (5.25) (a) Determine a two-sided 95% confidence interval for the difference in the means for the two exercise methods.

(b) Assuming that the two sets of 10 measurements are taken from normal populations with the same variance, determine a 95% confidence interval for the common standard deviation, leaving equal probability in each tail.


Section 15, Testing the Slope, Two Variable Model

The t-statistic can be used to test the slope of a regression.

Testing Whether the Slope is Zero:

The most common null hypothesis is that β = 0. H0: β = 0, with alternative hypothesis H1: β ≠ 0.

If β = 0, then we expect ^β to be close to zero. However, due to random fluctuations in the

values of Y, even if the actual slope is zero, the estimated slope will be somewhat different than zero.

As before, βs is the standard deviation of the estimate of β. If H0 is true, we expect ^β/ βs to be

close to zero. Let the test statistic be t = ^β/ βs . If the actual slope is zero, we are unlikely that t

will have a large absolute value. Therefore, if the absolute value of t = ^β/ βs is sufficiently large,

then we reject H0, and conclude that the slope is nonzero.

For the heights example, we had:

Parameter Table Estimate Standard Error T-Stat p-Value1 24.0679 5.98553 4.02102 0.00695079x 0.625436 0.100765 6.2069 0.000806761

t = ^β/ βs = .625436/.100765 = 6.207 = t-statistic for testing the hypothesis β = 0.

This t-statistic follows a t-distribution with N - 2 = 8 - 2 = 6 degrees of freedom.114

We compare the t-statistic to the t-table for 6 degrees of freedom.

Area in both tails (α) ν 0.1 0.05 0.02 0.01 6 1.943 2.447 3.143 3.707

Since 6.207 > 3.707 we reject H0 at the 1% significance level.

Exercise: If the t-statistic had instead been 3, what conclusion would have been drawn?[Solution: Since 2.447 < 3 < 3.143, we reject H0 at 5% and do not reject at 2%.]

114 Since we have assumed Normal errors, ^β is Normally Distributed with mean β. For a random sample from a Normal

variable, the sample mean divided by the sample standard deviation follows a t-distribution, with number of degrees of freedom equal to the sample size minus one. The analogous situation here is somewhat more complicated. We lose two degrees of freedom, since we are using the data to estimate both the slope and intercept.


Exercise: If the t-statistic had instead been -2, what conclusion would have been drawn?[Solution: Since 1.943 < 2 < 2.447, we reject H0 at 10% and do not reject at 5%.]

Since the alternative hypothesis is H1: β ≠ 0, the t-test is two sided. We reject H0 if the t-statistic is unusually large or small. Compare the absolute value of the t-statistic to the critical values in the t-table, for the appropriate number of degrees of freedom. Reject to the left and do not reject to the right.

Most Common t-test for the 2-variable model:1. H0: ββββ = 0. H1: ββββ ≠≠≠≠ 0.

2. t = ^ββ / ββs .

3. If H0 is true, then t follows a t-distribution. 4. Number of degrees of freedom = N - 2. 5. Compare the absolute value of the t-statistic to the critical values in the

t-table, for the appropriate number of degrees of freedom. 6. Reject to the left and do not reject to the right.

In general, the p-value or probability value of a statistical test is: p-value = Prob[test statistic takes on a value equal to its calculated value or a value less in

agreement with H0 (in the direction of H1) | H0].

The p-value of this test is the sum of the area in both tails. From the t-table attached to the exam, since 6.207 > 3.707 we can determine that the p-value is less than 1%. Using a computer one can determine that the p-value is .00081. In other words, for a t-distribution with 6 degrees of freedom, 2S(6.207) = .00081.

Exercise: If the t-statistic had instead been 3, what would be the p-value of the test?[Solution: Since 2.447 < 3 < 3.143, the p-value would be between 2% and 5%.]

Assuming β = 0, for this example the probability of seeing a |t| ≥ 3 is the sum of the areas in the two tails of a t-distribution with 6 degrees of freedom, Prob[t ≤ -3] + Prob[t ≥ 3]:115

-3 3115 Using a computer, the area in each tail is 1.20%. Therefore, the p-value is 2.40%.


In general, if the p-value is less than the chosen significance level, then we reject H0.In the above exercise with t = 3, we would reject H0 at a significance level of 5%, but not reject H0 at a significance level of 2%

S Notation:*

One can also put the t-statistic variance in the S.. notation discussed previously.

As discussed previously, ^β = SXY/SXX, and Var[

^β] = SYY/SXX - (SXY/SXX)2/(N-2).

Therefore = t = ^β/ βs = (SXY/SXX)√(N-2)/√SYY/SXX - (SXY/SXX)2

= SXY√(N-2)/√SXXSYY - SXY2.


t =89.75 √(8 - 2) / √(143.5)(64.875) - 89.752 = 6.2069.

More General Test:

One can also test the hypothesis that β takes on a certain nonzero value. For example in the

heights example, take H0 to be that β = 0.5. Then t = (^β - .5)/ βs = (.6254 - .5)/.10077 = 1.244.

Since 1.244 < 1.943, we do not reject this hypothesis at the 10% level.

General t-test, 2-variable model:1. H0: a particular regression parameter takes on certain value b. H1: H0 is not true.2. t = (estimated parameter - b)/standard error of the parameter.3. If H0 is true, then t follows a t-distribution. 4. Number of degrees of freedom = N - 2. 5. Compare the absolute value of the t-statistic to the critical values in the


Exercise: In the heights example, test the hypothesis that the intercept is 40.

[Solution: t = ( α - 40)/sα = (24.07 - 40)/5.986 = -2.661. Since 2.447 < 2.661 < 3.143, we reject at 5% and do not reject at 2%.]

As will be discussed in a subsequent section, one can apply the t-test to individual parameters in the multiple-variable case in a similar manner, with the number of degrees of freedom = N - k.


Relationship to Confidence Intervals:

Testing the hypothesis that ^β takes on a particular value b, is equivalent to testing whether that

value b is in the appropriate confidence interval for ^β.

Assume one has fit a two-variable regression to 30 observations, with ^β = -0.64, and

βs = 0.18. Then for 30 - 2 = 28 degrees of freedom, the critical values of the t-distribution are:

Area in both tails (α) ν 0.10 0.05 0.02 0.0128 1.701 2.048 2.467 2.763

Therefore, we can get the following confidence intervals for the slope:90% confidence interval: -.64 ± (1.701)(.18) = -.64 ± .31 = [-0.95, -0.33].95% confidence interval: -.64 ± (2.048)(.18) = -.64 ± .37 = [-1.01, -0.27].98% confidence interval: -.64 ± (2.467)(.18) = -.64 ± .44 = [-1.08, -0.20]. 99% confidence interval: -.64 ± (2.763)(.18) = -.64 ± .49 = [-1.13, -0.15].

-1.2 -1 -0.8 -0.6 -0.4 -0.2 0

90%

95%

98%

99%

Zero is not in the 99% confidence interval for β. Therefore, there is less than a 1 - 99% = 1%

probability that β has a value at least as far (on either side) from ^β as 0. Therefore, if H0 is the

hypothesis that β = 0, then we can reject H0 at 1%.

On the other hand, -.25 is in the 98% confidence interval but not in the 95% confidence interval. Therefore, for the hypothesis that β = -.25, we reject at 5% but do not reject at 2%.

In general, if b is not within the P confidence interval for β, then reject at significance level 1- P the hypothesis that β = b.


Exercise: In the heights example, construct 95% and 98% confidence intervals for the intercept.

[Solution: α = 24.07. sα = 5.986. The critical value for 5% and 6 degrees of freedom is 2.447. Therefore a 95% confidence interval for the intercept is: 24.07 ± (2.447)(5.986) = 24.07 ± 14.65 = [9.42, 38.72].Similarly, a 98% confidence interval for the intercept is: 24.07 ± (3.143)(5.986) = 24.07 ± 18.81 = [5.26, 42.88]. ]

The value 40 is not in the 95% confidence interval; therefore, we reject at 5% the hypothesis that the intercept is 40. On the other hand, the value 40 is in the 98% confidence interval; therefore, we do not reject at 2% the hypothesis that the intercept is 40. This matches the result obtained previously using the t-statistic; the two methods are equivalent.

F-Test:

As will be discussed extensively for the multiple-variable case, the F-Test can also be used to test the slopes of a regression. However, applying the F-Test to a single slope is equivalent to the t-test with t = √√√√F.

As discussed subsequently for the multiple-variable case, one form of the F-Statistic = (R2UR - R2R)/q/(1 - R2UR)/(N - k), with q and N - k degrees of freedom.

In the two-variable model for testing the hypothesis β = 0: k = 2, q = dimension of the restriction = 1,R2R = restricted R2 = percent of variation explained by using just an intercept = 0,

R2UR = unrestricted R2 = R2 of the two variable model.

For the 2-variable model, F = (N - 2)R2/(1 - R2), with 1 and N - 2 degrees of freedom.

Exercise: You fit the following model to 12 observations: Y = α + βX + ε. R2 = 0.80.Calculate the value of the F statistic used to test for a linear relationship.[Solution: F = (N - 2)R2/(1 - R2) = (12 - 2)(.8)/(1 - .8) = 40.]

Consulting the F-Table for 1 and 12 - 2 = 10 degrees of freedom, since 10.04 < 40, we reject at 1% the hypothesis that β = 0.

Alternately, t = √F = √40 = 6.325. Consulting the t-table for 10 degrees of freedom,since 3.169 < 6.325, we reject at 1% the hypothesis that β = 0.116

116 Note that 3.189, the critical value for the t-test at 1%, equals √10.04, the critical value for the F-test at 1%.See the section on the F-Distribution, for a discussion of its relationship to the t-distribution when ν1 = 1.


Exercise: At the 5% significance level, for 15 observations, for the 2-variable regression model, determine for which values of R2 you would reject the hypothesis that β = 0.[Solution: For 1 and 15 - 2 = 13 degrees of freedom, the critical value for 5% is 4.67. F = 13R2/(1 - R2). Thus we reject if 13R2/(1 - R2) > 4.67. ⇔ R2 < .264.]

As discussed subsequently for the multiple-variable case, another equivalent form of the F-Statistic = (ESSR - ESSUR)/q / ESSUR/(N - k), with q and N - k degrees of freedom.

In the two-variable model for testing the hypothesis β = 0: k = 2, q = dimension of the restriction = 1,ESSR = error sum of squares if using just an intercept = TSS,ESSUR= unrestricted ESS = ESS of the two variable model.Therefore, F-Statistic = (TSS - ESS)/1 / ESS/(N - 2) = (N - 2)RSS/ESS.

For the 2-variable model, F = (N - 2)RSS/ESS, with 1 and N - 2 degrees of freedom.

Exercise: You fit the following model to 12 observations: Y = α + βX + ε.You determine that ESS = 200 and RSS = 800.Calculate the value of the F statistic used to test for a linear relationship.[Solution: F = (N - 2)RSS/ESS = (12 - 2)(800)/(200) = 40.Comment: R2 = RSS/TSS = 800/(800 + 200) = .80, matching a previous exercise.]


Problems:

15 .1 (1 point) You fit the 2-variable linear regression model, Y = α + βX + ε,

to 20 observations. ^β = 13. βs = 7. Test the hypothesis H0: β = 0 versus H1: β ≠ 0.

A. Reject H0 at 1%.B. Do not reject H0 at 1%. Reject H0 at 2%.C. Do not reject H0 at 2%. Reject H0 at 5%.D. Do not reject H0 at 5%. Reject H0 at 10%.E. Do not reject H0 at 10%.

Use the following information for the next 3 questions:You fit the following model to 15 observations: Y = α + βX + ε.Σ(Xi - X)2 = 24.88.

Σ(Xi - X)(Yi -Y) = 1942.1.

Σ(Yi - ^

iY )2 = 282,750.

Σ(Yi -Y)2 = 434,348.

15.2 (1 point) What is ^β?

(A) 70 (B) 72 (C) 74 (D) 76 (E) 78

15.3 (2 points) Let H0 be the hypothesis that β = 0. Which of the following is true?A. Reject H0 at 1%.B. Do not reject H0 at 1%. Reject H0 at 2%.C. Do not reject H0 at 2%. Reject H0 at 5%.D. Do not reject H0 at 5%. Reject H0 at 10%.E. Do not reject H0 at 10%.

15.4 (1 point) Determine the upper end of a symmetric 95% confidence interval for β.(A) 142 (B) 143 (C) 144 (D) 145 (E) 146

15 .5 (2 points) You fit the 2-variable linear regression model, Y = α + βX + ε, to 27 observations.The total sum of squares (TSS) is 44 and the regression sum of squares (RSS) is 9. Test the hypothesis H0: β = 0 versus H1: β ≠ 0.A. Reject H0 at 1%.B. Do not reject H0 at 1%. Reject H0 at 2%.C. Do not reject H0 at 2%. Reject H0 at 5%.D. Do not reject H0 at 5%. Reject H0 at 10%.E. Do not reject H0 at 10%.


Use the following information for the next 8 questions:Year (t) Loss Ratio (Y)1 822 783 804 735 77You fit the following model: Y = α + βt + ε.

15.6 (3 points) What is the estimated Loss Ratio for year 7?(A) 71 (B) 72 (C) 73 (D) 74 (E) 75

15.7 (2 points) What is the variance of α?(A) Less than 7.5(B) At least 7.5, but less than 8.0(C) At least 8.0, but less than 8.5(D) At least 8.5, but less than 9.0(E) At least 9.0

15.8 (2 points) Determine the absolute value of the t-statistic in order to test whether β = 0.(A) Less than 1.5(B) At least 1.5, but less than 2.0(C) At least 2.0, but less than 2.5(D) At least 2.5, but less than 3.0(E) At least 3.0

15.9 (1 point) What is the p-value for the t-test of the hypothesis that β = 0?(A) Less than 1%(B) At least 1%, but less than 2%(C) At least 2%, but less than 5%(D) At least 5%, but less than 10%(E) At least 10%

15.10 (2 points) What is the covariance of α and ^β?

(A) -2.5 (B) -2.0 (C) -1.5 (D) -1.0 (E) -0.5

15.11 (1 point) What is the correlation of α and ^β?

(A) -0.9 (B) -0.8 (C) -0.7 (D) -0.6 (E) -0.5

* 15.12 (2 points) Using the delta method, estimate the standard deviation of the forecast of the expected value for year 7.(A) 3.0 (B) 3.2 (C) 3.4 (D) 3.6 (E) 3.8

* 15.13 (2 points) As t → ∞, what is the limit of the coefficient of variation of the forecast of the expected value for year t?(A) -0.8 (B) -0.7 (C) -0.6 (D) -0.5 (E) -0.4


15.14 (1 point) You fit a linear regression, Y = α + βX + ε, with X in feet.

If instead X were put in inches, with 12 inches per foot, what would be the effect on ^β and the

t-statistic to test whether β is zero?

15.15 (2 points) You fit the 2-variable linear regression model, Y = α + βX + ε, to 15 observations. R2 = .45. Test the hypothesis H0: β = 0 versus H1: β ≠ 0.A. Reject H0 at 1%.B. Do not reject H0 at 1%. Reject H0 at 2%.C. Do not reject H0 at 2%. Reject H0 at 5%.D. Do not reject H0 at 5%. Reject H0 at 10%.E. Do not reject H0 at 10%.

15.16 (1 point) You are given the following confidence intervals for the intercept parameter of a regression:90% confidence interval: [110, 150].95% confidence interval: [104, 156].98% confidence interval: [96, 164]. 99% confidence interval: [90, 170].Let H0 be the hypothesis that the intercept parameter is 100.Which of the following is true?A. Reject H0 at 1%.B. Do not reject H0 at 1%. Reject H0 at 2%.C. Do not reject H0 at 2%. Reject H0 at 5%.D. Do not reject H0 at 5%. Reject H0 at 10%.E. Do not reject H0 at 10%.

15.17 (1 point) You fit the 2-variable linear regression model, Y = α + βX + ε,

to 300 observations. ^β = -2.74. βs = 1.30. Test the hypothesis H0: β = 0 versus H1: β < 0.

A. Reject H0 at 1/2%.B. Do not reject H0 at 1/2%. Reject H0 at 1%.C. Do not reject H0 at 1%. Reject H0 at 2.5%.D. Do not reject H0 at 2.5%. Reject H0 at 5%.E. Do not reject H0 at 5%.

15.18 (1 point) You fit the following model: Y = α + βX + ε.

^β = -1.27. βs = 0.57.

Calculate the value of the F statistic used to test for a linear relationship.(A) 3.5 (B) 4.0 (C) 4.5 (D) 5.0 (E) 5.5


15.19 (1 point) You fit a linear regression, Y = α + βX + ε, with Y in euros. If instead Y were put in dollars, with 1.3 dollars per euro, what would be the effect on ^β and the t-statistic to test whether β is zero?

15.20 (2 points) You fit the following model to 15 observations: Y = α + βX + ε.

You determine that 2

R = 0.72.Calculate the value of the F statistic used to test for a linear relationship.(A) 29 (B) 31 (C) 33 (D) 35 (E) 37

15.21. You fit a linear regression to 25 observations via least squares: Y = α + βX + ε.

Let ^

iY be the fitted values.Σ(Xi - X)2 = 42.65.

Σ(Xi - X)(Yi -Y) = 302.1.

Σ(Yi - ^

iY )2 = 7502.

Let H0 be the hypothesis that β = 0. Which of the following is true?A. Reject H0 at 1%.B. Do not reject H0 at 1%. Reject H0 at 2%.C. Do not reject H0 at 2%. Reject H0 at 5%.D. Do not reject H0 at 5%. Reject H0 at 10%.E. Do not reject H0 at 10%.

15.22 (Course 120 Sample Exam #1, Q.2) (2 points) You fit the simple linear regression

model to 47 observations and determine ^Y = 1.0 + 1.2X. The total sum of squares (TSS) is 54

and the regression sum of squares (RSS) is 7. Determine the value of the t statistic for testing H0: β = 0 versus H1: β ≠ 0.(A) 0.4 (B) 1.2 (C) 2.2 (D) 2.6 (E) 6.7

15.23 (Course 120 Sample Exam #3, Q.3) (2 points) You fit a simple linear regression to seven observations. You determine: ESS = 218.680, and F = 2.088. Calculate R2. (A) 0.3 (B) 0.4 (C) 0.5 (D) 0.6 (E) 0.7

15.24 (4, 5/00, Q.1) (2.5 points) You fit the following model to 20 observations:Y = α + βX + εYou determine that R2 = 0.64.Calculate the value of the F statistic used to test for a linear relationship.(A) Less than 30(B) At least 30, but less than 33(C) At least 33, but less than 36(D) At least 36, but less than 39(E) At least 39


15.25 (IOA, 4/03, Q.12) (13.5 points) The following data give the invoiced amounts for work carried out on 12 jobs performed by a plumber in private customers’ houses. The durations of the jobs are also given.duration x (hours) 1 1 2 3 4 4 5 6 7 8 9 10amount y (£) 45 65 80 95 100 125 145 180 180 210 330 240Σ xi = 60, Σ xi2 = 402, Σ yi = 1795, Σ yi2 = 343,725, Σ xiyi = 11,570.The plumber claims to calculate his total charge for each job on the basis of a fixed charge for showing up plus an hourly rate for the time spent working on the job.(i) (3.75 points) (a) Draw a scatterplot of the data on graph paper and comment briefly on your plot.(b) The equation of the fitted regression line of y on x is y = 22.4 + 25.4x, and the coefficient of determination is R2 = 87.8% (you are not asked to verify these results). Draw the fitted line on your scatterplot. (ii) (9.75 points) (a) Calculate the fitted regression line of invoiced amount on duration of job using only the 11 pairs of values remaining after excluding the invoice for which x = 9 and y = 330.(b) Calculate the coefficient of determination of the fit in (ii)(a) above.(c) Add the second fitted line to your scatterplot, distinguishing it clearly from the first line you added (in part (i)(b) above).(d) Comment on the effect of omitting the invoice for which x = 9 and y = 330.(e) Carry out a test to establish whether or not the slope in the model fitted in (ii)(a) above is consistent with a rate of £25 per hour for work performed.


15.26 (IOA 101, 4/04, Q.14) (15.75 points) Forensic scientists use various methods for determining the likely time of death from post-mortem examination of human bodies. A recently suggested objective method uses the concentration of a compound (3-methoxytyramine or 3-MT) in a particular part of the brain. In a study of the relationship between post-mortem interval and the concentration of 3-MT, samples of the appropriate part of the brain were taken from coroners cases for which the time of death had been determined from eye-witness accounts. The intervals (x; in hours) and concentrations (y; in parts per million) for 18 individuals who were found to have died from organic heart disease are given in the followingtable. For the last two individuals (numbered 17 and 18 in the table), there was noeye-witness testimony directly available, and the time of death was established on thebasis of other evidence including knowledge of the individuals activities. Observation Interval Concentration number (x) (y)

1 5.5 3.262 6.0 2.673 6.5 2.824 7.0 2.805 8.0 3.296 12.0 2.287 12.0 2.348 14.0 2.189 15.0 1.9710 15.5 2.5611 17.5 2.0912 17.5 2.6913 20.0 2.5614 21.0 3.1715 25.5 2.1816 26.0 1.9417 48.0 1.5718 60.0 0.61

Σ x = 337, Σ x2 = 9854.5, Σ y = 42.98, Σ y2 = 109.7936, Σ xy = 672.8.In this investigation you are required to explore the relationship between concentration (regarded as the response/dependent variable) and interval (regarded as the explanatory/independent variable).(i) (3.75 points) Construct a scatterplot of the data. Comment on any interesting features of thedata and discuss briefly whether linear regression is appropriate to model the relationship between concentration of 3-MT and the interval from death.*(ii) (3.75 points) Calculate the correlation coefficient for the data, and use it to test the nullhypothesis that the population correlation coefficient is equal to zero.(iii) (3.75 points) Calculate the equation of the least-squares fitted regression line, and use it toestimate the concentrations of 3-MT:(a) after 1 day and (b) after 2 daysComment briefly on the reliability of these estimates. (iv) (4.5 points) Calculate a 99% confidence interval for the slope of the regression line. Usingthis confidence interval, test the hypothesis that the slope of the regression line is equal to zero. Comment on your answer in relation to the answer given in part (ii) above.


Section 16, Hypothesis Testing117

You should know how to apply hypothesis testing to the coefficients of regression models, using the t-distribution or F-Distribution.118 It is also a good idea to know some of the general terminology.

Testing a Slope, an Example:

The previously discussed application of the t-statistic to test a slope of a regression is an example of Hypothesis Testing. For the example involving the heights of fathers and sons, as

discussed previously, t = ^β/ βs = .625436/.100765 = 6.207.

The steps of hypothesis testing are:

1. Choose a level of significance For example, level of significance = 1%.

2. Formulate the statistical model The Normal Linear Regression Model holds.

3. Specify the null hypothesis H0 H0: β = 0.

and the alternative hypothesis H1. H1: β ≠ 0.

4. Select a test statistic whose The t-statistic computed above follows a t-distribution behavior is known. 6 degrees of freedom.119

5. Find the appropriate critical region. The critical region or rejection region is |t| ≥ 3.707.120

6. Compute the test statistic on the The test statistic is t = 6.207. assumption that H0 is true.

7. Draw conclusions. If the test statistic The test statistic is in the critical region, lies in the critical region, then reject the since 6.207 ≥ 3.707, so reject H0 at 1%.null hypothesis.

117 This material is on the syllabus of CAS Exam 3 and Joint Exam 4/C. See Section 2.5 of Pindyck and Rubinfeld. See also Probability and Statistical Inference by Hogg and Tanis, Introduction to Mathematical Statistics by Hogg, McKean and Craig, or Section 9.4 of Loss Models. 118 As will be discussed subsequently, in a multiple regression model, one can use the F-Distribution in order to test the null hypothesis that all of the slopes are zero119 N - k = 8 - 2 = 6 degrees of freedom.120 Consulting the t table for 6 d.f. and a total of 1% area in both tails.


Null Hypothesis:

In general, in hypothesis testing one tests the null hypothesis H0 versus an alternative hypothesis H1. It is important which hypothesis is H0 and which is H1.121

In the example above, the null hypothesis was that β = 0. A large absolute value of the t-statistic means it is unlikely H0 is true and therefore we would reject H0.122

Note that hypothesis tests are set up to disprove something, H0, rather than prove

something. In the above example, the test is set up to disprove that β = 0.

For example, a dry sidewalk is evidence it did not rain. On the other hand a wet sidewalk might be caused by rain or something else such as a sprinkler system. A wet sidewalk can not prove that it rained, but a dry sidewalk is evidence that it did not rain.

Similarly, a large absolute value of the t-statistic is evidence that the data was not drawn from the given distribution, and may lead one to reject the null hypothesis. On the other hand, small absolute values of the t-statistic result in one not rejecting the null hypothesis; a small absolute value of the t-statistic does not prove the null hypothesis is true. If for example β = 0.1, a small absolute value of t-statistic may result due to random fluctuations, particularly if we have a only small number of observations.123

We do not reject H0 unless there is sufficient evidence to do so. This is similar to the legal concept of innocent (not guilty) until proven guilty. A trial does not prove one innocent.

Technically, one should not use the term “accept H0”. Nevertheless, it is common for actuaries, including perhaps some members of the exam committee, to use the terms “do not reject H0” and “accept H0” synonymously. For many actuaries in common usage: do not reject ⇔ accept.

Test Statistic:

A hypothesis test needs a test statistic whose distribution is known. In the above example, the test statistic was the t-statistic, and one consults the t-table. In other statistical tests, one would use the F-Table, Normal Table, Chi-Square Table, etc.

Critical Values:

The critical values are the values used to decide whether to reject H0. For example, in the above test, the critical value (for 1% and 6 degrees of freedom) was 3.707. We reject H0 if |t| ≥ 3.707.

121 If the universe of possibility is divided in a manner that includes a boundary, the null hypothesis must include the boundary. 122 There are many other hypothesis tests, such as the t-test of means from Normal Distributions, the Chi-Square Goodness of Fit Test, the Likelihood Ratio Test, the Kolmogorov-Smirnov Test, etc.123 See the next section for a simulation experiment, illustrating this point.


The critical value(s) form the boundary (other than ±∞) of the rejection or critical region.

critical region ⇔⇔⇔⇔ if test statistic is in this region then we reject H0.

Significance Level:

The significance level, α, of the test is a probability level selected prior to performing the test. In the above example, 1% was selected. Using the t table attached to the exam, one can perform tests at significance levels of 10%, 5%, 2%, and 1%. For example, a significance level of 5% uses the column listed as a total of 5% area in both tails.

If Prob[test statistic will take on a value at least as unusual as the computed value | H0 is true] is less than the significance level chosen, then we reject the H0. If not, we do not reject H0.

The result of any hypothesis test depends on the significance level chosen. Therefore, in practical applications the choice of the significance level is usually important.

Exercise: A linear regression has been fit to 12 observations. We test H0: β = 0. t = 2.6.What conclusions do you draw at different significance levels?[Solution: There are 12 - 2 = 10 degrees of freedom. The critical values for 10%, 5%, 2%, and 1%, shown in the t table for 10 degrees of freedom are: 1.812, 2.228, 2.764, and 3.169. Since 2.6 > 1.812, reject H0 at 10%. Since 2.6 > 2.228, reject H0 at 5%. Since 2.6 < 2.764, do not reject H0 at 2%. Since 2.6 < 3.169, do not reject H0 at 1%.]

The results of this exercise would usually be reported as: reject H at 5%, do not reject at 2%. Since we reject at 5%, we also automatically reject at 10%. Since we do not reject at 2%, we also automatically do not reject at 1%.

Types of Errors:*

There are two important types of errors that can result when performing hypothesis testing:124

Type I Error Reject H0 when it is true.Type II Error Do not reject H0 when it is false.

Exercise: A linear regression has been fit to 15 observations.

We are testing H0: β = 0, by computing the t-statistic, t = ^β/ βs .

We will reject H0 when |t| ≥ 2.650.If we reject, what is the probability of making a Type I error?

124 We are assuming you set everything up correctly. These errors are due to the random fluctuations present in all data sets and the incomplete knowledge of the underlying risk process which led one to perform a hypothesis test in the first place.


[Solution: If H0 is true, then this t-statistic follows a t-distribution with 15 - 2 = 13 degrees of freedom. Consulting the t-table, if H0 is true, then there is a 2% chance that |t| ≥ 2.650, due to random fluctuation in the limited sample represented by the observed data.In other words, the significance level of this test is 2%.We reject when |t| ≥ 2.650, for example, if t = -2.9. Prob[|t| ≥ 2.9] < 2%. The probability of making a Type I error is 2% or less.]

In general, rejecting H0 at a significance level of α, means the probability of a Type I error is at

most α.

p-value:

The p-value = Prob[test statistic takes on a value less in agreement with H0 than its calculated value].

If the p-value is less than the chosen significance level, then we reject H0.

Exercise: A linear regression has been fit to 7 observations.

We are testing H0: β = 0, by computing the t-statistic, t = ^β/ βs .

If t = -3.6, What is the p-value?[Solution: There are 7 - 2 = 5 degrees of freedom. Since the critical value for 2% is 3.365 and the critical value for 1% is 4.032, and 3.365 < 3.6 < 4.032, the p-value is between 1% and 2%. Reject H0 at 2%, do not reject at 1%.Comment: Using a computer, the p-value is 1.55%.]

Power of a Test:*

The power of a test is the probability of rejecting the null hypothesis, when H1 is true.

Prob[Type II error] = 1 - Power of the test = probability of failing to reject H0 when it is false.Thus, everything else equal, large power of a test is good.

Decision H0 true H0 False

Reject H0 Type I Error ⇔ p-value Correct ⇔ Power

Do not reject H0 Correct ⇔ 1 - p-value Type II Error ⇔ 1 - Power

In general, there is a trade-off between Type I and Type II errors. Making the probability of one type of error smaller, usually makes the probability of the other type of error larger.

The larger the data set, the easier it is to reject H0 when it is false. The larger the data set, the more powerful a given test, all else being equal. A Ratemaking Example:*


A Ratemaking Example:*

The probabilities of the these two types of errors are important in some work by actuaries.

For example, using the information from credit reports, an insurer can calculate a “credit score” for an individual.125 Let us assume that Allen the actuary fit a regression model to data for the insureds for his insurer and found that for a personal line of insurance, such as automobile or homeowners insurance, a higher (better) credit score was associated with lower expected total insurance claim payments.126 127 The actuary computes the p-value for a test of whether the slope associated with the credit score is zero, versus negative.

How small should this p-value be, before the insurer uses credit scores for pricing?There is no single right answer.

If Allen were the first actuary to do such a test, then he probably would want a small p-value, such as for example 10%, before recommending the introduction of a new rating variable such as credit scores. The slope would also have had to be large in absolute value, in other words credit scores would have to have a large effect on expected insurance costs, before the insurer would bother using credit scores as a rating variable.128

In some states, the use of credit scores to price insurance might be controversial. For the first insurer to propose the use of credit scores for pricing, an insurance regulator in such a state might require a very small p-value such as 1%, before approving the use of credit scores.129 If the p-value were 10%, then even if a lower credit score is not associated with a higher expected cost, after taking into account the other rating variables, there is a 10% chance that one would see a statistic at least as large as that gotten by Allen.

If many other actuaries and many other insurers had gotten similar results, then the p-value for Allen’s test may be irrelevant. More simply, from an underwriting standpoint, if all of your competitors are using credit scores in pricing, then the p-value for Allen’s test is irrelevant.

125 Among the many items from credit reports that may be used to calculate a credit score for an individual are: late payments, bad debts, and financial leverage. See “A View Inside the Black Box: A Review and Analysis of Personal Lines Insurance Credit Scoring Models Filed in the State of Virginia,” by Cheng-sheng Peter Wu and John R. Lucker, Winter 2004 CAS Forum.126 Other variables were included in the model that also affect expected total claim payments. 127 A Generalized Linear Model, to be discussed in a subsequent section, might have been used instead of a linear regression model. 128 There are many criteria for the use of a rating variable. See ““Risk Classification,” by Robert J. Finger, Foundations of Casualty Actuarial Science.129 Some insurance regulator are not swayed by facts, but some are. Some regulators would actually have their staffs carefully review the results of an actuarial study and the result of that review would affect the regulator’s decision.


Another Ratemaking Example:*

Workers Compensation has different classes, which are charged different amounts for insurance. The most important part of ratemaking is to estimate the expected pure premium, in other words the expected dollars of loss per exposure insured.130

Let us assume, that the pure premium for Wire Goods Manufacturing indicated by the most recent 5 years of data is 115% of the average for all Manufacturing classes.

Let H0: the expected pure premium for the Wire Goods Manufacturing class is the same as that for the average of all Manufacturing classes.

If one charges the Wire Goods Manufacturing class more than the average rate for all Manufacturing Classes, when the expected cost for this class is not higher than average, then one is making a Type I error.

One might be able to perform some sort of a simulation experiment in order to estimate the p-value.131 Let us assume this estimated p-value is 25%.

Most statisticians would not reject H0, when the p-value is as large as 25%. However, an actuary is not just worried about the probability of a Type I error, in this case 25%, he is also worried about the probability of a Type II error.

If one charges the Wire Goods Manufacturing class the average rate for all Manufacturing classes, when the expected cost for this class is higher than average, then one is making a Type II error.

Let H1: the expected pure premium for the Wire Goods Manufacturing class is 115% of the average pure premium of all Manufacturing classes.

One might estimate the probability of a Type II error via a simulation experiment. The probability of making a Type II error might also be 25%.

It seems that one will have a 25% probability of making an error, no matter of which of two choices you make. However, actuaries rather than use either the average pure premium or 115% of the average pure premium for all Manufacturing classes, would use a value somewhere in between, via the use of Credibility.132

Also, the magnitude of the difference in indicated pure premium and therefore indicated price is very important. An actuary would have been unconcerned by the practical implications of an indicated pure premium only 1% higher than average, but would have been very concerned if the indicated pure premium had been either twice or half of average.

130 Regression is not used to estimate these pure premiums. 131 An example of a simulation experiment is in a subsequent section.132 See for example, “Credibility” by Howard C. Mahler and Curtis Gary Dean in Foundations of Casualty Actuarial Science, and “Workers Compensation Classification Credibilities,” by Howard C. Mahler in the Fall 1999 CAS Forum.


Problems:

16.1 (1 point) Which of the following statements about hypothesis testing is false?A. The p-value is the probability given H0 is true, that the test statistic takes on a value equal

to its calculated value or a value less in agreement with H0 (in the direction of H1).

B. When testing whether β = 0, if the t-statistic is 3, for 7 degrees of freedom the p-value is less than 2%.

C. If the p-value is less than the chosen significance level, then we reject H0.D. The p-value is the chance of a Type II error.E. None of the above statements is false. Use the following information for the next three questions:One has fit a regression model with 2 variables (1 independent variable plus the intercept).One is testing the hypothesis H0: β = 0, versus the alternative hypothesis H1: β ≠ 0.

16.2 (1 point) With 15 observations, what is the critical region for a test at a 5% significance level?


16.4 (1 point) Compare the probability of a Type II error for the tests in the two previous questions, all else being equal.

16.5 (3 points) Captain James T. Kirk, interstellar explorer, has discovered the new planet of Slubovia. Captain Kirk believes that the heights of adult males of humanoid species are Normally Distributed with an average of 175 centimeters. Captain Kirk beams down to the planet, to visit the capital city and talk to the native Slubs. The first 5 Slubs Kirk observes have heights of: 150, 153, 160, 171, and 176 centimeters.ΣXi = 810, ΣXi2 = 131,726.

Perform a statistical test of H0: µ = 175 versus H1: µ ≠ 175.Science officer Spock points out to Kirk a number of possible problems with this test.Briefly discuss some of them.

16.6 (4, 5/87, Q.50) (1 point) Which of the following are true regarding hypothesis tests?1. The test statistic has a probability of α of falling in the critical region when H0 is true,

where α is the level of significance.2. One should reject the H0 when the test statistic falls outside of the critical region.3. The fact that the test criteria is not significant proves that the null hypothesis is true.A. 1 B. 2 C. 3 D. 1, 2 E. 1, 3


16.7 (CAS3, 5/05, Q.24) (2.5 points) Which of the following statements about hypothesis testing are true? 1. A Type I error occurs if H0 is rejected when it is true. 2. A Type II error occurs if H0 is rejected when it is true. 3. Type I errors are always worse than Type II errors. A. 1 only B. 2 only C. 3 only D. 1 and 3 only E. 2 and 3 only


Section 17, A Simulation Experiment:*

By reviewing a simulation of the situation in which we test the slope of a linear regression, some people will get a better understanding of the hypothesis testing concepts discussed previously.

Assume the model, Y = 50 + βX + εi, where εi are independent Normals with µ = 0 and σ = 3.133

Take for example, X = (0, 1, 5, 10, 25, 100).

We will use the t-statistic to test H0: β = 0.

When the Null Hypothesis is True:

Take β = 0. Simulate a random set of εi and corresponding Yi.

134

Fit a linear regression with intercept, and record t = ^β/ βs .

For example, let the first simulated set of errors be: (1.22272, -0.936812, -2.95579, 1.23242, 1.3919, 5.60513).Then the corresponding set of Yi is:(51.2227, 49.0632, 47.0442, 51.2324, 51.3919, 55.6051).

Exercise: What is the fitted regression, and t = ^β/ βs ?

[Solution: α = 49.4691, ^β = 0.0620192, βs = 0.0202704,

and t = 0.0620192/0.0202704 = 3.05959.Comment: An unusually large value of t. The corresponding p-value is only 3.8%.]

We repeat this process 10,000 times, recording in each case the value of the t-statistic.135 Since the null hypothesis is true, the actual slope is zero, most of the values of t are of small absolute value. However, there are some unusual values of t.136

The ten smallest values of t are: -17.6622, -13.2387, -11.9001, -11.6023, -10.9685, -10.6137, -10.4323, -8.979, -8.9156, -8.03695.

The ten largest values of t are: 6.87149, 7.40265, 7.61979, 7.64125, 7.81643, 8.08546, 8.29908, 8.60589, 9.45159, 12.7397.

133 The values of the intercept and variance of the regression were chosen solely for illustrative purposes.134 One does not need to know the details of how to simulate a Normal Distribution. See Simulation by Ross.135 Each simulation run has different simulated errors, different Yi, different fitted regression, and different t statistic. 136 When there is such an unusual value of t, we would reject H0 even though it is true, making a so called Type I


Here is a histogram of the 10,000 values of the t-statistic:

-4. -2. 0. 2. 4.

0.1

0.2

0.3

0.4

Here is comparison to the density function of the t-distribution with 6 - 2 = 4 d. f.:

-4. -2. 0. 2. 4.

0.1

0.2

0.3

0.4

The simulated results seem to be a reasonable match to this t-distribution.


When the Null Hypothesis is Not True:

A similar set of 10,000 simulations was performed, except with β = 0.1.

Here is a histogram of the 10,000 values of the t-statistic:137

0. 2.5 5. 7.5 10. 12.5 15.

0.05

0.1

0.15

0.2

0.25

Since the actual slope is not zero, in other words the null hypothesis is false, many of the values of t have a large absolute value.

However, note that there are still some values of t near zero. For example, there are 929 cases out of 10,000 where |t| ≤ 1.5. Thus there are a significant number of simulated situations where we would have failed to reject H0, even though it was false.138

In general, hypothesis tests are set up to disprove something. In this case, we reject H0 when if H0 is true, there is a small probability of seeing a t-statistic as unusual as observed or more unusual. Failing to reject H0 means that either H0 is true or there is insufficient evidence to

demonstrate that H0 is false.

137 This distribution is not symmetric. When H0 is false, the t-statistic follows what is called a non-central t-distribution.138 This is what is called a Type II Error. The power of a statistical test is: 1 - Prob[Type II Error] = Prob[reject H0 | H1].

The power of this test would have been larger if there had been more than 6 observations, β had been larger than 0.1, or σ had been smaller than 3.


Mahler’s Guide to

Regression

Sections 18-21:

18 Three Variable Regression Model19 Matrix Form of Multiple Regression20 Tests of Slopes, Multiple Regression Model21 Additional Tests of Slopes


prepared by


Study Aid F06-Reg-E


Section 18, Three Variable Regression Model

Tristate Insurance writes private passenger automobile insurance in three states, New York, New Jersey, and Connecticut, and has 10 agents. Let X2 = the percentages of business written by each agent that is from New York. Let X3 = the percentages of business written by each agent that is from New Jersey.139 Let Y = the loss ratio for each agent.

Agent X2 X3 Y1 100% 0 75%2 90% 10% 78%3 70% 0% 71%4 65% 10% 73%5 50% 50% 79%6 50% 35% 75%7 40% 10% 65%8 30% 70% 82%9 15% 20% 72%10 10% 10% 66%

Let’s fit via regression the linear model: Y = β1 + β2X2 + β3X3 + ε.

Since there are 3 independent variables including the intercept, the formulas are somewhat more complicated than for the two-variable model.

As usual define the variables in deviation form: x2 = X2 - 2X .

Agent X2 X3 Y x 2 x 3 y x2x3 x2y x3y

1 1 0 0 0 7 5 4 8 - 2 1 . 5 1 .4 - 1 0 3 2 . 0 67.2 - 3 0 . 12 9 0 1 0 7 8 3 8 - 1 1 . 5 4 .4 - 4 3 7 . 0 167.2 - 5 0 . 63 7 0 0 7 1 1 8 - 2 1 . 5 - 2 . 6 - 3 8 7 . 0 - 4 6 . 8 55.94 6 5 1 0 7 3 1 3 - 1 1 . 5 - 0 . 6 - 1 4 9 . 5 - 7 . 8 6 .95 5 0 5 0 7 9 - 2 28.5 5 .4 - 5 7 . 0 - 1 0 . 8 153.96 5 0 3 5 7 5 - 2 13.5 1 .4 - 2 7 . 0 - 2 . 8 18.97 4 0 1 0 6 5 - 1 2 - 1 1 . 5 - 8 . 6 138.0 103.2 98.98 3 0 7 0 8 2 - 2 2 48.5 8 .4 - 1 0 6 7 . 0 - 1 8 4 . 8 407.49 1 5 2 0 7 2 - 3 7 - 1 . 5 - 1 . 6 55.5 59.2 2 .4

1 0 1 0 1 0 6 6 - 4 2 - 1 1 . 5 - 7 . 6 483.0 319.2 87.4Sum - 2 4 8 0 . 0 463.0 751.0Avg. 52 .0 21.5 73.6

Σx2ix3i = -2480, Σx2iyi = 463, and Σx2iyi = 751. Similarly, Σx2i2 = 8010.0, Σx3i2 = 4802.5.

139 Connecticut represents the remaining percentage, 1 - (X2 + X3).

HCMSA-F06-Reg-E, Mahler’s Guide to Regression, 7/11/06, Page 159

The fitted least squares coefficients are:140 ^β2 = Σx2iyi Σx3i2 - Σx3iyi Σx2ix3i / Σx2i2 Σx3i2 - (Σx2ix3i)2

= (463)(4802.5) - (751)(-2480)/(8010)(4802.5) - (-2480)2 = 4086038/32317625 = .126.^β3 = Σx3iyi Σx2i2 - Σx2iyi Σx2ix3i / Σx2i2 Σx3i2 - (Σx2ix3i)2

= (751)(8010.5) - (463)(-2480)/(8010)(4802.5) - (-2480)2 = 7164126/32317625 = .222.^β1 = Y -

^β2 2X -

^β3 3X = 73.6 - (.126)(52.0) - (.222)(21.5) = 62.3.141

^Y = 62.3 + .126X2 + .222X3.

Thus, the model would seem to indicate that the larger X2, portion of business in New York, the higher the loss ratio, and the larger X3, portion of business in New Jersey, the higher the loss ratio.

The fitted regression is a plane, as shown in the following three dimensional graph:

020

4060

80100

N.Y.

0

20

40

60

80

100

N.J.

70

80

90L.R.

020

4060

80100

N.Y.

If an agent wrote all of its business in New York, then X2 = 100 and X3 = 0, and the predicted

loss ratio is β1 + 100β2 = 74.9.

140 See equations 4.3 to 4.5 of Econometric Models and Economic Forecasts. As discussed in the next section, it also possible to perform this regression in matrix form.141 Therefore, the fitted regression goes through the point at which the independent variables and the dependent variable are equal to their means.


If an agent wrote all of its business in New Jersey, then X2 = 0 and X3 = 100, and the predicted

loss ratio is β1 + 100β2 = 84.5.

If an agent wrote all of its business in Connecticut, then X2 = 0 and X3 = 0, and the predicted

loss ratio is β1 = 62.3.

For example, for the second agent the fitted loss ratio is: 62.3 + (.126)(90) + (.222)(10) = 75.86.

As in the two-variable case, these estimators of the slopes are unbiased.In general, in multiple regression, the least squares estimators of the slopes are unbiased.142

R2:

^Y = 74.90, 75.86, 71.11, 72.69, 79.66, 76.34, 69.53, 81.57, 68.59, 65.74.

Y = 75, 78, 71, 73, 79, 75, 65, 82, 72, 66.

Note that the mean of ^Y = 73.6 = Y .

TSS = Σ (Yi - Y )2 = 264.4.

RSS = Σ ( îY - Y )2 = 225.01.

R2 = RSS/ TSS = 225.01/264.4 = 0.851.

1 - 2

R = (1 - R2)(N - 1)/(N - k) = (1 - .851)(10 - 1)/(10 - 3) = .192.2

R = .808.

R2 can also be computed as the square of the correlation between Y and ^Y:

Corr[ Y, ^Y] = 0.9225. Corr[Y,

^Y] 2 = .851 = R2.

Based on this model, the mix of business by state seems to explain some of the variation of loss ratios between the agents.143 We have computed how much of the variation was explained by this model.

In general one wants to determine whether the results of a regression are statistically significant. As with the two-variable regression model, as will be discussed subsequently, one can test the significance of the fitted coefficients using the t-test and F-Test. One preliminary step is to compute the variances and covariances of the fitted parameters.142 Provided the assumed form of the model is correct.143 Since these loss ratios are from finite data sets, at least some of the variation is due to random fluctuation in the aggregate loss process.


Variances and Covariances:144

For the three variable model, one can write the variances and covariances of the slopes in terms of the simple correlation of x2 and x3,

rX X2 3 = Σx2ix3i/√(Σx2i2Σx3i2) = -2480/√(8010.0)(4802.5) = -.3999

As in the two-variable model, one can estimate the variance of the regression:s2 = ESS/(N - k) = (TSS - RSS)/(10 - 3) = (264.4 - 225.01)/7 = 5.627.

Var[^β2 ] = s2/(1 - rX X2 3

2)Σx2i2 = (5.627)/(1 - .39992)(8010) = .000836.

Var[^β3] = s2/(1 - rX X2 3

2)Σx3i2 = (5.627)/(1 - .39992)(4802.5) = .001395.

Notice that as the correlation of the two independent variables increases in absolute value,the variance of the regression parameters increases. For X1 and X2 independent, their correlation is zero, and their sample correlation is close to zero. For X1 and X2 independent, the values of X1, X2, and Y provide more information than when X1 and X2 are highly

correlated, and we get a better estimate of the coefficients. For rX X2 3close to zero, Var[

^β2 ] and

Var[^β3] are smaller, than for rX X2 3

close to ±1.

Agent X2 X3 X2^2 X3^2 X2 X3

1 1 0 0 0 1 0 0 0 0 0 02 9 0 1 0 8 1 0 0 1 0 0 9 0 03 7 0 0 4 9 0 0 0 04 6 5 1 0 4 2 2 5 1 0 0 6 5 05 5 0 5 0 2 5 0 0 2 5 0 0 2 5 0 06 5 0 3 5 2 5 0 0 1 2 2 5 1 7 5 07 4 0 1 0 1 6 0 0 1 0 0 4 0 08 3 0 7 0 9 0 0 4 9 0 0 2 1 0 09 1 5 2 0 2 2 5 4 0 0 3 0 0

1 0 1 0 1 0 1 0 0 1 0 0 1 0 0

Sum 5 2 0 2 1 5 3 5 0 5 0 9 4 2 5 8 7 0 0

Cov[^β2 ,

^β3] = - rX X2 3

s2 / (1 - rX X2 32)√(Σx2i2Σx3i2) =

-(-.3999)(5.627)/(1 - .39992)√(8010.0)(4802.5) = .000432. 144 See equations 4.6 to 4.8 of Econometric Models and Economic Forecasts. As discussed in the next section, it also possible to use the matrix form to compute these elements of the variance-covariance matrix.

Econometric Models and Economic Forecasts does not include the formulas for Var[^β1] , Cov[

^β1,

^β2 ], and

Cov[^β1,

^β3]. These are derived in the next section from the matrix formula for of the covariance matrix.


Corr[^β2 ,

^β3] = Cov[

^β2 ,

^β3] / √(Var[

^β2 ]Var[

^β3]) =

.000432/√(.000836)(.001395) = .400 = -rX X2 3, subject to rounding.

Var[^β1] = s2ΣX2i2 ΣX3i2 - (ΣX2iX3i)2/NΣx2i2Σx3i2(1 - rX X2 3

2) =

(5.627)(35050)(9425) - 87002/10(1 - .39992)(8010)(4802.5) = 4.434.

Cov[^β1,

^β2 ] = s2ΣX3iΣX2iX3i - ΣX2iΣX3i2/NΣx2i2Σx3i2(1 - rX X2 3

2) =

(5.627)(215)( 8700) - (520)(9425)/10(1 - .39992)(8010)(4802.5) = -.05277.

Cov[^β1,

^β3] = s2ΣX2iΣX2iX3i - ΣX3iΣX2i2/NΣx2i2Σx3i2(1 - rX X2 3

2) =

(5.627)(520)( 8700) - (215)(35050)/10(1 - .39992)(8010)(4802.5) = -.05244.


Problems:

Use the following information for the next three questions:You fit the following model to four observations:Yi = β1 + β2X2i + β3X3i + εi, i = 1, 2, 3, 4You are given:

i X2i X3i

1 2 42 5 83 7 104 10 14

4

18 .1 (3 points) The least squares estimator of β2 is expressed as ^β2 = Σ wiYi.

i =1Determine (w1, w2, w3, w4).(A) (0.5, -2.5, 2.5, -0.5)(B) (0.5, 2.5, -2.5, -0.5)(C) (0.5, 2, -2, -0.5)(D) (-0.5, 2, -2, 0.5)(E) None of A, B, C, or D.

18 .2 (2 points) If Y = (10, 8, 14, 20), determine ^β3.

A. -10 B. -7 C. 0 D. 10 E. 16

18.3 (1 point) If Y = (10, 8, 14, 20), determine ^β1.

A. -10 B. -7 C. 0 D. 10 E. 16


Use the following information for the next four questions:You are given the multiple linear regression model Yi = β1 + β2X2i + β3X3i + εi. 32 32 32

Σ(X2i - 2X )2 = 23,266. Σ(X3i - 3X )2 = 250. Σ(X2i - 2X )(X3i - 3X ) = -612.1 1 132

Σ(Yi - ^

iY )2 = 516,727.1

18.4 (1 point) Determine Var[^β2 ].

A. 0.8 B. 1.0 C. 1.2 D. 1.4 E. 1.6

18.5 (1 point) Determine Var[^β3].

A. 65 B. 70 C. 75 D. 80 E. 85

18.6 (1 point) Determine Cov[^β2 ,

^β3].

A. 0.5 B. 1.0 C. 1.5 D. 2.0 E. 2.5

18.7 (1 point) Determine the estimate of the standard deviation of the least-squares estimate of the sum of β2 and β3.A. 8.6 B. 8.7 C. 8.8 D. 8.9 E. 9.0

18.8 (165, 5/89, Q.15) (1.7 points) You observed the following four students who sat for an examination: Student Hours Studied At Home Hours Studied At Library Score on The Exam I 0 0 0II 0 100 30 III 100 0 40IV 100 100 80 Expected scores are to be obtained by using the regression approach to fit a plane to the observed scores. Determine the number of hours of study at home required to have an expected score of 60, if 75 hours are studied at the library. (A) 75 (B) 81 (C) 86 (D) 93 (E) 100


18.9 (Course 120 Sample Exam #1, Q.7) ( 2 points) You are given the multiple linear regression model Yi = β2X2i + β3X3i + εi. The value of X2i and X3i have been scaled so that

ΣXji = 0 and ΣXji2 = 1, j = 1, 2.i iYou are also given:

(i) Var[^β2 ] is 4s2/3.

(ii) The regression of X2 on X3 has negative slope.Determine the correlation coefficient between X2 and X3.

18 .1 0 (Course 120 Sample Exam #2, Q.2) (2 points) You fit the model

Yi = β1 + β2X2i + β3X3i + εi to the following data:Y X2 X31 -1 -12 1 -14 -1 13 1 1

Determine ^β2 .

(A) 0 (B) 1 (C) 2 (D) 3 (E) 4

18.11 (4, 5/00, Q.35) (2.5 points) You fit the following model to 30 observations:Y = β1 + β2X2 + β3X3 + ε

(i) s2 = 10 (ii) rX X2 3

= 0.5.

(iii) Σ(X2 - 2X )2 = 4

(iv) Σ(X3 - 3X )2 = 8Determine the estimate of the standard deviation of the least-squares estimate of the difference between β2 and β3.(A) 1.7 (B) 2.2 (C) 2.7 (D) 3.2 (E) 3.7


18.12 (4, 11/01, Q.13) (2.5 points) You fit the following model to four observations:Yi = β1 + β2X2i + β3X3i + εi, i = 1, 2, 3, 4You are given:

i X2i X3i1 –3 –12 –1 33 1 –34 3 1

4

The least squares estimator of β3 is expressed as ^β3 = Σ wiYi.

i =1Determine (w1, w2, w3, w4).(A) (–0.15, –0.05, 0.05, 0.15)(B) (–0.05, 0.15, –0.15, 0.05)(C) (–0.05, 0.05, –0.15, 0.15)(D) (–0.3, –0.1, 0.1, 0.3)(E) (–0.1, 0.3, –0.3, 0.1)

18.13 (2 points) In the previous question, the least squares estimator of β2 is expressed as 4^β2 = Σ wiYi. Determine (w1, w2, w3, w4). i =1


Section 19, Matrix Form of Multiple Regression145

One can also perform regression using matrix methods. This is particularly useful as the number of variables increases. Data on loss ratios by agent was previously fit via regression. Agent X2 X3 Y1 100% 0 75%2 90% 10% 78%3 70% 0% 71%4 65% 10% 73%5 50% 50% 79%6 50% 35% 75%7 40% 10% 65%8 30% 70% 82%9 15% 20% 72%10 10% 10% 66%

As an example, let’s use matrix methods to fit the same regression.

To these 10 observations, fit the model: Y = β1 + β2X2 + β3X3 + ε.

The first step to list the so-called design matrix, in which the first column consists of ones, corresponding to the constant term in the model, and the remainder of each row is the values of the independent variables for an observation. Thus for example, the second observation has X2 = 90 and X3 = 10, and therefore the second row of the design matrix is (1, 90, 10).

The design matrix is called X. A column vector of the independent variables, which in this case is the loss ratio by agent, is called Y.

(1 100 0) (75)(1 90 10) (78)(1 70 0) (71)(1 65 10) (73)

X = (1 50 50) Y = (79)(1 50 35) (75)(1 40 10) (65)(1 30 70) (82)(1 15 20) (72)(1 10 10) (66)

Note that the fourth row of X is: 1, X2,4 , X3,4. So the subscripts of the elements of the design matrix do not follow the usual convention of row followed by column.

145 See Appendix 4.3 in Econometric Models and Economic Forecasts.


β is a column vector of the coefficients to be fit. ( β1 )

β = ( β2 )

( β3 )

In matrix form, the model equations are: Y = Xβ + ε.

For example the second row corresponds to the second observation.Y2 = β1 + β2X2,2 + β3X3,2 + ε, or 78 = β1 + β290 + β310 + ε.

X’ is the transpose of the matrix X, with the rows and columns interchanged:(1 1 1 1 1 1 1 1 1 1)

X’ = (100 90 70 65 50 50 40 30 15 10)(0 10 0 10 50 35 10 70 20 10)

The next step is to multiply X’ times X:

(10 520 215)X’X = (520 35050 8700)

(215 8700 9425)

For example, the third element of the second row is:(100)(0) + (90)(10) + (70)(0) + (65)(10) + ( 50)(50) + (50)(35) + (40)(10) + (30)(70) + (15)(20) + (10)(10 ) = 8700.

X’X is called the cross product matrix. The cross product matrix is always square and symmetric. The 1,1 element is the number of observations. The other elements in the first row and first column are sums of the independent variables. The remaining elements are of the form Σ Xji Xli.

146

We need to take the inverse of X’X.147

(.787979 -.00937724 -.00931922 ) (X’X)-1 = (-.00937724 .000148603 .0000767383)

(-.00931922 .0000767383 .00247852 )

For example, the second element of the first row is: -(520)(9425) - (215)(8700)/ 323176250 = -.00937724, where 323,176,250 is the determinant of the cross product matrix.148

146 This is the dot product of the vectors Xj and Xl.147 In general, the cross product matrix will have an inverse unless two or more of the independent variables are linearly related. Such multicollinearity will be discussed subsequently. If the cross product matrix does not have an inverse, then one can not perform linear regression using all of these independent variables. In that case, one would need to drop one or more of the independent variables from the originally proposed model equation.148 The determinant is: (10)(35050)(9425) - (8700)(8700) - (520)(520)(9425) - (215)(8700) + (215)(520)(8700) - (215)(35050) = 323,176,250.


Except in special cases with lots of zeros in the matrix, I do not believe you should be able to take inverses of three by three matrices, or larger, on the exam! We need to multiply the transpose of the design matrix times the column vector of dependent variables.

( 736 )X’Y = (38735)

(16575)The element in the first row of X’Y is the sum of the loss ratios. The element in the second row is Σ X2iYi, the sum of the product of the portion in New York and the loss ratio.

It turns out that the fitted regression coefficients are: (X’X)-1X’Y.

(.787979 -.00937724 -.00931922)( 736 ) (62.2596)(X’X)-1 X’Y = (-.00937724 .000148603 .0000767383)(38735) = (.126434)

(-.00931922 .0000767383 .00247852) (16575) (.221667)

^β1 = 62.3,

^β2 = .126,

^β3 = .222, matching the result obtained previously.

In general, the fitted regression coefficients are:149 ^ββ = (X’X)-1X’Y.

Variance-Covariance Matrix:

As computed in the previous section the estimated variance of the regression is:s2 = ESS/(N - k) = (TSS - RSS)/(10 - 3) = (264.4 - 225.01)/7 = 5.627.

The variance-covariance matrix of the estimated coefficients is: (.787979 -.00937724 -.00931922 )

Var[ ^ββ] = s2(X’X)-1 = (5.627) (-.00937724 .000148603 .0000767383) (-.00931922 .0000767383 .00247852 )

(4.434 -.05277 -.05244 )

Var[^β] = (-.05277 .000836 .000432)

(-.05244 .000432 .001395)

Note that the above matrix, as with all variance-covariance matrices, is symmetric.

Var[^β1] = 4.434, Var[

^β2 ] = .000836, Var[

^β3] = .001395.

Cov[^β1,

^β2 ] = -.05277, Cov[

^β1,

^β3] = -.05244, Cov[

^β2 ,

^β3] = .000432.

149 See Equation A4.12 in Econometric Models and Economic Forecasts, by Pindyck and Rubinfeld.


The matrix of correlations of ^β is:

( 1 -.867 -.667)

Corr[^β] = (-.867 1 -.400)

(-.667 .400 1 )

For example, Corr[^β1,

^β2 ] = -.05277/√((4.434)(.000836)) = -.867.

Model with One Slope and No Intercept, Variance-Covariance Matrix:

For the one variable model (slope and no intercept), the design matrix has a single column containing X. In other words, X is a column vector. X’X = ΣXi2. (X’X)-1 = 1/ΣXi2.

Var[^β] = s2(X’X)-1 = s2/ΣXi2, matching the result shown in a previous section.

Two Variable Model, Variance-Covariance Matrix:

For the two variable model (slope and intercept), the design matrix has 1s in the first column and Xi in the second column:

(1 X1) X = (1 X2)

(1 X3)(... ... )

X’ = (1 1 1 .... ) (X1 X2 X3 .... )

X’X = N X

X Xi

i i2

Σ

Σ Σ

(X’X)-1 = Σ Σ

Σ

X - X

X Ni2

i

i −−

/N ΣXi2 - (ΣXi)2.

Now, N ΣXi2 - (ΣXi)2 = NΣXi2 - NX2 = NΣ(Xi - X)2 = NΣxi2.

Therefore, the variance-covariance matrix of the fitted parameters is:

s2(X’X)-1 = s2Σ Σ

Σ

X - X

X Ni2

i

i −−

/NΣxi2.

Therefore, Var[ α] = s2ΣXi2 /(NΣxi2), Cov[ α,^β] = -s2ΣXi /(NΣxi2) = -s2 X /Σxi2, and

Var[^β] = s2N /(NΣxi2) = s2 /Σxi2. This matches the formulas discussed previously.


Three Variable Model, Variance-Covariance Matrix:*

For the three variable model (two slopes and an intercept), the design matrix has 1s in the first column, X2 in the second column, and X3 in the third column:

(1 X21 X31) X = (1 X22 X32)

(1 X23 X33)(... ... ... )

(1 1 1 .... )X’ = (X21 X22 X23 .... )

(X31 X32 X33 .... )

(N ΣX2i ΣX3i )

X’X = (ΣX2i ΣX2i2 ΣX2iX3i)

(ΣX3i ΣX2iX3i ΣX3i2 ) (ΣX2i2 ΣX3i2 - (ΣX2iX3i)2 ΣX3iΣX2iX3i - ΣX2iΣX3i2 ΣX2iΣX2iX3i - ΣX3iΣX2i2)

(X’X)-1 = (ΣX3iΣX2iX3i - ΣX2iΣX3i2 NΣX3i2 - (ΣX3i)2 ΣX2iΣX3i - NΣX2iX3i ) /D

(ΣX2iΣX2iX3i - ΣX3iΣX2i2 ΣX2iΣX3i - NΣX2iX3i NΣX2i2 - (ΣX2i)2 )

Where D is the determinant of X’X.

D = NΣX2i2ΣX3i2 - (ΣX2iX3i)2 - ΣX2iΣX2iΣX3i2 - ΣX3iΣX2iX3i + ΣX3iΣX2iΣX2iX3i - ΣX2iΣX3i2 =

NΣX2i2ΣX3i2 - (ΣX2i)2ΣX3i2 - (ΣX3i)2ΣX2i2 + (ΣX2i)2(ΣX3i)2

- N(ΣX2iX3i)2 - 2ΣX2iΣX3iΣX2iX3i + (ΣX2i)2(ΣX3i)2 =

NΣX2i2 - (ΣX2i)2/NΣX3i2 - (ΣX3i)2/N - NΣX2iX3i - ΣX2iΣX3i/N2 =

NΣx2i2Σx3i2 - (Σx2ix3i)2 = N Σx2i2Σx3i2(1 - rX X2 32).

Where rX X2 3 = correlation of X2 and X3 = Σx2ix3i/√(Σx2i2Σx3i2).

The variance-covariance matrix of the fitted parameters is s2(X’X)-1.

Therefore, Var[^β1] = s2ΣX2i2 ΣX3i2 - (ΣX2iX3i)2/NΣx2i2Σx3i2(1 - rX X2 3

2).

Var[^β2 ] = s2NΣX3i2 - (ΣX3i)2/NΣx2i2Σx3i2(1 - rX X2 3

2) = s2NΣx3i2/NΣx2i2Σx3i2(1 - rX X2 32)

= s2/Σx2i2(1 - rX X2 32).

Var[^β3] = s2NΣX2i2 - (ΣX2i)2/NΣx2i2Σx3i2(1 - rX X2 3

2) = s2/Σx3i2(1 - rX X2 32).


Cov[^β1,


2).

Cov[^β1,


2).

Cov[^β2 ,

^β3] = -s2NΣX2iX3i - ΣX2iΣX3i/NΣx2i2Σx3i2(1 -rX X2 3

2) =

-s2NΣx2ix3i/NΣx2i2Σx3i2(1 - rX X2 32) = -s2rX X2 3

/(1 - rX X2 32)√(Σx2i2Σx3i2).

This matches the formulas discussed previously for the three variable model.

The Regression Passes Through the Point Where All Variables Take on Their Mean:*

As has been discussed previously, in the two variable regression model (one independent variable plus intercept) the fitted line passes though the point (X, Y ). A similar result holds for multiple regressions.

^β = (X’X)-1X’Y. ⇒ (X’X)

^β = X’Y.

For concreteness, let assume 3 independent variables. Then the design matrix is:

(1 X21 X31 X41) X = (1 X22 X32 X42)

(1 X23 X33 X43) (... ... ... ... )

(1 1 1 .... )X’ = (X21 X22 X23 .... )

(X31 X32 X33 .... ) (X41 X42 X43 .... )

Then X’X =

N X X X

X X X X X X

X X X X X X

X X X X X X

2i 3i 4i

2i 2i2

2i 3i 2i 4i

3i 2i 3i 3i2 3i 4i

4i 2i 4i 3i 4i 4i2

∑∑ ∑∑ ∑∑

∑∑ ∑∑ ∑∑ ∑∑

∑∑ ∑∑ ∑∑ ∑∑

∑∑ ∑∑ ∑∑ ∑∑

.

X’Y =

Y

Y X

Y X

Y X

i

i 2i

i 3i

i 4i

∑∑

∑∑

∑∑

∑∑


Thus the first component of the matrix equation (X’X)^β = X’Y is:

N^β1 + ΣX2i

^β2+ ΣX3i

^β3 + ΣX4i

^β4 = ΣYi.

⇒ ^β1 +

^β2 X2 +

^β3 X3 +

^β4 X4 = Y .

The fitted regression passes through the point at which all of the variables are equal to their means. This nice property holds in general for multiple regressions.

Division of the Variance into Two Pieces in Matrix Form:*

One can also write the various sums of squares in matrix form:

TSS = Y’Y - NY2.

RSS = ^β‘X’X

^β - NY2 =

^β‘X’X(X’X)-1X’Y - NY2 =

^β‘X’Y - NY2.

ESS = Y’Y - ^β‘X’X

^β = Y’Y -

^β‘X’Y.

For the agents loss ratio example:

TSS = Y’Y - NY2 = 54434 - 10(73.62) = 264.4.150

( 736 )

RSS = ^β‘X’Y - NY2 = (62.2596 .126434 .221667)(38735) - 10(73.62) = 54394.6 - 54169.6

(16575) = 225.0.

ESS = Y’Y - ^β‘X’Y = 54434 - 54394.6 = 39.4.

s2 = ESS/(N - k) = 39.4/(10 - 3) = 5.63.

Matching the previous results, subject to rounding.

150 Note that the ΣYi = 736, which can be computed directly, but is also the the first element of X’Y. Y = 736/10.


Hat Matrix:*151

^Y = X

^β = X (X’X)-1X’Y = HY,

where H = X (X’X)-1X’, is called the hat matrix.

H’ = X (X’X)-1X’’ = (X’)’ (X’X)-1’ X’ = X (X’X)’-1X’ = X (X’X)-1X’ = H.152 Thus H is symmetric.

H2 = X (X’X)-1X’X (X’X)-1X’ = X (X’X)-1X’ = H.153

HX = X (X’X)-1X’H = X. ⇒ (I - H)X = 0.

ε = Y - ^Y = Y - HY = (I - H)Y.

Covariance Matrix of the Residuals:*

E[ ε ] = E[(I - H)Y] = (I - H)E[Y] = (I - H)E[Xβ + ε] = (I - H)E[Xβ] + (I - H)E[ε] = E[(I - H)X]β + (I - H)0 = E[0]β + 0 = 0.154

Therefore, ε - E[ ε ] = (I - H)Y - 0 = (I - H)Y.Using the fact that (I - H)X = 0, this can be rewritten as: ε - E[ ε ] = (I - H)Y - (I - H)Xβ = (I - H)(Y - Xβ) = (I - H)ε.155

( ε - E[ ε ])( ε - E[ ε ])’ = (I - H)ε(I - H)ε’ = (I - H)εε’(I - H)’ = (I - H)εε’(I’ - H’) = (I - H)εε’(I - H).

Therefore, the covariance matrix of the residuals is:V( ε ) = E[( ε - E[ ε ])( ε - E[ ε ])’] = E[(I - H)εε’(I - H)] = (I - H)E[εε’](I - H).

However, E[εi] = 0, the εi each have variance σ2, and are mutually independent, and therefore

E[εiεj] = σ2δij. ⇒ E[εε’] = σ2I.

⇒ V( ε ) = (I - H)σ2I(I - H) = (I - H)(I - H)σ2 = (I - 2H + H2)σ2 = (I - H)σ2.

Thus the covariance matrix of the residuals is (I - H)σ2.156

151 See Section 8.1 of Applied Regression Analysis by Draper and Smith. 152 The transpose of a matrix reverses the rows and columns.The transpose of the product of matrices is the product of the transposes in the opposite order.

Transpose of X’X is X’X. Also, (M-1)’ = (M’)-1.153 Thus H is idempotent.

154 This is a vector result. E[ îε ] = 0 for each i.

155 Recall that in matrix form the model is: Y = Xβ + ε.156 This result used the assumptions of homoscedasticity and independent errors.


Exercise: For a linear regression through the origin, Yi = βXi + εi, determine the covariance matrix of the residuals.[Solution: The design matrix has only one column consisting of the Xi. X’X = ΣXk2.

H = X (X’X)-1X’ = X X’/ΣXk2. Hij = XiXj /ΣXk2.

(I - H)σ2 = (δij - XiXj /ΣXk2)σ2. Var[îε ] = σ2(1 - Xi2/ΣXk2). Cov[ ^

iε , ^jε ] = -σ2XiXj/ΣXk2.

Comment: This matches a result in a previous section. H is an N by N matrix.]

Expected Value of ESS:*157

As demonstrated previously, E[ îε ] = 0. Therefore, E[ ^

iε 2] = Var[ îε ].

E[ESS] =E[Σ îε 2] = ΣE[^

iε 2] = ΣVar[îε ] = Tr[V[ ε ]].158

E[ESS] = Tr[V[ ε ]] = Tr[(I - H)σ2] = (Tr[I] - Tr[H])σ2.

I is the N by N identity matrix with trace N.

We will use the fact that assuming the matrices A and B are compatible in size: Tr[AB] = Σ δik Σ aijbjk = Σ aijbji = Σ δjk Σ bjiaik = Tr[BA]. i j i,j j i

Tr[H] = Tr[X (X’X)-1X’] = Tr[X’X (X’X)-1] = Tr[Ik] = k.159

E[ESS] = (Tr[I] - Tr[H])σ2 = (N - k)σ2.

Therefore, s2 = ESS/(N - k) is an unbiased estimator of σ2.

Note that this result did not depend on the distributional form of the errors.This result did depend on the errors each having mean of zero and variance of σ2, and that the errors be mutually independent.

157 See Section 19.9 of Volume 2 of Kendall’s Advanced Theory of Statistics, by Stuart and Ord.158 Where the trace of a matrix is the sum of the elements along its diagonal.159 Using the above result with A = X (X’X)-1 and B = X’. Note that the design matrix X is an N by k matrix, so that X’X, the cross product matrix is k by k.

Therefore, X’X(X’X)-1 is the k by k identity matrix.


Problems:

Use the following information for a regression model for the next 4 questions:(1133.200 )

X’Y = (12273.244 ) (46085.1007 )(1738.0862 )

(.52648584 .0070505743 -0.00080896501 -.370750118 )(X’X)-1 = (.0070505743 0.0041832832 -0.00071877348 -0.014609706)

(-0.00080896501 -0.00071877348 0.00071110259 -0.012779797)(-.370750118 -0.014609706 -0.012779797 0.69152462 )

Y’Y = 73990.3. N = 20.

19.1 (4 points) What are the fitted coefficients?

19.2 (2 points) What is the estimated variance of the regression?

19.3 (2 points) What is the covariance matrix of the fitted parameters?

19.4 (3 points) What are R2 and 2

R ?

19.5 (2 points) You fit the multiple regression model Yi = β1 + β2X2i + β3X3i + εi to a set of 32 observations. You determine:

( 1.695 -0.00773 -0.0571 )(X’X)-1 = (-0.00773 0.0000459 0.0001125)

(-0.0571 0.0001125 0.00428 )Total Sum of Squares (TSS) = 4,799,790.Regression Sum of Squares (RSS) = 4,283,063.

Determine the estimated standard error of: 100^β2 + 10

^β3.

(A) 110 (B) 120 (C) 130 (D) 140 (E) 150

19.6 (1 point) Demonstrate that the matrix form of regression matches the equation for the fitted slope of the regression model with no intercept.

19.7 (2 points) A multiple regression has been fit to 25 observations:

^Y = 11 - 4X2 + 7X3 - 12X4.

Σ X2i = 148. Σ X3i = 201. Σ X4i = 82.

Determine Y .

19.8 (3 points) Demonstrate that the matrix form of regression matches the equations for the fitted slope and intercept of the two variable regression model.Use the following information for the next five questions:


Use the following information for the next five questions:You fit the following model to four observations:Yi = β2X2i + β3X3i + εi, i = 1, 2, 3, 4You are given:

i X2i X3i Yi1 2 4 102 5 8 83 7 10 144 10 14 20

19.9 (2 points) Determine ^β2 .

A. -2.0 B. -1.8 C. -1.6 D. -1.4 E. -1.2

19.10 (2 points) Determine ^β3.

A. 2.3 B. 2.5 C. 2.7 D. 2.9 E. 3.1

19.11 (2 points) Determine Var[^β2 ].

A. 6 B. 8 C. 10 D. 12 E. 14

19.12 (2 points) Determine Var[^β3].

A. 1 B. 3 C. 5 D. 7 E. 9

19.13 (2 points) Determine Cov[^β2 ,

^β3].

A. -10 B. -8 C. -6 D. -4 E. -2

* 19.14 (4 points) For a linear regression, Yi = α + βXi + εi, determine the covariance matrix of the residuals.

* 19.15 (4 points) In the previous question, if there are a total of 5 observations, with X1 = 1, X2 = 2, X3 = 3, X4 = 4, and X5 = 5, determine the covariance matrix of the residuals.

19.16 (Course 120 Sample Exam #1, Q.4) (2 points) You fit the multiple regression

model Yi = β1 + β2X2i + β3X3i + εi to a set of data. You determine: ( 6.1333 -0.0733 -0.1933)

(X’X)-1 = (-0.0733 0.0087 -0.0020) (-0.1933 -0.0020 0.0087)

s2 = 280.1167.

Determine the estimated standard error of ^β2 -

^β3.

(A) 1.9 (B) 2.2 (C) 2.5 (D) 2.8 (E) 3.1


19.17 (Course 120 Sample Exam #3, Q.6) ( 2 points) You fit the multiple regression

model Yi = β1 + β2X2i + β3X3i + β4X4i + εi to 30 observations. You are given:Y’Y = 7995.

( 2.8195 -0.0286 -0.0755 -0.0263)(X’X)-1 = (-0.0286 0.0027 0.0010 -0.0014)

(-0.0755 0.0010 0.0035 -0.0010) (-0.0263 -0.0014 -0.0010 0.0032)

(261.5 ) (5.22)

X’Y = (4041.5)^β = (1.62)

(6177.5) (0.21) (5707.0) (-0.45)Determine the length of the symmetric 95% confidence interval for β3.(A) 0.3 (B) 0.6 (C) 0.7 (D) 1.5 (E) 1.8

19.18 (4, 11/03, Q.36) (2.5 Points) For the model Yi = β1 + β2X2i + β3X3i + β4X4i + εi, you are given:(i) N = 15(ii) (13.66 -0.33 2.05 -6.31)

(X’X)-1 = (-0.33 0.03 0.11 0.00) ( 2.05 0.11 2.14 -2.52)

(-6.31 0.00 -2.52 4.32)(iii) ESS = 282 82.

Calculate the standard error of ^β3 -

^β2 .

(A) 6.4 (B) 6.8 (C) 7.1 (D) 7.5 (E) 7.8


19.19 (4, 11/04, Q.3) (2.5 points) You are given:(i) Y is the annual number of discharges from a hospital.(ii) X is the number of beds in the hospital.(iii) Dummy D is 1 if the hospital is private and 0 if the hospital is public.(iv) The proposed model for the data is Y = β1 + β2X + β3D + ε.

(v) To correct for heteroscedasticity, the model Y/X = β1/X + β2 + β3D/X + ε/X is fitted to

N = 393 observations, yielding ^β2= 3.1,

^β1 = -2.8 and

^β3 = 28.

(vi) For the fit in (v) above, the matrix of estimated variances and covariances

of ^β2 ,

^β1 and

^β3 is:

(0.0035 -0.1480 0.0357)(-0.1480 21.6520 -16.9185)(0.0357 -16.9185 38.8423)

Determine the upper limit of the symmetric 95% confidence interval for the differencebetween the mean annual number of discharges from private hospitals with 500 beds and the mean annual number of discharges from public hospitals with 500 beds.(A) 6 (B) 31 (C) 37 (D) 40 (E) 67

19.20 (2 points) in the previous question, determine the lower limit of the symmetric 99% confidence interval for the difference between the mean annual number of discharges from private hospitals with 300 beds and the mean annual number of discharges from public hospitals with 400 beds.(A) -311 (B) -309 (C) -307 (D) -305 (E) -303

19.21 (VEE-Applied Statistics Exam, 8/05, Q.2) (2.5 points) You are given: (i) Y is the annual number of discharges from a hospital. (ii) X is the number of beds in the hospital. (iii) Dummy variable D is 1 if the hospital is private and 0 if the hospital is public. (iv) The classical three-variable linear regression model β1 + β2X + β3D + ε is fitted to

N cases using ordinary least squares.

(v) The matrix of estimated variances and covariances of ^β1,

^β2 , and

^β3 is:

(1.89952 -0.00364 -0.82744) (-0.00364 0.00001 -0.00041) (-0.82744 -0.00041 2.79655)

Determine the standard error of ^β1 + 600

^β2 .

(A) 1.06 (B) 1.13 (C) 1.38 (D) 1.90 (E) 2.35


Section 20, Tests of Slopes, Multiple Regression Model

One can test hypotheses about the slopes of multiple regression models in a similar manner to that discussed for the two-variable model. One can apply the t-test to individual parameters in the multiple-variable case in the same manner. The number of degrees of freedom is N - k, where k is the number of variables including the intercept. The t-test is a special case of the F-Test, which can be used to test more than one slope simultaneously.

An Example of a Four Variable Regression Model:

In order to give a concrete example to discuss, assume the following 8 observations of three independent variables, four variables when we include the intercept in the regression, and one dependent variable.

X2 X3 X4 Y-2 1 -4 6 1 -1 0 8 3 4 4 336 -4 8 14 11 0 12 40 15 8 16 118 17 -8 20 2 20 -6 24 61

To these 8 observations, fit the model:160 Y = β1 + β2X2 + β3X3 + β4X4 + ε.

(1 -2 1 -4) ( 6 )(1 1 -1 0) ( 8 )(1 3 4 4) ( 33 )

X = (1 6 -4 8) Y = ( 14 )(1 11 0 12 ) ( 40 )(1 15 8 16 ) (118)(1 17 -8 20) ( 2 )(1 20 -6 24) ( 61 )

(8 71 -6 80 ) (282 )X’X = (71 1085 -151 1260) X’Y = (3643)

(-6 -151 198 -196) (636 )(80 1260 -196 1472) (4092)

160 With the aid of a computer. Due to the number of observations and independent variables, calculating the fitted coefficients would be much too time consuming to do on the exam. However, questions involving for example the tests of slopes can be asked. In any case, this example can serve as a useful review of the concepts discussed previously.


(0.378621 -0.162062 0.005565 0.118885 ) (X’X)-1 = (-0.162062 0.276362 -0.022578 -0.230759)

(0.005565 -0.022578 0.007869 0.020071 ) (0.118885 -0.230759 0.020071 0.194415 )

(6.3974) ^β = (X’X)-1X’Y = (2.4626)

(6.4560) (1.1839)

^β1 = 6.3974,

^β2 = 2.4626,

^β3 = 6.4560,

^β4 = 1.1839.

For the fitted model, for example the first predicted value is:^1Y =

^β1 +

^β2X2,1 +

^β3X3,1 +

^β4X4,1 = 6.3974 + (2.4626)(-2) + (6.4560)(1) + (1.1839)(-4) = 3.19.

The residuals of this regression, îε , are:

X2 X3 X4 Yi^

iY îε = Yi -

îY

îY - Y Yi -Y

-2 1 -4 6 3.19 2.81 -32.06 -29.251 -1 0 8 2.40 5.60 -32.85 -27.253 4 4 33 44.34 -11.34 9.09 -2.256 -4 8 14 4.82 9.18 -30.43 -21.2511 0 12 40 47.69 -7.69 12.44 4.7515 8 16 118 113.93 4.07 78.68 82.7517 -8 20 2 20.29 -18.29 -14.96 -33.2520 -6 24 61 45.33 15.67 10.08 25.75

Y = (6 + 8 + 33 + 14 + 40 + 118 + 2 + 61)/8 = 35.25.

Error Sum of Squares = ESS = residual variation ≡ Σ îε 2 = Σ(Yi -

îY )2 =

2.812 + 5.602 + (-11.34)2 + 9.182 + (-7.69)2 + 4.072 + (-18.29)2 + 15.672 = 908.1.

Regression Sum of Squares = RSS = explained variation ≡ Σ(^

iY - Y )2 = (-32.06)2 + (-32.85)2 + 9.092 + (-30.43)2 + 12.442 + 78.682 + (-14.96)2 + 10.082 = 9786.

Total Sum of Squares = TSS ≡ Σ(Yi -Y)2 =

(-29.25)2 + (-27.25)2 + (-2.25)2 + (-21.25)2 + 4.752 + 82.752 + (-33.25)2 + 25.752 =10693.5.

ESS + RSS = 908 + 9786 = 10694 = TSS.

R2 ≡ RSS/TSS = explained variation / total variation = 9786/10694 = .915.


The estimated variance of the regression is:

s2 ≡ sample variance of îε = Σ^

iε 2/ (N - k) = ESS/(N - k) = 2.812 + 5.602 + (-11.35)2 + 9.182 + (-7.69)2 + 4.072 + (-18.29)2 + 15.672/(8 - 4) = 908.1/4 = 227.0.

Sample Variance of Y = Σ(Yi -Y)2/(N - 1) = TSS/(N-1) = 10693.5 /(8 - 1) = 1527.6.

2R = Corrected R2 ≡ 1 - s2/Var[Y] = 1 - 227.0/1527.6 = .851.

Note that 1 - 2

R = (1 - R2)(N - 1)/(N - k) = (1 - .915)(8 - 1)/(8 - 4) = (.085)(7/4) = .149.

The variance-covariance matrix of the estimated coefficients is: (0.378621 -0.162062 0.005565 0.118885 )

Var[^β] = s2(X’X)-1 = (227.0)(-0.162062 0.276362 -0.022578 -0.230759)

(0.005565 -0.022578 0.007869 0.020071 ) (0.118885 -0.230759 0.020071 0.194415 )

(85.96 -36.79 1.26 26.99 )

Var[^β] = (-36.79 62.75 -5.13 -52.39)

(1.26 -5.13 1.79 4.56 ) 26.99 -52.39 4.56 44.14)

Note that the above matrix, as with all variance-covariance matrices, is symmetric.

Var[^β1] = 85.96, Var[

^β2 ] = 62.75, Var[

^β3] = 1.79, Var[

^β4 ] = 44.14.

Cov[^β1,

^β2 ] = -36.79, Cov[

^β1,

^β3] = 1.26, Cov[

^β1,

^β4 ] = 26.99,

Cov[^β2 ,

^β3] = -5.13, Cov[

^β2 ,

^β4 ] = -52.39, Cov[

^β3,

^β4 ] = 4.56.

For example, Corr[^β1,

^β2 ] = -36.79/√(85.96)(62.75) = -.50.

The matrix of correlations of ^β is:

( 1 -.50 .10 .44 )

Corr[^β] = (-.50 1 -.48 -.995)

(.10 -.48 1 .51 )(.44 -.995 .51 1 )


The standard errors of the estimated regression coefficients are:

^sβ1

= √Var[^β1] = √85.96 = 9.27, ^s

β2 = √Var[

^β2 ] = √62.75 = 7.92,

^sβ3

= Var[^β3] = √1.79 = 1.34, ^s

β4 = √Var[

^β4 ] = √44.14 = 6.64.

The More General the Model, the Smaller the ESS:

The above model but with β2 fixed at zero, is a special case of the above model, with X2 not

entering into the model. The fitted regression with ^β1 = 6.3974,

^β2 = 2.4626,

^β3 = 6.4560,

^β4 = 1.1839, was determined so as to have the least sum of squared errors; in

other words, for the given observations, it has the smallest ESS over all possible values of β1,

β2, β3, and β4. A more restricted model with β2 fixed at zero has an ESS that can not be larger than that of the unrestricted model.

Minimizing over a larger set we do at least as well and probably better than when minimizing over a subset.161 For example, Susan is the youngest employee in the actuarial department at the Regressive Insurance Company. We know that the youngest employee of the whole Regressive Insurance Company is the same age or younger than Susan, because the actuarial department is a subset of the Regressive Insurance Company.

So in general, adding additional variables to a linear regression model creates a more general model, and will decrease the ESS, (on rare occasions the ESS will stay the same.)162 The question is whether this improvement in ESS is significant. As for the two-variable model, this can be determined by t-tests and F-tests, as will be discussed. Testing Individual Coefficients:

As in the two-variable case, we can use the t-test in order to test the hypothesis that an individual coefficient is zero. We have N - k = 8 - 4 = 4 degrees of freedom.

The t-statistics are:163 ^β1 / ^s

β1 = 6.3974/9.27 = .690 ⇒ p-value = .528.

^β2 / ^s

β2 = 2.4626/7.92 = .311 ⇒ p-value = .771.

^β3 / ^s

β3 = 6.4560/1.34 = 4.83 ⇒ p-value = .0085.

161 This is similar to an idea covered on Exam 4/C. The maximum likelihood Gamma has to have a likelihood at least as large as the maximum likelihood Exponential fit to the same data, since the Exponential Distribution is a special case of the Gamma Distribution with alpha = 1.162 Since TSS = RSS + ESS, and the total sum of squares depends only on the data, not the model, as we add additional variables to a regression model, the RSS will increase, (on rare occasions the RSS will stay the same.)163 p-values obtained via computer, allowing more accuracy than using the t-table.


^β4 / ^s

β4 = 1.1839/6.64 = .178 ⇒ p-value = .867.

The critical value at 10% (two-sided test) for the t-distribution for 4 degrees of freedom is 2.132.Since .690 < 2.132, we do not reject at 10% the hypothesis that β1 = 0.

Since .311 < 2.132, we do not reject at 10% the hypothesis that β2 = 0.

Since .178 < 2.132, we do not reject at 10% the hypothesis that β4 = 0.

The critical value at 1% (two-sided test) for the t-distribution for 4 degrees of freedom is 4.604.Since 4.83 > 4.604, we reject at 1% the hypothesis that β3 = 0. Saying the same thing

somewhat differently, for 4 degrees of freedom and a 1% significance level, the critical region is |t| > 4.604. Since 4.83 is in the critical region, we reject at 1% (two-sided test) the hypothesis that β3 = 0.

As discussed previously, in general, if the test statistic is in the critical region or rejection region, then we reject the null hypothesis. The critical region depends on the significance level and the type of test: t-test, F-Test, etc.

Exercise: For a 5 variable regression model fit to 25 observations, what is the critical region or rejection region for testing the hypothesis that β1 = 0, at a 10% level (2-sided test)?[Solution: There are N - k = 25 - 5 = 20 degrees of freedom. For a 2-sided test, the critical value for 10% in the t-table is: 1.725. The critical region is: |t| > 1.725.]


Exercise: For a 5 variable regression model fit to 25 observations, ^β2= 23.2 and ^s

β2 = 9.7.

Test the hypothesis that β2 = 0.

[Solution: t = ^β2 / ^s

β2 = 23.2/9.7 = 2.4. There are N - k = 25 - 5 = 20 degrees of freedom.

For a 2-sided test, the critical value for 5% in the t-table is 2.086, and for 2% is 2.528.Since 2.086 < 2.4 < 2.528, we reject the null hypothesis at 5% and do not reject it at 2%.]

One can also test whether an individual slope has a specific value b. The statistic is then

(^β - b)/ βs , which reduces to the previous case when b = 0. If the value to be tested is not zero,

then most commonly it will be 1.

Exercise: For a 3 variable regression model fit to 43 observations, ^β2= .67 and ^s

β2 = .12.

Test the hypothesis that β2 = 1.

[Solution: t = (^β2-1) / ^s

β2 = (1 - .67)/.12 = -2.75.

There are N - k = 43 - 3 = 40 degrees of freedom. For a 2-sided test, the critical value for 1% (two-sided test) in the t-table is 2.704. Since |-2.75| > 2.704, we reject the null hypothesis at 1%. We conclude (at a 1% significance level) that β2 ≠ 1.]

Summary of the general t-test:1. H0: a particular regression parameter takes on certain value b. H1: H0 is not true.2. t = (estimated parameter - b)/standard error of the parameter.3. If H0 is true, then t follows a t-distribution. 4. Number of degrees of freedom = N - k. 5. Compare the absolute value of the t-statistic to the critical values in the



Testing Whether All of the Slopes are Zero:

However, one may also be interested in whether two or more slope coefficients are all zero. This involves using the F-Statistic.164 165

Let’s test the hypothesis that all of the slope coefficients are zero in the prior example of a four variable regression: H0: β2 = β3 = β4 = 0.

The F-Statistic is: RSS/(k-1)/ESS/(N - k) = (9786/3)/(908.2/4) = 3262/227.0 = 14.37.

Note that the numerator and denominator are each sums of squares, divided by their degrees of freedom. The denominator is the estimated variance of the regression,s2 = ESS/(N - k) = 227.0.

For the F-Distribution with 3 and 4 degrees of freedom, the critical values at 5% and 1% are 6.59 and 16.69. Since 6.59 < 14.37 < 16.69, we reject the null hypothesis at 5% and do not reject the null hypothesis at 1%.

The Analysis of Variance (ANOVA) Table is:DF SumofSq MeanSq FRatio PValue

Model 3 9786 3262 14.37 .013Error 4 908 227Total 7 10694

Note that the F-Statistic can also be calculated as:F-Statistic = (R2/(1 - R2))(N - k)/(k - 1) = (.915/(1- .915))(4/3) = 14.4.

To test the hypothesis that all of the slope coefficients are zero compute the F-Statistic = RSS/(k-1)/ESS/(N - k) = R2/(1 - R2)(N - k)/(k - 1), which if H0 is true follows an F-Distribution with νννν1111 = k -1 and νννν2222 = N - k.

Exercise: For a 5 variable regression model fit to 25 observations, RSS = 1382 and TSS = 1945. Test the hypothesis that β2 = β3 = β4 = β5 = 0.[Solution: F-Statistic = RSS/(k-1)/ESS/(N - k) = 1382/(5-1)/(1945 - 1382)/(25-5) = 12.27.

DF SumofSq MeanSq FRatioModel 4 1382 345.5 12.27Error 20 563 28.15Total 24 1945Alternately, R2 = RSS/TSS = 1382/1945 = .711. F = (R2/(1 - R2))(N - k)/(k - 1) = 12.3.There are k - 1 = 4 and N - k = 20 degrees of freedom.The critical value at 5% is 2.87, and the critical value at 1% is 4.43. 4.43 < 12.3 ⇒ Reject the hypothesis at 1%.]

164 The t-test is a special case of the F-Test. 165 R2 determines F and vice-versa. As discussed previously, if all the slopes are zero, R2 follows a Beta Distribution.

Therefore, if one had a table of incomplete beta functions, one could perform an equivalent statistical test using R2.


Distribution of Sums of Squares:*166

TSS = Σ(Yi - Y )2 = Σyi2. Assuming the εi are Normal and independent, then so are the Yi.

Assume each εi has variance σ2; we assume homoscedasticity.

If all of the slopes are zero, then each of Yi has the same mean, and TSS/σ2 = Σ(Yi - Y )/σ2, looks like a sum of squared Unit Normals. The sum of the squares of N independent Unit Normals is a Chi-Square Distribution with N degrees of freedom. However, the yi are not independent since they sum to zero. Therefore, we lose one degree of freedom.167

If β1 = β2 = ... = βN = 0, TSS follows a Chi-Square Distribution with N - 1 degrees of freedom.168

Similarly, if β1 = β2 = ... = βN = 0, RSS/σ2 = Σ(^

iY - Y )2/σ2, follows a Chi-Square Distribution with k - 1 degrees of freedom.169

ESS/σ2 = Σ îε 2/σ2 = Σ(Yi -

îY )2/σ2, follows a Chi-Square Distribution with N - k degrees of

freedom. Therefore, E[ESS/σ2] = N - k.170 Therefore, E[s2] = E[ESS/(N-k)] = σ2. Therefore, s2 = ESS/(N-k), is an unbiased estimator of σ2.171

If β1 = β2 = ... = βN = 0, then RSS and ESS have independent Chi-Square Distributions, and

F = RSS/(k - 1)/ESS/(N - k) = (RSS/σ2)/(k - 1)/(ESS/σ2)/(N - k), follows an F-Distribution, with k - 1 and N - k degrees of freedom.

166 See Section 19.11 of Volume 2 of Kendall’s Advanced Theory of Statistics or Appendix 2 of Statistical Methods of Forecasting by Abraham and Ledbolter.167 If all the slopes are zero, then TSS is the numerator of the sample variance of Y, where the Yi are random samples

from the same Normal Distribution. An important statistical result is that in this case, S2(N - 1)/σ2 has a Chi-Square

Distribution with N-1 degrees of freedom; which is the same as saying TSS/σ2 has a Chi-Square Distribution with N-1 degrees of freedom. See Theorem 6.3-4 in Hogg and Tanis, or Theorem 3.6.1 in Hogg , McKean, and Craig. 168 If all the slopes are not zero, then TSS/σ2 follows a noncentral Chi-Square Distribution.169 If all the slopes are not zero, then RSS/σ2 follows a noncentral Chi-Square Distribution.170 The mean of a Chi-Square Distribution is equal to its number of degrees of freedom.171 The fact that s2 is an unbiased estimator of σ2 was proven in a previous section, without using the assumption of Normally Distributed Errors. See also Section 19.9 of Volume 2 of Kendall’s Advanced Theory of Statistics.


Problems:

2 0. 1 (1 point) A 4-variable model has been fit to 30 points. The estimated first slope parameter ^β1 = - 4.421, with standard error 2.203. Test the hypothesis that β1 = 0.

20.2 (1 point) A 6-variable model has been fit to 36 observations. The estimated third slope

parameter ^β3 = 3.13, with standard error .816. Test the hypothesis that β3 = 1.

2 0. 3 (2 points) A 3-variable model (including intercept) has been fit to 15 observations. Regression Sum of Squares (RSS) = 5,018,232. Total Sum of Squares (TSS) = 8,618,428. Test the hypothesis that β2 = β3 = 0.

Use the following information for the next two questions:

For a linear regression model: Y = β1 + β2X2 + β3X3 + β4X4 + β5X5 + ε, fit to 23 observations,

R2 = .912.

20.4 (1 point) What is 2

R ?(A) .86 (B) .87 (C) .88 (D) .89 (E) .90

20.5 (1 point) What is the value of the F-statistic used to test the hypothesis β2 = β3 = β4 = β5 = 0?(A) 47 (B) 49 (C) 51 (D) 53 (E) 55

20.6 (2 points) The F-Statistic used to test the hypothesis that all of the slopes are zero is calculated as which of the following?A. The explained variation divided by the total variationB. The explained variance divided by the total varianceC. The explained variation divided by the unexplained variationD. The explained variance divided by the unexplained varianceE. None of A, B, C, or D is true.

* 20.7 (2 points) A multiple linear regression model with k variables has been fit to N observations, N > k. You compute F = RSS/(k-1)/ESS/(N - k). Which of the following is not a necessary condition for F to follow an F-Distribution?A. HomoscedasticityB. Independent errorsC. All of the actual slopes are zero.D. The error terms are Normally Distributed.E. All of the above are necessary.

Use the following information from a three variable regression for the next 4 questions:


Use the following information from a three variable regression for the next 4 questions:

N = 15. ^β1 = -20.352,

^β2 = 13.3504,

^β3 = 243.714.

(426076 -2435 -36703)

Var[^β] = (-2435 58.85 41.99)

(-36703 41.99 4034)

20.8 (1 point) Test the hypothesis that β1 = 1500.A. Reject H0 at 1%.B. Do not reject H0 at 1%. Reject H0 at 2%.C. Do not reject H0 at 2%. Reject H0 at 5%.D. Do not reject H0 at 5%. Reject H0 at 10%.E. Do not reject H0 at 10%.



20.11 (3 points) What is the upper limit of a symmetric 95% confidence interval for β1 + 50β2 + 10β3?A. 3425 B. 3450 C. 3475 D. 3500 E. 3525

20.12 (2 point) You fit the 2-variable linear regression model, Y = α + βX + ε,

to 16 observations. ΣXi2 = 1018. ^β = -1.9. α = 27.

The t-statistic for testing β = 0 is -2.70.Test the hypothesis H0: α = 10 versus H1: α ≠ 10.A. Reject H0 at 1%.B. Do not reject H0 at 1%. Reject H0 at 2%.C. Do not reject H0 at 2%. Reject H0 at 5%.D. Do not reject H0 at 5%. Reject H0 at 10%.E. Do not reject H0 at 10%.


Use the following information for the next 6 questions:A multiple regression model has been fit to 50 observations.Coefficient Fitted Value Standard Errorβ0 88 49

β1 0.031 0.012

β2 -0.72 0.46The error sum of squares is 63 and R2 = .84.




20.16 (1 point) Determine the Standard Error of the regression.A. 1.0 B. 1.2 C. 1.4 D. 1.6 E. 1.8

20.17 (2 points) Determine the adjusted R2 of the regression.

20.18 (2 points) Test the hypothesis that β0 = β1 = β2 = 0.


Use the following information for the next three questions:One has fit a regression model with 5 variables (4 independent variables plus the intercept).One is testing the hypothesis H0: β2 = β3 = β4 = β5 = 0, versus the alternative hypothesis that H0

is false.



20.21 (1 point) Compare the probability of a Type II error for the tests in the two previous questions, all else being equal.

2 0.22 (Course 120 Sample Exam #2, Q.4) (2 points) You apply all possible regression models to a set of five observations with three explanatory variables. You determine ESS, the sum of squared errors (or residuals), for each of the models:

Model Variables in the Model ESS I X2 5.85 II X3 8.45 III X4 6.15 IV X2, X3 5.12 V X2, X4 4.35 VI X3, X4 1.72 VII X2, X3, X4 0.07You also determine that the estimated variance of the dependent variable Y is 2.2.Calculate the value of the F statistic for testing the significance of adding the variable X4 to the

model Yi = β2 + β2X2i + εi.(A) 0.3 (B) 0.7 (C) 1.0 (D) 1.4 (E) 1.7

20.23 (Course 120 Sample Exam #2, Q.5) (2 points) You preform a regression of Y on

X2 and X3. You determine: ^

iY = 20.0 - 1.5X2i - 2.0X3i.Source Sum of Squares Degrees of Freedom Mean Sum of Squares F-RatioRegression 42 2 21 5.25Error 12 3 4Total 54 5

(4/3 -1/4 -1/3)(X’X)-1 = (-1/4 1/16 0)

(-1/3 0 2/3)Calculate the value of t statistic for testing the null hypothesis H0: β3 = 1.(A) -0.9 (B) -1.2 (C) -1.8 (D) -3.0 (E) -5.0


20.24 (Course 120 Sample Exam #3, Q.8) (2 points) You fit the following model to 10 observations: Y = β1 + β2X2 + β3X3 + ε.You are given: RSS = 61.3. TSS = 128.You then fit the following new model, with the additional variable X4, to the same data:

Y = β1 + β2X2 + β3X3 + β4X4 + ε. For this new model, you determine: RSS = 65.6, TSS = 128.Calculate the value of the F statistic to test H0: β4 = 0.(A) 0.01 (B) 0.41 (C) 1.76 (D) 4.30 (E) 10.40


Section 21, Additional Tests of Slopes

This section continues the discussion of tests of hypotheses about the slopes of multiple regression models.

Testing Whether Subgroups of Slopes are Zero:

In a similar manner to testing whether all of the slopes are zero, one can also use the F-Distribution to test whether groups of slope coefficients are equal to zero. For example, in the 4 variable regression model discussed in the previous section, take H0: β2 = β4 = 0.

Then run a regression for the new model: Y = β1 + β3X3 + ε.

The result is ^β1 = 38.5349,

^β3 = 4.3798.

The residuals of this regression, îε , are:

X3 Yi^

iY îε = Yi -

îY

1 6 42.91 -36.91-1 8 34.16 -26.164 33 56.05 -23.05 -4 14 21.02 -7.02 0 40 38.53 1.478 118 73.57 44.43-8 2 3.50 -1.506 61 12.26 48.74

Error Sum of Squares = ESS = residual variation ≡ Σ îε 2 = Σ(Yi -

îY )2 = 6982.

ESSUR = Error Sum of Squares of the Unrestricted model = 908.ESSR = Error Sum of Squares of the Restricted model = 6982.q = dimension of the restriction = variables for unrestricted model - variables for restricted model = 4 - 2 = 2.N = number of observations = 8. k = variables for the unrestricted model = 4.Then the F-statistic is:172 (ESSR - ESSUR)/q/ESSUR/(N - k) = (6982 - 908)/2 /908/(8 - 4) = 3037/227 = 13.37.This is an F-Statistic with 2 and 4 degrees of freedom.

Using the table with ν1 = 2 and ν2 = 4, the critical values are 6.94 and 18.00 for 5% and 1% respectively. Since 6.94 < 13.37 < 18.00, we reject at 5% and do not reject at 1% the null hypothesis, that β2 = β4 = 0.

172 The numerator and denominator are each sums of squares, divided by their degrees of freedom.See formula 5.20 in Pindyck and Rubinfeld.


In general when testing whether some set of slope coefficients are all zero, or whether some linear relationship holds between two slope coefficients:173 F-Statistic = (ESSR - ESSUR)/q / ESSUR/(N - k).

The ESS for the restricted model is greater than or equal the ESS for the unrestricted model;the F-Statistic is always non-negative.

Since TSS = RSS + ESS, and the total sum of squares depends only on the data, not the model, we can rewrite the numerator of this F-Statistic as a difference of Regression Sums of Squares, but in the opposite order: F-Statistic = (RSSUR - RSSR)/q / ESSUR/(N - k).

The case of testing whether all the slope coefficients are zero is a special case of the above, with q = k -1, ESSUR = ESS, RSSUR = RSS, and RSSR = 0,

so that F-Statistic = RSS/(k-1)/ESS/(N - k).

Since ESS = (1 - R2)TSS, and the total sum of squares depends only on the data, not the model, we can rewrite this F-Statistic in terms of R2.F = (ESSR - ESSUR)/q / ESSUR/(N - k) =

((1 - R2R)TSS - (1 - R2UR)TSS)/q / (1 - R2UR)TSS/(N - k).

F-Statistic = (R2UR - R2R)/q/(1 - R2UR)/(N - k).174

Exercise: Test the null hypothesis that β2 = β3 = 0.

[Solution: Run a regression for the new model: Y = β1 + β4X4 + ε.


^β4 = 1.8929.

The residuals of this regression, îε , are: -2.75, -8.32, 9.11, -17.46, 0.96, 71.39, 52.18, -0.75.

Error Sum of Squares = ESS = residual variation ≡ Σ îε 2 = 8286.

ESSUR = Error Sum of Squares of the Unrestricted model = 908.Then the F-statistic is: (ESSR - ESSUR)/q/ESSUR/(N - k) = (8286 - 908)/2 /908/(8 - 4) = 3689/227 = 16.25.

Using the table with ν1 = 2 and ν2 = 4, the critical values are 6.94 and 18.00 for 5% and 1% respectively. Since 6.94 < 16.25 < 18.00, we reject at 5% and do not reject at 1% the null hypothesis, that β2 = β3 = 0.]

173 This is a specific example of what is called a Wald Test.174 See formula 5.21 in Pindyck and Rubinfeld.


t-test as a special case of the F-Test:175

If one applies the F-Test to a single slope, it is equivalent to the previously discussed t-test.

For example, in the four-variable regression example, one could use the F-Test to test the hypothesis that β3 = 0. There is one restriction, so that q = 1. N - k = 8 - 4 = 4.

F-Statistic = (ESSR - ESSUR)/q / ESSUR/(N - k) = (4)(ESSR - ESSUR)/ESSUR =

(4)(ESSR/ESSUR - 1), with 1 and 4 degrees of freedom.ESSUR = Error Sum of Squares of the Unrestricted model = 908.

Run a regression for the restricted model: Y = β1 + β2X2 + β4X4 + ε.


^β2 = 20.9849,

^β4 = -15.2823.

The residuals of this regression, îε , are: -14.99, -14.82, 29.34, 8.52, -9.284, 45.91, -50.93, 6.24.

Error Sum of Squares = ESSR ≡ Σ îε 2 = 6205.

F-Statistic = (4)(6205/908 - 1) = 23.33, with 1 and 4 degrees of freedom.

Consulting the F-Table, since 21.20 < 23.33, we reject the hypothesis at the 1% level.This is the same conclusion as drawn previously based on the t-test.

Using a computer, one can determine that the p-value is .0085. This is the same p-value as determined previously based on the t-statistic.

In fact the t-statistic was 4.83 = √23.33 = square root of the F-Statistic.

In general, applying the F-Test to a single slope is equivalent to the t-test with t = √√√√F.

175 See the section on the F-Distribution, for a discussion of its relationship to the t-distribution when ν1 =1.


Linear Relationship Between Slope Coefficients:

In a similar manner, one can also use the F-distribution to test the hypothesis that the slope coefficients satisfy a specific linear relationship.

For example one can test the hypothesis H0: β2 + β4 = 1.176

To obtain the restricted model, substitute β4 = 1 - β2 to yield the model:

Y - X4 = β1 + β2(X2 - X4) + β3X3 + ε.X2 X3 X4 Y X2-X4 Y-X4-2 1 -4 6 2 101 -1 0 8 1 83 4 4 33 -1 296 -4 8 14 -2 611 0 12 40 -1 2815 8 16 118 -1 10217 -8 20 2 -3 -1820 -6 24 61 -4 37

Run a regression for the new model: ^β1 = 18.7371 ,

^β2 = -10.5707,

^β3 = 7.1723.

The residuals of this regression, îε , are: 5.23, 7.01, -29.00, -5.19, -1.31, 15.31, -11.07, 19.01.

Error Sum of Squares = ESS = residual variation ≡ Σ îε 2 = 1665.

ESSUR = Error Sum of Squares of the Unrestricted model = 908.The restriction is one dimensional; q = 1. Then the F-statistic is: (ESSR - ESSUR)/q/ESSUR/(N - k) = (1665 - 908)/1 /908/(8 - 4) = 757/227 = 3.33.

Using the table with ν1 = 1 and ν2 = 4, the critical values are 7.71 and 21.20 for 5% and 1%

respectively. Since 3.33 < 7.71, we do not reject at 5% the null hypothesis, that β2 + β4 = 1.

As before, F-Statistic = (ESSR - ESSUR)/q / ESSUR/(N - k).

Exercise: For the four variable regression example, test the hypothesis β3 = β4.

[Solution: To obtain the restricted model, substitute β3 = β4 to yield the model:

Y = β1 + β2X2 + β3X3 + β3X4 + ε. ⇔ Y = β1 + β2X2 + β3(X3 + X4) + ε. Fitting a regression to the restricted model: ^β1 = 10.082,

^β2 = -4.306,

^β3 = 6.853. ESSR = 1080. ESSUR = 908.

F-Statistic = (ESSR - ESSUR)/q / ESSUR/(N - k) = (1080 - 908)/1/908/(8 - 4) = .76.For 1 and 4 degrees of freedom, the critical value for 5% is 7.71. Since .76 < 7.71, we do not reject at 5% the null hypothesis, that β3 = β4.Comment: With so few observations, it is difficult to reject hypotheses.]

176 See 4, 11/01, Q.21.


The Equality of the Coefficients of Two Regressions:177

We can also use the F-Statistic to test the equality of the coefficients of two similar regressions fit to different data sets.

Previously we fit the model:178 Y = β1 + β2X2 + β3X3 + β4X4 + ε, to 8 observations: X2 X3 X4 Y-2 1 -4 6 1 -1 0 8 3 4 4 336 -4 8 14 11 0 12 40 15 8 16 118 17 -8 20 2 20 -6 24 61

The result was ^β1 = 6.3974,

^β2 = 2.4626,

^β3 = 6.4560,

^β4 = 1.1839, with ESS = 908.

Let’s assume this data was from geographical region A.Let’s assume we have 6 similar observations from a neighboring geographical region B:

X2 X3 X4 Y-5 2 3 18 7 -3 5 -5 13 5 11 9819 -1 9 47 24 6 14 10926 3 2 121

Exercise: With the aid of a computer, fit a linear regression to these 6 observations from Region B.

[Solution: ^β1 = 21.1185,

^β2 = 2.7666,

^β3 = 10.5036,

^β4 = -2.2079, with ESS = 310.]

If we assume that the variances of the two models are equal179, we can pool the data into one model.

Exercise: With the aid of a computer, fit a linear regression to the combined 14 observations from both regions.

[Solution: ^β1 = 7.4757,

^β2 = 2.8453,

^β3 = 6.7521,

^β4 = .6758, with ESS = 2248.]

177 See Section 5.3.3 of Econometric Models and Economic Forecasts.178 With the aid of a computer. 179 One can test for heteroscedasticity.


We note that ESSC = 2248 > 1218 = 908 + 310 = ESSA + ESSB. With separate models, and therefore 4 extra parameters, we are able to reduce the ESS and get a better fit. However, we can always get a model that appears to fit better by adding additional parameters!

Is it better to use the separate models for regions A and B, or use the combined model C for both regions? One can do an F-Test in order to test H0: model C applies to both regions. ⇔H0: the coefficients for regions A and B are the same.

This is just an application of ideas previously discussed. One takes C as the restricted model, and separate A plus separate B as the unrestricted model.There are 8 + 6 = 14 total observations ⇒ N = 14.There are four restrictions (equality of the 4 coefficients in the two regions) ⇒ q = 4. There are 8 coefficients being fit in the unrestricted model (4 in region A and 4 in region B) ⇒ k = 8.ESSR = ESSC. ESSUR = ESSA + ESSB. F = (ESSR - ESSUR)/q/ESSUR/(N - k) = (ESSC - ESSA - ESSB)/4/(ESSA + ESSB)/6 = (2248 - 908 - 310)/4/(908 + 310)/6 = 1.27.This F-statistic has 4 and 6 degrees of freedom, with critical value at 5% of 4.53.Since 1.27 < 4.53, we do not reject H0.180 In general, F = (ESSR - ESSUR)/q/ESSUR/(N - k) = (ESSC - ESSA - ESSB)/k/(ESSA + ESSB)/(NA + NB - 2k), with k and NA + NB - 2k degrees of freedom.181

For the above example, F = (2248 - 908 - 310)/4/(908 + 310)/(8 + 6 - (2)(4) = 1.27, with 4 and 8 + 6 - (2)(4) = 6 degrees of freedom.

Exercise: The same two-variable linear regression model is fit to similar data from two different insurers. For the model fit to 13 observations from the first insurer ESS = 792. For the model fit to 17 observations from the second insurer ESS = 951. When the same model is fit to all 30 observations, ESS = 2370. Test the hypothesis that the model for the first insurer is identical to that for the second insurer.[Solution: F = (ESSC - ESS1 - ESS2)/k/(ESS1 + ESS2)/(N1 + N2 - 2k) = (2370 - 792 - 951)/2/(792 + 951)/(13 + 17 - (2)(2) = (627/2)/(1743/26) = 4.68, with 2 and 26 degrees of freedom. For 2 and 26 degrees of freedom, the 5% critical value is 3.37 and the 1% critical value is 5.53. Since 3.37 < 4.68 < 5.53, we reject the hypothesis at the 5% level, but not at the 1% level.]

180 With 4 independent variables including the constant, and so few observations, it is difficult to reject H0. 181 See Equation 5.25 in Econometric Models and Economic Forecasts.


Problems:

Use the following information for the next 6 questions:Six models have been fit to 25 observations:Model I: Y = β1 + β2X2 + β3X3 + β4X4 + ε

Model II: Y = β1 + β2(X2 + X3) + β4X4 + ε

Model III: Y = β1 + β4X4 + ε

Model IV: Y - X3 = β1 + β2(X2 - X3) + β4X4 + ε

Model V: Y = β1 + β2(X2 + X3 + X4) + ε

Model VI: Y - X4 = β1 + β2(X2 - X4) + β3X3 + εYou are given:Model Error Sum of Squares (ESS)I 2721II 3024III 3763IV 3406V 3245VI 3897

21.1 (2 points) Calculate the value of the F statistic used to test the hypothesis H0: β2 = β3 = 0.

(A) 2 (B) 3 (C) 4 (D) 5 (E) 6

21.2 (1 point) In the prior question, at what level, if any, do you reject the null hypothesis?

21.3 (2 points) Calculate the value of the F statistic used to test the hypothesis H0: β2 = β3.(A) 2 (B) 3 (C) 4 (D) 5 (E) 6

21.4 (1 point) In the prior question, at what level, if any, do you reject the null hypothesis? 21.5 (2 points) Calculate the value of the F statistic used to test the hypothesis H0: β2 + β4 = 1.

(A) 8 (B) 9 (C) 10 (D) 11 (E) 12



Use the following information for the next two questions:For each of five policy years an actuary has estimated the ultimate losses based on the information available at the end of that policy year.Policy Year Estimated Actual Ultimate1991 45 431992 50 581993 55 631994 60 761995 65 78

21.7 (3 points) Let Xt be the actuary’s estimate and Yt be the actual ultimate.

Fit the ordinary least squares model, Yt = α + βXt + εt, and determine the ESS.

21.8 (3 points) The null hypothesis is H0: α = 0, β = 1.Perform an F test of the null hypothesis.

21 .9 (3 points) You fit the model Y = β1 + β2X2 + β3X3 + ε, separately to similar data from two states.For the regression to the 30 observations from the state of Southern Exposure, the error sum of squares is: 2573. For the regression to the 20 observations from the state of Northern Exposure, the error sum of squares is: 2041. For a similar regression to the 50 observations from both states combined, the error sum of squares is: 5735. Which of the following is the F-statistic for testing the hypothesis that the models fit separately in the two states are the same?(A) Less than 2(B) At least 2, but less than 3(C) At least 3, but less than 4(D) At least 4, but less than 5(E) At least 5

21.10 (2 points) For a regression model containing all the independent variables:Source of Variation Degrees of Freedom Sum of SquaresRegression 5 72,195Error 33 22,070You exclude some of the independent variables, and for this revised regression model:Source of Variation Degrees of Freedom Sum of SquaresRegression 3 63,021Determine the F ratio to use to test the hypothesis that the coefficients for the excluded variables are all equal to zero.(A) 6 (B) 7 (C) 8 (D) 9 (E) 10


21.11 (3 points) Data Minor works at the Fitz Insurance Company (FIC). He has access to a large data base of information on long haul trucking insureds for which FIC writes commercial automobile insurance.For each of 16 such large insureds, Data has their claim frequency over the most recent 3 years and 25 different characteristics that vary across these insureds. Over the weekend, Data has his computer fit every possible 6 variable linear regression model(5 independent variables and an intercept).The computer ranks these regressions by R2; the largest R2 is 0.92.On Monday, Data very proudly presents this regression model to his boss Ernest Checca.What are some things Ernest should check or consider?

2 1.12 (Course 120 Sample Exam #1, Q.5 & Course 4 Sample Exam, Q.35) (2.5 points) To determine the relationship of salary (Y) to years of experience (X2) for both men

(X3 = 1) and women (X3 = 0) you fit the model Yi = β1 + β2X2i + β3X3i + β4X2iX3i + εito a set of observations from a sample of 11 employees. For this model you are given:Source of Variation Degrees of Freedom Sum of SquaresRegression 3 330.0117Error 7 12.8156You also fit the modelYi = β*1 + β*

2X2i + ε*i

to the observations. For this model you are given:Source of Variation Degrees of Freedom Sum of SquaresRegression 1 315.0992Error 9 27.7281Determine the F ratio to use to test whether the linear relationship between salary and years of experience is identical for men and women.(A) 0.6 (B) 2.0 (C) 3.5 (D) 4.1 (E) 6.2

21.13 (Course 120 Sample Exam #1, Q.6 & Course 4 Sample Exam, Q. 12) (2.5 points) To predict insurance sales using eight independent variables you fit two regression models based on 27 observations.The first model contains all eight independent variables. For this model you are given:Source of Variation Degrees of Freedom Sum of SquaresRegression 8 115,175Error 18 76,893The second model contains only the first two independent variables. For this model you are given:Source of Variation Degrees of Freedom Sum of SquaresRegression 2 65,597Error 24 126,471Determine the F ratio to use to test the hypothesis that the coefficients for the third through the eighth independent variables are all equal to zero.(A) 5.8 (B) 4.5 (C) 2.6 (D) 1.9 (E) 1.6

21.14 (4, 5/00, Q.9) (2.5 points) The following models are fitted to 30 observations:


21.14 (4, 5/00, Q.9) (2.5 points) The following models are fitted to 30 observations:Model I: Y = β1 + β2X2 + ε

Model II: Y = β1 + β2X2 + β3X3 + β4X4 + ε You are given:(i) Σ(Y - Y )2 = 160(ii) Σ(X2 - 2X )2 = 10

(iii) For Model I, ^β2 = -2

(iv) For Model II, R2 = 0.70Determine the value of the F statistic used to test that β3 and β4 are jointly equal to zero.(A) Less than 15(B) At least 15, but less than 18(C) At least 18, but less than 21(D) At least 21, but less than 24(E) At least 24


21.15 (4, 11/00, Q.21) (2.5 points) You are given the following two regression models, each based on a different population of data:Model A: Yi = A1 + A2X2i + A3X3i + εi where i = 1,2 ,...,30.

Model B: Yj = B1 + B2X2j + B3X3j + εj where j = 1,2 ,...,50.You assume that the variances of the two models are equal and pool the data into one model:Model G: Yp = G1 + G2X2p + G3X3p + εp where p = 1,2 ,...,80.

You calculate model2R and the error sum of squares, denoted as ESSmodel, for all three models.

Which of the following is the F statistic for testing the hypothesis that Model A is identical toModel B?(A) (ESSG - ESSA - ESSB)/3

F3 ,74 = _____________________

(ESSA + ESSB)/74

(B) (ESSG - ESSA - ESSB)/6

F6 ,77 = _____________________

(ESSA + ESSB)/77

(C) (ESSG - ESSA - ESSB)/6

F6 ,74 = _____________________

(ESSA + ESSB)/74

(D) ( G2R - A

2R - B2R )/3

F3 ,74 = _______________

( A2R + B

2R )/74

(E) ( G2R - A

2R - B2R )/6

F6 ,77 = ______________

( A2R + B

2R )/77


21.16 (4, 11/01, Q.21) (2.5 Points) Three models have been fit to 20 observations:Model I: Y = β1 + β2X2 + β3X3 + ε

Model II: Y = β1 + β2(X2 + X3) + ε

Model III: Y - X3 = β1 + β2(X2 - X3) + εYou are given:Model ESSI 484II 925III 982Calculate the value of the F statistic used to test the hypothesis H0: β2 + β3 = 1.

(A) Less than 15(B) At least 15, but less than 16(C) At least 16, but less than 17(D) At least 17, but less than 18(E) At least 18


21.18 (2 Points) Using the information in question 4, 11/01, Q.21, calculate the value of the F statistic used to test the hypothesis H0: β2 = β3.(A) Less than 15(B) At least 15, but less than 16(C) At least 16, but less than 17(D) At least 17, but less than 18(E) At least 18


21.20 (4, 11/02, Q.27) (2.5 Points) For the multiple regression model Y = β1 + β2X2 + β3X3 + β4X4 + β5X5 + β6X6 + ε, you are given:(i) N = 3,120 (ii) TSS = 15,000 (iii) H0: β4 = β5 = β6 = 0

(iv) RUR2 = 0.38(v) RSSR = 5,565Determine the value of the F statistic for testing H0.(A) Less than 10(B) At least 10, but less than 12(C) At least 12, but less than 14(D) At least 14, but less than 16(E) At least 16


21.21 (4, 11/03, Q.20) (2.5 Points) At the beginning of each of the past 5 years, an actuary has forecast the annual claims for a group of insureds. The table below shows the forecasts (X) and the actual claims (Y). A two-variable linear regression model is used to analyze the data.

t Xt Yt1 475 2542 254 4633 463 5154 515 5675 567 605

You are given:(i) The null hypothesis is H0: α = 0, β = 1.(ii) The unrestricted model fit yields ESS = 69,843.Which of the following is true regarding the F test of the null hypothesis?(A) The null hypothesis is not rejected at the 0.05 significance level.(B) The null hypothesis is rejected at the 0.05 significance level, but not at the 0.01 level.(C) The numerator has 3 degrees of freedom.(D) The denominator has 2 degrees of freedom.(E) The F statistic cannot be determined from the information given. 21.22 (4, 11/04, Q.19) (2.5 points) You are given the following information about a linear regression model:(i) The unit of measurement is a region, and the number of regions in the study is 37.(ii) The dependent variable is a measure of workers’ compensation frequency, while the

three independent variables are a measure of employment, a measure ofunemployment rate and a dummy variable indicating the presence or absence ofvigorous cost-containment efforts.

(iii) The model is fitted separately to the group of 18 largest regions and to the group of19 smallest regions (by population). The ESS resulting from the first fit is 4053, whilethe ESS resulting from the second fit is 2087.

(iv) The model is fitted to all 37 regions, and the resulting ESS is 10,374.The null hypothesis to be tested is that the pooling of the regions into one group isappropriate. Which of the following is true?(A) The F statistic has 4 numerator degrees of freedom and 29 denominator degrees of

freedom, and it is statistically significant at the 5% significance level.(B) The F statistic has 4 numerator degrees of freedom and 29 denominator degrees of

freedom, and it is not statistically significant at the 5% significance level.(C) The F statistic has 4 numerator degrees of freedom and 33 denominator degrees of

freedom, and it is not statistically significant at the 5% significance level.(D) The F statistic has 8 numerator degrees of freedom and 33 denominator degrees of

freedom, and it is statistically significant at the 5% significance level.(E) The F statistic has 8 numerator degrees of freedom and 33 denominator degrees of

freedom, and it is not statistically significant at the 5% significance level.


21.23 (VEE-Applied Statistics Exam, 8/05, Q.7) (2.5 points) You use the model Y= α + βX+ ε to analyze the following data:

i Xi Yi 1 1 2.8 2 2 2.9 3 3 3.6 4 4 4.7 5 5 6.2

You are given: (i) The null hypothesis is H0: α = β.

(ii) The unrestricted model fit yields α = 1.46 and β = 0.86. (iii) The restricted model fit yields α = β = 0.993. Which of the following is true regarding the F test of the null hypothesis? (A) The null hypothesis is not rejected at the 5% significance level.(B) The null hypothesis is rejected at the 5% significance level but not at the 1% level. (C) The numerator has 2 degrees of freedom. (D) The denominator has 2 degrees of freedom.(E) The F statistic cannot be determined from the information given.


Mahler’s Guide to

Regression

Sections 22-24:

22 Additional Models 23 Dummy Variables 24 Piecewise Linear Regression


prepared by


Study Aid F06-Reg-F


Section 22, Additional Models

One can create additional models via change of variables and other techniques.Change of variables can be used to convert models into those that are linear in the parameters.

For example, Y = 10 X13X22(error) is a multiplicative model. ln(Y) = ln(10) + 3ln(X1) + 2ln(X2) + ln(error) is an equivalent model, linear in its parameters.

More generally, such a model would be written as: ln(Y) = α + βln(X1) + γln(X2) + ln(error),and the linear regression techniques could be applied. If the errors of the original relationship are LogNormal, then those of the transformed relationship are Normal.

Exponential Regression:

If one assumes that on average costs increase at $5 per year, and they are $100 at time 0, then a reasonable model is: Yt = 100 + 5t + errort. If instead, we assume that costs increase on

average 5% per year, then a reasonable model is: Yt = (100)(1.05t)(errort).

This second model can be rewritten: ln Yt = ln(100) + t ln(1.05) + ln(errort).

This is of the form: Zt = α + βt + εt, where Zt = ln Yt. By a change of variables we have

managed to transform the original model into a two-variable linear regression model.If the errors of the original relationship are LogNormal, then those of the transformed relationship are Normal.

This is usually referred to as an exponential regression, since the original model can be written as Yt = exp[ln(100) + t ln(1.05) + ln(errort)].

Exponential regression: ln[Yi] = αααα + ββββXi + εεεε i. ⇔ Yi = exp[αααα + ββββXi + εεεε i].

More generally, an exponential model with more independent variables would be: Y = exp[β1 + β2X2 + ...+ βnXn + ln(error)] ⇔ lnY = β1 + β2X2 + ...+ βnXn + ln(error).

Exercise: Fit an exponential regression to the following data.Year: 1 2 3 4 5Average Claim Cost: 300 320 345 370 400[Solution: Fit a linear regression to the natural log of the claim sizes:ln(300) = 5.704, ln(320) = 5.768, ln(345) = 5.844, ln(370) = 5.914, ln(400) = 5.991. X = 3. x = -2, -1, 0, 1, 2. Σxi2 = 10. Σxiyi = (-2)(5.704) + (-1)(5.768) + (0)(5.844) + (1)(5.914) + (2)(5.991) = 0.72.

^β = Σxiyi /Σxi2 = .72/10 = .072. α = Y −

^βX = 5.844 - (.072)(3) = 5.628.]

HCMSA-F06-Reg-F, Mahler’s Guide to Regression, 7/12/06, Page 208

If Y is the average claim cost, then the fitted model is: ln(Y) = 5.628 + .072t.Exponentiating both sides, and noting that e5.628 = 278.1 and that e.072 = 1.075, this model is equivalent to: Y = 278.1(1.075t). This represents a constant annual inflation rate of 7.5%.

In general, exponential regression ⇔⇔⇔⇔ constant percentage rate of inflation.182

Using this exponential regression, one could predict the average claim cost in the future.For example, for year 6, the predicted average claim cost is: 278.1(1.0756) = 429.

Fitting a Quadratic:

Assume we have the following heights of eight fathers and their adult sons (in inches): Father Son53 5654 5857 6158 6061 6362 6263 6566 64 Here is a graph of the least squares line fit in a previous section, y = .6254x + 24.07:

54 56 58 60 62 64 66Father

56

58

60

62

64

Son

182 While the expected percent changes are the same, due the random element, the observed changes will vary.


One can also fit higher order polynomials by least squares. Here is an example of fitting a second degree polynomial, y = β1 + β2x + β3x2, to this same data.

Let Vi = Xi2 and vi = Vi - E[X2]. In deviations form, the fitted least squares coefficients are:183 ^β2 = Σxiyi Σvi2 - Σviyi Σxivi / Σxi2 Σvi2 - (Σxivi)2.^β3 = Σviyi Σxi2 - Σxiyi Σxivi / Σxi2 Σvi2 - (Σxivi)2.^β1 = Y -

^β2 X -

^β3E[X2].

E[X2] = 532 + 542 + 572 + 582 + 612 + 622 + 632 + 662/8 = 3528.5.

vi = (532, 542, 572, 582, 612, 622, 632, 662) - 3528.5 = (-719.5, -612.5, -279.5, -164.5, 192.5, 315.5, 440.5, 827.5).

x i y i x i y i v i x i v i v i y i x i ^2 v i^2

- 6 . 2 5 - 5 . 1 2 5 32.031 - 7 1 9 . 5 4496.9 3687.4 39.062 517680.2- 5 . 2 5 - 3 . 1 2 5 16.406 - 6 1 2 . 5 3215.6 1914.1 27.562 375156.2- 2 . 2 5 - 0 . 1 2 5 0.281 - 2 7 9 . 5 628.9 34.9 5 .062 78120.2- 1 . 2 5 - 1 . 1 2 5 1.406 - 1 6 4 . 5 205.6 185.1 1.562 27060.21.75 1.875 3.281 192.5 336.9 360.9 3.062 37056.22.75 0.875 2.406 315.5 867.6 276.1 7.562 99540.23.75 3.875 14.531 440.5 1651.9 1706.9 14.062 194040.26.75 2.875 19.406 827.5 5585.6 2379.1 45.562 684756.2

Sum 0 0 89.750 0.0 16989.0 10544.5 143.500 2013410.0

^β2 = Σxiyi Σvi2 - Σviyi Σxivi / Σxi2 Σvi2 - (Σxivi)2 =

(89.75)(2013410) - (10544.5)(16989)/(143.5)(2013410) - (16989)2 = 1563037/298214 = 5.24133.

^β3 = Σviyi Σxi2 - Σxiyi Σxivi / Σxi2 Σvi2 - (Σxivi)2 =

(10544.5)(143.5) - (89.75)(16989)/(143.5)(2013410) - (16989)2 = -11627/298214 = -.0389888.X = 59.25. Y = 61.125.

^β1 = Y -

^β2 X -

^β3E[X2] = 61.125 - (5.24133)(59.25) - (-.0389888)(3528.5) = -111.852.

183 Below I will show how to fit a quadratic in matrix from and via the Normal Equations. You can use which ever form you find most convenient.


Here is a graph of the fitted polynomial, y = -111.852 + 5.24133x - 0.0389888x2:

54 56 58 60 62 64 66

Father

56

58

60

62

64

Son

One convenient way to fit least squares polynomials is to write the regression equations in matrix form, as discussed in a previous section. Here is the same example in matrix form. The design matrix, X, has as its columns the values of 1, Xi and Xi2.The matrix Y, has a single column with Yi in it.

(1 53 2809) (56)(1 54 2916) (58)(1 57 3249) (61)

X = (1 58 3364) Y = (60)(1 61 3721) (63)(1 62 3844) (62)(1 63 3969) (65)(1 66 4356) (64)

Let X’ be the transpose of X, in other words with the rows and columns interchanged.(8 474 28228)

X’X = (474 28228 1689498)(28228 1689498 101615908)

(5872.61 -199.014 1.67752)(X’X)-1 = (-199.014 6.75156 -0.0569692)

(1.67752 -0.0569692 0.000481198)(489 )

X’Y = (29063 )(1735981)


(-111.852 )^β = (X’X)-1X’Y = (5.24133 )

(-0.0389888)

The fitted polynomial is: y = -111.852 + 5.24133x - 0.0389888x2, matching the previous result.

This matrix form can be applied in a similar manner to higher order polynomials.Of course, many software programs can fit regressions.184

This matrix form is just a set of N linear equations in N unknowns, called the Normal Equations. In general, the Normal Equations can be obtained by writing the expression for the sum of squared errors, and setting equal to zero the partial derivative with respect to each of the parameters.

Exercise: Obtain the Normal Equations for this example.

[Solution: Σ(Yi - ^

iY )2 = Σ(Yi - β1 - Xi β2 - Xi2 β3)2.

0 = ∂ Σ(Yi - ^

iY )2 / ∂β1 = -2Σ(Yi - β1 - Xi β2 - Xi2 β3). ⇒ N β1 + ΣXi β2 + ΣXi2 β3 = ΣYi.

0 = ∂ Σ(Yi - ^

iY )2 / ∂β2 = -2ΣXi(Yi - β1 - Xi β2 - Xi2 β3). ⇒ ΣXi β1 + ΣXi2 β2 + ΣXi3 β3 = ΣYiXi.

0 = ∂ Σ(Yi - ^

iY )2 / ∂β3 = -2ΣXi2(Yi - β1 - Xi β2 - Xi2 β3). ⇒ ΣXi2 β1 + ΣXi3 β2 + ΣXi4 β3 = ΣYiXi2.

N = 8. ΣXi = 474. ΣXi2 = 28228. ΣXi3 = 1,689,498. ΣXi4 = 101,615,908.

ΣYi = 489. ΣXiYi = 29,063. ΣXi2Yi = 1,735,981. Therefore, in this example, the Normal Equations are:8β1 + 474β2 + 28228β3 = 489.

474β1 + 28228β2 + 1689498β3 = 29063.

28228β1 + 1689498β2 + 101615908β3 = 1735981.Comment: These are the same equations as obtained in the matrix form.]

One can solve these three linear equations in three unknowns in the usual manner.

The first two equations imply: 1148β2 + 135912β3 = 718.

The second two equations imply: 4002068β2 + 474790848β3 = 2464630.

Therefore, 4002068(718 - 135912β3)/1148 + 474790848β3 = 2464630. ⇒ β3 = -.03898878.

⇒ β2 = 5.2413267. ⇒ β1 = 111.8516966.

The fitted polynomial is: y = -111.852 + 5.24133x - 0.0389888x2, matching the previous result.

184 For example, I used PolynomialFit in Mathematica in order to fit least squares polynomials.


Testing the Quadratic Fit:

Exercise: Determine the Error Sum of Squares for the linear and quadratic fits.

[Solution: îε = Yi -

îY . ESS = Σ ^

iε 2.

Linear Square of Quadratic Square of Xi Yi Fitted Value Residual Residual Fitted Value Residual Residual

5 3 5 6 57.216 - 1 . 2 1 6 1.479 56.419 - 0 . 4 1 9 0.1765 4 5 8 57.842 0.158 0.025 57.488 0.512 0.2625 7 6 1 59.718 1.282 1.644 60.229 0.771 0.5945 8 6 0 60.343 - 0 . 3 4 3 0.118 60.987 - 0 . 9 8 7 0.9746 1 6 3 62.219 0.781 0.609 62.792 0.208 0.0436 2 6 2 62.845 - 0 . 8 4 5 0.714 63.238 - 1 . 2 3 8 1.5316 3 6 5 63.470 1.530 2.340 63.605 1.395 1.9456 6 6 4 65.346 - 1 . 3 4 6 1.813 64.241 - 0 . 2 4 1 0.058

Sum 0.000 8.742 0.001 5.583

Note that the residuals for the quadratic fit would sum to zero except for rounding.] It should be noted that the second degree least squares polynomial has a smaller sum of squared errors at 5.583 than the least squares straight line at 8.742. Since a polynomial of order m - 1 is a special case of a polynomial of order m, the best polynomial of order m does at least as well and in most cases better than the best polynomial of order m - 1. However, the principle of parsimony states one should only continue to increase the degree of the polynomial as long as one gets a “significantly” better fit.

Exercise: Perform an F-Test of whether the quadratic regression is significantly better than the linear regression.[Solution: The restriction of going from the quadratic to the linear model is one dimensional.F = (ESSR - ESSUR)/q/ESSUR/(N - k) = (8.742 - 5.583)/1/(5.583/(8 - 3) = 2.829. Perform a one sided F-Test at 1 and 5 d.f. 6.61 is the critical value at 5%. Since 2.829 < 6.61 we do not reject the simpler linear model at the 5% level.Comment: Using a computer, the p-value is: 15.3%.

This is equivalent to performing a t-test of the hypothesis that β3 = 0.]

Thus in this case, with only 8 data points, the more complicated model quadratic model is not significantly better than the simpler linear model.

As discussed previously for the linear fit of the heights example, we had:185 Parameter Table Estimate SE T-Stat p-Value

1 24.0679 5.98553 4.02102 0.00695079x 0.625436 0.100765 6.2069 0.000806761

185 The p-values of the t-statistics were calculated on a computer.


Exercise: Test each of the coefficients of the fitted quadratic, in order to see if it is significantly different from zero.[Solution: s2 = ESS/(N - k) = 5.583/(8 - 3) = 1.1166.Calculate the correlation of the two variables X and X2 = V:rX X2 3

= Σx2ix3i/√(Σx2i2Σx3i2) = 16989/√(143.5)(2013410) = .999484

Var[^β2 ] = s2/(1 - rX X2 3

2)Σx2i2 = (1.1166)/(1 - .9994842)(143.5) = 7.539.

s2β= √7.539 = 2.746. t =

^β2 /s

2β= 5.24133/2.746 = 1.909.

At N - k = 8 - 3 degrees of freedom, the 10% critical value is 2.015.Thus we do not reject H0: β2 = 0 at the 10% level. (Using a computer, the p-value is 11.5%.)

Var[^β3] = s2/(1 - rX X2 3

2)Σx3i2 = (1.1166)/(1 - .9994842)(2013410) = .0005375.

s3β= √.0005375 = .02318. t = -0.0389888/.02318 = -1.682.

Since 1.682 < 2.015, we do not reject H0: β3 = 0 at the 10% level. (The p-value is 15.3%.)

X X^2 X^3 X^4

5 3 2 8 0 9 148 ,877 7 ,890 ,4815 4 2 9 1 6 157 ,464 8 ,503 ,0565 7 3 2 4 9 185 ,193 10 ,556 ,0015 8 3 3 6 4 195 ,112 11 ,316 ,4966 1 3 7 2 1 226 ,981 13 ,845 ,8416 2 3 8 4 4 238 ,328 14 ,776 ,3366 3 3 9 6 9 250 ,047 15 ,752 ,9616 6 4 3 5 6 287 ,496 18 ,974 ,736

Sum 28 ,228 1 ,689 ,498 101 ,615 ,908

ΣX2i2 = ΣXi2 = 28,228. ΣX3i2 = Σ(Xi2)2 = ΣXi4 = 101,615,908.

ΣX2iX3i = ΣXi(Xi2) = ΣXi3 = 1,689,498.


2) =

(1.1166)(28,228)(101,615,908) - 1,689,4982/8(1 - .9994842)(143.5)(2013410) = 6560.s

1β= √6560 = 80.99. t = -111.852/80.99 = -1.381.

Since 1.381 < 2.015, we do not reject H0: β1 = 0 at the 10% level. (The p-value is 22.6%.)]

Thus in this case, ^β3 is not significantly different from 0, and as determined before, the more

complicated model quadratic model is not significantly better than the simpler linear model.186

Many would find it easier to use the matrix form discussed previously, in order to compute the variance-covariance matrix, since one need not memorize individual formulas.187

186 Note that the F-test was equivalent to the t-test of the fitted β3. In both cases the p-value was 15.3%.187 Note that even for the three variable model and only a few data points, carrying out all of the calculations oneself takes too long for exam questions.


Exercise: For this quadratic fit, determine the variance-covariance matrix as s2(X’X)-1. [Solution: From the previous calculation of the fitted coefficients using the matrix form:

(5872.61 -199.014 1.67752)(X’X)-1 = (-199.014 6.75156 -0.0569692)

(1.67752 -0.0569692 0.000481198)From the previous solution, s2 = ESS/(N - k) = 5.583/(8 - 3) = 1.1166.

(6557.4 -222.22 1.8731)s2(X’X)-1 = (-222.22 7.5388 -.063612)

(1.8731 -.063612 .00053731)

Therefore, Var[^β1] = 6557.4, Var[

^β2 ] = 7.5388, Var[

^β3] = .00053731,

Cov[^β1,

^β2 ] = -222.22, Cov[

^β1,

^β3] = 1.8731, Cov[

^β2 ,

^β3] = -.063612.

Comment: The variances match those calculated previously, subject to rounding.]

A List of Some Models:188

Linear model: Y = β1 + β2X2 + β3X3 + ε. Example: Y = 3 + 4X2 + 5X3 + 7X4 + ε.

Quadratic model: Y = β1 + β2X + β3X2 + ε. Example: Y = 1 + 2X + 8X2 + ε.

Polynomial model: Y = β1 + β2X + β3X2 + β4X3 + ε.

Example: Y = -4 + 3X + 6X2 + 2X3 + ε.

Log-Log model: ln Y = β1 + β2 lnX2 + β3 lnX3 + ε. Example: ln Y = 4 + 6 lnX2 + 7 lnX3 + ε.

Equivalent to: Y = γ1X2γ2 X3γ

3 ε∗, where ε = ln(ε∗).

Multiplicative model: Y = β1X2β2 X3β

3 + ε. Example: Y = 0.3X22.2 X31.5 + ε.

Exponential model: Y = exp[β1 + β2X2 + β3X3 + ε]. Example: Y = exp[5 + .1X + ε]

Equivalent to: lnY = β1 + β2X2 + β3X3 + ε.

Reciprocal Model: Y = 1/β1 + β2X2 + β3X3 + ε. Example: Y = 1/7 + 9X2 + 6X3 + ε

Equivalent to: 1/Y = β1 + β2X2 + β3X3 + ε.

Semilog Model: Y = β1 + β2 lnX2 + β3lnX3 + ε. Example: Y = -10 + 3lnX + ε

Interaction Model: Y = β1 + β2X2 + β3X3 + β4X2X3 + ε.

Example: Y = 20 + 4X2 + 3X3 + 0.7X2X3 + ε.

188 See pages 118 to 119 of Pindyck and Rubinfeld


Bias:*

Assume we have lnY = β1 + β2X2 + ε, with εi Normal with mean zero and variance σ2, and the other assumptions of the linear regression model.

Then Y = exp[β1 + β2X2 + ε] = exp[β1 + β2X2]eε. Thus the errors are multiplicative rather than

additive. eε is Lognormal with parameters µ = 0 and σ. E[eε] = exp[σ2/2] > 1.

ln^Y is an unbiased estimator of lnY. ⇔ E[ln

^Y] = E[lnY].

However, this does not imply that E[^Y] = E[Y].

^Y is not an unbiased estimator of Y.

Unbiasedness is not necessarily preserved under change of variables.


Problems:

22 .1 (3 points) Fit the model Y = a bt (error), to the following data.Time: 1 2 3 4 5Average Claim Cost: 117 132 136 149 151What is the fitted value of b?A. 1.065 B. 1.070 C. 1.075 D. 1.080 E. 1.085

22.2 (4 points) For homeowner’s insurance, you are given the following series of average written premium per home insured at current rate level:1991 1156.831992 1152.341993 1153.641994 1150.221995 1144.521996 1150.111997 1164.211998 1178.571999 1193.75Fit an exponential regression. Use this fitted model to predict the average written premium per home insured at current rate level for the year 2002.A. 1170 B. 1180 C. 1190 D. 1200 E. 1210

22.3 (10 points) Fit the model Y = β1 + β2X2 + β3X3 + β4X2X3 + ε to 15 observations:X2 X3 Y

29 12 2841 21 8 1876 62 10 2934 18 10 155240 11 3065 50 11 367065 5 2005 44 8 321517 8 1930 70 6 2010 20 9 3111 29 9 288215 5 168314 7 181733 12 4066 Test whether X2 and X3 interact.(You may use a computer to help with the computations.You may check your work using a regression package.)


Use the following information for the next 5 questions:The model Y = β1 + β2X + β3X2 + ε is fit to the following data:X: 11.7 25.3 90.2 213.0 10.2 17.6 32.6 81.3 141.5 285.7Y: 15.3 9.3 6.5 6.0 15.7 10.0 8.6 6.4 5.6 6.0

22.4 (2 points) What is ^β1?

(A) Less than 15(B) At least 15, but less than 16(C) At least 16, but less than 17(D) At least 17, but less than 18(E) At least 18


(A) Less than -2(B) At least -2, but less than -1(C) At least -1, but less than 0(D) At least 0, but less than 1(E) At least 1


(A) Less than .0001(B) At least .0001, but less than .0002(C) At least .0002, but less than .0003(D) At least .0003, but less than .0004(E) At least .0004

22.7 (3 points) What is the estimated variance of this regression?(A) Less than 3.0(B) At least 3.0, but less than 3.5(C) At least 3.5, but less than 4.0(D) At least 4.0, but less than 4.5(E) At least 4.5

22.8 (3 points) Test the hypothesis that β3 = 0.

A. Reject H0 at 1%.B. Do not reject H0 at 1%. Reject H0 at 2%.C. Do not reject H0 at 2%. Reject H0 at 5%.D. Do not reject H0 at 5%. Reject H0 at 10%.E. Do not reject H0 at 10%.


22 .9 (2 points) A regression model Y = β1 + β2X2 + β3X3 + β4X2X3 + ε, where Y is the auction price of an antique, X2 is the age of an antique, and X3 is the number of bidders on the antique,

has been fit, with: ^β1 = 300,

^β2 = 0.9,

^β3 = -90,

^β4 = 1.3.

What is the expected change in auction price for an increase of 2 in the number of bidders?

22.10 (4 points) You are given the following data on 6 planets. For each planet you are given X, how far it is from the sun (in astronomical units), and Y, how many days it takes to go around the sun once.X: 0.39 0.72 1.00 1.52 5.20 9.54Y: 88 225 365 687 4333 10,759Fit via least squares the model Y = aXb. Estimate how many days it would take a planet at a distance of 19.2 to go around the sun.(A) 23,000 (B) 25,000 (C) 27,000 (D) 29,000 (E) 31,000

22.11 (5 points) You are given the following 12 values of a consumer price index:Third Quarter 2002 122.0Fourth Quarter 2002 123.9First Quarter 2003 124.9Second Quarter 2003 127.0Third Quarter 2003 131.5Fourth Quarter 2003 132.6First Quarter 2004 135.3Second Quarter 2004 137.6Third Quarter 2004 139.7Fourth Quarter 2004 141.2First Quarter 2005 143.1Second Quarter 2005 148.0Fit via least squares the model, ln(Y) = a + b t. Use the fitted model to predict the value of the consumer price index in the Third Quarter of 2006.A. 154 B. 156 C. 158 D. 160 E. 162

22 .1 2 (2 points) You use the method of least squares to fit the model Yi = α + β√Xi to the following data:Xi 1 3 4 4 7Yi 0 1 3 5 6

Determine the least squares estimate ^β.

(A) 3.1 (B) 3.3 (C) 3.5 (D) 3.7 (E) 3.9


22.13 (5 points) Use the following 4 observations:X: -1 1 3 5Y: 3 4 7 6Fit a least squares quadratic, Y = β1 + β2X + β3X2, and use it to estimate y for x = 6. (A) 6.0 (B) 6.2 (C) 6.4 (D) 6.6 (E) 6.8

22.14 (5 points) Polynomials are fit via least squares to the following mortality data:Age Mortality per 1000 Age Mortality per 100027 3.89 67 40.7432 2.45 72 59.5537 2.49 77 86.0242 3.81 82 145.4247 6.34 87 172.1552 10.49 92 230.8057 15.94 97 271.6062 26.91The results are as follows:Straight line: y = -150.442 + 3.58626 x.2nd degree: y = 166.68 - 8.05695 x + 0.0938969 x2.3rd degree: y = 2.70544 + 1.34511 x - 0.0695867 x2 + 0.000878944 x3.4th degree: y = -229.788 + 19.3897 x - 0.559033 x2 + 0.00642612 x3 - 0.0000223677 x4.5th degree: y = 509.078 - 52.7309 x + 2.10662 x2 - 0.0404085 x3 + 0.000370816 x4 - .00000126833 x5.

Fitted Polynomial Error Sum of SquaresStraight line 24005.92nd degree 1273.73rd degree 582.34th degree 433.75th degree 282.9

Based on a significance level of 5%, which polynomial should one use?A. Straight line B. 2nd degree C. 3rd degree D. 4th degree E. 5th degree

22.15 (4 points) In the previous question, compute and compare both R2 and 2

R for the various models, including a 6th and 7th degree polynomial. The ESS for the sixth degree polynomial is 278.2.The ESS for the seventh degree polynomial is 271.2.

22.16 (2, 5/83, Q. 9) (1.5 points) For the regression model Yi = b ln(xi) + εi, i = 1, 2, assume

that x1 = e and x2 = e2 and that ε1 and ε2 are independent random variables with mean zero

and unknown variance σ2. What is the least squares estimator for b based on (x1, Y1) and (x2, Y2)? A. (Y1 + 2Y2)/3 B. (Y1 + Y2)/3 C. (Y1 + 2Y2)/5 D. (exp[Y1] + exp[Y2])/3 E. (exp[Y1] + exp[2Y2])/3


22.17 (165, 11/86, Q.11) (1.8 points) You are given: x x2 ln(ux) x ln(ux) 38 1444 -6.23 -236.74 39 1521 -6.04 -235.56 40 1600 -6.00 -240.0041 1681 -5.96 -244.36 42 1764 -5.77 -242.34

200 8010 -30.00 -1199.00 where the ux are the observed values of the force of mortality µx.

Based on a Gompertz form, S(x) = exp[-m(cx - 1)], fitted by linear regression, estimate ln(µ41).(A) -6.115 (B) -6.100 (C) -6.000 (D) -5.900 (E) -5.885

22.18 (165, 11/87, Q.8) (2.1 points) You are fitting the model µ[x] + r = B dr Cx+r to observed forces of mortality. The values u[x]+r (x = 21, 22; r = 0, 1, 2, 3) are the logs of the observed forces of mortality. Define:

3 22

SS = Σ Σ (u[x]+r - λ1 - λ2r - λ3x)2. r =0 x =21

One of the normal equations to solve for the least-squares estimates of λ1, λ2 and λ3 is: 3 22

fλ1 + gλ2 + hλ3 = Σ Σ u[x]+r . r =0 x =21

Determine f + g + h. (A) 158 (B) 175 (C) 192 (D) 316 (E) 384

22.19 (2, 5/88, Q. 15) (1.5 points) Let (x1, y1), (x2, y2),. . . . , (xn, yn), be pairs of

observations. The curve y = θex is to be fitted to this data set. What is the least squares estimate for θ?

A. yii=1

n∑∑ / exp[x ]i

i=1

n∑∑

B. y exp[x ]ii=1

ni∑∑ / exp[2x ]i

i=1

n∑∑

C. y exp[x ]ii=1

ni∑∑ / exp[x ]i

i=1

n∑∑

D. yii=1

n∑∑ / exp[2x ]i

i=1

n∑∑

E. (y / exp[x ] )i ii=1

n∑∑


22.20 (165, 11/89, Q.11) (1.7 points) You wish to fit the Gompertz form, µx = Bcx, using linear regression. You are given: x ln(µx) 1 -3.10 2 -3.07 3 -3.06 Determine ln(c). (A) 0.010 (B) 0.015 (C) 0.020 (D) 0.025 (E) 0.030

22.21 (165, 11/90, Q.19) (1.9 points) You are given the following initial estimates, u[x]+t,

and graduated values, v[x]+t, which were obtained by fitting the form A + Bx + Ct2 by least squares:

Initial Estimates Graduated Values t t x 0 1 2 x 0 1 2 0 u[0] 1.5 1.8 0 1.4 v[0]+1 2.0 1 1.5 1.8 2.3 1 v[1] 1.8 v[1]+22 1.8 2.3 2.4 2 v[2] v[2]+1 v[2]+2

Determine u[0]. (A) 0.6 (B) 1.4 (C) 1.5 (D) 2.3 (E) 2.4

22.22 (165, 5/91, Q.16) (1.9 points) A Gompertz form, S(x) = exp[-m(cx - 1)], has been fitted by linear regression. You are given:

x ln(ux) x ln(ux) 1 - 8.37 - 8.37 2 - 8.29 - 16.58 3 - 8.21 - 24.63 4 - 8.12 - 32.48 5 - 8.01 - 40.05 Total -41.00 -122.11

where the ux are the observed values of the force of mortality µx.

Determine the estimate of In µ4. (A) -8.15 (B) -8.13 (C) -8.11 (D) -8.09 (E) -8.07


22.23 (2, 5/92, Q.7) (1.7 points) Let (x1, y1), . . . , (xn, yn) be n pairs of observations.

The curve θ ln(x) is to be fitted to this data set. What is the least squares estimate for θ?

A. y ln(x )i ii=1

n∑∑ / 2 ln(x )i

i=1

n∑∑

B. y ln(x )i ii=1

n∑∑ / ln(x )i

2

i=1

n∑∑

C. y ln(x )i ii=1

n∑∑ / ln(x )i

i=1

n∑∑

D. yi i=1

n∑∑ / ln(x )i

2

i=1

n∑∑

E. yi i=1

n∑∑ / ln(x )i

i=1

n∑∑

22.24 (2, 2/96, Q.9) (1.7 points) You are given the model E(Yi) = θ(xi + xi2) and the following data:

i xi yi 1 1 4 2 2 8 3 3 14

Calculate the least squares estimate of θ. A. 13/92 B. 28/23 C. 13/10 D. 9/2 E. 56/5

22.25 (165, 5/96, Q.17) (1.9 points) You are given:x ln ux

5 -3.910 -3.315 -2.8

where ux is the observed force of mortally.

The force of mortality is fit by linear regression to the form µx = kxb.

Determine the estimate of ln µ15.(A) - 2.88 (B) -2.84 (C) -2.80 (D) -2.76 (E) - 2.72


22.26 (165, 11/97, Q.17) (1.9 points) You are fitting the model µ[x] + r = B dr Cx+r to observed forces of mortality. The values u[x]+r (x = 11, 12; r = 0, 1, 2, 3) are the logs of the observed forces of mortality. Define:

3 12

SS = Σ Σ (u[x]+r - λ1 - λ2r - λ3x)2. r =0 x =11

One of the normal equations to solve for the least-squares estimates of λ1, λ2 and λ3 is: 3 12

fλ1 + gλ2 + hλ3 = Σ Σ u[x]+r . r =0 x =11

Determine f + g + h. (A) 43 (B) 89 (C) 112 (D) 124 (E) 140

22.27 (Course 120 Sample Exam #1, Q.1) (2 points) You fit the regression model Yi = βXi2 + εi to n observations (Xi, Yi).

Which of the following is the correct expression for the least squares estimate of β?(A) ΣYiXi2 /ΣXi4

(B) ΣYi ΣXi2 /ΣXi4

(C) Y ΣXi2 /ΣXi4

(D) ΣYi /ΣXi2

(E) ΣYi Xi/ΣXi2

22.28 (Course 120 Sample Exam #3, Q.1) (2 points) You use the method of least

squares to fit the model Yi = α + βXi2 + εi to the following data:Xi 0 0 1 2 2Yi 2 4 8 16 20

Determine the least squares estimate ^β.

(A) 1.0 (B) 2.4 (C) 3.7 (D) 4.6 (E) 5.5


22.29 (CAS3, 11/05, Q.9) (2.5 points)The following information is known about average claim sizes:

Year Average Claim Size 1 $1,020

2 1,120 3 1,130 4 1,210 5 1,280

Average claim sizes, Y, in year X are modeled by: Y = α eβX

Using linear regression to estimate α and β, calculate the predicted average claim size in year 6. A. Less than $1,335 B. At least $1,335, but less than $1,340 C. At least $1,340, but less than $1,345 D. At least $1,345, but less than $1,350 E. At least $1,350


Section 23, Dummy Variables189

A dummy variable is one that is discrete rather than continuous. Most commonly a dummy variable takes on only the values 0 or 1.

For example, Bergen Insurance has a special set of insurance agents, who are members of its “Gold Circle” program. Bergen Insurance believes the loss ratios from business written through these Gold Circle agents is better than that from its other agents. One can fit a regression using a dummy variable to test this hypothesis.

Let Xi = 1 if an agent is a Gold Circle.Let Xi = 0 if an agent is not Gold Circle.Then X is an example of a dummy variable.Let Yi = the loss ratio for agent i.

A simple model is: Yi = β1 + β2Xi + εi.

Then for the Gold Circle Agents, Xi = 1 and the expected loss ratio is β1 + β2.

For the other agents, Xi = 0 and the expected loss ratio is β1.

Thus β2 measures the difference in expected loss ratio between Gold Circle Agents and other agents.

Exercise: The above regression model is fit to data for 22 Gold Circle Agents and 40 other

agents. The results are: ^β1 = 73.1,

^β2 = -7.2, s

1β = 3.4, s

2β = 2.9.

Test H0, the hypothesis that β2 = 0. The alternative is β2 ≠ 0, Gold Circle agents are different.

[Solution: t = ^β2 /s

2β = -7.2/2.9 = -2.483.

To test H0 one preforms a two-sided t-test with 62 - 2 = 60 degrees of freedom.Since 2.390 < 2.483 < 2.660, reject H0 at 2% and do not reject H0 at 1%.]

Gold Circle agents are better than others ⇔ their expected loss ratio is lower: β2 < 0.

Let H0 be the hypothesis that β2 ≥ 0. Test H0 versus the alternative that β2 < 0.

To test H0, one preforms a one-sided t-test with 60 degrees of freedom, t = -2.483.Since 2.390 < 2.483 < 2.660, reject H0 at 2%/2 = 1% and do not reject H0 at 1%/2 = 0.5%.

A somewhat more complex model would take into account others things about each agent. For example, we might take into account how many years the agent has been writing insurance for Bergen Insurance.

Such a model could be: Yi = β1 + β2X2i + β3X3i + εi, where X2 is the Gold Circle Dummy,

X3 is the years with Bergen Insurance, and Y is the loss ratio.

189 See Section 5.2 and Appendix 5.1 of Pindyck and Rubinfeld.


Exercise: The above regression model is fit to data for 22 Gold Circle Agents and 40 other

agents. The results are: ^β1 = 80.6,

^β2 = -3.7,

^β3 = -.53, s

1β = 2.6, s

2β = 2.2, s3β = .17.

Let H0 be the hypothesis that β2 ≥ 0. Test H0 versus the alternative that β2 < 0.

[Solution: t = ^β2 /s

2β = -3.7/2.2 = -1.682.

To test H0 one preforms a one-sided t-test with 62 - 2 = 60 degrees of freedom.Since 1.671 < 1.682 < 2.000, reject H0 at 10%/2 = 5% and do not reject H0 at 5%/2 = 2.5%.]

One can use more than one dummy variable in a model. For example, let us assume you divide the towns (and cities) in a state into three categories: Urban, Suburban, and Rural. You wish to see whether your loss ratios are significantly different between the different types of towns of the state.190 Let Yi = the loss ratio for town i. Let X2i = 1 if the town is urban and 0 otherwise.Let X3i = 1 if the town is suburban and 0 otherwise.

Then we have the following table:191 Type X2 X3Urban 1 0Suburban 1 0Rural 0 0

Exercise: You fit the model Yi = β1 + β2X2i + β3X3i + εi, and ^β1 = 77.2,

^β2 = -3.7,

^β3 = 1.4.

Assuming the insurer does not alter its practices, what are the expected loss ratios for the three categories? [Solution: For Rural, X2 = 0 and X3 = 0, so the mean is β1 = 77.2.

For Urban, X2 = 1 and X3 = 0, so the mean is β1 + β2 = 77.2 - 3.7 = 73.5.

For Suburban, X2 = 0 and X3 = 1, so the mean is β1 + β3 = 77.2 + 1.4 = 78.6.]

One can do a joint test of the significance of all of the fitted coefficients for the dummy variables. One uses the F-Test as described in a previous section.

For example, let assume that the above regression is fit to 25 towns, with error sum of squares of 412 and total sum of squares of 577. Then the F-Statistic to test whether β2 = β3 = 0 is: (ESSR - ESSUR)/q / ESSUR/(N - k) = RSS/(k-1)/ESS/(N - k) =(577 - 412)/2/412/(25 - 3) = 4.41, with 2 and 22 degrees of freedom. Since 3.44 < 4.41 < 5.72, we reject the hypothesis that β2 = β3 = 0 at the 5% level and do not reject at the 1% significance level.

190 For example you might charge more in an urban area and expect more losses. However, the expected loss ratio might be the same in an urban area as for a rural area.191 There are equally good equivalent ways to use dummy variables to deal with this situation. For example, one could instead have a dummy variable for rural and one for suburban.


Dummy variables can also be useful when there is a one time effect. For example, assume claim frequencies have been changing at a constant rate. Then a good model would be:Y = α + βt + ε.

For example, if α = .17 and β = .001, then the graph of frequency versus time is:

2 4 6 8 10 12 14

0.172

0.174

0.176

0.178

0.18

0.182

0.184

Instead, assume that at time 5 there is a one time change in the level of claim frequency.192 Also assume that that the expected rate of change in frequency is the same, both before and after the change. Then a good model would be: Y = α1 + α2D + βt + ε, where D is 0 for t < 5 and D is 1 for t ≥ 5.

For example, if α1 = .17, α2 = -.02, and β = .001, then the graph of frequency versus time is:

0 2 4 6 8 10 12 14

0.16

0.165

0.17

0.175

192 This could be a change in a liability law, a change in the definition of a “claim” in the insurers data base, passage of a seat belt law, etc.


If instead we assume an event that produced at t = 5 a change in the expected rate of change in frequency, then a model would be: Y = α1 + β1t + β2(t-5)D + ε, where D is 0 for t < 5 and D is 1 for t ≥ 5.

For example, if α1 = .17, β1 = .001, and β2 = -.0005, then the graph of frequency versus time is:193

2 4 6 8 10 12 14

0.172

0.174

0.176

0.178

0.18

We can combine these two situations; if instead we assume an event that produced at t = 5 both a one time change in the level and a change in the expected rate of change in frequency, then a good model would be: Y = α1 + α2D + β1t + β2tD + ε, where D is 0 for t < 5 and D is 1 for t ≥ 5.

For example, if α1 = .17, α2 = -.01, β1 = .001, and β2 = -.0005, then the graph of frequency versus time is:194

2 4 6 8 10 12 14

0.164

0.166

0.168

0.172

0.174

193 Note that this graph is continuous. this an example of piecewise linear regression, to be discussed in a subsequent section.194 This is an example of the “switching regression method”, discussed in Section 5.4.1 of Pindyck and Rubinfeld.


One can apply the same ideas to exponential regressions.

Exercise: For the dependent variable, Y, you calculate the average claim costs on closed claims by year during 1990-99. You define the variable X as the year.You also define a variable D as:

0 for years 1996 and priorD =

1 for years 1997 and laterWhat is the assumption behind each of the following models:195 (A) Y = α1D β1X ε (B) Y = α1 α2D β1X ε (C) Y = α1 β1X β2XD ε

(D) Y = α1 α2D β1X β2XD ε (E) Y = α1 α2D X 1β ε

[Solution: (A) ln Y = Dln(α1) + Xln(β1) + ε: constant expected percent rate of inflation, over the whole period, one time change in average cost between 1996 and 1997. The problem with this model is that the fitted cost in 1990 is: β11990, unlikely to be meaningful.

(B) ln Y = ln(α1) + Dln(α2) + Xln(β1) + ε: constant expected percent rate of inflation, over the whole period, one time change in average cost between 1996 and 1997.(C) ln Y = ln(α1) + Xln(β1) + XDln(β2) + ε: constant expected percent rate of inflation over the period 1990 to 1996 and another (possibly) different constant expected percent rate of inflation over the period 1997 to 1999. (D) ln Y = ln(α1) + Dln(α2) + Xln(β1) + XDln(β2) + ε: combines the assumptions of B and C.

(E) ln Y = ln(α1) + Dln(α2) + β1ln(X) + ε: a one time change in average cost between 1996 and 1997, and otherwise expected costs are a power of the year (unlikely to be a useful model).]

195 See 4, 5/10, Q.24.


Problems:

23 .1 (2 points) A tort reform law was passed in a state to be effective on January 1, 1995. For liability insurance, you believe the law immediately reduced claim severity as well as reducing the future rate of inflation. However, you assume there was a constant rate of inflation before the law was effective and another but lower constant rate after the law was effective. You use a multiple regression model to test these assumptions.For the dependent variable, Y, you calculate the average claim costs on liability claims in this state by year during 1991-1998.You define the variable X as the year, with 1 corresponding to 1991.You also define a variable D as:


1 for years 1995 and laterAssume a lognormal error component. Which of the following models would be used to test the assumptions?(A) Y = α1D β1X ε

(B) Y = α1 α2D β1X ε

(C) Y = α1 β1X β2XD ε

(D) Y = α1 α2D β1X β2XD ε(E) None of the above.

Use the following information for the next two questions:A model has been fit of private passenger automobile liability insurance costs for a certain

territory and class: ^

iY = 100 - 2Xi + 10D1i + 4D2i - XiD1i + 3D1i D2i , whereXi = age of the car in years, (0 = new, 1 = one year old, etc.)D1i = 1 if the car is an SUV, and 0 otherwise. D2i = 1 if the car has a cell phone installed, and 0 otherwise.

23.2 (2 points) What is the expected difference in cost between a 7 year old SUV with a cell phone and a 2 year old car that is not an SUV and that has no cell phone?(A) -5 (B) 0 (C) 5 (D) 10 (E) 15

23.3 (2 points) For this class and territory you have the following information:Not SUV and no cell phone: 3000 cars with an average age of 6.7.Not SUV and with cell phone: 4000 cars with an average age of 6.1.SUV and no cell phone: 1000 cars with an average age of 4.9.SUV and with cell phone: 2000 cars with an average age of 4.4.What is the overall average cost per car? (A) 93.0 (B) 93.5 (C) 94.0 (D) 94.5 (E) 95.0


Use the following information for the next 7 questions:The following data has been collected on 100 corporate executives:X2 = years of experience

X3 = years of education

X4 = 1 if male and 0 if female

X5 = number of employees supervised

X6 = corporate assets

Y = ln(annual salary)The following different models have been fit, with the corresponding values of R2:Y = 9.86 + .0436X2 + .0309X3 + .117X4 + .000326X5 + .00239X6 - .000635X22

+ .000302X4X5; R2 = .9401. (All terms included in the regression.)

Y = 10.29 + .0281X3 + .122X4 + .000301X5 + .00155X6 - .000923X22 + .000269X4X5;

R2 = .8524. (No X2 term.)

Y = 10.34 + .0404X2 + .145X4 + .000302X5 + .00260X6 - .000510X22 + .000221X4X5;

R2 = .8685. (No X3 term.)

Y = 9.92 + .0438X2 + .0316X3 + .000121X5 + .00254X6 - .000652X22 + .000571X4X5;

R2 = .9336. (No X4 term.)

Y = 9.96 + .0432X2 + .0306X3 - 0.0101X4 + .00262X6 - .000635X22 + .000626X4X5;

R2 = .9289. (No X5 term.)

Y = 10.26 + .0403X2 + .0317X3 + .138X4 + .000379X5 - .000507X22 + .000266X4X5;

R2 = .9212. (No X6 term.)Y = 10.02 + .0269X2 + .0298X3 + .123X4 + .000326X5 + .00204X6 + .000274X4X5;

R2 = .9264. (No X22 term.)

Y = 9.81 + .0433X2 + .0301X3 + .228X4 + .000543X5 + .00229X6 - .000605X22;

R2 = .9331. (No X4X5 term.)Y = 10.43 + .0306X3 + .00383X4 + .0000552X5 + .00216X6 + .000648X4X5;

R2 = .3605. (No X2 and X22 terms.)

Y = 9.89 + .0410X2 + .0298X3 + .000394X5 + .00301X6 - .000494X22;

R2 = .7695. (No X4 and X4X5 terms.)

Y = 10.04 + .0385X2 + .0230X3 + .186X4 + .00297X6 - .000423X22;

R2 = .8234. (No X5 and X4X5 terms.)

23.4 (1 point) Determine the value of the F statistic used to test whether gender (male/female) is significant in determining salary.

23.5 (1 point) Determine the value of the F statistic used to test whether years of education is significant in determining salary.


23.6 (1 point) Determine the value of the F statistic used to test whether years of experience is significant in determining salary.

23.7 (1 point) Determine the value of the F statistic used to test whether corporate assets is significant in determining salary.

23.8 (1 point) Determine the value of the F statistic used to test whether number of employees supervised is significant in determining salary.

23.9 (1 point) Determine the value of the F statistic used to test whether the interactive term between gender and number of employees supervised is significant in determining salary.

23.10 (1 point) Determine the value of the F statistic used to test whether the term involving the square of the years of experience is significant in determining salary.

23.11 (2 points) Some people believe that soda machines in schools leads to an increase in obesity among the students.You have access to information on 283 public high schools across the state, including the average weight of the students in each school.The null hypothesis is that the average weight is the same regardless of whether a school has soda machines, while the alternative hypothesis is that the average weight is higher for schools with soda machines. Briefly discuss how you would set up a regression model to test this hypothesis.

23.12 (5 points) You are considering linear regression models of the annual amount spent on clothing by single persons of ages 26 to 30 living in Manhattan.Let X = annual pretax income of the individual.Let D = 1 if the individual is female (and zero if male).Y = annual amount spent on clothing by the individual.List five common but different linear regression models using an intercept, X, and possibly D. Briefly explain the assumptions behind each.

23.13 (2 points) Let Y be the size of loss. Let X2 = 1 if the loss occurred in the Spring and 0 otherwise. Let X3 = 1 if the loss occurred in the Summer and 0 otherwise. Let X4 = 1 if the loss occurred in the Fall and 0 otherwise.

You fit the model Yi = β1 + β2X2i + β3X3i + β4X4i + εi to 1000 losses. The error sum of squares (ESS) is 1,143,071.The total sum of squares (TSS) is 1,155,820.Determine the value of the F statistic used to test whether season is significant in determining size of loss.A. Less than 2.5B. At least 2.5, but less than 3.0C. At least 3.0, but less than 3.5D. At least 3.5, but less than 4.0E. 4.0 or more


23 .1 4 (Course 120 Sample Exam #2, Q.9) (2 points) An insurance company uses a model to predict Yi = daily phone sales for personnel in New York, Chicago, Portland. The model is:^

iY = 12 + 3Xi + (2)(Xi - 5)D1i - (2)(Xi - 2)D2i + (Xi -5)D3i,where Xi = salesperson’s years of experience,D1i = 1 if Xi ≥ 5, and 0 otherwise, D2i = 1 if New York, and 0 otherwise, D3i = 1 if Chicago, and 0 otherwise.Calculate the predicted phone sales for a salesperson with 7 years of experience located in Chicago.(A) 32 (B) 33 (C) 35 (D) 37 (E) 39

23.15 (Course 120 Sample Exam #3, Q.7) (2 points) The following model is used to estimate the amount of fire damage Y (in thousands):^

iY = 8 + 5X1i + 2(X1i - 4)X2i + 9X3i - 2X1iX3iwhere X1i = the distance from the nearest fire station, in kilometers,X2i = 1 if X1i ≥ 4 and 0 otherwise,X3i = 1 if the city is A and 0 if the city is B.For the fires that took place at least 4 kilometers from the fire station, determine the distance from the nearest fire station for which the average fire damage for city A and city B is the same. (A) 4.25 (B) 4.50 (C) 8.50 (D) 28.00 (E) 29.00

23.16 (4, 5/01, Q.5) (2.5 Points) A professor ran an experiment in three sections of a psychology course to show that the more digits in a number, the more difficult it is to remember. The following variables were used in a multiple regression:X2 = number of digits in the numberX3 = 1 if student was in section 1, 0 otherwiseX4 = 1 if student was in section 2, 0 otherwiseY = percentage of students correctly remembering the numberYou are given:(i) A total of 42 students participated in the study.(ii) The regression equation Y = β1 + β2X2 + β3X22 + β4X3 + β5X4 + ε

was fit to the data and resulted in R2 = 0.940.(iii) A second regression equation Y = γ1 + γ2X2 + γ3X22 + ε

was fit to the data and resulted in R2 = 0.915.Determine the value of the F statistic used to test whether class section is a significantvariable.(A) 5.4 (B) 7.3 (C) 7.7 (D) 7.9 (E) 8.3


23.17 (4, 5/01, Q.24) (2.5 points) Your claims manager has asserted that a procedural change in the claims department implemented on January 1, 1997 immediately reduced claim severity by 20 percent. You use a multiple regression model to test this assertion.For the dependent variable, Y, you calculate the average claim costs on closed claims by year during 1990-99. You define the variable X as the year.You also define a variable D as:


1 for years 1997 and laterAssuming a lognormal error component and constant inflation over the entire period, which ofthe following models would be used to test the assertion?(A) Y = α1D β1X ε

(B) Y = α1 α2D β1X ε

(C) Y = α1 β1X β2XD ε

(D) Y = α1 α2D β1X β2XD ε

(E) Y = α1 α2D X 1β ε

23.18 (4, 11/02, Q.20) (2.5 points) You study the impact of education and number of children on the wages of working women using the following model:Y = a + b1E + b2F + c1G + c2H + εwhere Y = ln(wages)

1 if the woman has not completed high schoolE = 0 if the woman has completed high school -1 if the woman has post - secondary education

1 if the woman has completed high schoolF = 0 if the woman has not completed high school -1 if the woman has post - secondary education

1 if the woman has no childrenG = 0 if the woman has 1 or 2 children -1 if the woman has more than 2 children

1 if the woman has 1 or 2 childrenH = 0 if the woman has no children -1 if the woman has more than 2 children Determine the expected difference between ln(wages) of a working woman who has post-secondary education and more than 2 children and ln(wages) of the average for all working women.(A) a - b1 - b2(B) b1 + b2(C) -b1 - b2(D) a - b1 - b2 + c2(E) -b1 - b2 - c1 - c2


23.19 (4, 11/03, Q.5) (2.5 points)For the model Yi = α + βXi + εi, where i = 1, 2,...,10, you are given:(i) Xi = 1, if the ith individual belongs to a specified group

0, otherwise(ii) 40 percent of the individuals belong to the specified group.

(iii) The least squares estimate of β is ^β = 4.

(iv) Σ(Yi - α - ^βXi)2 = 92.

Calculate the t statistic for testing H0: β = 0.(A) 0.9 (B) 1.2 (C) 1.5 (D) 1.8 (E) 2.1

23.20 (4, 11/03, Q.9) (2.5 points) You are given:(i) Ytij is the loss for the jth insured in the ith group in Year t.

(ii) tiY is the mean loss in the ith group in Year t.(iii) Xji = 0, if the jth insured is in the first group (i = 1)

1, if the jth insured is in the second group (i = 2).(iv) Y2ij = δ + φY1ij + θXij + εij, where i = 1, 2 and j = 1, 2,..., n.

(v) 21Y = 30, 22Y = 37, 11Y = 40, 12Y = 41.

(vi) φ = 0.75.

Determine the least-squares estimate of θ.(A) 5.25 (B) 5.50 (C) 5.75 (D) 6.00 (E) 6.25


Section 24, Piecewise Linear Regression196

Piecewise Linear Regression uses a model made up a series of straight line segments, with the entire model continuous.

For example, assume some event has produced at time = 5 a change in the expected rate of change in claim frequency, then a model would be: Y = β1 + β2t + β3(t - 5)D + ε, where D is 0 for t < 5 and D is 1 for t ≥ 5.

For example, if β1 = .17, β2 = .001, and β3 = -.0005, then the graph of frequency versus time is:

2 4 6 8 10 12 14

0.172

0.174

0.176

0.178

0.18

Note that the graph is continuous; the term β3(t-5)D is zero at time = 5,While the slope before t = 5 is .001, after t = 5 the slope is: .001 - .0005 = .0005.

In general, a piecewise linear regression with one structural break at time s could be written:Y = β1 + β2t + β3(t - s)D + ε, where D is 0 for t < s and D is 1 for t ≥ s.

A piecewise linear regression with two structural breaks at times s1 and s2 could be written:

Y = β1 + β2t + β3(t - s1)D1 + β4(t - s2)D2 + ε, where D1 is 0 for t < s1 and D1 is 1 for t ≥ s1, and D2 is 0 for t < s2 and D2 is 1 for t ≥ s2.

One can apply the same idea to exponential regressions, with ln(Y) piecewise linear.For example if claim severity were 1000 at time 0, and increasing at 5% per year before time = 4 and at 3% per year after time = 4, then an appropriate model would be:Y = 1000(1.05t)(1.03D(t-4))ε, where D = 0 for t < 4 and D = 1 for t ≥ 4.

196 See Section 5.4 of Pindyck and Rubinfeld.


Checking Whether the Slope Changes:

If we have a two piece linear model with one structural break, then the slopes of the two line segments are (usually) not equal. One question of interest is whether the two slopes are significantly different.

For example, assume that 25 data points have been fit to the model: Y = β1 + β2t + β3(t-10)D + ε, where D is 0 for t < 10 and D is 1 for t ≥ 10. ^β1 = 50,

^β2= 3,

^β3 = 2.

Then the slope before time 10 is β2 and after time 10 is β2 + β3.

The two slopes would be the same if β3 = 0. Thus to test whether the slopes on the two

segments are different, we apply a t-test to test the hypothesis β3 = 0.

Exercise: In the above example, if s3β = .7, test whether the two slopes are different.

[Solution: H0 is β3 = 0. t = ^β3/s

3β = 2/.7 = 2.857, with 25 - 3 = 22 degrees of freedom.

Since 2.819 < 2.857, we reject H0 at 1%. At the 1% level the two slopes are significantly different.]

Splines:*

Spline Functions are a generalization of piecewise linear models. One still requires continuity, but no longer requires that each segment be a straight line.197 There are usually smoothness requirements, such as equality of first derivatives, or first and second derivatives, at the points of joining.

197 See for example, Section 15.3-15.6 of Loss Models.


Problems:

24 .1 (2 points) There are 30 observations of average claim costs over time.You have fit the regression model:Y = β1(β2t)(β3D(t-6))ε, where D is 0 for t < 6 and D is 1 for t ≥ 6.^β1 = 450,

^β2= 1.07,

^β3 = 1.02. s

1β = 3.9, s

2β = .016, s3β = .011.

At what level are the rates of inflation significantly different before and after time 6?A. 10% B. 5% C. 2% D. 1% E. None of A, B, C, or D

Use the following 15 observations for the next 2 questions:X: 1150 840 900 800 1070 1220 980 1300 520 670 1420 850 1000 910 1230Y: 1.29 2.20 2.26 2.38 1.77 1.25 1.87 0.71 2.90 2.63 0.55 2.31 1.90 2.15 1.20

24.2 (7 points) Fit a piecewise linear regression model with a structural break at 1000.

24.3 (3 points) Test whether the effect of X on Y is significantly different before and after 1000.

Use the following information for the next two questions:Let X be the age of driver and Y be the claim frequency for automobile insurance.You are to fit a piecewise linear regression model to a large set of observations.You assume the slope changes at age 27 and at age 60.For your set of observations let: a = ΣXi, for Xi < 27. b = ΣXi2, for Xi < 27. c = ΣYi, for Xi < 27. d = ΣXiYi, for Xi < 27.

e = ΣXi, for 27 ≤ Xi < 60. f = ΣXi2, for 27 ≤ Xi < 60.

g = ΣYi, for 27 ≤ Xi < 60. h = ΣXiYi, for 27 ≤ Xi < 60.

j = ΣXi, for 60 ≤ Xi. k = ΣXi2, for 60 ≤ Xi. l = ΣYi, for 60 ≤ Xi. m = ΣXiYi, for 60 ≤ Xi. s = number of observations for Xi < 27. t = number of observations for 27 ≤ Xi < 60. u = number of observations for 60 ≤ Xi.

24.4 (2 points) Write out the form of the model.

24.5 (8 points) Derive the set of linear equations to be solved in order to fit the model, in terms of the given observed quantities.


24.6 (Course 120 Sample Exam #1, Q.13) (2 points) You are given the following model.

E(Ct) = β1 + β2Yt 0 < t ≤ t0(β1 - β3Yt0) + (β2 + β3)Yt t0 ≤ t ≤ t1

(β1 - β3Yt0 - β4Yt1) + (β2 + β3 + β4)Yt t > t1Which of the following is true?(A) This model is not considered a spline function.(B) This model is discontinuous, with two structural breaks.(C) This model is continuous, with one structural break.(D) Dummy variables are used to account for shifts in the intercept.(E) This model is continuous, with two structural breaks.


Mahler’s Guide to

Regression

Sections 25-28:

25 Weighted Regression 26 Heteroscedasticity 27 Tests for Heteroscedasticity 28 Correcting for Heteroscedasticity


prepared by


Study Aid F06-Reg-G


Section 25, Weighted Regression

In a weighted regression, we weight some of the observations more heavily than others.

Sometimes weighted regressions are used when one has some reason to count certain observations more heavily. For example one might weight more recent observations more heavily. In an insurance study, one might weight more heavily observations from states, insurers, classifications, employers, policies, etc., that are more similar to whatever one was studying.

Weighted regressions come up in the study of credibility. The line formed by the Buhlmann Credibility estimates is the weighted least squares line to the Bayesian estimates, with the a priori probability of each outcome acting as the weights.198

However, as will be discussed in a subsequent section, the chief use of weighted regressions on this exam is to correct for the presence of heteroscedasticity.

Model with No Intercept:

Let us assume we have three observations: X = 1 and Y = 2, X = 2 and Y = 6, X = 5 and Y = 11. We can fit the model Yi = βXi + εi, by ordinary least squares regression.

^β = ΣXiYi / ΣXi2 = 69/30 = 2.3.

What if we wish to weight some of these three observations more heavily than others.For example, assume we wish to weight the second observation twice as much as the first, and the third observation three times as much as the first.

In other words, let us minimize the weighted squared error:

(Y1^ - 2)2 + 2(Y2

^ - 6)2 + 3(Y3^ - 11)2 = (β - 2)2 + 2(2β - 6)2 + 3(5β - 11)2.

Setting the partial derivative of the weighted squared error with respect to β equal to zero:0 = (2)(β - 2) + (2)(2)(2β - 6) + (5)(3)(5β - 11).

⇒ ^β = 2 + (2)(2)(6) + (5)(3)(11)/1 + (2)(2)(2) + (5)(3)(5) = 191/84 = 2.274.

In general, one can perform a weighted regression by minimizing the weighted

sum of squared errors: Σwi(Yi - ^

iY )2 = Σwi(Yi - βXi)2.

Setting the partial derivative with respect to β equal to zero:

0 = -2βΣXiwi(Yi - βXi) ⇒ ^β = ΣwiXiYi / ΣwiXi2.

198 Buhlmann Credibility is covered on joint Exam 4/C.

HCMSA-F06-Reg-G, Mahler’s Guide to Regression, 7/12/06, Page 241

For the model with no intercept, for a weighted regression with weights wi:

^ββ = ΣΣΣΣwiXiYi / ΣΣΣΣwiXi2....

Note that when all the weights are equal, the weighted regression reduces to the unweighted

case, ^β = ΣXiYi / ΣXi2.

Exercise: Apply the above formulas to the previous example of a weighted regression.[Solution: w = (1, 2, 3). X = (1, 2, 5). Y= (2, 6, 11).ΣwiXiYi = (1)(1)(2) + (2)(2)(6) + (3)(5)(11) = 191.

ΣwiXi2 = (1)(12) + (2)(22) + (3)(52) = 84.

^β = ΣwiXiYi / ΣwiXi2 = 191/84 = 2.274.Comment: One can take w = (1/6, 2/6, 3/6) if one prefers, without affecting the result.]

Exercise: You have 6 observations: 1, 2, 2, 6, 2, 6, 5, 11, 5, 11, 5, 11.Fit the model Yi = βXi + εi, by ordinary least squares regression.

[Solution: ΣXi2 = 12 + (2)(22) + (3)(52) = 84.ΣXiYi = (1)(2) + (2)(2)(6) + (3)(5)(11) = 191.

^β = ΣXiYi / ΣXi2 = 191/84 = 2.274.]

This is the same result as for the weighted regression. In general, when the weights are integer, or proportional to integers, one can pretend one had different numbers of repeated copies of the actual observations.

Two Variable Model:

Let’s apply these ideas to the two variable model: Yi = α + βXi + εi. The weighted sum of squared errors is:

Σwi(Yi - ^

iY )2 = Σwi(Yi - α - βXi)2.

We minimize this sum of squared errors by setting the partial derivatives with respect to α and β equal to zero.

0 = Σwi(Yi - α - βXi). ⇒ αΣwi = ΣwiYi - βΣwi.

⇒ α = ΣwiYi - ^βΣwiXi/Σwi, where we have not necessarily assumed Σwi = 1.

0 = Σwi(Yi - α + βXi)Xi. ⇒ 0 = ΣwiXiYi - αΣ wiXi - ^βΣwiXi2.


Substituting α into the second equation, we can solve for ^β:

^β = ΣwiXiYi - ΣwiXiΣwiYi/Σwi/ ΣwiXi2 - (ΣwiXi)2/Σwi.

α = ΣwiYi - ^βΣwiXi/Σwi.

One can always divide the weights by a constant so that Σwi = 1.If we assume that Σwi = 1, then these equations become:

^β = ΣwiXiYi - ΣwiXiΣwiYi/ ΣwiXi2 - (ΣwiXi)2.

α = ΣwiYi - ^βΣwiXi.

These weighted regression equations compare to the unweighted regression equations:^β = NΣXiYi - ΣXiΣYi / NΣXi2 - (ΣXi)2 = ΣXiYi/N - ΣXi/N ΣYi /N/ ΣXi2/N - (ΣXi/N)2.

α = Y - ^β X . If all the weights are equal, wi = 1/N, the weighted regression reduces to the

unweighted regression, as it should.

Exercise: Fit a weighted regression (with slope and intercept) to the following data:Xi Yi wi3 10 2/117 6 3/1120 4 6/11

[Solution: ^β = ΣwiXiYi - ΣwiXiΣwiYi/ΣwiXi2 - (ΣwiXi)2 =

60.545 - (13.364)(5.6364)/233.18 - 13.3642 = -.271.

α = ΣwiYi - ^βΣwiXi = 5.6364 - (-.271)(13.364) = 9.26.

The weighted regression is: ^

iY = 9.26 - .271Xi.]


Deviations Form:

Just as in the unweighted case, one can put weighted regressions into deviations form. However, rather than subtracting the straight average, one subtracts the weighted average from each variable. Assuming that Σwi = 1:

xi = Xi - ΣΣΣΣwiXiyi = Yi - ΣΣΣΣwiYi

^ββ = ΣΣΣΣwixiyi /ΣΣΣΣwixi2.

αα = ΣΣΣΣwiYi - ^ββΣΣΣΣwiXi.

If wi = 1/N, then these formulas for the weighted regression reduce to the formulas for an unweighted regression.

For example, redoing the previous exercise in deviations form:X = 3, 7, 20Y = 10, 6, 4w = 2/11, 3/11, 6/11ΣwiXi = 13.364.xi = Xi - ΣwiXi = -10.364, -6.364, 6.636ΣwiYi = 5.6364.yi = Yi - ΣwiYi = 4.3636, .3636, -1.6364Σwixiyi = -14.78.

Σwixi2 = 54.60.

^β = Σwixiyi/Σwixi2 = -14.78/54.60 = -.271.

α = ΣwiYi - ^βΣwiXi = 5.6364 - (-.271)(13.364) = 9.26.


iY = 9.26 - .271Xi, matching the previous result.

Multiple Regression:*

Weighted regression is just a special case of Generalized Least Squares (GLS), to be discussed in a subsequent section. Weighted least squares can be applied in the same manner to the multiple regression model as to the two variable model, but it is usually easier to handle this as a special case of Generalized Least Squares in matrix form.


The Relationship of Bayes Analysis and Buhlmann Credibility:*199

The line formed by the Buhlmann Credibility estimates is the weighted least squares line to the Bayesian estimates, with the a priori probability of each outcome acting as the weights. The slope of this weighted least squares line to the Bayesian Estimates is the Buhlmann Credibility. Buhlmann Credibility is the Least Squares approximation to the Bayesian Estimates.

For example, assume the following information:Observation 1 2 3 4 5 6 7 8A Priori Probability 0 .2125 0.2125 0.2125 0.2125 0.0625 0.0625 0.0125 0.0125Bayesian Estimate 2 .853 2.853 2.853 2.853 3.7 3 .7 4 .5 4 .5

The weights to be used are the a priori probabilities of each observation.

Put the variables in deviations form, by subtracting the weighted average from each variable: xi = Xi - ΣwiXi, yi = Yi - ΣwiYi.w = 0.2125, 0.2125, 0.2125, 0.2125, 0.0625, 0.0625, 0.0125, 0.0125.X = 1, 2, 3, 4, 5, 6, 7, 8.ΣwiXi = 3 = a priori mean.x = X - ΣwiXi = -2, -1, 0, 1, 2, 3, 4, 5.Y = 2.853, 2.853, 2.853, 2.853, 3.7, 3.7, 4.5, 4.5.ΣwiYi = 3. y = Y - ΣwiYi = -.147, -.147, -.147, -.147, .7 , .7, 1.5, 1.5

Then if the least squares line is Y = α + βX, ^β = Σwixiyi/Σwixi2 = .45/2.6 = .173.

α = ΣwiYi - ^βΣwiXi = 3 - (3)(.173) = (3)(.827) = 2.481.

îY = 2.481 + .173Xi, where Xi is the observation.

The slope of the line is the credibility assigned to one observation, Z = 17.3%. The fitted weighted regression line is the estimates using Buhlmann Credibility: Z(observation) + (1-Z)(a priori mean). This is true in general.

199 Bayes Analysis and Buhlmann credibility are covered on joint Exam 4/C.See for example “Credibility,” by Mahler and Dean.This example is taken from “Mahler’s Guide to Buhlmann Credibility and Bayesian Analysis.”


Problems:

25.1 (2 points) Determine the slope of a weighted regression, with slope and intercept, fit to the following data.X Y Weight0 2 1/43 4 1/210 6 1/4A. 0.31% B. 0.33 C. 0.35 D. 0.37 E. 0.39

25 .2 (3 points) You are given the following information:X Y Weight1 15 60% 4 30 30%9 50 10%Fit a weighted regression, with slope and intercept, to this data.What is the fitted value of Y for X = 10?(A) 55 (B) 56 (C) 57 (D) 58 (E) 59

25.3 (2 points) Determine the slope of a weighted regression, with no intercept, fit to the following data.X Y Weight1 3 30%5 8 40%10 13 20%20 32 10%A. 1.46 B. 1.48 C. 1.50 D. 1.52 E. 1.54

25.4 (165, 5/89, Q.2) (1.7 points) You believe that the true probability of success in a single play of a game is directly proportional to the age of the player. You are given the following observed experience: Player Age Number of Successes Number of Plays, wi

1 20 25 1002 25 28 1123 30 30 100 where: (i) ui = the observed proportion of successes; and (ii) vi = the graduated proportion of successes.

The fit measure, F = w u vi i i2

i=1

3( )−−∑∑ , is to be minimized subject to the prior opinion concerning

the true probability of success. Determine v1. (A) 0.210 (B) 0.213 (C) 0.218 (D) 0.250 (E) 0.265


* 25.5 (4, 5/90, Q.57) (3 points) Let X1 be the outcome of a single trial and let

E[X2 | X1] be the expected value of the outcome of a second trial as described in the table below. Outcome Initial Probability Bayesian Estimate K of Outcome E[X2 | X1 = K ] 0 1/3 1 3 1/3 6 12 1/3 8Which of the following represents the Buhlmann credibility estimates corresponding to the Bayesian estimates (1, 6, 8)?A. (3, 5, 10) B. (2, 4, 10) C. (2.5, 4.0, 8.5) D. (1.5, 3.375, 9.0) E. (1, 6, 8)

25.6 (165, 11/90, Q.1) (1.9 points) You are given the following exposures, nx, and observed values, ux:

x nx ux 0 300 3 1 200 6 2 100 11

Revised estimates, vx, are to be determined such that:

(i) ∆ vx = a, x = 0, 1; and (ii) the sum of the squared deviations, weighted by exposures, is minimized. Determine v1. (A) 6.3 (B) 6.6 (C) 6.7 (D) 10.4 (E) 10.7

25.7 (165, 11/90, Q.15) (1.9 points) You are using the least squares method, weighted by exposures, to develop a rate of mortality, qx = a(x + 1/2). You are given: x Exposure Deaths 30 300 3 40 400 10 50 300 15 Determine a. (A) 0.00069 (B) 0.00070 (C) 0.00072 (D) 0.00074 (E) 0.00075


* 25.8 (4B, 11/93, Q.24) (3 points) You are given the following:• An experiment consists of three possible outcomes, R1 = 0, R2 = 2, and R3 = 14.• The a priori probability distribution for the experiment's outcome is: Outcome, Ri Probability, Pi

0 2/32 2/9

14 1/9• For each possible outcome, Bayesian analysis was used to calculate predictive estimates, Ei, for the second observation of the experiment. The predictive estimates are:

Bayesian Analysis PredictiveOutcome, Ri Estimate Ei Given Outcome Ri

0 7/42 55/24

14 35/12• The Buhlmann credibility factor after one experiment is 1/12.Determine the values for the parameters a and b that minimize the expression: 3

Σ Pi(a + bRi - Ei)2

i=1

A. a = 1/12; b = 11/12 B. a = 1/12; b = 22/12 C. a = 11/12; b = 1/12 D. a = 22/12; b = 1/12 E. a = 11/12; b = 11/12

25.9 (165, 11/94, Q.3) (1.9 points) You are given the following exposures nx, observed values ux and graduated values vx:

x nx ux vx

1 1 4 v1 2 1 6 v2 3 2 u3 10

Graduated values vx are determined such that: (i) vx = ax + b; and (ii) the sum of the squared deviations, weighted by exposures, is minimized. Determine v2. (A) 6.0 (B) 6.2 (C) 6.4 (D) 6.6 (E) 6.8Note: The original exam question has been revised.


25.10 (165, 11/94, Q.15) (1.9 points) You are using the least squares method, weighted by exposures, to fit the functional form qx = a(x + 1/3). You are given:

x Exposures Deaths 10 30 1 20 90 5 30 80 7

Determine 1000a. (A) 2.84 (B) 2.87 (C) 2.90 (D) 2.93 (E) 2.96

* 25.11 (4, 11/02, Q.7) (2.5 points) You are given the following information about a credibility model:First Unconditional Bayesian Estimate ofObservation Probability Second Observation1 1/3 1.502 1/3 1.503 1/3 3.00Determine the Bühlmann credibility estimate of the second observation, given that the firstobservation is 1.(A) 0.75 (B) 1.00 (C) 1.25 (D) 1.50 (E) 1.75


Section 26, Heteroscedasticity

One of the assumptions underlying ordinary least squares regression is that the error terms εi

are random variables with the same variance. Actually we assumed the εi were independent, identically distributed normal variables, each with mean zero, however, we are for now focusing on the assumption that they have the same variance.

We use the following terms:

Variances of εεεε i are all equal ⇔⇔⇔⇔ Homoscedasticity.

Variances of εεεε i are not all equal ⇔⇔⇔⇔ Heteroscedasticity.200

An assumption underlying ordinary least squares regression is that the σi2 are all equal. The null hypothesis is that there is homoscedasticity. The alternate hypothesis is that there is heteroscedasticity.

An Example of Heteroscedasticity: 201

Assume that for each of 80 towns, we have data on their annual claim frequency over the last 4 years, and the exposures (car-years) over the same period of time. In each pair shown below, the exposures are followed by the observed number of claims per 10,000 exposures:

4092, 261, 4401, 218, 5164, 267, 5687, 215, 5847, 173, 6003, 173, 6196, 211, 6219, 241, 6524, 302, 6698, 199, 7244, 213, 7924, 235, 8473, 250, 8546, 274, 8923, 236, 9107, 238, 10113, 254, 10341, 266, 11740, 272, 11919, 214, 11972, 285, 12020, 291, 12387, 284, 12653, 253, 13560, 210, 13893, 205, 14403, 271, 14906, 321, 16178, 243, 16280, 270, 16611, 266, 17653, 265, 18421, 307, 18506, 269, 18575, 272, 21036, 340, 21125, 309, 21972, 289, 23576, 280, 23800, 324, 25599, 339, 28942, 369, 29773, 329, 30778, 366, 31897, 310, 32738, 346, 35708, 404, 42345, 418, 44382, 385, 46051, 394, 47386, 446, 52763, 454, 54881, 479, 60044, 491, 63511, 543, 66620, 539, 69062, 520, 71807, 494, 72231, 542, 77854, 595, 81597, 609, 92432, 648, 98188, 644, 100133, 715, 104217, 703, 111460, 750, 123870, 761, 124017, 794, 129975, 862, 132996, 879, 139876, 877, 140738, 922, 141963, 934, 148211, 935, 159978, 982, 167914, 1033, 180206, 1109, 185566, 1143, 194448, 1163, 211189, 1250.

For example, the first town had 4092 exposures and 107 claims, for a claim frequency of 107/4092 = .0261 or 261 claims per 10,000 exposures. The final town had 211,189 exposures and a claim frequency of 12.50%. 200 “Hetero” ⇔ differing, as in heterogeneous. “Homo” ⇔ similar, as in homogeneous. “Scedastic” from to scatter.201 This example is very loosely based on Private Passenger Automobile Insurance in Massachusetts. The behavior of the actual data is more complicated.See “The Construction of Automobile Rating Territories in Massachusetts,” by Robert Conger, PCAS 1987.


Here is a graph of this data:

50000 100000 150000 200000

200

400

600

800

1000

1200

It appears as if the larger towns tend to have higher frequencies.

Therefore, we fit via ordinary least squares the model Yi = α + βXi + εi, where Xi is the number of exposures for town i, and Yi is the observed claim frequency per 10,000 exposures for town i.

The result is α = 194.4 and ^β = .00499. R2 = .9903.

2R = .9902. s2 = 764.

sα = 4.285. t-statistic for the intercept is: 194.4/4.285 = 45.4.

βs = .00005594. t-statistic for the slope is: .00499/.00005594 = 89.2.

Source DF Sum of Squares Mean Square F-StatisticModel 1 6,083,320 6,083,320 7960Error 78 59,610 764Total 79 6,142,930

(18.36 -0.000166)Covariance Matrix = ( )

(-0.000166 .00000000313)

Corr[ α , ^β ] = -.693.

Durbin-Watson Statistic is 1.719.202

Based on the t-statistics and the F-Statistic, the slope and intercept both appear to be

significantly different than zero. 2

R is extremely high; the regression line accounts for most of the variation in frequency between the towns.

202 As discussed subsequently, the Durbin-Watson statistic tests for serial correlation. In this case the Durbin-Watson statistic of 1.719 is sufficiently close to 2, so as not to indicate serial correlation.


Here is a graph of this regression, ^

iY = 194.4 + .00499Xi, versus the data:

50000 100000 150000 200000

200

400

600

800

1000

1200

So far everything seems okay. However, here is a graph of the squared residuals:

50000 100000 150000 200000Expos.

500

1000

1500

2000

2500

3000

The magnitude of the residuals appears to have some tendency to be larger on average for smaller towns. This corresponds to our intuition, that the observed frequencies of smaller towns would be more affected by random fluctuation. Thus we have a suspicion that the variance of the errors εi is not constant. Rather, we suspect that σi2 increases as the number of exposures decreases.


Simulating Heteroscedasticity:*

Assume for example we wish to simulate the model: Yi = 95 + .05Xi + εi.

We are given the values for a series of observations of X, and want to simulate a corresponding series of values of Y.

It is assumed that εi is Normally Distributed with mean zero.

If the model is homoscedastic with Var[εi] = 900, then we simulate this model as follows:203

Yi = 95 + .05Xi + 30(Random Standard Normal Distribution),where for each Yi we simulate a new independent random draw from a Normal Distribution with mean zero and standard deviation 1.

In the case of the heteroscedastic model, instead of multiplying the random draws from a Standard Normal by the same value of σ for each Yi, σi varies.

Yi = 95 + .05Xi + σi(Random Standard Normal Distribution).

If the model is heteroscedastic with for example Var[εi] = .36Xi and Stddev[εi] = .6√Xi, then we simulate this model as follows:Yi = 95 + .05Xi + .6Xi.5(Random Standard Normal Distribution),where for each Yi we simulate a new independent random draw from a Normal Distribution with mean zero and standard deviation 1.

Exercise: Let 1.670, -0.518, 0.299, be three independent random draws from a Standard Normal Distribution. X1 = 1711, X2 = 3124, and X3 = 4502. For the heteroscedastic example

with Var[εi] = .36Xi, simulate Y1, Y2, and Y3.

[Solution: Y1 = 95 + (.05)(1711) + (.6)(1711.5)(1.670) = 222.00.

Y2 = 95 + (.05)(3124) + (.6)(3124.5)(-0.518) = 233.83.

Y3 = 95 + (.05)(4502) + (.6)(4502.5)(.299) = 332.14.]

203 Here one is not interested in the details of how the computer simulates a random draw from a Normal Distribution. How to simulate a Normal Distribution is explained for example in Simulation by Ross.


Estimated Variances and Covariances:

The usual formula for the variance of the estimated slope for the two-variable model is:

Var[^β] = s2 /Σxi2, where s2 = Σ^

iε 2/(N - 2) = ESS/(N - 2). If there is homoscedasticity this estimate is unbiased and consistent. However, if there is heteroscedasticity, then the

usual estimator of the variance of ^ββ is biased and inconsistent.

For the two variable model, when there is heteroscedasticity, Var[ ^ββ] = ΣΣΣΣxi2σσσσ i2 /(ΣΣΣΣxi2)2,204

compared to Var[^β] = σ2 /Σxi2 with homoscedasticity. Thus when heteroscedasticity is

present, the ordinary least squares estimators of these variances are biased, inconsistent, and inefficient.

Derivation of Var[^β]:*

For the two variable model, ^β = Σ xiYi / Σxi2.

Var[^β] = Var[Σ xiYi / Σxi2] = Σ xi2Var[Yi] / (Σxi2)2 = Σxi2σi2 /(Σxi2)2.205

If all the σi are equal, in other words if we have homoscedasticity, then this reduces to the

usual: Var[^β] = σ2/(Σxi2).

If instead of an unweighted regression, we perform a weighted regression with wi = 1/σi2,

then ^β = Σ wixiYi / Σwixi2 = Σ xiYi/σi2 / Σxi2/σi2.206

In this case of a weighted regression, Var[^β] = Var[Σ xiYi/σi2 / Σxi2/σi2]

= Σ xi2Var[Yi]/σi4 / (Σxi2/σi2)2 = Σxi2σi2/σi4 /(Σxi2/σi2)2 = Σxi2/σi2 /(Σxi2/σi2)2 = 1 / Σxi2/σi2.

204 See Equation 6.3 in Pindyck and Rubinfeld.205 The xi are assumed to be known and therefore we can treat them as constants. We have used the fact that the

variance of a variable times a constant is the variance of the variable times the square of the constant.206 As will be discussed, this is one way to correct for heteroscedasticity, if one can determine how the variances vary.


Problems:

26 .1 (1 point) Define homoscedasticity and heteroscedasticity.

26.2 (2 points) Regressions are fit to five different time series, each with 100 observations.Of the following five graphs of the squares of the residuals, which of them most clearly indicates the presence of heteroscedasticity?

20 40 60 80 100

A.

20 40 60 80 100

D.

20 40 60 80 100

B.

20 40 60 80 100

E.

20 40 60 80 100

C.


26.3 (8 points) Let X = (0, 5, 10). Yi = 3 + 2Xi + εi.

ε1 has a 50% chance of being -1 and a 50% chance of being +1.



ε1, ε2, and ε3 are mutually independent. List all possible observed sets of Y.

For each set determine α, ^β, and ESS.

Determine the average values over the sets.


Section 27, Tests for Heteroscedasticity207

There are number of tests for heteroscedasticity in errors: the Goldfeld-Quandt Test, the Breusch-Pagan Test, and the White Test. The latter two are very similar.

Goldfeld-Quandt Test for Heteroscedasticity:

One way to test for heteroscedasticity in errors is the Goldfeld-Quandt test.

First one needs to find some variable that one believes is related to σi2, the variance of εi.In the town example, we believe such a variable to be the exposures.

Next, rank the observations according to that variable, from smallest to largest assumed σi2.

In this case, we would rank the towns from most to fewest exposures.

Run two separate regressions. One regression on the 32 largest towns, and then another regression on the 32 smallest towns, omitting the middle 80/5 = 16 towns.208

For the 32 largest towns, the regression results were:

α = 180.6 and ^β = .00510. R2 = .9918.

2R =.9916. s2 = 513.

sα = 10.08. t-statistic for the intercept is 17.9. βs = .00008464. t-statistic for the slope is 60.2.

Source DF Sum of Squares Mean Square F-StatisticModel 1 1,860,890 1,860,890 3626Error 30 15,395 513Total 31 1,876,295

For the 32 smallest towns, the regression results were:

α = 209.8 and ^β = .00360. R2 = .160.

2R = .132. s2 = 1109.

sα = 16.32. t-statistic for the intercept is 12.9. βs = .001505. t-statistic for the slope is 2.39.

Source DF Sum of Squares Mean Square F-StatisticModel 1 6331 6331 5.71Error 30 33,283 1109Total 31 39,614

In order to test for homoscedasticity, we compare the ESS for the second regression to the error sum of squares for the first regression. The test statistic is:

(ESS for second regression)/(32 - 2)/ESS first regression/(32 - 2) = 33283/15395 = 2.16. Assuming the εi are independent, identically distributed normal variables, each with mean zero, this test statistic has an F-Distribution, with 30 and 30 degrees of freedom.209

207 See Section 6.1.2 of Pindyck and Rubinfeld.208 Usually one omits some observations in the middle and then fits a regression to each half of the remaining data.209 In each of the numerator and denominator, we had 32 observations and fit 2 coefficients.


For 30 and 30 degrees of freedom, the critical value at 5% is 1.84, and the critical value at 1% is 2.30.210 Since 1.84 < 2.16 < 2.30, we do not reject at 1% the null hypothesis that the errors are constant and reject at 5% the null hypothesis. At a 5% significance level, the Goldfeld-Quandt Test indicates heteroscedasticity.

In general, the Goldfeld-Quandt Test proceeds as follows:

0. Test H0 that σi2, the variance of εi, is the same for all i.

1. Find a variable that seems to be related to σi2, by graphing the squared residuals, or other techniques.

2. Order the observations in assumed increasing order of σi2, based on the relationship from step 1.

3. Run a regression on the first (N - d)/2 observations, with assumed smaller σi2.

4. Run a regression on the last (N - d)/2 observations, with assumed larger σi2.

5. (ESS from step 4)/(ESS from step 3) has an F-Distribution, with (N - d)/2 - k and (N - d)/2 - k degrees of freedom.

210 Using a somewhat larger F-Table than attached to the exam.


Breusch-Pagan Test for Heteroscedasticity:

Another test for heteroscedasticity is the Breusch-Pagan test.

As with the Goldfeld-Quandt test, we first need to find a variable that seems to be related to σi2. In the town example, we have already identified exposures.

We use the regression that was fit to the town data previously: ^

iY = 194.4 + .00499Xi.

Take σ2 = ESS/N = 59610/80 = 745.211

Run a linear regression of îε 2/σ2 on exposures. The result is:

intercept = 1.291, and slope = -5.5 x 10-6.

R2 = .0485. 2

R = .0363. RSS = 7.34. ESS = 144.141. F = 3.97.

Assuming the εi are independent, identically distributed normal variables, each with mean zero, RSS/2 has a Chi-Square Distribution, with 1 degree of freedom.212

RSS/ 2 = 3.67. For the Chi-Square, the critical value for 1 degree of freedom at 5% is 3.84.Since 3.67 < 3.84, we do not reject the null hypothesis of homoscedasticity at 5%.213

In general, the Breusch-Pagan Test proceeds as follows:


1. Find a variable(s) that seems to be related to σi2, by graphing the squared residuals, or other techniques.

2. Run the assumed regression model. Note the residuals îε and let σ2 = ESS/N.

3. Run a regression of îε 2/σ2 from step 2 on the variable(s) from step 1.

4. RSS/2 from step 3 has a Chi-Square Distribution with number of degrees of freedom equal to the number of variables from step 1, not counting an intercept.

One does not have to assume a linear relationship between îε 2/σ2 and the exposures.

For example, let us assume îε 2/σ2 = a + b/exposures + error term.

Running this regression results in: intercept = .608, and slope = 6628. RSS = 11.75.

RSS/ 2 = 5.88. Since 5.02 < 5.88 < 6.64, we reject the null hypothesis of homoscedasticity at 2.5% and do not reject the null hypothesis at 1%.

211 For calculating σ2 in this test, the denominator is N rather than N - k.212 Assuming the variances are related to set of p independent variables, then RSS/2 has a Chi-Square Distribution, with p degrees of freedom.213 The critical value at 10% turns out to be 2.71. Since 3.67> 2.71, we reject the null hypothesis at 10%.


White Test for Heteroscedasticity:

A third test for heteroscedasticity in errors is the White test, which is very similar to the Breusch-Pagan test.

As with the Goldfeld-Quandt test and Breusch-Pagan test, we first need to find a variable that seems to be related to σi2. In this example, we have already identified exposures.

We use the regression that was fit to the town data previously: ^

iY = 194.4 + .00499Xi.

Run a linear regression of îε 2 on exposures. The result is:214

intercept = 961.9, and slope = -.00409. R2 = .0485.

Assuming the εi are independent, identically distributed normal variables, each with mean

zero, N R2 has a Chi-Square Distribution, with 1 degree of freedom.215

N R2 = (80)(.0485) = 3.88. For the Chi-Square, the critical value for 1 degree of freedom at 5% is 3.84. Since 3.88 > 3.84, we reject the null hypothesis of homoscedasticity at 5%.216

In general, the White Test proceeds as follows:



2. Run the assumed regression model. Note the residuals.

3. Run a regression of îε 2 from step 2 on the variable(s) from step 1.

4. N R2 from step 3 has a Chi-Square Distribution with number of degrees of freedom equal to the number of variables from step 1, not counting an intercept.

One does not have to assume a linear relationship between îε 2 and the exposures.

For example, let us assume îε 2 = a + b/exposures + error term.

Running this regression results in: intercept = 453, and slope = 4.94 million. R2 = .0776.

N R2 = (80)(.0776) = 6.21. Since 5.02 < 6.21 < 6.64, we reject the null hypothesis of homoscedasticity at 2.5% and do not reject the null hypothesis at 1%.

214 Note that since the only difference at this stage from the Breusch-Pagan test was not dividing each îε 2 by

745 = σ2, the R2 values are the same.215 Assuming the variances are related to set of p independent variables, then NR2 has a Chi-Square Distribution, with p degrees of freedom.216 The critical value at 2.5% is 5.02. Since 3.88 < 5.02, we do not reject the null hypothesis at 2.5%.


Problems:

27.1 (2 points) You are given 75 observations, which you have ordered based on some variable which is believed to be related to the size of the variances of the errors.A four variable linear regression was fit to the first 30 observations, with total sum of squares of 1313 and R2 = .985. A four variable linear regression was fit to the last 30 observations, with total sum of squares of 1696 and R2 = .980.What conclusion do you draw with respect to heteroscedasticity?

27.2 (1 point) Which of the following statements about tests for heteroscedasticity are false?A. The null hypothesis is that there is heteroscedasticity.B. In the White Test, the test statistic is the number of observations times R2 for a regression.C. In the Breusch-Pagan Test, the test statistic is a Regression Sum of Squares divided by 2. D. In the Goldfeld-Quandt Test, the test statistic involves a ratio of Error Sum of Squares.E. None of A, B, C, or D is false.

Use the following 10 observations for the next 4 questions:X 1 5 6 8 9 13 14 18 23 25Y 3 5 9 11 13 16 16 23 30 27

27.3 (4 points) Fit a two variable regression model. What is the fitted value of Y, corresponding to X = 23?(A) Less than 27(B) At least 27, but less than 28(C) At least 28, but less than 29(D) At least 29, but less than 30(E) At least 31

27.4 (6 points) Apply the Breusch-Pagan Test for heteroscedasticity, assuming the variance of the errors is related to X. At which level do you reject the null hypothesis?(A) 5% (B) 2.5% (C) 1% (D) 0.5% (E) None of A, B, C, or D

27.5 (5 points) Apply the White Test for heteroscedasticity, assuming the variance of the errors is related to X. At which level do you reject the null hypothesis?(A) 5% (B) 2.5% (C) 1% (D) 0.5% (E) None of A, B, C, or D

27.6 (6 points) Calculate the F statistic used in the Goldfeld-Quandt test for heteroscedasticity. Omit the middle 1/5 of observations.(A) 1.0 (B) 1.5 (C) 2.0 (D) 2.5 (E) 3.0


Use the following information for the next 3 questions:The following 20 observations have been fit to a regression with the result:-44.0947 + 11.5339x - 0.0637749x2.Xi Yi Residual7.7 47 6.064847.9 43.2 0.1570388.2, 45.1 -1.09519.6, 52.7 -8.05339.9, 54.7 -9.1403910, 61.3 -3.5668710.2 85 18.084011.2 68 -9.0851311.5 103.8 23.689012 73.3 -11.828612.3 74.2 -13.923912.4 75.2 -13.919713.2 97.5 0.45926413.5 70.7 -29.290113.8 144.3 41.372114.5 148 38.261715.2 86.8 -29.686116.3 158.1 31.136417.6 87.6 -51.547218.7 171.2 21.9121

27.7 (6 points) Perform the Goldfeld-Quandt test for heteroscedasticity. Omit the middle 1/5 of observations.

27.8 (6 points) Perform the White test for heteroscedasticity.

27.9 (6 points) Perform the Breusch-Pagan test for heteroscedasticity.

27 .1 0 (1 point) Match the tests for heteroscedasticity with the distribution for their test statistic.1. Goldfeld-Quandt Test f. F Distribution2. Breusch-Pagan Test t. t-Distribution3. White Test x. Chi-Square DistributionA. 1f, 2t, 3xB. 1f, 2x, 3tC. 1t, 2x, 3fD. 1x, 2f, 3tE. None of A, B, C, or D


27.11 (Course 120 Sample Exam #1, Q.14) (2 points) You are given the following:Group Xi Yi

5.0 1.01 5.0 2.0

5.0 2.0

10.0 3.02 10.0 3.2

10.0 3.5

15.0 4.03 15.0 4.2

15.0 4.6

20.0 4.64 20.0 5.0

20.0 5.8

You are to test for heteroscedasticity in errors between the first two groups and the second two groups, assuming all groups are to be included in the calculation.A linear regression was fit to the first two groups with result: ^

iY = 0.10 + .313Xi, with RSS = 3.68 and ESS = .79.A linear regression was fit to the second two groups with result: ^

iY = 1.67 + .173Xi, with RSS = 1.13 and ESS = .93.Calculate the F statistic used in the Goldfeld-Quandt test.(A) 0.3 (B) 0.7 (C) 1.2 (D)1.7 (E) 2.2

27.12 (VEE-Applied Statistics Exam, 8/05, Q.11) (2.5 points) You fit the model Yi = α + βXi + εi to a data set with N observations. You test the null hypothesis that the error terms are homoscedastic against the alternative hypothesis that Var(εi) = σi2 = γ + δXi + ηXi2. Which of the following statements is false?

(A) A valid test is done by running the model îε 2 = γ + δXi + ηXi2 + νi on the residuals and

referring the resulting value of NR2 to a chi-square distribution with 2 degrees of freedom.

(B) A valid test is done by running the model îε 2 = γ + δXi + ηXi2 + νi on the residuals and

referring the resulting value of RSS/2 to a chi-square distribution with 2 degrees of freedom.

(C) A valid test is done by running the model îε 2/ σ2 = γ + δXi + ηXi2 + νi on the residuals and

referring the resulting value of RSS/2 to a chi-square distribution with 2 degrees of freedom.

(D) If ^β is the ordinary least-squares estimator of β and the alternative hypothesis is true,

Var(^β) = Σxi2σi2 /(Σxi2)2.

(E) If γ = δ = 0, the procedure for testing homoscedasticity developed by Goldfeld and Quandt can be applied.


Section 28, Correcting for Heteroscedasticity

Exercise: A random variable is divided by its standard deviation. What is the variance of the new variable that results?

[Solution: Let Var[X] = σ2. Then Var[X/σ] = Var[X]/ σ2 = σ2/ σ2 = 1.]

Thus if we divide any variable by its standard deviation, we can get a new variable with variance of 1.

Model with No Intercept:

We will first consider how we would correct for the presence of heteroscedasticity in the case of a simple model with one variable and no intercept.

Suppose the variance of ε1 is 4 and the variance of ε2 is 36. Then if these were the first two

error terms of a regression, we transform to ε1/2 and ε2 /6, in order to get variables with variance 1. In this manner we have transformed a situation with differing variances of the error terms, into one where the errors have equal variances. Of course to preserve the model we need to also divide X1 and Y1 by 2, and X2 and Y2 by 6.

Assume Yi = βXi + εi, with variance of ε1 = 4, variance of ε2 = 36, and variance of ε3 = 64.

Then we would revise the model to:Y1/2 = βX1/2 + ε1/2, Y2/6 = βX2/6 + ε2/6, and Y3/8 = βX3/8 + ε3/8.

The revised model is equivalent to the original model; it has the same slope β.However, the adjusted model is homoscedastic; the errors each have a variance of 1.

Exercise: For the above situation, if X = 10, 36, 104 and Y = 20, 78, 168, what are the estimates of β, prior to and after making the above adjustment.

[Solution: Prior to adjustment: ΣXi2 = 12212. ΣXiYi = 20480. ^β = 20480/12212 = 1.68.

After adjusting, X = 10/2, 36/6, 104/8 = 5, 6, 13 and Y = 20/2, 78/6, 168/8 = 10, 13, 21.

ΣXi2 = 230. ΣXiYi = 401. ^β = 401/230 = 1.74.]

Prior to adjustment, the estimate of the slope is 1.68. However, there was heteroscedasticity. As discussed previously, therefore while this estimate is unbiased and consistent, it is not efficient. It does not have the smallest expected squared error among unbiased linear estimators.

After adjustment, the estimate of the slope is 1.74. The adjustment has removed the heteroscedasticity. The assumptions behind ordinary least squares hold for the adjusted model, and therefore this estimate is unbiased, consistent, and efficient. It has the smallest expected squared error among unbiased linear estimators. The estimate from the adjusted model is better.


Note that in terms of the original X and Y, the estimate of the slope after the adjustment to correct for heteroscedasticity is:

^β = (10/2)(20/2) + (36/6)(78/6) + (104/8)(168/8)/ (10/2)2 + (36/6)2 + (104/8)2 =Σ(Xi/σi)(Yi/σi) / Σ(Xi/σi)2 = ΣwiXiYi / ΣwiXi2, where wi = (1/σi2)/ Σ1/σi2.

In this case, the weights wi are: 1/4, 1/36, 1/64/1/4 + 1/36 + 1/64 = 144, 16, 9/169 = .852, .095, .053. It is as if we count the first observation more heavily and the last observation less heavily. This is an example of a weighted regression, as discussed in a previous section.

In a weighted regression we weight some of the observations more heavily than others. Equivalently we pretend as if we have different numbers of repeated copies of the actual observations.

Exercise: You have 169 observations. For 144 of them X = 10 and Y = 20, for 16 of them X = 36 and Y = 78, and for the last 9 of them X = 104 and Y = 168. Fit the model Yi = βXi + εi, by ordinary least squares regression.

[Solution: ΣXi2 = (144)(102) + (16)(362) + (9)(1042) = 132480.

ΣXiYi = (144)(10)(20) + (16)(36)(78) + (9)(104)(168) = 230976. ^β = 230976/132480 = 1.74.]

This is the same result as for the model adjusted to correct for the effects of heteroscedasticity. The chief use of weighted regressions on this exam is to correct for the presence of heteroscedasticity.

In order to adjust for heteroscedasticity, we use a weighted regression, in which we weight each observation by wi, with wi proportional to 1/σσσσ i2222, the

inverse of the variance of the error εεεε i.

For the model with no intercept:^ββ = ΣΣΣΣ((((Xi/σσσσ i)((((Yi/σσσσ i) / ΣΣΣΣ((((Xi/σσσσ i)2 = ΣΣΣΣwiXiYi / ΣΣΣΣwiXi2, where wi = (1/σσσσ i2222)/ ΣΣΣΣ1/σσσσ i2222....

Sometimes one is given Var(εi) = σi2 as a function of an independent variable.

Exercise: You are given that Var(εi) = σ2Xi and the εi’s are uncorrelated.

You fit the regression model Yi = βXi + εi.

Determine the weighted least squares estimate of β.


[Solution: Adjust each variable by dividing by StdDev[εi], σ√Xi.

Yi / σ√Xi = βXi/ σ√Xi + εi/ σ√Xi. Yi /√Xi = β√Xi + εi /√Xi. The errors now have constant variance

and the least squares solution is: ^β = Σ√Xi Yi /√Xi /Σ(√Xi)2 = ΣYi /ΣXi.

Alternately, wi = (1/ σ2Xi )/Σ1/ σ2Xi = (1/Xi) /Σ1/Xi .

^β = ΣwiXiYi / ΣwiXi2 = ΣYi/Σ1/Xi /ΣXiΣ1/Xi = ΣYi /ΣXi.]

σ2, the proportionality constant in the formula for Var(εi), had no effect on the estimate of the

slope. In this example, we only needed to know that Var(εi) was proportional to Xi. In general,

we only need to know the relationship between σi2 and Xi up to a proportionality constant. We

are only interested in the relative sizes of σi2.

In the above exercise, with the variance of εi proportional to Xi, the Yi associated with large Xi had a larger variance. Therefore, it is harder to estimate the Yi associated with large Xi with the same accuracy as those Yi associated with small Xi. We are less concerned with a given size error in predicting Y when X is large, than the same size error when X is small. Therefore, it makes sense to weight those squared errors associated with large X less heavily in a sum of squared errors.

This is exactly what is done in the weighted regression as used to correct for heteroscedasticity. In the weighted sum of squares, each term is weighted inversely to its variance. By dividing by each variance, we have standardized the squared errors so they are on a comparable scale and can be usefully added up.

Two Variable Model:

Let’s apply these ideas to the two variable model: Yi = α + βXi + εi. If we believe there is

heteroscedasticity, we adjust the original model: Yi/σi = α/σi + βXi/σi + εi/σi. This adjusted model is homoscedastic.

The sum of squared errors for the adjusted model is:Σ(Yi/σi - (α/σi + βXi/σi))2 = Σ(Yi - α + βXi)2/ σi2.

We minimize this sum of squared errors by setting the partial derivatives with respect to α and β equal to zero.

0 = Σ(Yi - α + βXi)/ σi2. ⇒ αΣ(1/ σi2) = ΣYi/ σi2 - βΣXi/ σi2.

⇒ α = ΣwiYi - ^βΣwiXi, where wi = (1/σi2)/ Σ(1/σj2).


0 = Σ(Yi - α + βXi)Xi/ σi2. ⇒ 0 = ΣXiYi/ σi2 - αΣ Xi/ σi2 - βΣXi2/ σi2. ⇒

0 = ΣwiXiYi - αΣwiXi - ^βΣwiXi2, where wi = (1/σi2)/ Σ(1/σj2).

Substituting α into the second equation, we can solve for ^β:

^β = ΣwiXiYi - ΣwiXiΣwiYi/ ΣwiXi2 - (ΣwiXi)2, where wi = (1/σi2)/ Σ(1/σj2).

α = ΣwiYi - ^βΣwiXi.

These are the equations for weighted regression, when Σwi = 1, discussed in a previous section.

Exercise: Fit a weighted regression to the following data:Xi Yi Var[εi]3 10 37 6 220 4 1[Solution: wi = (1/σi2)/ Σ(1/ σi2) = 1/3, 1/2, 1/(1/3 + 1/2 + 1) = 2/11, 3/11, 6/11.

^β = ΣwiXiYi - ΣwiXiΣwiYi/ΣwiXi2 - (ΣwiXi)2 = 60.545 - (13.364)(5.6364)/233.18 - 13.3642

= -.271. α = ΣwiYi - ^βΣwiXi = 5.6364 - (-.271)(13.364) = 9.26.


iY = 9.26 - .271Xi.]

Deviations Form:

As discussed in a previous section, one could instead use the equations in deviations form:

xi = Xi - ΣΣΣΣwiXiyi = Yi - ΣΣΣΣwiYi

^ββ = ΣΣΣΣwixiyi /ΣΣΣΣwixi2.

αα = ΣΣΣΣwiYi - ^ββΣΣΣΣwiXi.

In order to correct for heteroscedasticity, wi = (1/σσσσ i2222) / ΣΣΣΣ (1/σσσσ j2222)....


Adjusting the Town Example for Heteroscedasticity:

For the example of frequencies by town, there was heteroscedasticity, with the variance of the errors increasing with exposures.

As one example, it was assumed that îε 2 = a + b/(exposures for town i) + error term.

Running this regression resulted in: intercept = 453, and slope = 4.94 million. Based on this a regression, let us assume that:σi2 = 450 + 5 million/(exposures for town i).

The original model is: Yi = α + βXi + εi

The adjusted model is: Yi/σi = α/σi + βXi/σi + εi/σi.

The weights in the weighted regressions are: wi = (1/σi2)/ Σ(1/σj2).

The smallest town with 4092 exposures, has w1 = .004954, a medium sized town with 23,800 exposures has w40 = 0.012548, while the largest town with 211,189 exposures, has w80 = .0174858. It is not unreasonable that one is giving more weight to the data from larger towns.

^β = ΣwiXiYi - ΣwiXiΣwiYi/ ΣwiXi2 - (ΣwiXi)2 =

5.2434 x 107 - (67003)(528.33)/7.8922 x 09 - (67003)2 = .005006.

α = ΣwiYi - ^βΣwiXi = 528.33 - (.005006)(67003) = 192.9.

The results of the weighted regression are:

α = 192.9 and ^β = .00501. R2 = .9923.

2R = .9922. s2 = 1.022.

sα = 4.43. t-statistic for the intercept is 43.5. βs = .0000499. t-statistic for the slope is 100.

Durbin-Watson Statistic: 1.72. (19.64 -0.000167)

Covariance Matrix = ( (-0.000167 .00000000249)


iY = 192.9 + .00501Xi. This is only slightly different than the

unweighted regression gotten previously, ^

iY = 194.4 + .00499Xi.

Here is a comparison of the two models for three towns:Town Exposures Unweighted Regression Weighted Regression1 4092 214.8 213.440 23,800 313.2 312.180 211,189 1248.2 1251.0


For example, for the smallest town, the fitted claim frequencies per exposure (rather than per 10000 exposures) are: 2.148% and 2.134%. So while there are small differences, in this case there is no practical difference between the weighted and unweighted regressions.Here is a plot of this weighted regression line versus the data:217

50000 100000 150000 200000Expos.

0.02

0.04

0.06

0.08

0.1

0.12

Freq.

217 Frequencies are shown as number of claims per exposure, rather than per 10,000 exposures.


Heteroscedasticity-Consistent Estimators:218

Heteroscedasticity-consistent estimators (HCE), provide unbiased, and consistent estimators of variances of estimated parameters, when heteroscedasticity is present.

Exercise: Fit the linear regression model Yi = βXi + εi to the following data:Y 1 2 6 11X 0 3 5 8

Estimate Var[^β].

[Solution: X = (0 + 3 + 5 + 8)/4 = 4. x = (-4, -1, 1 , 4).Y = (1 + 2 + 6 + 11)/4 = 5. y = (-4, -3, 1, 6).^β = Σ xiyi / Σ xi2 = 44/34 = 22/17. α = Y -

^βX = 5 - (22/17)(4) = -3/17.

îY = α +

^βX = (-3/17, 63/17, 107/17, 173/17).

îε = Y -

îY = (20/17, -29/17, -5/17, 14/17).

s2 = Σîε 2/(N-2) = 5.0588/2 = 2.529. Var[

^β] = s2 /Σxi2 = 2.529/34 = .0744.]

Heteroscedasticity-consistent estimators are based on the more general equation:219

Var[^β] = Σxi2σi2 /(Σxi2)2, using ^

iε to estimate σi.

Var[ ^ββ] ≅≅≅≅ ΣΣΣΣxi2

îεε 2222 / ( ΣΣΣΣxi2)2.

Exercise: In the previous exercise, determine the heteroscedasticity-consistent estimator of

Var[^β].

[Solution: Var[^β] = Σxi2

îε 2 /(Σxi2)2 =

(-4)2(20/17)2 + (-1)2(-29/17)2 + (1)2(-5/17)2 + (4)2(14/17)2/(-4)2 + (-1)2 + (1)2 + (4)22

= 35.993/342 = .0311.

Comment: This differs from the previous estimate of Var[^β], which assumed homoscedasticity.

Unlike in the use of weighted regression, here there was no need to specify a form of εi as a function of Xi.]

Heteroscedasticity-consistent estimators are not efficient. Efficient estimators of variances of estimated parameters are obtained via weighted regression.

Heteroscedasticity-consistent estimators are consistent in the presence of heteroscedasticity of unknown form. In order to apply weighted regression to correct for heteroscedasticity, one must know or estimate how the variances vary across the observations. 218 See page 152 of Pindyck and Rubinfeld.219 See Equation 6.3 in Pindyck and Rubinfeld.


Applying Heteroscedasticity-Consistent Estimators to the Town Example:*

The simplest and most commonly used HCE is in matrix form:220

Var[^β] ≅ (X’X)-1 X’ Diag[ ^

iε 2] X (X’X)-1, where X is the design matrix.

In the two-variable case, this is equivalent to:

Var[ α] ≅ Var[X]2Σ îε 2 + X2Σxi2

îε 2 - 2Var[X] XΣxi

îε 2/ ( Σxi2)2.

Var[^β] ≅ Σxi2

îε 2 / ( Σxi2)2.

Cov[ α, ^β] ≅ Var[X]Σxi

îε 2 - XΣxi2

îε 2 / ( Σxi2)2.

In the town example, this heteroscedasticity-consistent estimator yields the following covariance matrix:(20.71 -.000164)(-.000164 .00000000212)

This is fairly close to the covariance matrix from the weighted regression performed above:(19.64 -0.000167)(-0.000167 .00000000249)

As well as the covariance matrix from the unweighted regression:(18.36 -0.000166)(-0.000166 .00000000313)

An HCE recommended for samples of size less than 250 is in matrix form:221

Var[^β] ≅ (X’X)-1 X’ Diag[ ^

iε 2/(1 - hii)2] X (X’X)-1,

where hii are the diagonal elements of the “hat matrix”, H = X(X’X)-1X’.

220 Pindyck and Rubinfeld does not contain formulas for HCEs. See “A heteroscedastic-consistent covariance matrix estimator and a direct test of heteroscedasticity,” by H. White, Econometrica, 48 (1980).221 See “Some heteroscedasticity consistent covariance matrix estimators with improved finite sample properties,” by J.G. MacKinnon and H. White, Journal of Econometrics, 29 (1985).


Problems:

Use the following information for the next two questions:Yi = α + βXi + εi. Var(εi) = Xi/10.i Xi Yi1 10 102 40 403 160 1004 250 125

28.1 (3 points) Determine the weighted least squares estimate of β.(A) Less than 0.3(B) At least 0.3, but less than 0.4(C) At least 0.4, but less than 0.5(D) At least 0.5, but less than 0.6(E) At least 0.6

28.2 (2 points) Determine the weighted least squares estimate of α.(A) Less than 5(B) At least 5, but less than 6(C) At least 6, but less than 7(D) At least 7, but less than 8(E) At least 8

Use the following information for the next two questions:Ten independent loss ratios Y1, Y2, ..., Y10 are described by the model Yt = α + εt. Y1 + Y2 + Y3 = 225. Y4 + Y5 + Y6 + Y7 = 290. Y8 + Y9 + Y10 = 205.

28 .3 (1 point) Determine the ordinary least squares estimator of α.(A) 70.5 (B) 71.0 (C) 71.5 (D) 72.0 (E) 72.5

28 .4 (3 points) Var(εt) = 1/3, t = 1, 2, 3; Var(εt) = 1/5, t = 4, 5, 6, 7; Var(εt) = 1/8, t = 8, 9, 10.

Determine the weighted least squares estimator of α.(A) 70.5 (B) 71.0 (C) 71.5 (D) 72.0 (E) 72.5

Use the following 15 observations for the next two questions:X: 10 10 10 10 10 15 15 15 15 15 20 20 20 20 20Y: 15 23 11 14 18 19 29 20 35 24 26 48 27 38 39

28.5 (3 points) Fit the ordinary least squares model, Yi = α + βXi + εi.

28.6 (6 points) Fit the weighted least squares model, Yi = α + βXi + εi, assuming Var(εi) is

proportional to Xi2.


28.7 (4 points) You fit the linear regression model Yi = βXi + εi to the following data:Y 3 9 14X 1 4 10

Determine the heteroscedasticity-consistent estimator of Var[^β].

(A) .013 (B) .014 (C) .015 (D) .016 (E) .017

28 .8 (2 points) You fit the model Y = α + βX + ε.The error variance is inversely proportional to X.Which of the following models corrects for this form of heteroscedasticity?(A) YX1/4 = αX1/4 + βX5/4 + ε∗ (B) YX1/4 = α + βX5/4 + ε∗

(C) YX1/2 = αX1/2 + βX3/2 + ε∗

(D) YX-1/4 = αX-1/4 + βX3/4 + ε∗

(E) YX-1/2 = αX-1/2 + βX1/2 + ε∗

28.9 (Course 120 Sample Exam #1, Q.10) (2 points) You fit the regression model Yi = βXi + εi to the following data:Y 3 8 15X 1 4 9You are given that Var(εi) = σ2Xi and the εi’s are uncorrelated.

Determine the weighted least squares estimate of β.(A) 1.68 (B) 1.70 (C) 1.73 (D) 1.86 (E) 2.22

28.10 (Course 120 Sample Exam #3, Q.9) (2 points)

You fit the model Yi = βXi + εi to the following observations:X 1 2 3 4 5Y 8 12 24 36 55

Determine ^β, the least squares estimate of β, when the error variance is proportional to X2.

(A) 8.4 (B) 9.0 (C) 9.5 (D) 9.8 (E) 10.1

28.11 (4, 11/00, Q.31) (2.5 points) You are given:(i) yi = βxi + εi

Var(εi) = (xi/2)2

(ii) i xi yi1 1 82 2 53 3 34 4 –4

Determine the weighted least squares estimate of β.(A) 0.4 (B) 0.9 (C) 1.4 (D) 2.0 (E) 2.6


28.12 (4, 5/01, Q.21) (2.5 points) Twenty independent loss ratios Y1, Y2, ..., Y20 are

described by the model Yt = α + εt,

where: Var(εt) = 0.4, t = 1, 2, ...,8, and Var(εt) = 0.6, t = 9, 10, ...,20.You are given:

1Y = (Y1 + Y2 +...+ Y8)/8

2Y = (Y9 + Y10 +...+ Y20)/12.

Determine the weighted least squares estimator of α in terms of 1Y and 2Y .(A) 0.3 1Y + 0.7 2Y(B) 0.4 1Y + 0.6 2Y(C) 0.5 1Y + 0.5 2Y(D) 0.6 1Y + 0.4 2Y(E) 0.7 1Y + 0.3 2Y

28.13 (4, 11/01, Q.28) (2.5 points) You fit the model Y = α + βX + ε.The error variance is proportional to X-1/2.Which of the following models corrects for this form of heteroscedasticity?(A) YX1/4 = αX1/4 + βX5/4 + ε∗ (B) YX1/4 = α + βX5/4 + ε∗

(C) YX1/2 = αX1/2 + βX3/2 + ε∗

(D) YX-1/4 = αX-1/4 + βX3/4 + ε∗

(E) YX-1/2 = αX-1/2 + βX1/2 + ε∗

28.14 (4, 11/04, Q.23) (2.5 points) The model Yi = βXi + εi is fitted to the following observations:

X Y 1.0 1.04.5 9.07.0 20.1

You are given:Var(εi) = σ2Xi.

Determine the weighted least-squares estimate of β.(A) Less than 2.5(B) At least 2.5, but less than 2.8(C) At least 2.8, but less than 3.1(D) At least 3.1, but less than 3.4(E) At least 3.4


Mahler’s Guide to

Regression

Sections 29-32:

29 Serial Correlation30 Durbin-Watson Statistic31 Correcting for Serial Correlation32 Multicollinearity


prepared by


Study Aid F06-Reg-H


Section 29, Serial Correlation

Assume one has the following time series, Y1, Y2, ... , Y20: 547, 628, 778, 759, 823, 772, 904, 971, 974, 916, 1043, 932, 959, 1077, 998, 1003, 1048, 1371, 1414, 1568.

Exercise: Fit a linear regression to this time series.[Solution: X = 1, 2, ..., 20. X = 10.5. Σxi2 = 665. Y = 974.25. Σxiyi = 25289.5. ^β = Σxiyi/Σxi2 = 25289.5 /665 = 38.0. α = Y -

^βX = 575.]

Exercise: What are the residuals, εt , for this regression?

[Solution: The fitted ^tY are: 613, 651, 689, 727, 765, 803, 841, 879, 917, 955, 993, 1031,

1069, 1107, 1145, 1183, 1221, 1259, 1297, 1335.

εt = Yt - ^tY = -66, -23, 89, 32, 58, -31, 63, 92, 57, -39, 50, -99, -110, -30, -147, -180, -173, 112,

117, 233.]

Here is a graph of these residuals:

5 10 15 20

-100

100

200

There seems to be some tendency for a positive residual to follow a positive residual, and a negative residual to follow a negative residual.

HCMSA-F06-Reg-H, Mahler’s Guide to Regression, 7/12/06, Page 275

This apparent correlation is also shown in a graph of ^t 1ε − versus εt :

There are more points in the lower left and upper right quadrants. Such a lower-left and upper-right pattern, indicates positive serial correlation. On the other hand, a lower-right and upper-left pattern would indicate negative serial correlation.

The visual impression of the above graph of ^t 1ε − versus εt can be quantified by calculating

the sample correlation between ^t 1ε − and εt , which is positive in this example.

The average of the residuals for 1 to 19 is: -12.00The average of the residuals for 2 to 20 is: 3.74.

The sample correlation between ^t 1ε − and εt is:222

20 20 20

Σ( ^t 1ε − + 12)( εt - 3.74)/ √Σ( ^

t 1ε − + 12)2Σ( εt - 3.74)2 = 100382 /√(16804)(220498) = .52.t = 2 t = 2 t = 2

The residuals appear to be positively correlated with their value one time period earlier.223

222 See for example, equation 2.9 in Econometric Models and Economic Forecasts.223 The data for this time series was in fact simulated from a model with positive serial correlation with ρ = .5.


First Order Serial Correlation:

One of the assumptions of the ordinary least squares regression was that the errors were independent. If they were independent the correlation should be zero. It appears that in this case this assumption is violated. Rather we seem to have positive serial correlation.224

Let ρρρρ = Corr[εεεε t-1, εεεε t]. If ρρρρ > 0, then we have positive (first order) serial

correlation.225 If ρ < 0, then we have negative (first order) serial correlation. Positive serial correlation is common in time series.

ρ > 0 ⇔ successive residuals tend to be alike.ρ < 0 ⇔ successive residuals tend to be unalike.ρ = 0 ⇔ successive residuals are approximately independent.226

Model of First Order Serial Correlation:

εεεεt = ρρρρεεεεt-1 + ννννt,

where |ρ| ≤ 1, εt is Normal with mean zero and standard deviation σε,

νt is Normal with mean zero and standard deviation σν, with νt and εt independent.227

σε2 = Var[εt] = Var[ρεt-1 + νt] = ρ2Var[εt-1] + Var[νt] = ρ2σε2 + σν2.228

⇒ Var[εεεεt] = σσσσεεεε2222 ==== σσσσνννν2222/(1 - ρρρρ2222).229

Cov[εt-1, εt] = E[εt-1εt] = E[εt-1(ρεt-1 + νt)] = E[ρεt-12 + εt-1νt)] = ρE[εt-12] + E[εt-1νt]

= ρVar[εt-1] + ρE[εt-1]2 + E[εt-1]E[νt] = ρσε2.230

⇒ Corr[εt-1, εt] = Cov[εt-1, εt]/√(Var[εt-1]Var[εt]) = ρσε2/σε2 = ρ.

Thus ρ is indeed the correlation between successive errors.

Similarly, Corr[εt-2, εt] = ρ2, Corr[εt-3, εt] = ρ3, and Corr[εt-d, εt] = ρd.231

224 Due to random fluctuation in this relatively small sample, the calculated simple correlation is not necessarily a good indication of the underlying correlation of errors. 225 Serial correlation is discussed more extensively with respect to time series.226 Even if the underlying errors are independent, the residuals are not independent of each other, because they sum to zero.227 See Equation 6.12 in Pindyck and Rubinfeld.228 Where we have assumed homoscedasticity. Var[εt-1] = Var[εt].229 See Equation 6.13 in Pindyck and Rubinfeld.230 See Equation 6.14 in Pindyck and Rubinfeld.We have used the fact that νt and εt-1 are independent, and each have mean of zero.231 See Equations 6.15 and 6.16 in Pindyck and Rubinfeld.


Effects of Positive Serial Correlation on Ordinary Regression Estimators:

1. Still unbiased.2. Still consistent.3. No longer efficient.232 4. Standard error of regression is biased downwards.5. Overestimate precision of estimates of model coefficients.6. Some tendency to reject H0: β = 0, when one should not.

7. R2 is biased upwards.

Exercise: For the regression done above what are R2 and 2

R ?

[Solution: RSS ≡ Σ(^

iY - Y )2 = 960,261. TSS = Σ(Yi - Y )2 = 1,186,860.

R2 = RSS/TSS = .81. 1 - 2

R = (1 - R2)(N - 1)/(N - k) = 1 - (.19)(19)/18 = .20. 2

R = .80.]

These estimates of the goodness of fit are biased upwards due to the positive serial correlation. The fitted regression is likely to explain less than 80% of the actual underlying variation.

Exercise: For the regression done above what are the t-statistics?

[Solution: s2 = Σ îε 2 /(N - 2) = 225119/18 = 12507.

Var[ α] = s2ΣXi2 /(NΣxi2) = (12507)(2870)/((20)(665)) = 2699.

sα = √2699 = 51.95. In order to test the hypothesis that α = 0, t = α/sα = 575/ 51.95 = 11.1.

Var[^β] = s2 /Σxi2 = 12507/665 = 18.81. βs = √18.81 = 4.34.

In order to test the hypothesis that β = 0, t = ^β/ βs = 38.0/4.34 = 8.8.]

Based on these very large t-statistics one would ordinarily conclude that each of the coefficients are significantly different than zero. However, the standard error of the regression is biased upwards, due to the positive serial correlation. Therefore, the absolute values of the estimated t-statistics are likely to be too large. One can not trust any conclusions one might draw from these calculated t-statistics.

The statistic commonly used to test for the presence of serial correlation is the Durbin-Watsonstatistic. After the Durbin-Watson statistic is discussed, then methods of correcting for serial correlation will be discussed.

232 No longer does ordinary least squares have the smallest variance of unbiased estimators. This property depended on an assumption that the errors were independent and had equal variance.


Here are some examples of simulated series of εt , with time on the horizontal axis:

ρ = 0.7

ρ = 0

ρ = -0.7


and here are the corresponding plots of ^t 1ε − versus εt :

ρ = 0.7

ρ = 0.0

ρ = -0.7


Simulating Serial Correlation:*

Assume for example we wish to simulate the model: Yi = 22 + 7Xi + εi.

We are given the values for a series of observations of X, and want to simulate a corresponding series of values of Y.

It is assumed that εi is Normally Distributed with mean zero.

If the model has Var[εi] = 100, with no serial correlation, then we simulate this model as follows:233

Yi = 22 + 7Xi + 10(Random Standard Normal Distribution),where for each Yi we simulate a new independent random draw from a Normal Distribution with mean zero and standard deviation 1.

Assume instead that the model has first order serial correlation with for example ρ = .6, and as before Var[εi] = 100. Then we simulate this model as follows:

εi = .6εi-1 + 8(Random Standard Normal Distribution),

Yi = 22 + 7Xi + εi,

where for each εi we simulate a new independent random draw from a Normal Distribution with mean zero and standard deviation 1. In order to initialize, we let ε0 = 10(Random Standard Normal Distribution).

Exercise: Let -0.849, 0.931, 1.988, and -1.253, be four independent random draws from a Standard Normal Distribution. X1 = 1.7, X2 = 3.1, and X3 = 4.5. Simulate Y1, Y2, and Y3.

[Solution: ε0 = 10(-.849) = -8.49. ε1 = (.6)(-8.49) + (8)(.931) = 2.35.

ε2 = (.6)(2.35) + (8)(1.988) = 17.31. ε3 = (.6)(17.31) + (8)(-1.253) = .36.Y1 = 22 + (7)(1.7) + 2.35 = 36.25. Y2 = 22 + (7)(3.1) + 17.31 = 61.01.Y3 = 22 + (7)(4.5) + .36 = 53.86.]

In general, εi = ρεi-1 + (Random Standard Normal Distribution)σ√(1 - ρ2), where σ2 is the

variance of ε.234 Initially we let ε0 = σ(Random Standard Normal Distribution).

Exercise: If εi = .6εi-1 + 8(Random Standard Normal Distribution), what is Var[εi]?

[Solution: Var[εi] = .62Var[εi-1] + 82(1) = .36Var[εi] + 64. ⇒ Var[εi] = 64/.64 = 100.]

Thus in the example above, we do in fact have Var[εi] = 100 as desired.

233 Here one is not interested in the details of how the computer simulates a random draw from a Normal Distribution. How to simulate a Normal Distribution is explained for example in Simulation by Ross. 234 See Equation 6.12 in Econometric Models and Econometric Forecasts by Pindyck and Rubinfeld.


Problems:

29 .1 (1 point) Which of the following is not an effect of positive serial correlation on regressions? A. Ordinary least squares regression estimators are no longer efficient.B. Ordinary least squares regression estimators are no longer consistent.C. The standard error of the regression is biased downwards.D. The estimate of the portion of variation explained by the regression is biased upwards.E. There will be a tendency to reject the null hypothesis when in fact it should not be

rejected. .


Y = 4 + 3t + εt, t = 1, 2, 3, 4, 5.

Prob[εt = 1] = 50% = Prob[εt = -1].

29.2 (3 points) If Corr[εt , εt] = 1, determine the expected value of the Error Sum of Squares.

29.3 (3 points) If Corr[εt , εt] = -1, determine the expected value of the Error Sum of Squares.

29.4 (1 point) According to Pindyck and Rubinfeld in Economic Models and Economic Forecasts, which of the following statements about serial correlation is false? A. When there is first order serial correlation, the errors in one time period are correlated

directly with errors in the ensuing period. B. Serial correlation may be positive or negative.C. Positive serial correlation frequently occurs in time series studies.D. A likely cause of negative serial correlation is the high degree of correlation over time that

is present in the cumulative effects of omitted variables.E. When there is negative serial correlation, a negative error is likely to be followed by a

positive error in the ensuing period.

29.5 (1 point) There is first order serial correlation, εt = ρεt-1 + νt.

ρ = 0.7. Var[ν] = 40. Determine Var[ε].A. 12 B. 20 C. 28 D. 78 E. 133.


29.6 (1 point) The regression model Yt = α + βXt + εt is fit to five different time series.

Which of the following graphs of ^t 1ε − versus εt , indicates negative serial correlation?

A D

B E

C


Section 30, Durbin-Watson Statistic

Exercise: For the regression fit in the previous section, compute Σ( εt - ^t 1ε − )2 / Σεt

2.

[Solution: εt = Yt - ^tY = -66, -23, 89, 32, 58, -31, 63, 92, 57, -39, 50, -99, -110, -30, -147, -180,

-173, 112, 117, 233. Σ( εt - ^t 1ε − )2 = 192,533. Σεt

2 = 225,119. Σ( εt - ^t 1ε − )2 / Σεt

2 = .855.]

This an example of the Durbin-Watson statistic, which is used to test for serial correlation.If the value of the DW statistic is far from 2, such as .855, that indicates the likely presence of serial correlation. The Durbin-Watson statistic is computed from the residuals as follows:235

N N

DW = ΣΣΣΣ ( εεt - εεt 1− )2 / ΣΣΣΣ εεt2.t = 2 t = 1

Exercise: Yt = 23, 25, 33, 39, 46, 44. ^tY = 22.6, 27.5, 32.5, 37.5, 42.5, 47.4.

Compute the Durbin-Watson statistic.

[Solution: εt = Yt - ^tY = .4, -2.5, .5, 1.5, 3.5, -3.4. Σεt

2 = 32.7.

εt - ^t 1ε − = -2.9, 3.0, 1.0, 2.0, -6.9. Σ( εt -

^t 1ε − )2 = 70.0.

DW = Σ( εt - ^t 1ε − )2 / Σεt

2 = 70.0/32.7 = 2.14.]

The DW Statistic has the following properties:0 < DW < 4.no serial correlation ⇔⇔⇔⇔ DW near 2.positive serial correlation ⇔⇔⇔⇔ DW small.negative serial correlation ⇔⇔⇔⇔ DW large.DW ≅≅≅≅ 2(1 - ρρρρ), where ρ is the correlation coefficient of adjacent errors.

One can use the Durbin-Watson statistic to test the null hypothesis that there is no (first order) serial correlation ⇔ ρ = 0. One would need values from the appropriate statistical table, which is not attached to the exam. The values depend on the number of observations, N, and the number of explanatory variables excluding the constant term, k.236 237

235 See Equation 6.22 in Econometric Models and Economic Forecasts. Note that the numerator has only N-1 terms, while the denominator has N terms.236 Note that k in the DW table is the number of explanatory variables excluding the constant, while elsewhere in Econometric Models and Economic Forecasts k is the number of explanatory variables including the constant.237 As N increases, the range of DWs where we do not reject H0 gets narrower. As k increases, the range of DWs

where we do not reject H0 gets wider. As N increases, the range of DW that result in an indeterminate result

decreases. The distribution of the DW statistic is complicated. It not only depends on N, k, and the chosen significance level, it also depends on X, the design matrix of independent variables. See Kendall’s Advanced Theory of Statistics, Volume 2.


Using Table 5 at the back of Econometric Models and Economic Forecasts, at a 5% significance level, for the two-variable regression model, one explanatory variable excluding the constant (k = 1), fit to 20 observations (N =20): dl = 1.20 and du = 1.41. These values, d-lower and d-upper are compared to the DW statistic and conclusions are drawn as follows:H0 : no (first order) serial correlation ⇔ ρ = 00 < DW < 1.20 ⇒ reject H0, positive serial correlation.1.20 < DW < 1.41 ⇒ indeterminate238 1.41 < DW < 4 - 1.41 = 2.59 ⇒ do not reject H0.2.59 < DW < 4 - 1.20 = 2.80 ⇒ indeterminate239 2.80 < DW < 4 ⇒ reject H0, negative serial correlation.

Exercise: A multiple regression model with 4 explanatory variables excluding the constant term has been fit to 50 observations. You test the hypothesis that there is no first order serial correlation of the errors. How does your conclusion depend on the Durbin-Watson Statistic?Hint: Looking in Table 5 in Econometric Models and Economic Forecasts, at a 5% significance level, for k = 4 and N = 50: dl = 1.38 and du = 1.72. [Solution: 0 < DW < 1.38 ⇒ reject H0, positive serial correlation.1.38 < DW < 1.72 ⇒ no conclusion on H0 (but not negative serial correlation.)1.72 < DW < 4 - 1.72 = 2.28 ⇒ do not reject H0.2.28 < DW < 4 - 1.38 = 2.62 ⇒ no conclusion on H0 (but not positive serial correlation.)2.62 < DW < 4 ⇒ reject H0, negative serial correlation.]

As the number of observations increases, the size of the indeterminate regions decreases.240

A Simulation Experiment:*

I performed a simulation experiment on the model: Yt = 50 + 5t + εt, t = 1, 2, ... ,20.I assumed the errors were independent and had variance 100.

I simulated Y1, Y2, ... Y20, fit a linear regression, and computed the Durbin-Watson Statistic.Since the Ys were simulated with no serial correlation, we expect DW to be close to 2.When I performed the experiment 10 times, I got the following values of the DW Statistic, sorted from smallest to largest: 1.35, 1.75, 1.99, 2.03, 2.08, 2.21, 2.36, 2.36, 2.80, 2.85. While most of the values are close to 2, some are not that close.

Using Table 5 at the back of Econometric Models and Economic Forecasts, at a 5% significance level, for the two-variable regression model, one explanatory variable excluding the constant (k = 1), fit to 20 observations (N =20): dl = 1.20 and du = 1.41.

238 No conclusion on H0 (but not negative serial correlation.)239 No conclusion on H0 (but not positive serial correlation.)240 Some people use the rule of thumb, that at least 50 observations of a time series are needed before the Durbin-Watson test is likely to provide worthwhile conclusions.


Since 1.20 < 1.35 < 1.41, in this case the test would have been inconclusive241; there might be positive serial correlation or no serial correlation, but there is not negative serial correlation. Since 1.41 < 1.75 < 2.59, in this case we would not reject the null hypothesis that there is no serial correlation. Since 4 - 1.20 = 2.80 > 2.85 > 2.59 = 4 - 1.41, in this case the test would have been inconclusive; there might be negative serial correlation or no serial correlation, but there is not positive serial correlation.

Here is a graph of DW for 100 such simulations, with horizontal lines at dl = 1.20, du = 1.41,4 - du = 2.59, and 4 - dl = 2.80:

1

1.5

2

2.5

3

The 100 simulations were divided into:2 cases with 0 < DW < 1.20 ⇒ reject H0, positive serial correlation.3 cases with 1.20 < DW < 1.41 ⇒ no conclusion (however, not negative serial correlation.)86 cases with 1.41 < DW < 2.59 ⇒ do not reject H0.5 cases 2.59 < DW < 2.80 ⇒ no conclusion (however, not positive serial correlation.)4 cases with 2.80 < DW < 4 ⇒ reject H0, negative serial correlation.

Since the table from which dl and du were taken was for a 5% significance level, we would expect about 5 cases out of 100 where we reject H0 even though H0 is in fact true.242 For this particular simulation, there are 6 cases where we reject H0 when we should not have.243 241 Assuming one did not know how the data had been simulated.242 In other words, the probability of a Type I Error is 5%.243 At a 1% significance level it turns out dl = .95, du = 1.15. See Kendall’s Advanced Theory of Statistics,

Volume 2, Appendix Table 11. For this same experiment, at the 1% level, there are no cases where one rejects H0, 6 indeterminate cases, and 94 cases where one does not reject H0.


Next, I simulated a similar situation, but with positive serial correlation with ρ = 0.5. As discussed previously, one can simulate this situation as follows:εt = .5εt-1 + vt, where vt are independent Normally distributed with mean 0 and variance 75.

Var[εt] = Var[vt]/(1 - ρ2) = 75/(1 - .52) = 100.

Take ε1 to be Normal with mean 0 and variance 100.

When I performed the experiment 10 times, I got the following values of the DW Statistic, sorted from smallest to largest: .80, .91, .93, 1.14, 1.30, 1.31, 1.34, 1.52, 1.62, 1.63. Since ρ = .5, we expect the Durbin-Watson statistics to be close to 2(1 - ρ) = 1.Using dl = 1.20, du = 1.41, we would have 4 cases (DW < 1.20) where we (correctly) conclude there positive serial correlation, 3 cases (1.20 < DW < 1.41) where we are not sure whether there is positive or no serial correlation, and 3 cases (1.41 < DW < 2.59) where we do not reject the null hypothesis that there is no serial correlation.

Here is a graph of DW for 100 such simulations, with horizontal lines at dl = 1.20, du = 1.41,and 4 - du = 2.59:

0.5

1

1.5

2

2.5

There were many simulation runs which resulted in Durbin-Watson statistics in the indeterminate region, 1.20 < DW < 1.41, in the region where we reject the null hypothesis in favor of positive serial correlation, DW < 1.20, and in the region where we do not reject the null hypothesis, 1.54 < DW < 2.59. With only 20 observations, it is often the case that the Durbin-Watson test will lead to no conclusion.


I performed a similar experiment, but with negative serial correlation, ρ = - .5. (A positive error is more likely to be followed by a negative error and vice-versa.)

Here is a graph of DW for 100 such simulations, with horizontal lines at du = 1.41,4 - du = 2.59, and 4 - dl = 2.80:

1.5

2

2.5

3

3.5

In the experiments with negative serial correlation DW is more likely to be large, while in the experiments with positive serial correlation DW is more likely to be small. However, as with any statistic, for finite sample sizes DW is subject to random fluctuation. So with negative serial correlation, we can get a calculated DW < 2.

Lagged Dependent Variables:244

The model Yt = α + βYt-1 + γXt + εt, contains a lagged dependent variable. The value of Y is assumed to depend among other things on the value of Y during the previous period.245 Assuming we observe Xt and Yt-1, we could use the fitted model to forecast Yt.

While the Durbin-Watson test can be used when a lagged dependent variable is present in the regression, the DW statistic will often be close to 2, even when there is serial correlation. The usual tables of critical values for the DW Statistic, dl and du, are not valid when there is a lagged variable. Using the DW statistic when there is a lagged dependent variable, one is unlikely to find serial correlation when it exists.244 See Section 6.2.3 of Pindyck and Rubinfeld. 245 For example, Y might be automobile insurance claim frequency and X might be the price of gasoline in real dollars.


When there is a lagged dependent variable, the Durbin-Watson test is biased towards not rejecting the null hypothesis of no serial correlation.246 Pindyck and Rubinfeld present two alternatives to the Durbin-Watson test for use when there is a lagged variable. The first is the Durbin h-test.

Durbin’s h-test:

Let us assume we have fit the model Yt = α + βYt-1 + γXt + εt to 200 observations and get a

Durbin-Watson Statistic of 1.74 and Var[^β] = .0009. Then it turns out that

h = (1 - DW/2)√N/(1- N Var[^β]) = (1 - 1.74/2)√200/(1- (200)(.0009) = (.13)(15.62) = 2.03,

has a Standard Normal Distribution if H0 is true. Therefore, since Φ(1.960) = .975 and 2.03 > 1.960, we can reject the null hypothesis at a 5% (two sided test) or 2.5% (one sided test).247

When there is a lagged variable with coefficient ββββ , then the Durbin h-statistic:248

h = (1 - DW/2)√√√√ N/(1 - N Var[ ^ββ]), where N is the number of observations and

DW is the Durbin-Watson Statistic, has a Standard Normal Distribution.

if H0: no serial correlation, is true. This is not valid if N Var[^β] ≥ 1.249

Exercise: One has fit the model Yt = α + βYt-1 + εt to 100 observations of a time series and the

Durbin-Watson Statistic is 1.68 and Var[^β] = .0012. H0 is that there is no serial correlation

and H1 is that there is positive serial correlation. Using Durbin’s h-test, what conclusion do you draw?

[Solution: h = (1 - DW/2)√(N/(1- N Var[^β])) = 1.706. Φ(1.645) = .95 and Φ(1.960) = .975.

1.645 < 1.706 < 1.960. We reject H0 at 5% and do not reject H0 at 2.5% (one-sided test.)]

A Second Technique when one has a Lagged Dependent Variable:

The second technique for dealing with the situation with a lagged dependent variable, involves fitting a regression to the residuals. If the original model is Yt = α + βYt-1 + γXt + εt,

then we fit: εt = a + ρ ^t 1ε − + bYt-1 + gXt + errort. We then apply the usual t-test to ρ. If ρ is

significantly different from zero, then we reject the null hypothesis of no serial correlation.

246 Nevertheless, it is still often used in this situation. 247 A one sided test is performed if the alternative hypothesis is positive serial correlation. A two-sided test is performed if the alternative hypothesis is serial correlation.248 See Section 6.2.3 of Pindyck and Rubinfeld.

249 If N Var[^β] ≥ 1, one would be taking the square root of a negative number.


Exercise: One has fit the model Yt = α + βYt-1 + εt to 125 observations of a time series.

Then using the residuals, one fits εt = a + ρ ^t 1ε − + bYt-1 + errort. ρ = -.23 and Var[ ρ ] = .017.

What conclusion do you draw?[Solution: t = - .23/√.017 = -1.76. We fit 124 residuals, with 3 parameters, for 121 degrees of freedom. The critical values at 10% and 5% are about 1.66 and 1.98. 1.66 < 1.76 < 1.98.We reject H0 at 10% (two-sided test) and do not reject H0 at 5% (two-sided test). At 10%, we conclude there is serial correlation.]

Some Intuition with respect to the Durbin Watson Statistic:

t = N t = N t = 30 t = N-1 t = N t = N

Σ( εt - ^t 1ε − )2 = Σεt

2 - 2Σεt^t 1ε − + Σεt

2 = 2Σεt2 - 2Σεt

^t 1ε − - ε12 - εN

^ 2. t = 2 t = 2 t = 2 t = 1 t = 1 t = 2

If εt-1 and εt have a correlation of zero, then Cov[εt-1 , εt] = 0.

⇒ E[εt-1 εt] = E[εt-1] E[εt] = (0)(0) = 0.

If ρ = 0, and N is not small, then E[ εt^t 1ε − ] ≅ 0, and the sum of cross products is small.250

t = N t = N t = N t = N

DW = Σ( εt - ^t 1ε − )2 / Σεt

2 ≅ 2Σ εt2/ Σεt

2 = 2. t = 2 t = 1 t = 1 t = 1

If instead ρ > 0, then E[ εt^t 1ε − ] > 0, and DW < 2.

If ρ = 1, then E[ εt^t 1ε − ] ≅ E[ εt

2], and DW ≅ 0.

If ρ = -1, then E[ εt^t 1ε − ] ≅ −E[ εt

2], and DW ≅ 4.

250 Since the residuals sum to zero, they are not independent.


Problems:

30 .1 (1 point) The Durbin-Watson Statistic is 2.9. Assuming there is serial correlation of the

errors, estimate ρ, the correlation coefficient between successive errors.(A) Less than -0.6(B) At least -0.6, but less than -0.2(C) At least -0.2, but less than 0.2(D) At least 0.2, but less than 0.6(E) At least 0.6

30.2 (1 point) A multiple regression model with 3 explanatory variables excluding the constant term has been fit to 40 observations. You test the hypothesis that there is no first order serial correlation of the errors. At a 5% significance level, for k = 3 and N = 40: dl = 1.34 and du = 1.66.The Durbin-Watson Statistic is 2.41.Which of the following statements is true?A. Do not reject the null hypothesis.B. Reject the null hypothesis; negative serial correlation present.C. Reject the null hypothesis; positive serial correlation present.D. Result indeterminate.E. None of A, B, C, or D.

30 .3 (3 points) Use the following information:Year (t) Loss Ratio (Y)1 822 783 804 735 77You have fit the following model: Y = 82.5 - 1.5t.What is the Durbin-Watson Statistic?(A) 3.0 (B) 3.2 (C) 3.4 (D) 3.6 (E) 3.8

30.4 (2 points) One has fit the model Yt = α + βYt-1 + γXt + εt to 50 observations of a time

series. The Durbin-Watson Statistic is 1.46. Var[ α] = .17. Var[^β] = .009. Var[ γ ]= .014.

The null hypothesis H0 is that there is no serial correlation, and the alternative hypothesis H1 is that there is positive serial correlation. Using Durbin’s h-test, what conclusion do you draw?A. Reject H0 at 0.005. B. Do not reject H0 at 0.005; reject H0 at 0.010. C. Do not reject H0 at 0.010; reject H0 at 0.025. D. Do not reject H0 at 0.025; reject H0 at 0.050. E. Do not reject H0 at 0.050.


30.5 (1 point) According to Pindyck and Rubinfeld in Econometric Models And Economic Forecasts, which of the following statements about the Durbin-Watson Statistic is false?A. A small value of the Durbin-Watson Statistic is associated with positive serial correlation.B. The Durbin-Watson Statistic should be used to test the regression model

Yt = α + βYt-1 + εt, for serial correlation.C. A value of the Durbin-Watson Statistic near 2 is associated with no serial correlation.D. The Durbin-Watson Statistic is less than 4.E. The Durbin-Watson Statistic is greater than 0.

30.6 (2 points) One has fit the model Yt = α + βYt-1 + γXt + εt to 250 observations of a time

series. The Durbin-Watson Statistic is 1.78. Var[ α] = .0814. Var[^β] = .0023. Var[ γ ] = .0009.

H0 is that there is no serial correlation and H1 is that there is positive serial correlation. Using Durbin’s h-test, what conclusion do you draw?A. Do not reject H0 at 5%. B. Do not reject H0 at 2.5%. Reject H0 at 5%.C. Do not reject H0 at 1%. Reject H0 at 2.5%.

D. Do not reject H0 at 0.5%. Reject H0 at 1%.E. Reject H0 at 0.5%.

30.7 (1 point) One has fit the model Yi = α + βYi-1 + γXi + εi to 45 observations.

Then using the residuals of this fit, one fits îε = a + ρ εi−1 + bYi-1 + gXi + errori.

a = 172, Var[ a] = 3207. ρ = .35, Var[ ρ ] = .031. b^ = .88, Var[b^ ] = .30. g = -2.3, Var[ g] = .82.H0 is that there is no serial correlation and H1 is that there is serial correlation. What conclusion do you draw?A. Do not reject H0 at 10%. B. Do not reject H0 at 5%. Reject H0 at 10%.C. Do not reject H0 at 2%. Reject H0 at 5%.

D. Do not reject H0 at 1%. Reject H0 at 2%.E. Reject H0 at 1%.

For the next 3 questions, use the following 28 values for a time series at regular intervals:2545, 2469, 2392, 2193, 1901, 1718, 1645, 1546, 1433, 1289, 1136, 1041, 1000, 984, 964, 955, 969, 949, 926, 880, 839, 822, 812, 802, 797, 782, 762, 759.These 28 values sum to 35,310.

30.8 (4 points) Fit the regression model, Y = α + βt + ε.

30.9 (4 points) Compute the Durbin-Watson Statistic.

30.10 (1 point) Determine the approximate value of the sample autocorrelation coefficient measuring the association between consecutive residuals.


30.11 (2 points) A linear regression has been fit to a time series with 30 observations.

ε1 = -7. ε30^ = 11.

t = 30 t = 30

Σεt2 = 2422. Σεt

^t 1ε − = 801.

t = 1 t = 2Compute the Durbin-Watson Statistic.(A) Less than 1.3(B) At least 1.3, but less than 1.4(C) At least 1.4, but less than 1.5(D) At least 1.5, but less than 1.6(E) At least 1.6

30.12 (7 points) You are given the following information by quarter on gas prices (inflation adjusted dollars per gallon) and automobile insurance claims frequencies.Gas Price: 1.96 2.21 2.13 2.30 2.57 2.42 2.46 2.68 2.69 2.72 2.75Frequency: .0423 .0383 .0385 .0378 .0364 .0339 .0347 .0402 .0369 .0364 .0385

Gas Price: 2.90 2.94 3.01 2.98 2.86 2.91 3.04 3.16 3.29 3.38Frequency: .0338 .0362 .0372 .0376 .0367 .0342 .0344 .0351 .0355 .0322

Gas Price: 3.40 3.45 3.52 3.51 3.67 3.62 3.68 3.74 3.88 3.92Frequency: .0329 .0327 .0302 .0291 .0257 .0282 .0315 .0308 .0339 .0345Using a regression program on a computer, fit the following model: Yi = α + βYi-1 + γXi + εi,where X is gas prices and Y is claim frequency.Use Durbin’s h test in order to test the null hypothesis H0 is that there is no serial correlation versus the alternative hypothesis H1 is that there is positive serial correlation.

30.13 (5 points) Average premiums for homeowners insurance follow the modelY = 400 + 20t + ε.Due to an underwriting cycle, ε1 = 0, ε2 = 10, ε3 = 20, ε4 = 10, ε5 = 0, ε6 = -10, ε7 = -20,

ε8 = -10, ε9 = 0, ε10 = 10, ε11 = 20, ε12 = 10, ε13 = 0, ε14 = -10, ε15 = -20, ε16 = -10, ε17 = 0.With the aid of a computer, for t = 1, 2, ..., 17, determine Yi, and fit a linear regression of Y as

function of t. Determine the values of: R2, 2

R , the t-statistics, the F-Statistic, and the Durbin-Watson Statistic.


30.14 (Course 120 Sample Exam #1, Q.8) (2 points) You fit a simple linear regression model to the following eight observations obtained on consecutive days:

Day Y X1 11 22 20 23 30 34 39 35 51 46 59 47 70 58 80 5

Using the method of least squares, you determine the estimated regression line

^Y = -25 + 20X.Determine the value of the Durbin-Watson statistic.(A) 3.5 (B) 3.6 (C) 3.7 (D) 3.8 (E) 3.9

30.15 (Course 120 Sample Exam #3, Q.10) ( 2 points) You have performed a simple regression analysis and determined that the value of the Durbin-Watson test statistic is 0.8.Determine the approximate value of the sample autocorrelation coefficient measuring the association between consecutive residuals.(A) 0.2 (B) 0.3 (C) 0.4 (D) 0.6 (E) 0.8

30.16 (Course 4 Sample Exam, Q.30) (2.5 points) You wish to determine the relationship between sales (Y) and the number of radio advertisements broadcast (X). Data collected on four consecutive days is shown below.

Day Sales Number of Radio Advertisements1 10 22 20 23 30 34 40 3

Using the method of least squares, you determine the estimated regression line:

^Y = -25 + 20X

Determine the value of the Durbin-Watson statistic for this model.


30.17 (4, 5/00, Q.24) (2.5 points) You are given the following linear regression results:t Actual Fitted1 77.0 77.62 69.9 70.63 73.2 70.94 72.7 72.75 66.1 67.1

Estimate the lag 1 serial correlation coefficient for the residuals, using the Durbin-Watsonstatistic.(A) Less than -0.2(B) At least -0.2, but less than -0.1(C) At least -0.1, but less than 0.0(D) At least 0.0, but less than 0.1(E) At least 0.1

30.18 (VEE-Applied Statistics Exam, 8/05, Q.14) (2.5 points) You are given: (i) The model is St = α + βSt-1 + γAt + εt

where S denotes monthly sales and A denotes monthly advertising expenses. (ii) A regression fit with T = 36 yields (standard errors in parentheses):

S^

t = 7.5 + 0.5St-1 + 0.2At (0.5) (0.1) (0.03) s = 4.3, R2 = 0.94, DW = 1.2

Determine the outcome of Durbin's h test for the presence of serial correlation in the errors.(A) The test does not reject the null hypothesis of no serial correlation at the 10%

significance level. (B) The test rejects the null hypothesis of no serial correlation at the 10% significance level,

but not at the 5% significance level. (C) The test rejects the null hypothesis of no serial correlation at the 5% significance level,

but not at the 2.5% significance level. (D) The test rejects the null hypothesis of no serial correlation at the 2.5% significance level,

but not at the 1 % significance level. (E) The test rejects the null hypothesis of no serial correlation at the 1% significance level.


Section 31, Correcting for Serial Correlation:

We will discuss two methods of correcting for serial correlation when it exists: the Hildreth-Lu Procedure and the Cochrane-Orcutt Procedure

Hildreth-Lu Procedure:

One method of correcting for serial correlation is the Hildreth-Lu procedure.

Recall that when the following time series, Y1, Y2, ... , Y20: 547, 628, 778, 759, 823, 772, 904, 971, 974, 916, 1043, 932, 959, 1077, 998, 1003, 1048, 1371, 1414, 1568, was fit by linear regression, there was evidence of positive first order serial correlation. The Durbin-Watson Statistic was .855. Since DW ≅ 2(1 - ρ), ρ ≅ 1 - DW/2 = 1 - .855/2 = .57.

Thus we assume ρ is approximately .57. We can try various value for ρ, such as: .50, .55, .60, .65, and see which one “works best.”

Let Xt* = Xt - ρXt-1, and Yt

* = Yt - ρYt-1. As discussed below, we hope this transformation will remove much of the serial correlation present in the original regression.

For example, for ρ = .55, X2* = X2 - .55X1 = 2 - (.55)(1) = 1.45.

X∗ = 1.45, 1.90, 2.35, 2.80, 3.25, 3.70, 4.15, 4.60, 5.05, 5.50, 5.95, 6.40, 6.85, 7.30, 7.75, 8.20, 8.65, 9.10, 9.55.Y2

* = Y2 - .55Y1 = 628 - (.55)(547) = 327.15.251

Y∗ = 327.15, 432.6, 331.1, 405.55, 319.35, 479.4, 473.8, 439.95, 380.3, 539.2, 358.35, 446.4, 549.55, 405.65, 454.1, 496.35, 794.6, 659.95, 790.3.

The transformed regression equation is: Y∗ = α(1 - ρ) + βX∗ = .45α + βX∗, where α and β are the intercept and slope of the equation that relates to the original variables, Y = α + βX.

Exercise: Fit a linear regression to X∗ and Y∗. Determine ESS.

[Solution: .45 α = 255.655, and ^β = 40.4421. εt = 12.8537, 100.105, -19.5942, 36.6568,

-67.7421, 74.1089, 50.31, -1.73895, -79.5879, 61.1132, -137.936, -68.0847, 16.8663,

-145.233, -114.982, -90.9305, 189.121, 36.2716, 148.423. ESS = Σεt2 = 160,238.]

Thus α = 255.655 / .45 = 568.12 and ^β = 40.44.

In terms of the original variables: ^tY = 568.12 + 40.44t.

Proceeding in the same way for various values of ρ, the regressions of X∗ and Y∗ are:

251 There is no Y1*. After the transformation, there are only 19 rather than 20 values.


ρ intercept slope ESS.50 286.479 39.7930 161,681.55 255.655 40.4421 160,238.60 224.669 41.2535 159,588.65 193.452 42.2967 159,732

The smallest Error Sum of Squares corresponds to ρ = .60.252 Thus we could use the results

of this regression: slope = (1 - .60) α = 224.669 and ^β = 41.2535. α = 224.669/.4 = 561.67.


Note this differs somewhat from the linear regression fit to the original variables:

^tY = 575 + 38.0t.

Exercise: Use both the original regression and the result of the Hildreth-Lu procedure in order to forecast Y22.[Solution: Using the original regression, Y22 = 575 + (38.0)(22) = 1411.Using the Hildreth-Lu procedure, Y22 = 561.67 + (41.25)(22) = 1469.]

The Reason for the Transformed Variables and the Transformed Regression Equation:*

Assume Yt = α + βXt + εt, with Corr[εt-1, εt] = ρ, Corr[εt-2, εt] = ρ2, and Var[εt] = σ2.253

Then Yt - ρYt-1 = α + βXt + εt - ρ(α + βXt-1 + εt-1) = α(1 - ρ) + βX∗ + ε*t , where

Xt* = Xt - ρXt-1, and ε*

t = εt - ρεt-1.

Var[ε*t ] = Var[εt - ρεt-1] = Var[εt] + ρ2Var[εt-1] - 2ρCov[εt, εt-1] = σ2 + ρ2σ2 - 2ρ2σ2 = (1 - ρ2)σ2.

Cov[ε*t 1− ,ε*

t ] = Cov[εt-1 - ρεt-2, εt - ρεt-1] =

Cov[εt-1, εt] + ρ2Cov[εt-2, εt-1] - ρCov[εt-1, εt-1] - ρCov[εt, εt-2] = ρσ2 + ρ2ρσ2 - ρσ2 - ρρ2σ2 = 0.

Thus ε*t have constant variance and no first order serial correlation.

If Yt* = Yt - ρYt-1 and Xt

* = Xt - ρXt-1, then the transformed equation is: Y∗ = α(1 - ρ) + βX∗.

Summary of the Hildreth-Lu procedure:254

0. Choose a grid of likely values for -1 ≤ ρ ≤ 1. For each such value of ρ:

1. Let Xt* = Xt - ρXt-1, and Yt

* = Yt - ρYt-1.

2. Fit by regression the transformed equation Y∗ = α(1 - ρ) + βX∗.

252 We might then create a new finer grid of values of ρ centered on .6, and try to refine our estimate of ρ.253 This is the covariance structure we expect with homoscedasticity and first order serial correlation.254 Shown for the case with a constant plus one independent variable. The procedure is the same for more than one independent variable, except one must transform each independent variable in the same manner.


3. The best regression has the smallest Error Sum of Squares (ESS).If desired, then refine the grid of values for ρ, and again perform steps 1, 2, and 3.Translate the transformed equation back to the original variables: Yt = α + βXt.

Cochrane-Orcutt Procedure:

Another method of correcting for serial correlation is the Cochrane-Orcutt procedure.

When the following time series, Y1, Y2, ... , Y20: 547, 628, 778, 759, 823, 772, 904, 971, 974, 916, 1043, 932, 959, 1077, 998, 1003, 1048, 1371, 1414, 1568, was fit by linear regression,

the residuals, εt were: -66, -23, 89, 32, 58, -31, 63, 92, 57, -39, 50, -99, -110, -30, -147, -180, -173, 112, 117, 233. There was evidence of positive first order serial correlation.

Exercise: Fit a linear regression εt = ρ ^t 1ε − + error.

[Solution: Since this is a model with no intercept, ρ = Σ^t 1ε − εt / Σ^

t 1ε −2 =

(-66)(-23) + ... + (117)(233)/(-66)2 + ... + 1172 = 99530/170830 = .58.]

Thus our first estimate of the lag 1 serial correlation coefficient is .58.Let Xt

* = Xt - .58 Xt-1, and Yt* = Yt - .58 Yt-1. As discussed previously, we hope this

transformation, the same one used in the Hildreth-Lu procedure, will remove much of the serial correlation present in the original regression.

X2* = X2 - .58X1 = 2 - (.58)(1) = 1.42.

X∗ = 1.42, 1.84, 2.26, 2.68, 3.10, 3.52, 3.94, 4.36, 4.78, 5.20, 5.62, 6.04, 6.46, 6.88, 7.30, 7.72, 8.14, 8.56, 8.98.Y2

* = Y2 - .58 Y1 = 628 - (.58)(547) = 310.74.255

Y∗ = 310.74, 413.76, 307.76, 382.78, 294.66, 456.24, 446.68, 410.82, 351.08, 511.72, 327.06, 418.44, 520.78, 373.34, 424.16, 466.26, 763.16, 618.82, 747.88.

The transformed regression equation is: Y∗ = α(1 - ρ) + βX∗ = .42α + βX∗, where α and β are the intercept and slope of the equation that relates to the original variables.

Exercise: Fit a linear regression to X∗ and Y∗.

[Solution: slope = .42 α = 237.1, and ^β = 40.9.]

Thus α = 237.1 / .42 = 564.5 and ^β = 40.9.


255 There is no Y1*. After the transformation, there are only 19 rather than 20 values.


Exercise: For this revised equation compute^tY and εt .

[Solution: ^tY = 605.4, 646.3, 687.2, 728.1, 769.0, 809.9, 850.8, 891.7, 932.6, 973.5, 1014.4,

1055.3, 1096.2, 1137.1, 1178., 1218.9, 1259.8, 1300.7, 1341.6, 1382.5.

εt = -58.4, -18.3, 90.8, 30.9, 54.0, -37.9, 53.2, 79.3, 41.4, -57.5, 28.6, -123.3, -137.2, -60.1, -180.0, -215.9, -211.8, 70.3, 72.4, 185.5.]

One could perform another iteration of this whole procedure.

Exercise: Fit a linear regression εt = ρ ^t 1ε − + error.

[Solution: ρ = Σ^t 1ε − εt / Σ^

t 1ε −2 = 123969/203949 = .61.]

Thus our second estimate of the lag 1 serial correlation coefficient is .61, compared to our first estimate of .58.

Let Xt* = Xt - .61 Xt-1, and Yt

* = Yt - .61 Yt-1.

X∗ = 1.39, 1.78, 2.17, 2.56, 2.95, 3.34, 3.73, 4.12, 4.51, 4.9, 5.29, 5.68, 6.07, 6.46, 6.85, 7.24, 7.63, 8.02, 8.41.Y∗ = 294.33, 394.92, 284.42, 360.01, 269.97, 433.08, 419.56, 381.69, 321.86, 484.24, 295.77, 390.48, 492.01, 341.03, 394.22, 436.17, 731.72, 577.69, 705.46.

The transformed regression equation is: Y∗ = α(1 - ρ) + βX∗ = (1 - .61)α + βX∗, where α and β are the intercept and slope of the equation that relates to original variables.

Exercise: Fit a linear regression to X∗ and Y∗.

[Solution: slope = .39 α = 218.45 and ^β = 41.44.]

Thus α = 218.45 / .39 = 560.13 and ^β = 41.44.


Exercise: For this revised equation compute^tY and εt .

[Solution: ^tY = 601.57, 643.01, 684.45, 725.89, 767.33, 808.77, 850.21, 891.65, 933.09,

974.53, 1015.97, 1057.41, 1098.85, 1140.29, 1181.73, 1223.17, 1264.61, 1306.05, 1347.49, 1388.93.

εt = -54.57, -15.01, 93.55, 33.11, 55.67, -36.77, 53.79, 79.35, 40.91, -58.53, 27.03, -125.41, -139.85, -63.29, -183.73, -220.17, -216.61, 64.95, 66.51, 179.07.]


One could perform yet another iteration of this whole procedure.

Fitting a linear regression εt = ρ ^t 1ε − + error:

ρ = Σ^t 1ε − εt / Σ^

t 1ε −2 = 127058/209607 = .61.

Since ρ seems to have converged, we can exit the procedure.256

The resulting model is: ^tY = 560.13 + 41.44t.

Summary of the Cochrane-Orcutt procedure:257

1. Fit a linear regression and get the resulting residuals εt .

2. Estimate the serial correlation coefficient: ρ = Σ ^t 1ε − εt / Σ^

t 1ε −2.


* = Yt - ρYt-1.


5. Translate this transformed equation back to the original variables: Yt = α + βXt, and get the

resulting residuals εt .


t 1ε −2.258

7. Unless the value of ρ seems to have converged or enough iterations have been performed, return to step 3.259

256 One could instead take ρ to a little more accuracy if desired, such as ρ = .606.257 Shown for the case with a constant plus one independent variable. The procedure is the same for more than one independent variable, except one must transform each independent variable in the same manner.

258 Note that since we are fitting a linear regression between ^t 1ε − and εt , the denominator only includes the sum

of squares of ^t 1ε − , with the square of the final error missing. This is contrast to equations 16.23 and 18.15 in

Pindyck and Rubinfeld, related to time series, which involve all of the squared differences.Note that the denominator of the Durbin-Watson Statistic includes the sum of the squares of all the residuals.259 Pindyck and Rubinfeld suggest one exit when the change in ρ is less than either .01 or .005, or when one has performed 10 or 20 iterations.


Problems:

31 .1 (2 points) You are given the following linear regression results:t Actual Fitted1 68.9 69.92 73.2 72.83 76.8 75.64 79.0 78.55 80.3 81.4

Determine the estimated lag 1 serial correlation coefficient after one iteration of the Cochrane-Orcutt procedure.(A) -.10 (B) -0.05 (C) 0 (D) .05 (E) .10

31.2 (1 point) For a time series with 100 observations, you wish to estimate the serial correlation for the model Yt = α + βt + εt. Y = (21.268, 15.557, 4.8898, 10.878, 14.122, 33.743, 10.395, 5.158, 17.910, 38.840, ...). Using the Hildreth-Lu procedure, for ρ = .3, what is Y6

*?(A)20 (B)25 (C) 30 (D) 35 (E) 40

31.3 (1 point) For a time series with 100 observations, you have transformed the variables and fit a linear regression as in the Hildreth-Lu procedure.ρ ESS.3 9808.5.4 9712.4.5 9844.4.6 10204.4.7 10792.5Determine the estimated lag 1 serial correlation coefficient.(A) 0.3 (B) 0.4 (C) 0.5 (D) 0.6 (E) 0.7


Use the following information for the next 4 questions:One has 3 years of monthly values of the Consumer Price Index for Medical Care,t = 1, 2, 3, ..., 36: 216.6, 217.9, 218.4, 218.9, 219.3, 219.8, 220.8, 221.6, 222.1, 222.9, 223.5, 223.8, 225.2, 226.2, 226.6, 227.0, 227.4, 227.8, 228.7, 229.2, 229.4, 230.1, 230.5, 230.6, 231.8, 232.7, 233.4, 233.8, 234.2, 234.4, 234.8, 235.2, 235.4, 235.8, 236.4, 237.1.Use a computer to help you with the calculations.

31.4 (4 points) Fit an exponential regression to this data, ln(CPI) = α + βt.

31.5 (3 points) What is the Durbin-Watson Statistic for this regression?

31.6 (8 points) Apply the Cochrane-Orcutt procedure, in order to correct for serial correlation.

31.7 (8 points) Apply the Hildreth-Lu procedure, in order to correct for serial correlation.

31.8 (4, 11/00, Q.12) (2.5 points) You are given the following linear regression results:t Actual Fitted1 77.0 77.62 69.9 70.63 73.2 70.94 72.7 72.75 66.1 67.1

Determine the estimated lag 1 serial correlation coefficient after one iteration of the Cochrane-Orcutt procedure.(A) –0.3 (B) –0.2 (C) –0.1 (D) 0.0 (E) 0.1

31.9 (4, 5/01, Q.33) (2.5 points) An actuary uses annual premium income from the previous year as the independent variable and loss ratio in the current year as the dependent variable in a two-variable linear regression model. Using 20 years of data, the actuary estimates the model slope coefficient with the ordinary least-squares estimator β and does not take into account that the error terms in the model follow an AR(1) model with first-order autocorrelation coefficient ρ > 0.Which of the following statements is false?(A) The estimator β is biased.(B) The estimator β is consistent.(C) The R2 probably gives an overly optimistic picture of the success of the regression.(D) The estimator of the standard error of β is biased downward.(E) Use of the Cochrane-Orcutt procedure would have produced a consistent estimator of

the model slope with variance probably smaller than the variance of β.


Section 32, Multicollinearity260

In the multiple regression model, we assumed no linear relationship among the independent variables. Specifically we assumed the k by k cross product matrix X’X had dimension k, so that its matrix inverse exists.

If X’X has dimension less than k, then we have what is called perfect multicollinearity. In the simplest case of perfect multicollinearity, two independent variables are collinear, i.e. proportional. In the case of multicollinearity, the equations to estimate the parameters will not work; in this case the inverse of X’X does not exist. Less extreme, are situations where even though the inverse of X’X exists, independent variables are highly correlated.

There is a high degree of multicollinearity when some of the independent variables or combinations of independent variables are highly correlated.261 This creates problems.

A high degree of multicollinearity generally results in large variances of the estimated parameters.262 It also generally results in high covariances between estimated parameters. A high degree of multicollinearity, usually leads to unreliable estimates of the regression parameters.

Signs of Possible Multicollinearity:263

1. If several parameters have high standard errors and dropping one or more variables from the equation lowers the remaining standard errors.2. High R2, (and significant F-Statistic,) with few significant t-statistics.3. High simple correlations between pairs of variables may indicate multicollinearity.4. A high “condition number.”264 5. Sign of fitted parameter opposite of what is expected for one or more parameters. 6. When a variable is excluded from the model, one has R2 ≥ .95.

Adding an additional independent variable to a regression model does not improve its performance if this variable is highly correlated with either one or more of the current variables or a linear combination of the current variables.

As will be discussed subsequently, multiple regression is often used in order to estimate the elasticity of the dependent variable with respect to each independent variable. If there is a high degree of multicollinearity, then such estimates of elasticities are unreliable or even meaningless.

260 See Section 4.4 of Pindyck and Rubinfeld.261 This is often described as X’X being an ill-conditioned matrix; one can also say the data is ill-conditioned. In this case, the determinant of X’X will be very small. 262 As discussed previously, in the three variable model, the variance of the fitted slopes increases when the two independent variables are highly correlated.263 See Section 4.4.3 of Pindyck and Rubinfeld. See also Section 16.4 of Applied Regression Analysis by Draper and Smith. 264 Mentioned in Pindyck and Rubinfeld. For an explanation, see for example Section 16.5 of Applied Regression Analysis by Draper and Smith.


For example, let us assume the dependent variable is the miles per gallon for a automobile.Vehicle weight and horsepower of the engine, might each be useful independent variables for a regression model. However, if these two variables are highly correlated, one should not use them both in the model. Perhaps instead, weight and the ratio of weight to horsepower could be used together in the model.

Variance Inflation Factors:*

If we regress the ith independent variable against all of the other independent variables,then the Variance Inflation Factor is: VIFi = 1/(1 - Ri2) ≥ 1.265 If one or more of the VIFs is large, that is an indication of multicollinearity.

Average Claim Costs:*

Insurance average claims costs are often modeled via consumer price indices (CPI).However, different CPIs usually have a high correlation; when one CPI increases by a lot the other CPI is more likely to increase by a lot. Therefore, if we used two or more CPIs in a regression model, we are likely to run into the problem of multicollinearity.

For example, for homeowners insurance, average claim costs are related to both the overall rate of inflation and the cost of construction (in order to repair or replace damaged homes.) Rather than use two CPIs separately, one might instead use a 45%-55% weighting of the Consumer Price Index for all items and a Construction Cost Index.266 The selected weights would approximately reflect the mix of the costs of items for which the claims are paying.

Classifications for Ratemaking:*

In insurance ratemaking, insureds are divided into classifications.267 Insureds classified similarly will be charged the same or similar rates.268

For example, in automobile insurance one might use the location, characteristics of the driver, and characteristics of the vehicle, in order to classify an insured. Since these three types of variables are not highly correlated, we get a more accurate rate, than if correlated variables were used.

265 See Section 16.4 of Applied Regression Analysis by Draper and Smith. The VIFs are the diagonal elements of the inverse of the correlation matrix. 266 See “Homeowners Insurance Pricing” by Mark Homan in the 1990 CAS Discussion Paper Program. 267 See for example “Risk Classification” by Robert Finger, in Foundations of Casually Actuarial Science.268 While linear regression is usually not used to determine classification rates, the problem of multicollinearity still applies.


Problems:

32.1 (3 points) You are given the following observations of 10 young boys:Age Height Weight Chest Depth Maximal Oxygen Uptake(years) (cms.) (kgs.) (centimeters) (milliliters per kilogram) X2 X3 X4 X5 Y8.4 132.0 29.1 14.4 1.54 8.7 135.5 29.7 14.5 1.74 8.9 127.7 28.4 14.0 1.32 9.9 131.1 28.8 14.2 1.50 9.0 130.0 25.9 13.6 1.46 7.7 127.6 27.6 13.9 1.35 7.3 129.9 29.0 14.0 1.53 9.9 138.1 33.6 14.6 1.71 9.3 126.6 27.7 13.9 1.27 8.1 131.8 30.8 14.5 1.50The correlation matrix between the independent variables is:( 1 .327 .231 .166)(.327 1 .790 .791)(.231 .790 1 .881)(.166 .791 .881 1 )

The model Y = β1 + β2X2 + β3X3 + β4X4 + β5X5 + ε, was fit:Parameter Estimate Standard Error T-Statistic P-Valueβ1 -4.775 .8628 -5.534 0.3%

β2 -0.03521 .01539 -2.289 7.1%

β3 0.05164 .006215 8.308 0.04%

β4 -0.02342 .01343 -1.744 14.2%

β5 0.03449 .08524 0.4046 70.3%

R2 = .967. 2

R = .941. s2 = .001385. Durbin-Watson = 2.753.D.F. Sum of Squares Mean Square F-Ratio P-Value

RSS 4 0.20604 0.0515 37 0.07%ESS 5 0.00692 0.00138TSS 9 0.21296 (0.744 -0.00122 -0.00144 0.00910 -0.0572 ) (-0.00122 0.000237 -0.0000274 -0.0000178 0.000231)

Var[^β] = (-0.00144 -0.0000274 0.0000386 -0.0000235 -0.000191)

(0.00910 -0.0000178 -0.0000235 0.000180 -0.000784) (-0.0572 0.000231 -0.000191 -0.000784 0.00727 )

Are there signs of possible multicollinearity? If so what are they? What additional steps, if any, might you take to test for multicollinearity?


32.2 (1 point) The Hoffa Insurance Company is studying the loss ratios of long haul trucking firms for which it is writing Workers Compensation Insurance in the United States.One fits a multiple linear regression model with: Y = loss ratio for the most recent year, X2 = written premium, X3 = number of trucks, X4 = log of number of years insured with Hoffa Insurance, and dummy variables D1 to D49 indicating whether or not there is exposure in each state,(excluding Hawaii and Alaska, but including the District of Columbia.)List one specific reason why multicollinearity could be expected to cause a problem.

32.3 (3 points) Four different multiple regressions have each been fit to the same 20 observations. Each regression has the same dependent variable, but uses different combinations of the independent variables X2, X3, X4, and X5. In which case is there an indication of multicollinearity?

A. RSS = 773. ESS = 227. ^β2 = 44.

^β3 = 1.3.

^β4 = -2.4. s

2β= 16. s

3β= 0.7. s

4β= 0.5.

B. RSS = 798. ESS = 202. ^β2 = 40.

^β3 = 1.5.

^β5 = 0.35. s

2β= 18. s

3β= 0.6. s

5β = 0.16.

C. RSS = 769. ESS = 231. ^β2 = 37.

^β4 = -1.9.

^β5 = 0.41. s

2β= 25. s

4β= 1.4. s

5β = 0.23.

D. RSS = 785. ESS = 215. ^β3 = 1.7.

^β4 = -2.8.

^β5 = 0.45. s

3β= 0.7. s

4β= 1.1. s

5β = 0.24.

E. None of A, B, C, or D has an indication of multicollinearity.


Mahler’s Guide to

Regression

Sections 33-35:

33 Forecasting34 Testing Forecasts35 Forecasting with Serial Correlation


prepared by


Study Aid F06-Reg-I


Section 33, Forecasting

One can use regression to forecast the value of the dependent variable for a value of the independent variable, or in the case of a multiple regression a set of values of the independent variables. Assuming we have selected the right model, there are two different reasons why the forecasted value will not equal the observed value. First, the coefficients of the model have been estimated from a finite sample of data. Second, the model assumes there is a random element to the observation.

A Time Series Example:

Linear regression models are often fit to time series.

For example assume you have the following 5 years of loss ratios:269

Year (t) Loss Ratio (Y)1 842 803 814 765 74

Exercise: Fit the two-variable linear regression model: Y = α + βt + ε, to the above data.[Solution: x = -2, -1, 0, 1, 2. Y = 79. y = 5, 1, 2, -3, -5.^β = Σxiyi/Σxi2 = -24/10 = -2.4. α = Y -

^βX = 79 - (-2.4)(3) = 86.2.]

Exercise: What is the standard error of this regression?

[Solution: ^tY = 83.8, 81.4, 79, 76.6, 74.2. εt = Yt -

^tY = .2 , -1.4, 2, -0.6, -0.2.

s2 = Σεt2 / (N - 2) = 6.4/3 = 2.133. s = 1.461.]

Exercise: What are the standard errors of the slope and intercept?

[Solution: Var[ α] = s2ΣXi2 /(NΣxi2) = (2.133)(55)/((5)(10)) = 2.346. sα = √2.346 = 1.532.

Var[^β] = s2 /Σxi2 = 2.133/10 = .2133. βs = √.2133 = .462.]

Exercise: What is the covariance of the estimated slope and intercept?

[Solution: Cov[ α,^β] = -s2 X /Σxi2 = -(2.133)(3)/10 = -.640.]

Corr[ α,^β] = Cov[ α,


^β]) = -.640/√((2.346)(.2133)) = -.905.

269 Loss Ratio = Losses/Premium. The 80 shown here corresponds to a loss ratio of 80%. These loss ratios have presumably been brought to ultimate and adjusted for any rate changes.

HCMSA-F06-Reg-I, Mahler’s Guide to Regression, 7/12/06, Page 307

Predicting the Future:

Sometimes linear regression models fit to time series are used in order to forecast the future.270

In the above example, the forecasted loss ratio for year 6 is: α + ^β6 = 86.2 + (-2.4)(6) = 71.8.

This is a point estimate as opposed to an interval estimate, which can be obtained as follows.

Using the general properties of variances, the variance of this forecast is:271

Var[ α + ^β6] = Var[ α] + 62Var[

^β] + (2)(6)Cov[ α,

^β] = 2.346 + (36)(.2133) + (12)(-.640) = 2.345.

Thus the standard error of the forecasted loss ratio for year 6 is: √2.345 = 1.531.

With 5 - 2 = 3 degrees of freedom, a 95% confidence interval from the t-table is: ±3.182 standard deviations.272 Thus an approximate 95% confidence interval for this forecast is: 71.8 ± (3.182)(1.531) = 71.8 ± 4.9.

Exercise: What would the forecasted loss ratio be for year 8? What is the standard error of this forecast?[Solution: estimate = 86.2 + (-2.4)(8) = 67.0.

Var[ α + ^β8] = 2.346 + (82)(.2133) + (8)(2)(-.640) = 5.757. Standard error = √5.757 = 2.399.]

Thus an approximate 95% confidence interval for the forecasted loss ratio for year 8 is: 67.0 ± (3.182)(2.399) = 67.0 ± 7.6. This confidence interval is wider. In general as one tries to forecast further out into the future, the confidence intervals are wider.

One can derive a formula for the variance of such a forecast. Assume one is forecasting at

the value X of the independent variable, then the estimate is α + ^βX, with variance:

Var[ α] + X2Var[^β] + 2XCov[ α,

^β] = s2ΣXi2 /(NΣxi2) + X2 s2 /Σxi2 - 2Xs2 X /Σxi2

= s2(Σxi2 + NX2)/(NΣxi2) + (X2 - 2XX) /Σxi2 = s21/N + (X2 - 2XX + X2) /Σxi2 =

s21/N + x2/Σxi2.

When we have observed N data points, with independent variable Xi, with mean X:

Variance of forecast at x = X - X is: s21/N + x2/ΣΣΣΣxi2.273

270 Generally linear regression (or exponential regression) could be used to forecast for a short period of time beyond the observed data. It would not be used to forecast the distant future.271 Var[aX + bY] = a2Var[X] + b2Var[Y] + 2abCov[X,Y]. This is also a special case of the delta method.272 A sum of 5% area in the two tails, with 6 d.f. For the 2-variable model, the degrees of freedom are N-2.273 See for example, section 3.1 of Applied Regression Analysis, by Draper and Smith.


As X gets further from the mean of the values of the independent variable contained in the observations, X, x = X - X gets larger, and the variance of the forecast increases.

Near X, the forecast is not affected very much by the value of ^β, while far from X, the forecast

is affected a great deal by the value of ^β. Therefore, when far from X, the uncertainty in the

estimate of the slope, leads to the large variance of the forecast.

Exercise: Use this alternate formula in order to estimate the variance of the forecasted loss ratio for year 8.[Solution: x = 8 - 3 = 5. N = 5. s2 = 2.133. Σxi2 = 10. s21/N + x2 /Σxi2 = 5.759.]

This matches the result obtained previously. For this example, the variance of a forecast at time t is: s21/N + x2/Σxi2 = (2.133)1/5 + (t - 3)2/10). Therefore, a 95% confidence interval for

this forecast is: 86.2 - 2.4t ± (3.182)(1.461)√1/5 + (t - 3)2/10) =86.2 - 2.4t ± (4.65)√.2 + (t - 3)2/10).

Here is a graph of the observed loss ratios, the regression line (solid), and 95% confidence intervals for the forecasted loss ratios (dashed):

2 4 6 8 10time

55

60

65

70

75

80

L.R.

We have predicted the expected value of the loss ratio for each time. However, we have yet to consider the fact that for a given time the observed loss ratio will vary around its expected value.


Random Nature of the Model:

Let us assume we knew that the correct model was Y = α + βt + ε, with α = 86, β = -2 and ε is Normally Distributed with mean zero and variance 1.5.274 Note that these are not values estimated from a finite data set, but are the assumed true values.

Then the expected loss ratio for t = 6 is: 86 - (2)(6) = 74. Therefore, the loss ratio for t = 6 is Normally Distributed with mean 74 and variance 1.5. If one simulated this situation, the observed loss ratio at time 6 would vary around 74. The mean squared difference between 74 and the observed loss ratio at time 6 is 1.5.

Thus even when there is no error due to estimating the coefficients of the regression model from data, the forecast would not be equal to the observation; the mean squared forecast error would be σ2, the variance of ε.

Forecast Errors:

Combining the two causes of forecast error, the variance of the forecast and the variance of the model: Mean Squared Forecast Error at x = X - X is: σ21 + 1/N + x2/Σxi2.275

Taking into account both the error in the estimated coefficients and the random nature of the model, and estimating as usual σ2 by s2:

Mean Squared Forecast Error at x = X - X is: sf2 = s21 + 1/N + x2/ΣΣΣΣxi2.

For the time series example, at time 6, the mean squared forecast error is:2.1331 + 1/5 + (6 - 3)2/10 = 4.479.

Note that this is the variance of the forecast at time 6, 2.345 as calculated previously, plus s2 = 2.133, subject to rounding. In general, the mean squared forecast error is the sum of s2 and the variance of the forecast.

Exercise: For the time series example, what is the mean squared forecast error at time 7?[Solution: x = 7 - 3 = 4. N = 5. s2 = 2.133. Σxi2 = 10. s21 + 1/N + x2 /Σxi2 = 5.972.]

If one estimated the regression from data at time 1, 2, ..., T, and one is forecasting at time T+1,Mean Squared Forecast Error is: sf2 = s21 + 1/T + ((((XT+1 - X)2/ΣΣΣΣ((((Xi - X)2.276

The mean squared forecast error is smallest when we try to predict the value of the dependent variable at the mean of the independent variable.

274 We assume that the εi are independent and they each have the same variance.275 See equation 8.19 in Pindyck and Rubinfeld.276 See equation 8.22 in Pindyck and Rubinfeld.


When predicting the future in this example, here is a graph of the mean squared forecast error as a function of time:

6 7 8 9 10Time

4

6

8

10

12

MSE

As we attempt to forecast further into the future, the mean squared forecast error increases.

Normalized Errors:

In the previous example, the forecasted loss ratio for year 6 was: 86.2 + (-2.4)(6) = 71.8, with a mean squared forecast error of 4.479. If the loss ratio for year 6 turned out to be 75, then the forecast error was 71.8 - 75 = -3.2. One can normalize this error by dividing by sf: -3.2/√4.479 = -1.512.

Normalized Forecast Error = λ = (^YT+1 - YT+1)/sf.

277

The normalized forecast error follows a t-distribution with N - k degrees of freedom.In this example, N = 5 and k = 2, so the t-distribution has 3 degrees of freedom.

Confidence Intervals:

We can use the mean squared forecast error to create confidence intervals for the future observations.

For example, at time 6 the mean squared forecast error was calculated as 4.479.

The forecasted loss ratio for time 6 is: α + ^β6 = 86.2 + (-2.4)(6) = 71.8.

With 3 degrees of freedom, a 95% confidence interval from the t-table is: ±3.182 standard deviations. Thus an approximate 95% confidence interval for the observed loss ratio is: 71.8 ± (3.182)(√4.479) = 71.8 ± 6.7.277 See Equation 8.20 in Pindyck and Rubinfeld.


Note this is wider than the 95% confidence interval for the expected loss ratio, calculated previously as: 71.8 ± 4.9. The observed loss ratios will vary around the expected loss ratio, and therefore their confidence interval is wider.

Exercise: For the time series example, determine a 95% confidence interval for the observed loss ratio at time 9.[Solution: The forecast is: 86.2 - (9)(2.4) = 64.6. x = 9 - 3 = 6. N = 5. s2 = 2.133. Σxi2 = 10.

Mean squared forecast error = s21 + 1/N + x2 /Σxi2 = 10.238.64.6 ± (3.182)(√10.238) = 64.6 ± 10.2.]

Here is a graph of the observed loss ratios, the regression line (solid), 95% confidence intervals for the expected future loss ratios (shorter dashes), and 95% confidence intervals for the observed future loss ratios (longer dashes):

2 4 6 8 10time

50

55

60

65

70

75

80

L.R.

These same ideas apply to regressions that do not involve time, and to multiple regression.


Three Variable Example:*278

We have previously discussed an example with two independent variables and an intercept.

Agent X2 X3 Y1 100% 0 75%2 90% 10% 78%3 70% 0% 71%4 65% 10% 73%5 50% 50% 79%6 50% 35% 75%7 40% 10% 65%8 30% 70% 82%9 15% 20% 72%10 10% 10% 66%

X2 = the % of business written from New York. X3 = the % of business written from New Jersey. Y = the loss ratio for each agent.

The fitted regression was: ^Y = 62.3 + .126X2 + .222X3.

One can use this model to predict the loss ratio for an agent not included in the sample.This is an example where we are not necessarily predicting the future.This would be called a cross-section model.279

Exercise: What is the predicted loss ratio for an agent with 25% of his business written in New York and 60% in New Jersey?[Solution: 62.3 + (.126)(25) + (.222)(60) = 78.77.]

As discussed previously, for this fitted model, ^Y =

^β1 +

^β2X2 +

^β3X3 + ε:

Var[^β1] = 4.434. Var[

^β2 ] = .000836. Var[

^β3] = .001395.

Cov[^β1,

^β2 ] = -.05277. Cov[

^β1,

^β3] = -.05244. Cov[

^β2 ,

^β3] = .000432.

Exercise: What is the variance of the forecast in the previous exercise?

[Solution: Var[^β1 + 25

^β2 + 60

^β3] = Var[

^β1] + 252Var[

^β2 ]+ 602Var[

^β3] + (2)(25)Cov[

^β1,

^β2 ] +

(2)(60)Cov[^β1,

^β3] + (2)(25)(60)Cov[

^β2 ,

^β3] = 4.434 + (625)(.000836) + (3600)(.001395) +

(50)(-.05277) + (120)(-.05244) + (3000)(.000432) = 2.343.]

The calculation of the variance of this forecast can also be done in matrix form.

278 See Appendix 8.1 of Pindyck and Rubinfeld, not on the syllabus.279 Estimating classification relativities would be a good example of a cross-section model.


As discussed previously, for this example: (4.434 -.05277 -.05244)

Var[^β] = s2(X’X)-1 = (-.05277 .000836 .000432)

(-.05244 .000432 .001395)

Performing the matrix multiplication: (1, 25, 60)Var[^β] (1, 25, 60)T = 2.343, involves the same

arithmetic as performed in this exercise. In general, the variance of the forecast at the vector Xf is: Xfs2(X’X)-1Xf’.

The t-distribution with 10 - 3 = 7 degrees of freedom has a critical value of 1.860 for a total of 10% area in both tails. Thus an approximate 90% confidence interval for this forecast is: 78.77 ± 1.860√2.343 = 78.8 ± 2.8.

However, as in the two variable case, there is inherent randomness in the model. Each observed loss ratio includes ε, which is Normally Distributed with variance σ2. As discussed previously, for this example, σ2 was estimated as: s2 = ESS/(N - k) = 39.389/(10 - 3) = 5.627.

Therefore, the mean squared error for this forecast is: 5.627 + 2.343 = 7.970.In general, the mean squared forecast error is: s2(1 + Xf(X’X)-1Xf’).

280

In analogy to the two variable case, the mean squared forecast error is smallest when we try to predict the value of the dependent variable at the means of all of the independent variables.281

Exercise: For this example, what is the mean squared error of the forecast at the means of all of the independent variables?[Solution: The mean of X2 is 52.0. The mean of X3 is 21.5.

(1, 52, 21.5) Var[^β] (1, 52, 21.5)T = .562.

Adding s2, the mean squared error of the forecast is: .562 + 5.627 = 6.189.Comment: 6.189 < 7.970.]

280 This formula reduces to the previously discussed formula in the case of the two variable model,

Mean Squared Forecast Error is: s21 + 1/T + (XT+1 - X)2/Σ(Xi - X)2.281 See equation A8.17 in Pindyck and Rubinfeld, not on the syllabus.


Problems:

Use the following information for the next 4 questions:You are given the following frequencies for Workers Compensation Medical Only claims (per 10000 worker-weeks) for each of five years:Year 1 2 3 4 5Frequency 86,756 85,601 83,474 75,307 72,083

33 .1 (2 points) Fit a linear regression, with intercept, to this data.What is the forecasted frequency for year 8?A. 61,000 B. 62,000 C. 63,000 D. 64,000 E. 65,000

33 .2 (2 points) What is the estimated variance of this regression?A. 5 million B. 6 million C. 7 million D. 8 million E. 9 million

33 .3 (2 points) Determine a 95% confidence interval for the expected frequency in year 8.What is the the upper end of this interval?A. 70,000 B. 72,000 C. 74,000 D. 76,000 E. 78,000

33 .4 (2 points) Determine a 95% confidence interval for the observed frequency in year 8.What is the the lower end of this interval?A. 43,000 B. 45,000 C. 47,000 D. 49,000 E. 51,000

33.5 (4 points) You are given the following information on 12 apple trees:Size of Crop (hundreds of apples), X Percentage of Wormy Apples, Y

8 59 6 5811 5622 5314 5017 4518 4324 4219 3923 3826 3040 27

A linear regression with intercept has been fit to the above data,

^Y = 64.247 - 1.013X.Based on this regression, determine a 95% confidence interval for the percentage of wormy fruit one would observe from a tree that produced 3500 apples (X = 35). What is the lower end of this confidence interval?A. 11 B. 12 C. 13 D. 14 E. 15


33.6 (3 points) You are given the following severities for each of five years:Year 1 2 3 4 5Severity 6117 6873 8148 8246 9112A regression has been fit to the natural logs of these severities:ln[severity] = 8.64516 + 0.0979168t. s2 = 0.00191151.Determine the upper end of a 90% confidence interval for the expected severity in year 8.A. 14,500 B. 14,700 C. 14,900 D. 15,100 E. 15,300

Use the following information for the next three questions:

For Actuarial Exam Q, based on the data for many students you have fit the following multiple regression model: Y = 18.40 + 2.04X2 + 0.1120X3 + .09371X4. where X2 = number of times Exam Q has been taken previously,X3 = number of hours studied at work,X4 = number of hours studied at home, andY = student’s score on Exam Q out of 100.The covariance matrix of the fitted coefficients is:(.304 -.0167 .00104 .00136 )(-.0167 .00452 .000239 .000220 )(.00104 .000239 .0000936 .0000628)(.00136 .000220 .0000628 .0001017)The estimated variance of the regression is 20.38.

33.7 (3 points) Determine a 95% confidence interval for the expected score of a student who has taken Exam Q twice before, has studied 100 hours at work and 200 hours at home.What is the upper end of that interval?A. 54 B. 55 C. 56 D. 57 E. 58

33.8 (3 points) Determine a 90% confidence interval for the observed score of a student who has not taken Exam Q before, has studied 150 hours at work and 50 hours at home.What is the upper end of that interval?A. 48 B. 49 C. 50 D. 51 E. 52

33.9 (4 points) Assume that a score of 54.2 or more is required to pass Exam Q. What is the probability of passing for a student who has taken Exam Q once before, has studied 50 hours at work and 250 hours at home?A. 13% B. 15% C. 17% D. 19% E. 21%

33.10 (4 points) You are given the following mortality data for males at age 70, at four points in time:Year 1970 1980 1990 2000100,000q70 3580 3302 3007 2694Fit a linear regression with intercept to this data. Determine the lower end of a 99% confidence interval for the expected value of 100,000q70 in 2005.A. 2370 B. 2390 C. 2410 D. 2430 E. 2450



For private passenger automobile bodily injury liability insurance, the size of loss is assumed to be LogNormal. Let X = weight (in 1000 of pounds) of the automobile driven by the insured, and Y = size of loss.You fit the regression model, lnY = α + β X + ε, to 4000 observations.ΣXi = 13,283. ΣXi2 = 54,940.

The results are: α = 8.743. ^β = .136. ESS = 3285. RSS = 11,781.

33.11 (3 points) What is the lower end of a 90% confidence interval for the expected size of loss for an automobile that weighs 5000 pounds?A. 11,200 B. 11,400 C. 11,600 D. 11,800 E. 12,000

33.12 (2 points) What is the upper end of a 90% confidence interval for the observed size of loss for an automobile that weighs 5000 pounds?A. 35,000 B. 40,000 C. 45,000 D. 50,000 E. 55,000

Use the following information for the next two questions:The finishing times for the Boston Marathon from 1980 to 2005 were: 2:12:11, 2:09:26, 2:08:52, 2:09:00, 2:10:34, 2:14:05, 2:07:51, 2:11:50, 2:08:43, 2:09:06, 2:08:19, 2:11:06, 2:08:14, 2:09:33, 2:07:15, 2:09:22, 2:09:15, 2:10:34, 2:07:34, 2:09:52, 2:09:47, 2:09:43, 2:09:02, 2:10:11, 2:10:37, 2:11:45.This data was converted to number of seconds beyond two hours.For example, 2:11:45, which is 2 hours, 11 minutes, and 45 seconds, corresponds to 705.Then a linear regression was fit to this data, with the following results, where t = 0 for 1980:Finish Time = 596.496 - 0.86735t. Covariance Matrix of the estimated parameters:

(1301.31 -76.548)(-76.548 6.1238)

33.13 (3 points) Determine the lower end of a 99% confidence interval for the observed finish time in 2006.A. 2:4:50 B. 2:5:00 C. 2:5:10 D. 2:5:20 E. 2:5:30

33.14 (1 point) The finishing time in 2006 turned out to be 2:07:14.Determine the normalized forecast error.A. Less than 0 B. At least 0, but less than 0.5 C. At least 0.5, but less than 1.0 D. At least 1.0, but less than 1.5E. 1.5 or more


33.15 (4 points) You are given the following pure premiums for each of four years:Year 1 2 3 4Pure Premium 251 247 260 268 Fit a linear regression with intercept to this data. Determine the upper end of a 95% confidence interval for the pure premium observed in year 5. A. 270 B. 280 C. 290 D. 300 E. 310

33.16 (5 points) For liability insurance you have the following information on the number of claims reported for several Accident Years. (Accident Year 1997 consists of all claims on accidents that occurred during 1997, regardless of when they are reported to the insurer.)Accident Number of Claims Reported Number of Claims Reported Year by December 31 of that year after December 31 of that year1997 8037 33121998 7948 30901999 7792 29832000 8125 32112001 7936 3224You fit the regression model Y = βX + ε, where X = Number of Claims Reported by December 31 of that year, and Y = Number of Claims Reported after December 31 of that year.8341 claims have been observed for Accident Year 2004, as of December 31, 2004.You determine an interval such that the probability that the total number of claims that will be reported for Accident Year 2004 is outside this interval is 10%. What is the lower end of that confidence interval?A. 11,400 B. 11,450 C. 11,500 D. 11,550 E. 11,600

33.17 (2 points) A linear regression X = α + βY, where X = age of male, and Y = height (in inches), has been fit to a data set consisting of 1000 boys equally split between ages 10, 11, 12, 13, and 14.

α = 30.6. ^β = 2.44. s2 = 9.2.

Briefly discuss the use of this model in order to predict the height of a male of a different age.


Use the following data on the amount property-casualty insurers paid for losses due to catastrophes in the United States for the next two questions:Year Losses ($billion)1991 4.7 1992 23.0 1993 5.6 1994 17.0 1995 8.3 1996 7.4 1997 2.6 1998 10.1 1999 8.3 2000 4.6 2001 26.5 2002 5.9 2003 12.9 2004 27.3 A linear regression was fit to this data, with the following results:Loss = -1057.7 + 0.535385 Year. Standard Error of the Regression = 8.42436.Standard Error of the Intercept = 1115.67. Standard Error of the Slope = 0.55853.

33.18 (2 points) Determine a 90% confidence interval for the expected catastrophe losses in 2005.

33.19 (2 points) Determine a 90% confidence interval for the observed catastrophe losses in 2005.


33.20 (IOA 101, 9/00, Q.16) (12.75 points) At the end of the skiing season the tourist board in a mountain region examines the records of ten ski resorts. For each one it obtains the total number (y, thousands) of visitor-days during the season as a measure of the resort’s popularity, and the ski-lift capacity (x, thousands), being the maximum number of skiers that can be transported per hour. The resulting data are given in the following table:Resort: A B C D E F G H I JLift capacity x: 1.9 3.3 1.2 4.2 1.5 2.2 1.0 5.6 1.9 3.8Visitor-days y: 15.1 22.6 9.2 37.5 8.9 21.1 5.8 41.0 9.2 32.4Σx = 26.6, Σx2 = 91.08, Σy = 202.8, Σy2 = 5603.12, Σxy = 707.58.(i) (1.5 points) Draw a scatterplot of y against x and comment briefly on any relationship

between a resort’s popularity and its ski-lift capacity. (ii) (2.25) Calculate the correlation coefficient between x and y and comment briefly in the

light of your comment in part (i).(iii) (1.5 points) Calculate the fitted linear regression equation of y on x. (iv) (4.5 points) (a) Calculate the “total sum of squares” together with its partition into the

“regression sum of squares” (RSS) and the “error sum of squares” (ESS).(b) Use the values in part (iv)(a) above to calculate the coefficient of determination R2 and

comment briefly on its relationship with the correlation coefficient calculated in part (ii).(c) Use the values in part (iv)(a) above to calculate an estimate of the error variance σ2 in the

usual linear regression model. (v) (3 points) Suppose that a resort can increase its ski-lift capacity by 500 skiers per hour.

Estimate the increase in the number of visitor-days it can expect in a season, and specify a standard error for this estimate.


33.21 (IOA 101, 4/01, Q.16) (12 points) The table below gives data on the lean body mass (the weight without fat) and resting metabolic rate for twelve women who were the subjects in a study of obesity. The researchers suspected that metabolic rate is related to lean body mass.

Lean body mass (kg) Resting metabolic ratex y36.1 99554.6 142548.5 139642.0 141850.6 150242.0 125640.3 118933.1 91342.4 112434.5 105251.1 134741.2 1204

Σ x = 516.4 Σ x2 = 22,741.34Σ y = 14821 Σ y2 = 18,695,125Σ xy = 650,264.8

(i) (1.5 points) Draw a scatter plot of the resting metabolic rate against lean body mass and comment briefly on any relationship.

(ii) (2.25 points) Calculate the least squares fit regression line in which resting metabolic rate is modeled as the response and the lean body mass as the explanatory variable.

(iii) (3.75 points) Determine a 95% confidence interval for the slope coefficient of the model.State any assumptions made.

(iv) (3 points) Use the fitted model to construct 95% confidence intervals for the mean resting metabolic rate when:(a) the lean body mass is 50kg(b) the lean body mass is 75kg

(v) (1.5 points) Comment on the appropriateness of each of the confidence intervals given in (iv).

33.22 (2 points) In the previous question, IOA 101, 4/01, Q.16, using a computer produce a graph showing the fitted line and 95% confidence intervals for the mean resting metabolic rate when the lean body mass is between 30 and 55 kilograms.

33.23 (2 points) For IOA 101, 4/01, Q.16, produce another graph showing the fitted line and 95% confidence intervals for the observed resting metabolic rate when the lean body mass is between 30 and 55 kilograms.


33.24 (4, 5/01, Q.25) (2.5 points) You have modeled eight loss ratios as Yt = α + βt + εt,

t = 1, 2, ..., 8, where Yt is the loss ratio for year t and εt is an error term.You have determined:

( α ) (0.50)( ) = ( )

( β ) (0.02)

[ ( α ) ] ( 0.00055 -0.00010)Var [ ( ) ] = ( )

[ ( β ) ] (-0.00010 0.00002)

Estimate the standard deviation of the forecast for year 10, Y10 = α+ 10 β.(A) Less than 0.01(B) At least 0.01, but less than 0.02(C) At least 0.02, but less than 0.03(D) At least 0.03, but less than 0.04(E) At least 0.04

33.25 (2 points) In the previous question, determine a 95% confidence interval for the forecast.


33.26 (IOA 101, 9/01, Q.15) (12.75 points) In a study into employee share ownership plans, data were obtained from ten large insurance companies on the following two variables:employee satisfaction with the plan (x);employee commitment to the company (y).For each company a random sample (of the same size) of employees completed questionnaires in which satisfaction and commitment were recorded on a 1-10 scale, with 1 representing low satisfaction/commitment and 10 representing high satisfaction/commitment. The resulting means provide each company’s employees’ satisfaction and commitment score. These scores are given in the following table:Co. A B C D E F G H I Jx 5.05 4.12 5.38 4.17 3.81 4.47 5.41 4.88 4.64 5.19y 5.36 4.59 5.42 4.35 4.03 5.34 5.64 4.89 4.52 5.88Σx = 47.12, Σx2 = 224.8554, Σy = 50.02, Σy2 = 253.5796, Σxy = 238.3676.(i) (1.5 points) Draw a scatterplot of y against x and comment briefly on any relationship

between employee satisfaction and commitment. (ii) (2.25 points) Calculate the fitted linear regression equation of y on x. (iii) (1.5 points) Calculate the coefficient of determination R2 and relate its value to your

comment in part (i). (iv) (2.25 points) Assuming the full normal model, calculate an estimate of the error variance

σ2 and obtain a 95% confidence interval for σ2. (v) (2.25 points) Calculate a 95% confidence interval for the true underlying slope coefficient. (vi) (3 points) For companies with an employees’ satisfaction score of 5.0, calculate an

estimate of the expected employees' commitment score together with 95% confidence limits.


Section 34, Testing Forecasts282

One important method of testing models is to see how they would have performed if used in the past.283 One can compare the predictions the model would have made at a given point in time with the eventual outcomes. A better match between prediction and outcome would indicate a model that is more likely to work well in the future.

Boston Marathon Example:

We are given the following finishing times for the Boston Marathon from 1980 to 1992: 2:12:11, 2:09:26, 2:08:52, 2:09:00, 2:10:34, 2:14:05, 2:07:51, 2:11:50, 2:08:43, 2:09:06, 2:08:19, 2:11:06, 2:08:14.

This data was converted to number of seconds beyond two hours.For example, 2:12:11, which is 2 hours, 12 minutes, and 11 seconds, corresponds to 731.X = (731, 566, 532, 540, 634, 845, 471, 710, 523, 546, 499, 666, 494).

Here is a graph of his data:

1980 1982 1984 1986 1988 1990 1992

500

550

600

650

700

750

800

850

A linear regression was fit to this data, with the following results, where t = 0 for 1980:Finish Time = 650.269 - 7.65385t.

sα = 66.445, with t-statistic = 9.787 and p-value 9 x 10-7.

βs = 8.37128, with t-statistic = -0.914 and p-value .380.

R2 = 0.0706273. 2

R = -0.0138611. s2 = 12754.3.RSS = 10662. ESS = 140297. F = 10662/(2-1)/140297/(13-2) = .833, with p-value .380.Durbin-Watson Statistic = 2.64.282 See Section 8.1.2 in Pindyck and Rubinfeld.283 This very important and useful idea applies to most models used by actuaries, not just regression models.


These statistics indicate among other things that the slope is not significantly different than zero. However, none of this directly examines how well this regression model would have predicted the future.

Root Mean Squared Error:

Here are the finishing times for the Boston Marathon for the next 13 years, 1993 to 2005,not used in fitting the regression: 2:09:33, 2:07:15, 2:09:22, 2:09:15, 2:10:34, 2:07:34,2:09:52, 2:09:47, 2:09:43, 2:09:02, 2:10:11, 2:10:37, 2:11:45.

This data was converted to number of seconds beyond two hours:(573, 435, 562, 555, 634, 454, 592, 587, 583, 542, 611, 637, 705).For example, 2:11:45, which is 2 hours, 11 minutes, and 45 seconds, corresponds to 705.

Here is a graph of this data and the regression model which was fit to the earlier data:

1994 1996 1998 2000 2002 2004

450

500

550

600

650

700

The forecasts are: 650.269 - 7.65385t = (543.115, 535.461, 527.807, 520.154, 512.500, 504.846, 497.192, 489.538, 481.884, 474.230, 466.577, 458.923, 451.269).

The mean squared error is the mean of the squared differences between the observations and the forecasted values:(573 - 543.115)2 + ... + (705 - 451.269)2/13 = 13920.8.

The root mean squared error is: √13920.8 = 118.0 seconds.

Root Mean Squared Forecast Error = √√√√ΣΣΣΣ (forecasti - observationi)2/(# of forecasts).


All else being equal, the smaller the mean squared error or the root mean squared error, the better the performance of the model.

Evaluating the Qualities of an Estimator:

In this particular use of this estimator, the root mean squared forecast error was 118 seconds, which seems rather large.284 This is just one use of this estimator. Any estimator can do a poor job or good job on occasion. Rather, the errors that would result from the repeated use of a procedure is what is referred to when we discuss the qualities of an estimator.

In this example, one could check this estimator by performing similar regressions over different periods of time. For example, one could see how well a regression applied to the data from 1960 to 1972 would have predicted the results from 1973 to 1985. One could also apply the estimator to other marathons. In an actuarial problem, one would apply the estimator to similar situations, covering different periods of time, different states, etc.

Ex Post Forecasts:285

Using a regression fit to 1980 to 1992 in order to forecast 1993 to 2005 was an example of what is called an ex post forecast. The values that were forecast were already known at the time the forecast was made. Ex post forecasts are used to evaluate a forecasting model.

ex post forecast ⇔⇔⇔⇔ values to be forecast known at time of forecast.

ex ante forecast ⇔⇔⇔⇔ values to be forecast not known at the time of the forecast.

In contrast, an ex ante forecast is used to predict the future. An example of an ex ante forecast would be if in 2005 we applied a regression to the finishing times from 1993 to 2005 in order to predict the finishing time in 2006.

Conditional Forecasts:* 286

When predicting a future finishing time for the marathon, one knows the values of the year, the independent variable to be used in the model. This is called an unconditional forecast.

When predicting the future in an ex ante forecast, sometimes one does not know all of the values of all of the independent variables to be used in the model. For example, we might be using the unemployment rate to predict the claim frequency for workers compensation insurance. If the unemployment is lagged one year, then one would be using the unemployment rate during 2004 to help predict the claim frequency during 2005.

284 A large portion of the error resulted from using the nonzero fitted slope, which was not significantly different from zero.285 See page 203 of Pindyck and Rubinfeld.286 See page 203 of Pindyck and Rubinfeld. Section 8.3 in Pindyck and Rubinfeld, not on the syllabus, discusses conditional forecasting.


In January 2005, we would know the unemployment rate during 2004, and therefore this would be an unconditional forecast of the claim frequency for 2005. However, in January 2005, we would not know the unemployment rate during 2005. We could use a prediction of the unemployment rate during 2005, in order to forecast the claim frequency during 2006.This would be an example of a conditional forecast.

In a conditional forecast, not all of the values of the independent variable(s) are known at the time of the forecast. 287

Theil’s Inequality Coefficient:288

In the marathon example, the second moment of the observed finishing times from 1993 to 2005 was: (5732 + 4352 + 5622 + 5552 + 6342 + 4542 + 5922 + 5872 + 5832 + 5422 + 6112 + 6372 + 7052)/13 = 4354296/13 = 334,946.

The second moment of the predicted finishing times from 1993 to 2005 was: (543.1152 + 535.4612 + 527.8072 + 520.1542 + 512.5002 + 504.8462 + 497.1922 + 489.5382 + 481.8842 + 474.2302 + 466.5772 + 458.9232 + 451.2692)/13 = 3,224,260/13 = 248,020.

The root mean squared error was 118.0 seconds.

For this example, Theil’s Inequality Coefficient is:118.0/(√248,020 + √334,946) = .110.

In general, U = Theil’s Inequality Coefficient = (RMS Error) / √√√√ (2nd moment of forecasts) + √√√√ (2nd moment of observeds).

In general, 0 ≤≤≤≤ Theil’s Inequality Coefficient ≤≤≤≤ 1.Theil’s Inequality Coefficient is a relative measure of the root mean squared forecast error.The smaller Theil’s Inequality Coefficient, the better the forecasts.

Dividing the Mean Squared Error into Three Pieces:289

In the marathon example, for 1993 to 2005, the mean squared forecast error was 13920.8.

The mean forecast for 1993 to 2005 was: (543.115 + 535.461 + 527.807 + 520.154 + 512.500 + 504.846 + 497.192 + 489.538 + 481.884 + 474.230 + 466.577 + 458.923 + 451.269)/13 = 6463.5/13 = 497.192.

The mean observation for 1993 to 2005 was: (573 + 435 + 562 + 555 + 634 + 454 + 592 + 587 + 583 + 542 + 611 + 637 + 705)/13 = 7470/13 = 574.615.

287 This would be an additional source of forecast error, beyond that present when making an unconditional forecast.288 See equation 8.25 in Pindyck and Rubinfeld.289 See equation 8.26 in Pindyck and Rubinfeld.


As calculated previously, the second moment of the forecasts for 1993 to 2005 was 248,020.Therefore, the variance of the forecasts for 1993 to 2005 is: 248,020 - 497.1922 = 820.115.The standard deviation of the forecasts for 1993 to 2005 is: √820.115 = 28.638.

As calculated previously, the second moment of the observations for 1993 to 2005 was 334,946. Therefore, the variance of the observations for 1993 to 2005 is: 334,946 - 574.6152 = 4763.602.The standard deviation of the observations for 1993 to 2005 is: √4763.602 = 69.019.

The numerator of the correlation of the forecasts and observations for 1993 to 2005 is: (543.115 - 497.192)(573 - 574.615) + ... + (451.269 - 497.192)(705 - 574.615)/13 = -15231.2/13 = -1171.63.

Correlation of the forecasts and observations for 1993 to 2005 is: -1171.63/√(28.638)(69.019) = -.5928.

(mean forecast) - (mean observation)2 + (stddev of forecasts) - (stddev of observations)2 +2(1 - correlation of fore. & obser.)(stddev of fore.)(stddev of obser.) =(497.192 - 574.615)2 + (28.638 - 69.019)2 + (2)(1.5928)(28.638)(69.019) = 5994.3 + 1630.6 + 6296.5 = 13,921.This matches the mean square error computed previously, subject to rounding.

In general, Mean Squared forecast Error = (mean forecast) - (mean observation)2 + (stddev of forecasts) - (stddev of observations)2 +2(1 - correlation of fore. & obser.)(stddev of fore.)(stddev of obser.).

Demonstration of the Division of the Mean Squared Error into Three Pieces:*290

Let Oi be the m observed values being compared to the m forecasted values Fi.

(mean forecast) - (mean observation)2 + (stddev of forecasts) - (stddev of observations)2 + 2(1 - correlation of fore. & obser.)(stddev of fore.)(stddev of obser.) =(F - O )2 + (σF - σO)2 + 2σF σO - 2Σ(Fi - F )(Oi - O )/m =

F 2 + O 2 - 2F O + σF2 + σO2 - 2ΣFi Oi/m + 2F O + 2F O - 2F O =

F 2 + σF2 + O 2 + σO2 - 2ΣFi Oi/m = ΣFi2 /m + ΣOi2 /m - 2ΣFi Oi/m =

Σ(Fi - Oi)2 /m = Mean Squared forecast Error.

290 This is an application of a result discussed in a previous section, MSE = Variance + Bias2.


The Proportions of Inequality:291

Using the three pieces of the mean squared error (MSE), one can divide the contributions to Theil’s Inequality Coefficient, U, into three pieces.

Bias proportion of U = UM = Bias2/MSE = (mean forecast) - (mean observation)2/ MSE.

Variance proportion of U = US = (stddev of forecasts) - (stddev of observations)2/ MSE.

Covariance proportion of U = UC = 2(1 - correlation of fore. & obser.)(stddev of fore.)(stddev of obser.) / MSE.

In the marathon example: Bias proportion of U = UM = (497.192 - 574.615)2/13921 = 5994.3/13,921 = .431. Variance proportion of U = US = (28.638 - 69.019)2/13921 = 1630.6/13,921 = .117.Covariance proportion of U = UC = (2)(1.5928)(28.638)(69.019)/13921

= 6296.5/13,921 = .452.

Note that .431 + .117 + .452 = 1.000. Since as discussed previously, these numerators add up to the Mean Squared Error, these three proportions of inequality sum to one.

UM + US + UC = 1.

The bias proportion measures the error due to forecasting on average either too high or too low. In this case, 43.1% of the mean squared error is due to bias; this is large.A value of the bias proportion of U, UM, greater than 10% or 20%, indicates a systematic bias, and the forecasting model should (probably) be revised.

The variance proportion measures the error due to the forecasted series having a variance that differs from the observed series. In this case, 11.7% of the mean squared error is due to this difference in variances; this is not large. A large variance proportion would have indicated that the forecasting model should (probably) be revised.

The covariance proportion measures the remaining error not due to the other two causes, and is of less concern than the other two proportions.

The ideal distribution of inequality over the three proportions is:UM + US = 0, and UC = 1.292 This is only an ideal, which is never achieved by any realistic estimator. However, we would prefer UM and US to be as small as possible.

291 See equations 8.27 to 8.29 in Pindyck and Rubinfeld.292 Bottom of page 211 in Pindyck and Rubinfeld.


Problems:

Use the following information for the next 6 questions:You are given the following series of 15 years of average rates for Workers Compensation Insurance in Massachusetts, t = 1, 2, ..., 15: 1.630, 1.553, 1.453, 1.340, 1.322, 1.360, 1.341, 1.359, 1.323, 1.302, 1.366, 1.342, 1.334, 1.307, 1.362.For each year, the currently charged rates by class were averaged using that year’s distribution of exposures by class.

34 .1 (3 points) Fit a linear regression versus time to the first ten years of average rates.What is the forecasted average rate for time 15?A. 1.0 B. 1.1 C. 1.2 D. 1.3 E. 1.4

34 .2 (2 points) What is the root mean squared error in using this regression to forecast the average rates for times 11 through 15?A. 0.06 B. 0.09 C. 0.12 D. 0.15 E. 0.18

34 .3 (2 points) What is Theil’s Inequality Coefficient, U, in using this regression to forecast the average rates for times 11 through 15?A. 0.03 B. 0.05 C. 0.07 D. 0.09 E. 0.11

34.4 (2 points) What is UM, the bias proportion of U, in using this regression to forecast the average rates for times 11 through 15?A. 75% B. 80% C. 85% D. 90% E. 95%

34.5 (2 points) What is US, the variance proportion of U, in using this regression to forecast the average rates for times 11 through 15?A. 2% B. 4% C. 8% D. 10% E. 12%

34.6 (2 points) What is UC, the covariance proportion of U, in using this regression to forecast the average rates for times 11 through 15?A. 2% B. 4% C. 8% D. 10% E. 12%

Use the following forecasted and actual values for the next two questions:

Yst Y

at

76 81 106 93 110 125 142 129

34.7 (2 points) Determine the value of the bias proportion of inequality. (A) 1.0% (B) 1.5% (C) 2.0% (D) 2.5% (E) 3.0%

34.8 (3 points) Determine the value of the variance proportion of inequality. (A) 6% (B) 7% (C) 8% (D) 9% (E) 10%


34.9 (1 point) According to Pindyck and Rubinfeld in Economic Models and Economic Forecasts, which of the following statements is false?A. Forecasting is a principal purpose for constructing regression models.B. A forecast is a quantitative estimate about the likelihood of future events.C. In an ex post forecast some of the values of the dependent variable are unknown. D. An ex ante forecast may be conditional.E. All of A, B, C, and D are true.

Use the following information for the next 5 questions:100 forecasts, F1, F2, ..., F100, are being compared to 100 corresponding observations, O1, O2, ..., O100.

Σ Fi = 872. Σ Fi2 = 11,330. Σ Oi = 981. Σ Oi2 = 15,856. Σ FiOi = 10,281.

34.10 (2 points) What is the root mean squared error of forecasting? A. 7 B. 8 C. 9 D. 10 E. 11

34.11 (1 point) What is Theil’s Inequality Coefficient, U, for these forecasts?A. 0.15 B. 0.20 C. 0.25 D. 0.30 E. 0.35

34.12 (1 point) What is UM, the bias proportion of U, for these forecasts?A. 2% B. 3% C. 4% D. 5% E. 6%

34.13 (2 points) What is US, the variance proportion of U, for these forecasts?A. 2% B. 3% C. 4% D. 5% E. 6%

34.14 (2 points) What is UC, the covariance proportion of U, for these forecasts?A. 89% B. 91% C. 93% D. 95% E. 97%

34.15 (1 point) Let X = a consumer price index of the value of homes in a certain metropolitan area.Let Y = the average homeowners premiums on polices written by the Regressive Insurance Company on homes in that metropolitan area.Both X and Y are monthly series.The values of each series are available within 20 days after the close of the month.For example, the values of each series for June are available by July 20.A regression model is fit, with Yi = α + βXi-2 + ε. For forecasts made on July 23, 2006, which of the following statements are false?A. A prediction of the average premium for June 2006 is an ex post forecast.B. A prediction of the average premium for July 2006 is an ex ante forecast.C. A prediction of the average premium for August 2006 is a unconditional forecast. D. A prediction of the average premium for September 2006 is a conditional forecast. E. All of A, B, C, and D are true.


Use the following information for the next 3 questions:You are given the following series of 18 months of a Consumer Price Index, t = 1, 2, ...,18: 292.6, 293.7, 294.2, 294.6, 295.5, 296.3, 297.6, 298.4, 299.2, 299.9, 300.8, 302.1, 303.6, 306.0, 307.5, 308.3, 309.0, 310.0.

34.16 (3 points) Fit a linear regression versus time to the first 12 values.What is the forecasted value for time 18?A. 307 B. 308 C. 309 D. 310 E. 311

34.17 (2 points) What is the root mean squared error in using this regression to forecast the values for times 13 through 18?A. 1 B. 2 C. 3 D. 4 E. 5

34.18 (2 points) What is Theil’s Inequality Coefficient, U, in using this regression to forecast the values for times 13 through 18?A. 0.005 B. 0.010 C. 0.015 D. 0.020 E. 0.025

34.19 (VEE-Applied Statistics Exam, 8/05, Q.6) (2.5 points)You are given the following forecasted and actual values:

Yst Y

at

174 186 193 206 212 227 231 242

Determine the value of the bias proportion of inequality. (A) 0.031 (B) 0.077 (C) 0.800 (D) 0.890 (E) 0.987

34.20 (3 points) In the previous question, determine the value of the covariance proportion of inequality. (A) 0.009 (B) 0.011 (C) 0.013 (D) 0.015 (E) 0.017


Section 35, Forecasting with Serial Correlation293

The following time series exhibits positive serial correlation, which affects how to use regression to best predict the future.

Massachusetts State Average Weekly Wage (SAWW), An Example of a Time Series:294

Period Covered Average Weekly Wage Period Covered Average Weekly Wage4/1/69 to 3/31/70 131.02 4/1/87 to 3/31/88 444.204/1/70 to 3/31/71 139.38 4/1/88 to 3/31/89 474.47 4/1/71 to 3/31/72 149.64 4/1/89 to 3/31/90 490.574/1/72 to 3/31/73 155.57 4/1/90 to 3/31/91 515.524/1/73 to 3/31/74 163.80 4/1/91 to 3/31/92 543.304/1/74 to 3/31/75 174.48 4/1/92 to 3/31/93 565.944/1/75 to 3/31/76 186.85 4/1/93 to 3/31/94 585.664/1/76 to 3/31/77 199.31 4/1/94 to 3/31/95 604.034/1/77 to 3/31/78 211.37 4/1/95 to 3/31/96 631.034/1/78 to 3/31/79 227.31 4/1/96 to 3/31/97 665.554/1/79 to 3/31/80 245.48 4/1/97 to 3/31/98 699.914/1/80 to 3/31/81 269.93 4/1/98 to 3/31/99 749.694/1/81 to 3/31/82 297.85 4/1/99 to 3/31/00 830.894/1/82 to 3/31/83 320.29 4/1/00 to 3/31/01 890.944/1/83 to 3/31/84 341.06 4/1/01 to 3/31/02 882.574/1/84 to 3/31/85 360.50 4/1/02 to 3/31/03 884.464/1/85 to 3/31/86 383.57 4/1/03 to 3/31/04 918.784/1/86 to 3/31/87 411.00 4/1/04 to 3/31/05 958.58

One can fit an exponential regression to the first 33 of the 36 element data elements, holding out the last three values in order to compare to forecasts. ln[SAWW] = Y = α + βt, where t = 1 for the first annual period, 4/1/69 to 3/31/70. The result is:

α = 4.84291, with standard error of 0.0192764, and t-statistic of 251.235 (p-value = 0).

^β = 0.0613192, with standard error of 0.00989292, and t-statistic of 61.9829 (p-value = 0).

R2 = 0.991996. 2

R = 0.991737. s2 = 0.00292827.

DF SumofSq MeanSq FRatio PValueRegression 1 11.2501 11.2501 3841.88 0Error 31 .0907763 .00292827Total 32 11.3408

Durbin Watson Statistic = .160702.

293 See Section 8.2 of Pindyck and Rubinfeld.294 Data from the Massachusetts Department of Unemployment Assistance (formerly the Division of Unemployment and Training.) These values, released each October 1, affect the Workers’ Compensation benefits paid in the state of Massachusetts.


Here is a graph of all of the data and the exponential regression fitted to the first 33 points,State Average Weekly Wage = exp[4.84291 + 0.0613192t] = (126.84) (1.06324t):

5 10 15 20 25 30 35t

200

400

600

800

1000

SAWW

Residuals:

The residuals of this regression are graphed below:

5 10 15 20 25 30

-0.075

-0.05

-0.025

0.025

0.05

0.075

The graph of the residuals indicates that they are positively serially correlated. The Durbin-Watson Statistic is .161, strongly indicating positive serial correlation. DW ≅ 2(1- ρ). ⇒ ρ ≅ 1 - DW/2 = 1 - .161/2 = .92.


The residuals of this regression are: -0.0288821, -0.0283473, -0.0186381, -0.041094, -0.0508628, -0.0490181, -0.0418413, -0.0386053, -0.0411758, -0.0297904, -0.0142089, 0.019419, 0.0565271, 0.0678447, 0.0693571, 0.0634714, 0.0641823, 0.0719342, 0.0882966, 0.0929009, 0.0649513, 0.0532401, 0.0444065, 0.0239136, -0.00315424, -0.033589, -0.0511787, -0.0592376, -0.0702188, -0.06283, -0.0213116, -0.0128512, -0.0836094.

One can estimate the serial correlation coefficient as follows:

ρ = Σ ^t 1ε − εt / Σ^

t 1ε −2 = 0.07957/0.08379 = .950.

As discussed previously, this would be the first step in the Cochrane-Orcutt procedure, one way to correct for serial correlation.

Applying the Cochrane-Orcutt Procedure:295

Continuing the Cochrane-Orcutt procedure, let Xt

* = Xt - ρXt-1 = Xt - .95Xt-1 = (1.05, 1.10, 1.15, ..., 2.6).

Yt* = Yt - ρYt-1 = Yt - .95Yt-1 = (4.87535, 4.9372, 5.00823, ... , 6.78284).

Fit by regression the transformed equation Y∗ = α(1 - ρ) + βX∗.

α(1 - ρ) = 0.264477. ⇒ α = 0.264477/(1 - .95) = 5.28954.296 ^β = 0.0482171.

Translate this transformed equation back to the original variables: Yt = α + βXt, and get the


^Y = α +

^βX = 5.28954 + 0.0482171X = (5.33776, 5.38597, 5.43419, ... , 6.8807).

ε = Y - ^Y = (-0.462407, -0.44877, -0.425959, ... , -0.0978662).

Estimate the serial correlation coefficient:

ρ = Σ ^t 1ε − εt / Σ^

t 1ε −2 = 2.13757/2.24902 = .950.

Since ρ seems to have converged, we will use the equation

Ln[SAWW] = ^Y = 5.28954 + 0.0482171X, together with ρ = .950, in order to make forecasts.

295 A similar result could have been obtained using instead the Hildreth-Lu procedure.296 When ρ is close to one, the estimated intercept is very sensitive to the estimated intercept of the translated equation. As will be seen, when ρ is close to one, the estimated intercept has only a small effect on the forecast.


Forecasting:

When there is serial correlation in time series, we forecast forwards one step at a time, starting at the last available data point.

In this case the last data point used in the regression was a SAWW of 882.57, corresponding to data from 4/1/01 to 3/31/02 or X = 33. We want to forecast the SAWW corresponding to the next annual period 4/1/02 to 3/31/03 or X = 34.297

One possible estimate of ln[SAWW] is the last observed point plus the slope: ln[882.57] + 0.0482171 = 6.7828 + 0.0482171 = 6.8310.

Another possible estimate is obtained by using the regression equation:

^Y = 5.28954 + 0.0482171X = 5.28954 + (0.0482171)(34) = 6.9289.

We weight this estimate, with weight ρ being given the last observation and weight 1- ρ being given to the regression estimate:298 (.95)(6.8310) + (.05)(6.9289) = 6.8359.

This is equivalent to using the following equation:299

YT+1 = ρYT + α(1- ρ) + ^β(XT+1 - ρXT) =

= (.95)(6.7828) + (5.28954)(1 - .95) + 0.048217134 - (.95)(33) = 6.8359.Corresponding to an SAWW of exp[6.8359] = 930.68.

Continuing, we can forecast the value for X = 35 using the value forecasted for X = 34:(.95)(6.8359) + (5.28954)(1 - .95) + 0.048217135 - (.95)(34) = 6.8887.Corresponding to an SAWW of exp[6.8887] = 981.13.

In general, with serial correlation one successively forecasts ahead one time period at a time:

Forecast for time t+1 = ρρ(Forecast for time t) + αα( 1 - ρρ) + ^ββ(t+1 - ρρt).

= ρρ( ^ββ + Forecast for time t) + (1- ρρ) αα + ^

ββ(t+1).

Where, α, ^β, and ρ have been estimated using one of the procedures to correct for serial

correlation, the Cochrane-Orcutt procedure or Hildreth-Lu procedure.

Exercise: Forecast the values for X = 36, and then 37, using the value forecasted for X = 35.[Solution: (.95)(6.8887) + (5.28954)(1 - .95) + 0.048217136 - (.95)(35) = 6.9413.Corresponding to an SAWW of exp[6.9413] = 1034.11.(.95)(6.9413) + (5.28954)(1 - .95) + 0.048217137 - (.95)(36) = 6.9937.Corresponding to an SAWW of exp[6.9937] = 1089.75.]297 Ignoring for now that we actually have this data point.298 This is related to the ideas behind the use of credibility. See the sequential approach to credibility in “Mahler’s Guide to Buhlmann Credibility and Bayesian Analysis.”299 See equation 8.34 in Pindyck and Rubinfeld.


Comparing Forecasted and Actual SAWW:

Forecast ForecastX Actual SAWW Corrected for Serial Correlation Original Exponential Regression

34 884.46 930.68 1036.3735 918.78 981.13 1101.9136 958.58 1034.11 1171.6037 ??? 1089.75 1245.69

In this case, the forecasts are not very good. After decades of steady increase, the last few values in the series did not follow that pattern. Changes in economic conditions can result in changes in the movement of time series such as this one.300

Variance of Forecasts:*301

With serial correlation, one can use similar techniques to those discussed previously, in order to get the variance of the forecasted expected values and the mean squared error of the forecast. However, we need to work with the regression on the transformed variables, X* and Y*.

Y*i+1 = Yi+1 - ρYi. ⇒ Yi+1 = Y*i+1 + ρYi.

Where α* and ^β are the estimated coefficients for the regression on the transformed

variables, X* and Y*, one can write the forecasting equations as:

^YT+1 = ρYT +

^Y*T+1 = ρYT + α* +

^βX*T+1 = ρYT + α(1- ρ) +

^β(XT+1 - ρXT) .

^YT+2 = ρ

^YT+1 +

^Y*T+2 = ρYT+1 + α* +

^βX*T+2 = ρ2YT + (1 + ρ) α* +

^β(X*T+2 + ρX*T+1).

^YT+3 = ρ

^YT+2 +

^Y*T+3 = ρ3YT + (1+ ρ + ρ2) α* +

^β(X*T+3 + ρX*T+2 + ρ2X*T+1).

When the Cochrane-Orcutt Procedure was applied to the SAWW example, the fitted regression was Y* = 0.264477 + 0.0482171X*, with ρ = .95.The last observed value of ln[SAWW], YT = 6.7828, corresponds to X = 33.

X*T+1 = 34 - (.95)(33) = 2.65. X*T+2 = 35 - (.95)(34) = 2.7. X*T+3 = 36 - (.95)(35) = 2.75.

^YT+1 = ρYT + α* +

^βX*T+1 = (.95)(6.7828) + .264477 + (0.0482171)(2.65) = 6.8359.

300 One should always bear in mind that when one uses regression models for forecasting, one is assuming that the pattern in the past will more or less continue into the future. 301 See pages 216 and 217 of Pindyck and Rubinfeld.


^YT+2 = ρ2YT + (1 + ρ) α* +

^β(X*T+2 + ρX*T+1) =

(.952)(6.7828) + (1.95)(.264477) + (0.0482171)2.7 + (.95)(2.65) = 6.8888.

^YT+3 = ρ3YT + (1+ ρ + ρ2) α* +

^β(X*T+3 + ρX*T+2 + ρ2X*T+1) =

(.953)(6.7828) + (1 + .95 + .952)(.264477) + (0.0482171)2.75 + (.95)(2.7) + (.952)(2.65) = 6.9414. These match the forecasts gotten previously, subject to rounding. Treating the serial correlation coefficient, ρ, as known, one can determine the variances of the forecasted expected values:302

Var[^YT+1] = Var[ α*] + X*T+12Var[

^β] + 2 X*T+1Cov[ α*,

^β].

Var[^YT+2] = (1 + ρ)2Var[ α*] + (X*T+2 + ρX*T+1)2Var[

^β] + 2 (1 + ρ)(X*T+2 + ρX*T+1)Cov[ α*,

^β].

Var[^YT+3] = (1 + ρ + ρ2)2Var[ α*] + (X*T+3 + ρX*T+2 + ρ2X*T+1)2Var[

^β]

+ 2 (1 + ρ + ρ2)(X*T+3 + ρX*T+2 + ρ2X*T+1)Cov[ α*, ^β].

For the regression on the transformed variables, the covariance matrix of α* and ^β is:

(0.00022733 -0.000117075)(-0.000117075 0.0000641505)

Var[^YT+1] = Var[ α*] + X*T+12Var[

^β] + 2 X*T+1Cov[ α*,

^β] =

0.00022733 + (2.652)(0.0000641505) + (2)(2.65)(-0.000117075) = 0.00005733.

Var[^YT+2] = (1 + ρ)2Var[ α*] + (X*T+2 + ρX*T+1)2Var[

^β] + 2 (1 + ρ)(X*T+2 + ρX*T+1)Cov[ α*,

^β] =

(1.952)(0.00022733) + 2.7 + (.95)(2.65)2(0.0000641505) + (2)(1.95)2.7 + (.95)(2.65)(-0.000117075) = 0.00022848.

Var[^YT+3] = (1 + .95 + .952)2(0.00022733) +

2.75 + (.95)(2.7) + (.952)(2.65)2(0.0000641505) + (2) (1 + .95 + .952)2.75 + (.95)(2.7) + (.952)(2.65)(-0.000117075) = 0.00051241.

In order to get mean squared errors for the forecasts, we add s2 = .00043751, from the regression on the transformed variables. MSE Forecast forward one period: .00043751 + 0.00005733 = .00049484.MSE Forecast forward two periods: .00043751 + 0.00022848 = .00066599.MSE Forecast forward three periods: .00043751 + 0.00051241 = .00094992.

302 YT is the last observed value, which is a known constant, and does not contribute to the variance.


The regression on the transformed variables, had 33 - 1 = 32 values, and 32 - 2 = 30 degrees of freedom. For a 95% confidence interval, from the t-table one takes ±2.042 standard deviations.

Thus 95% confidence intervals for the observed ln[SAWW] are:Forecast forward 1 period: 6.8359 ± 2.042√.00049484 = 6.8359 ± .0454 = (6.7905, 6.8813).Forecast forward 2 periods: 6.8888 ± 2.042√.00066599 = 6.8888 ± .0527 = (6.8361, 6.9415).Forecast forward 3 periods: 6.8359 ± 2.042√.00094992 = 6.9414 ± .0629 = (6.8785, 7.0043).

Therefore, 95% confidence intervals for the observed SAWW are:Forecast forward one period: (889.36, 973.89).303 Forecast forward two periods: (930.85, 1034.32).304 Forecast forward three periods: (971.17, 1101.36).

An Actuarial Procedure:*

When dealing with such economic time series, an actuary might have fit a regression solely to estimate the annual percentage rate of increase, and then applied that annual increase to the latest observed point.

Using the original regression, the rate of increase is 1.06324, or 6.324% per year.Using the regression adjusted for serial correlation, the rate of increase is e0.0482171 = 1.04940, or 4.940% per year.Applying these increases to the latest observed value of 882.57 (at X = 33):

X Actual SAWW Forecast at 4.940% Forecast at 6.324%33 882.5734 884.46 926.17 938.3835 918.78 971.92 997.7336 958.58 1019.93 1060.8237 ??? 1070.32 1127.91

When ρ is very close to one, and we are predicting forwards only a few periods, this actuarial procedure gives a result similar to that gotten previously.

Using this previous procedure, with ρ = .95, the forecasts for ln[SAWW] were:

(Forecast for time 34) = .95(Y for time 33) + α.05 + 2.65^β.

(Forecast for time 35) = .95(Forecast for time 34) + .05 α + 2.7^β

= .9025(Y for time 33) + .0975 α + 5.2175 ^β.

(Forecast for time 36) = .95(Forecast for time 35) + .05 α + 2.75^β

= .857375(Y for time 33) + .142625 α + 7.706625 ^β.

303 Actual SAWW was 884.46, outside this 95% confidence interval.304 Actual SAWW was 918.78, outside this 95% confidence interval. Given that the previous value was much less than its forecast, one would have expected this value to also be significantly less than its forecast.


Using the actuarial procedure, the forecasts of ln[SAWW] are:

(Forecast for time 34) = (Y for time 33) + ^β.

(Forecast for time 35) = (Y for time 33) + 2^β.

(Forecast for time 36) = (Y for time 33) + 3^β.

The difference between the forecasts at time 36 (3 periods ahead) from this actuarial procedure and the previous procedure is:

(Y for time 33) + 3^β - .857375(Y for time 33) + .142625 α + 7.706625

^β =

.142625(Y for time 33) - ( α + 33) = (1 - ρ3)(Y at time 33) - (^Y at time 33) =

(1 - ρ3)(regression residual at time 33).

The difference between the forecasts J periods ahead from this actuarial procedure and the previous procedure is: (1 - ρJ)(regression residual at the latest observation).Thus for ρ close to one and J small, the actuarial procedure and the correct procedure give similar results, provided the regression residual at the last observation is not too large. If the regression exactly matches the last observation, then the results of using the two procedures are the same.


Problems:

Use the following information for the next 3 questions:One has a time series of 3 years of monthly points, t = 1, 2, 3, ..., 36, and Y = 88.3, 89.5, 91.1, ..., 135.7.Using the Cochrane-Orcutt procedure, in order to adjust for serial correlation, the serial correlation has been estimated at 0.80, and a regression has been fit: Y = 87.2 + 1.4 t.

35 .1 (2 points) Forecast the value of Y for t = 37.A. 137.5 B. 137.7 C. 137.9 D. 138.1 E. 138.2

35 .2 (2 points) Forecast the value of Y for t = 38. A. 138.8 B. 139.0 C. 139.2 D. 139.4 E. 139.6

35.3 (2 points) Forecast the value of Y for t = 39. A. 140.2 B. 140.4 C. 140.6 D. 140.8 E. 141.0

Use the following information for the next 4 questions:The population (in millions) of the United States was:year 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000population 76.2 92.2 106.0 123.2 132.2 151.3 179.3 203.3 226.5 249.6 281.4

35.4 (4 points) Fit an exponential regression to this data, ln(population) = α + βt. Use the model to predict the population (in millions) in 2030.A. 405 B. 410 C. 415 D. 420 E. 425

35.5 (3 points) What is the Durbin-Watson Statistic for this regression? A. 1 B. 5/4 C. 3/2 D. 7/4 E. 2

35.6 (9 points) Apply the Cochrane-Orcutt procedure, in order to correct for serial correlation. Use the revised model to predict the population (in millions) in 2010, 2020, and 2030.Use a computer to help you with the calculations.

35.7 (9 points) Apply the Hildreth-Lu procedure, in order to correct for serial correlation.Use the revised model to predict the population (in millions) in 2010, 2020, and 2030.Use a computer to help you with the calculations.


Use the following information for the next 4 questions:One has 10 years of monthly values of the Consumer Price Index for All Urban Consumers, t = 1, 2, 3, ..., 120. The value of the Price Index for t = 120 is 190.3.Using the Hildreth-Lu procedure, in order to adjust for serial correlation, the serial correlation has been estimated as 0.90, and a linear regression has been fit: Y* = 14.981 + .33590 t*,where Y* = Yi+1 - .9Yi, and t* = i+1 - .9i.

For this regression, R2 = .884 and s2 = .1775.

35.8 (2 points) Forecast the value of Y (the price index) for t = 121.(A) Less than 190.4(B) At least 190.4, but less than 190.5(C) At least 190.5, but less than 190.6(D) At least 190.6, but less than 190.7(E) At least 190.7

35.9 (2 points) Forecast the value of Y for t = 122. (A) Less than 190.7(B) At least 190.7, but less than 190.8(C) At least 190.8, but less than 190.9(D) At least 190.9, but less than 191.0(E) At least 191.0

35.10 (2 points) Forecast the value of Y for t = 123. (A) Less than 190.9(B) At least 190.9, but less than 191.0(C) At least 191.0, but less than 191.1(D) At least 191.1, but less than 191.2(E) At least 191.2

35.11 (3 points) Determine a 95% confidence intervals for the value of the price index

observed at t = 121. Hint: i n n 1 2n 1 62i=1

n∑∑ == ++ ++( )( ) / .


Mahler’s Guide to

Regression

Sections 36-43:

36 Standardized Coefficients37 Elasticity38 Partial Correlation Coefficients39 * Regression Diagnostics40 * Stepwise Regression41 * Stochastic Explanatory Variables42 * Generalized Least Squares43 Nonlinear Estimation

VEE-Applied Statistical Methods Exam prepared by


Study Aid F06-Reg-J


Section 36, Standardized Coefficients305

In the two variable regression model, X and Y may be in different units. For example X may be the rate of unemployment and Y might be the workers’ compensation insurance claim frequency. In multiple regression some of the independent variables may be in different units.For example, X1 might be in the mean annual income and X2 might be the unemployment rate. Even two variables both in dollars might have significantly different means and variances, for example the annual cost of Homeowners losses due to theft and the annual cost of Homeowners losses due to hurricanes.

Therefore, often one standardizes the variables, each independent variable as well as the dependent variable, prior to performing a regression. To standardize a variable, one subtracts its mean and then divides each variable by its standard deviation.306 The standardized variables than each have a mean of 0 and a standard deviation of 1.

Since the standardized variables each have a mean of 0, the intercept vanishes from the regression equation (the intercept is zero.)

Two Variable Model:

In the two variable regression model X∗ = (X - X)/sX, and Y∗ = (Y - Y )/sY.

The regression model becomes: Y∗ = β∗X∗ + ε.

In standardized form, the regression goes through the origin; the intercept is zero.

The slope of the standardized regression is related to that of the original regression:

ββ^* = ^ββsX/sY.

The slope is ∆y/∆x, so to standardize we divide the numerator by sY and divide the

denominator by sX; we multiply ^β by sX/sY.

Exercise: You fit a two variable regression Y = α + βt + ε to the data:Year (t) Loss Ratio (Y)1 822 783 804 735 77

With result ^β = Σxiyi/Σxi2 = -15/10 = -1.5, and α = Y -

^βX = 78 - (-1.5)(5) = 82.5.

What is the standardized regression equation?[Solution: sX2 = Σxi2/(5 -1) = 10/4 = 2.5. sY2 = Σyi2/(5 -1) = (16 + 0 + 4 + 25 + 1)/4 = 11.5.

β^* = ^βsX/sY = -1.5(2.5/11.5)1/2 = -.70. The standardized regression is: Y∗ = -.70X∗.]

305 See Section 4.5.1 of Econometric Models and Economic Forecasts, by Pindyck and Rubinfeld.306 This is the same way one standardizes a variable in order to use the Standard Normal Table.

HCMSA-F06-Reg-J, Mahler’s Guide to Regression, 7/12/06, Page 343

Therefore, a year with time one standard deviation above the average time of 3, has a fitted loss ratio 0.7 standard deviation less than average. For example, (6 - 3)/√2.5 = 1.90. Therefore, the forecasted loss ratio for year 6 is: (1.90)(.70) = 1.33 standard deviation below the mean loss ratio. This is: 78 - 1.33√11.5 = 73.5.

Note that this is the same result obtained using the original regression: 82.5 + (6)(-1.5) = 73.5. Thus while a standardized regression allows one to better interpret or compare the impacts of variables, the fitted values are the same.

Note that in this example, the (simple) correlation of X and Y is:rXY = Σxiyi/ (N-1)/√Σxi2/ (N-1)Σyi2/ (N-1) = Σxiyi/ √Σxi2Σyi2 = -15/√(10)(46) = -.70 = β∗.

In general, for the two variable regression, the standardized slope is equal to

correlation of X and Y. ββ^* = rXY.

Multiple Regression:

In the case of the multiple regression model, in a similar manner the standardized slopes are:

β^*2 =

^β2 2Xs /sY =

^β2√(Σx2i2/Σyi2), β

^*3 = ^β3 3Xs /sY =

^β3√(Σx3i2/Σyi2), etc.

One can compare the standardized coefficients directly. The larger the absolute value of the standardized coefficient, the more important the corresponding variable is in determining the value of Y.307

Three Variable Model:

In the case of three variables, as shown below, the standardized slopes can be written in terms of correlations:

β^*2 = ( rYX2

- rYX3rX X2 3

) / (1 - rX X2 32).

β^*3 = ( rYX3 - rYX2

rX X2 3) / (1 - rX X2 3

2).

If X2 and X3 are uncorrelated, then rX X2 3 = 0 and β2

* = rYX2 and β3* = rYX3

.

^β2 = (Σx2iyi Σx3i2 - Σx3iyi Σx2ix3i / Σx2i2 Σx3i2 - (Σx2ix3i)2 =

(Σx2iyi - Σx3iyi Σx2ix3i/Σx3i2 / Σx2i2 - (Σx2ix3i)2/Σx3i2 =

( rYX2 2Xs sY - rYX3sY rX X2 3 2Xs ) / ( 2Xs 2 - rX X2 3

22Xs 2 = sY(rYX2

- rYX3rX X2 3

) /2Xs (1 - rX X2 3

2).

β^*2 =

^β2 2Xs /sY = ( rYX2

- rYX3rX X2 3

) / (1 - rX X2 32).

By symmetry, the similar result holds, β^*3 = ( rYX3 - rYX2

rX X2 3) / (1 - rX X2 3

2).

307 However, one must keep in mind that variables may interact in determining the value of Y.


t-statistics:

t = ^β2 /s

2β. In standardized form, β^*

2 = ^β2 2Xs /sY. Therefore its standardized error is

^β2 2Xs /sY.

Therefore, the t-statistics for the regression in standardized form are the same as for the original regression.

The Idea Behind Standardizing:*

Let us assume that one of the independent variables is in feet. What if instead of using feet we were to express it in terms of inches? Then all of the values of this independent variable would be multiplied by 12, and its coefficient in the model would be divided by 12.308 However, in terms of inches the standard deviation of this independent variable would be multiplied by 12.The standardized coefficient of this variable would be unaffected.

The same result would hold for a change of scale in the dependent variable. If the dependent variable would expressed in inches rather than feet, then the slopes would all be multiplied by 12. However, the sample standard deviation would also be multiplied by 12. Therefore, the standardized coefficients would be unaffected.

In general, standardized coefficients are unaffected by changes in scale of the variables. The standardized coefficients can be written in terms of correlations, dimensionless quantities unaffected by changes of scale. Therefore, the standardized coefficients are dimensionless quantities, unaffected by changes of scale.

Exercise: Assume the exchange rate is 0.8 dollars per euro.

Assume that when the independent variable X2 is expressed in euros, ^β2 = 100 and β^*

2 = 30.

If instead X2 is expressed in dollars, what are the values of ^β2 and β^*

2?

[Solution: ^β2 is divided by 0.8. The new value of the coefficient is: 100/0.8 = 125.

2Xs is multiplied by 0.8. β^*2 =

^β2 2Xs /sY remains unchanged at 30.]

For example, assume X2 is in dollars and Y is in tons.

Then, ^β2 is in units of tons per dollar, just as ∆Y/∆X2.

2Xs is in dollars, just as X2. sY is in tons, just as Y.

Therefore, β^*2 =

^β2 2Xs /sY is a pure number.

In general, standardized variables and standardized coefficients are unit-less numbers

308 The coefficient is like ∆Y/∆X. If all the X values are multiplied by 12, then the slope is divided by 12.


Problems:

36 .1 (3 points) Data on 20 observations have been fit:

^Y = 1470.3 + 0.8145X2 + 0.8204X3 + 13.5287X4.

2X = 9213, 3X = 35311.3, 4X = 1383.35, Y = 56660.

sα = 5746. s2β = .5122. s

3β = .2112. s

4β = .5857.

Var[X2] = 28.98 million. Var[X3] = 236.4 million. Var[X4] = .2169 million. Var[Y] = 514.9 million.Determine the equivalent model if each of the variables is standardized by subtracting its mean and dividing by its standard deviation.

36.2 (3 points) The following linear regression has been fit:

^Y = 177.703 - 0.715143X2 - 0.873252X3 + 31.2728X4 - 17.8078X5 + 9.98376X6. The variance-covariance matrix of the variables in the regression is:

Y X2 X3 X4 X5 X6 Y 317.564 -11.2631 -6.42264 6.12579 -0.311231 1.33801 X2 -11.2631 3.90916 0.862264 -0.231626 -0.0822102 -0.193801X3 -6.42264 0.862264 16.4044 0.307817 -0.0320755 -0.168104 X4 6.12579 -0.231626 0.307817 0.226415 0.0226415 -0.0449236X5 -0.311231 -0.0822102 -0.0320755 0.0226415 0.0622642 0.000269542X6 1.33801 -0.193801 -0.168104 -0.0449236 0.000269542 0.246631Based on the standardized coefficients, which of the variables is the single most important determinant of Y?A. X2 B. X3 C. X4 D. X5 E. X6

36.3 (3 points) The correlation of X2 and X3 is -.2537.The correlation of X2 and Y is .7296. The correlation of X3 and Y is .3952.

For the model Y = β1 + β2X2 + β3X3 + ε, determine the value of β3* , the standardized coefficient associated with X3.(A) 0.56 (B) 0.58 (C) 0.60 (D) 0.62 (E) 0.64

36.4 (3 points) You given the following 5 observations: X: 1 2 3 4 5Y: 202 321 404 480 507Determine the linear regression model if each of the variables is standardized by subtracting its mean and dividing by its standard deviation.


36.5 (2 points) The following linear regression has been fit:

^Y = 6.3974 + 2.4642X2 + 6.4560X3 + 1.1839X4. The variance-covariance matrix of the variables in the regression is:

Y X2 X3 X4Y 1527.64 162.893 121.071 181.714X2 162.893 64.9821 -13.9643 78.5714X3 121.071 -13.9643 27.6429 -19.4286X4 181.714 78.5714 -19.4286 96.0000

β^*j is the standardized regression coefficient associated with Xj.

Which of the following is correct?

(A) β^*2 > β^*3 > β^*4

(B) β^*2 > β^*4 > β^*3

(C) β^*3 > β^*4 > β^*2

(D) β^*4 > β^*2 > β^*3

(E) None of A, B, C, or D

36.6 (3 points) Given the following information: ΣXi = 128. ΣYi = 672. ΣXi2 = 1853. ΣYi2 = 14,911. ΣXiYi = 4120. N = 40.

Determine β^* , the standardized regression coefficient. A. Less than 0.60 B. At least 0.60, but less than 0.70 C. At least 0.70, but less than 0.80 D. At least 0.80, but less than 0.90 E. At least 0.90

36.7 (3 points) You given the following observations: X: 40 60 80 100 120 140 160Y: 15.9 18.8 21.6 25.2 28.7 30.4 30.9Determine the value of β∗, the standardized regression slope.A. 0.965 B. 0.970 C. 0.975 D. 0.980 E. 0.985


36.8 (4, 11/00, Q.37) (2.5 points) Data on 28 home sales yield the fitted model:

^Y = 43.9 + 0.238X2 - 0.000229X3 + 0.14718X4 - 6.68X5 - 0.269X6. whereY = sales price of homeX2 = taxesX3 = size of lotX4 = square feet of living spaceX5 = number of roomsX6 = age in yearsYou are given that the estimated variance-covariance matrix (lower-triangular portion) of thevariables in the regression is:

Y X2 X3 X4 X5 X6Y 20,041.4X2 36,909.0 80,964.2X3 229,662.6 439,511.8 5,923,126.9X4 71,479.2 129,032.9 907,497.1 300,121.4X5 127.2 244.5 1,589.3 532.5 1.3X6 –585.4 –1,420.5 –12,877.4 –1,343.4 0.2 190.9

β^*j is the standardized regression coefficient associated with Xj.

Which of the following is correct?

(A) β^*2 > β^*3 > β^*4

(B) β^*2 > β^*4 > β^*3

(C) β^*3 > β^*4 > β^*2

(D) β^*4 > β^*2 > β^*3

(E) β^*4 > β^*3 > β^*2


36.9 (4, 5/01, Q.13) (2.5 points) Applied to the model Yi = β1 + β2X2i + β3X3i + εi, the method of least squares implies:Σ(Yi - Y )(X2i - 2X ) = β2Σ(X2i - 2X )2 + β3Σ(X2i - 2X )(X3i - 3X )

Σ(Yi - Y )(X3i - 3X ) = β2Σ(X2i - 2X )(X3i - 3X ) + β3Σ(X3i - 3X )2.You are given:(i) Σ(Yi - Y )(X2i - 2X )

rYX2 = = 0.4.

√Σ(Yi - Y )2Σ(X2i - 2X )2

(ii) Σ(Yi - Y )(X3i - 3X )

rYX3 = = 0.9.

√Σ(Yi - Y )2Σ(X3i - 3X )2

(iii) Σ(X2i - 2X )(X3i - 3X )

rX X2 3 = = 0.6.

√Σ(X2i - 2X )2Σ(X3i - 3X )2

Determine the value of β2* , the standardized coefficient associated with X2.

(A) –0.7 (B) –0.2 (C) 0.3 (D) 0.8 (E) 1.0

36.10 (2 points) In the previous question, determine the value of β3* , the standardized coefficient associated with X3.(A) –0.7 (B) –0.2 (C) 0.3 (D) 0.8 (E) 1.0


Section 37, Elasticity309

The elasticity measures the percent change in the dependent variable for a given percent change in an independent variable. In regressions, the elasticity is most commonly calculated at the mean value of each variable.

For example, in the heights regression example, we had X = Height of father. X = 59.250.Y = Height of son. Y = 61.125.Fitted regression model: Y = 24.1 + .625X.

Near the mean, the percentage change in height of a father is approximately: ∆X/X.

Near the mean, the percentage change in height of a son is approximately: ∆Y/Y.

For this model, ∆Y/∆X = ^β = .625.

(percent change in height of a son)/ (percent change in height of a father) ≅ .625X/Y =(.625)(59.250/61.125) = .61.

Thus in this case, near the mean, a 1% change in the height of the father is expected to result in about a 0.61% change in the height of the son.

For a simple linear regression, the coefficient of elasticity related to the jth variable is:

Ej = ^

jββ jX /Y

Exercise: In the three variable regression example involving the loss ratios of agents:

2X = 52.0. 3X = 21.5. Y = 73.6. ^Y = 62.3 + .126X2 + .222X3.

What are the elasticities?

[Solution: E2 = ^β2 2X /Y = (.126)(52.0)/73.6 = .089. E3 =

^β3 3X /Y = (.222)(21.5)/73.6 = .064.]

For example, assume X2 is in dollars and Y is in tons.

Then, ^β2 is in units of tons per dollar, just as ∆Y/∆X2.

2X is in dollars, just as X2. Y is in tons, just as Y.

Therefore, E2 = ^β2 2X /Y is a pure number.

In general, elasticities are are unit-less numbers Elasticities may be any real number, positive, negative, or zero.

Large absolute value of an elasticity ⇔⇔⇔⇔the dependent variable is responsive to changes in that independent variable.

309 See Section 4.5.2 of Pindyck and Rubinfeld.


Elasticities at Other Values than the Mean:*

While Pindyck and Rubinfeld concentrate on the elasticity at the mean of the variables, one can also look at the elasticity at other values of the variables.

Ei = (∂Y/∂Xi)(Xi/Y).

For example, let Y = 10 + 2X2 - 5X3. Then E2 = 3X2/(10 + 3X2 - 5X3), and E3 = 5X3/(10 + 3X2 - 5X3).

The values of these elasticities, depend on the values of X2 and X3. For example, at X2 = 4 and X3 = 3, E1 =12/(10 + 12 - 15) = 1.71, and E2 = -15/(10 + 12 - 15) = -2.14.At X2 = 10 and X3 = -3, E1 = 20/(10 + 20 + 15) = 0.444, and E2 = 15/(10 + 20 + 15) = 0.333.Thus the percentage change in the dependent variable due to a percentage change in an independent variable, depends on where in the domain of the regression we look.

More Complicated Models:

More generally the elasticity of Y with respect to Xi is:

(% change in Y)/(%change in Xi) = (∆Y/Y)/(∆Xi/Xi) ≅ (∂∂∂∂Y/∂∂∂∂Xi)(Xi/Y).310

For example, assume a regression model has been fit:Y = 7 + 3X2 + 4X3, with X2 = 10 and X3 = 5. ⇒ Y = 7 + 30 + 20 = 57.

At the means, the elasticity of Y with respect to X2 is: (∂Y / ∂X2)(X2/Y) = 3(10/57) = 0.526.

For this linear model, one could instead calculate E2 = ^β2 2X /Y = (3)(10/57) = 0.526.

For a linear model, ∂Y / ∂Xi = βi, and therefore the more general definition of elasticity reduces to that given previously for the linear model.

Exercise: At the means, what is the elasticity of Y with respect to X3?

[Solution: (∂Y / ∂X3)( X3/Y) = 4(5/57) = .351.]

For a different data set, assume a model with an interactive term:Y = 10 - 5X2 + 3X3 + 2X2X3, with X2 = 8, X3 = 11, and Y = 200.

∂Y / ∂X2 = -5 + 2X3. Thus, the elasticity of Y with respect to X2 depends on the level of X3. One could either determine the elasticity Y with respect to X2 for a stated value of X3 or use the average value of X3.For X3 = 11, ∂Y / ∂X2 = 17, and the elasticity Y with respect to X2 is: (17)(8/200) = .680.

Exercise: What is the elasticity of Y with respect to X3, at the average level of X2?

[Solution: ∂Y / ∂X3 = 3 + 2X2 = 3 + (2)(8) = 19. (∂Y / ∂X3)(X3/Y) = (19)(11/200) = 1.045.]

310 This the manner in which elasticities are usually calculated in economics.


For yet another data set, assume the model:ln Y = 4 + 0.5lnX2 + 0.3lnX3.Y = exp[ 4 + 0.5lnX2 + 0.3lnX3].∂Y / ∂X2 = Y(0.5/X2). Taking the partial derivative at the mean of X2,

Elasticity of Y with respect to X2 is: (∂Y / ∂X2)(X1/Y) = 0.5.

When a model is estimated in logarithms rather than in levels, the variable coefficients can be interpreted as elasticities.

Exercise: At the means, what is the elasticity of Y with respect to X3?

[∂Y / ∂X3 = Y(0.3/X3). (∂Y / ∂X3)( X3/Y) = 0.3.]

elasticity = (∆Y/Y)/(∆Xi/Xi) ≅ (∆ lnY)/(∆ lnXi) ≅ (∂ lnY / ∂ lnXi). elasticity ≅ change in lnY per change in lnXi.

Standardized Coefficients versus Elasticities:*

Both standardized coefficients and elasticities are unit-less quantities that are useful to measure which independent variable is most important in a multiple regression.

When there is a multiplicative relationship between the variables, in other words a linear model in terms of their logarithms, then it make sense to use elasticities, since they measure the percentage change in the independent variable based on a percentage change in the dependent variable. When a multiplicative relationship holds, the variable coefficients can be interpreted as elasticities.

If on the other hand, the relationship is additive/linear, then it makes somewhat more sense to use standardized coefficients, since they measure the change in the independent variable based on a change in the dependent variable.


Problems:

37 .1 (2 points) For a four variable linear regression model:

2X = 8.875. 3X = -0.750. 4X = 10.000. Y = 35.250.

^β1 = 6.3974,

^β2 = 2.4626,

^β3 = 6.4560,

^β4 = 1.1839.

^sβ1

= 9.27, ^sβ2

= 7.92, ^sβ3

= 1.34, ^sβ4

= 6.64.

Rank the absolute values of the elasticities, at the means of each variable, from smallest to largest.A. |E2| < |E3| < |E4|B. |E2| < |E4| < |E3|C. |E3| < |E2| < |E4|D. |E4| < |E3| < |E2|E. None of A, B, C, or D.

37.2 (2 points) Data on 20 observations have been fit:

^Y = 1470.3 + 0.8145X2 + 0.8204X3 + 13.5287X4.

2X = 9213, 3X = 35311.3, 4X = 1383.35, Y = 56660.

sα = 5746. s2β = .5122. s

3β = .2112. s

4β = .5857.

Var[X2] = 28.98 million. Var[X3] = 236.4 million. Var[X4] = .2169 million. Var[Y] = 514.9 million.Determine the elasticities, at the means of each variable.

Use the following information for the next three questions:The following model has been fit via least squares:ln Yi = -4.30 - .002D2i + .336 ln(X3i) + .384 X4i + .067D5i - .143D6i + .081D7i + .134 ln(X8i),

where D2, D5, D6, and D7 are dummy variables.

37.3 (1 point) Estimate the elasticity of Y with respect to X3.(A) .134 (B) .336 (C) .384 (D) .857 (E) Can not be determined





A regression has been fit to five observations: ^Y = 700 + 60X2 - 3X3.

X2 takes on the values 1, 2, 3, 4, and 5.X3 takes on the values 300, 500, 100, 400, and 200.TSS = 20,000.

37.6 (3 points) Determine the standardized coefficients.Briefly discuss the meaning of your results.

37.7 (3 points) Determine the elasticities at the means of each variable.Briefly discuss the meaning of your results.

37.8 (4, 11/04, Q.27) (2.5 points) You are given the following model:ln Yt = β1 + β2 ln X2t + β3 ln X3t + β4(ln X2t - ln X2t0 )Dt + β5(ln X3t - ln X3t0 )Dt + εt

where,• t indexes the years 1979-93 and t0 is 1990.• Y is a measure of workers’ compensation frequency.• X2 is a measure of employment level.• X3 is a measure of unemployment rate.• Dt is 0 for t ≤ t0 and 1 for t > t0.Fitting the model yields:

^β =

4 00

0 60

0 10

0 07

0 01

.

.

.

.

.

−−

−−

−−

Estimate the elasticity of frequency with respect to employment level for 1992.(A) –0.11 (B) 0.53 (C) 0.60 (D) 0.90 (E) 1.70


Section 38, Partial Correlation Coefficients

The partial correlation coefficient measures the effect of Xj on Y which is not accounted for by the other variables. For the example involving agent’s loss ratios versus percent of business in three states, let’s see how to calculate partial correlation coefficients.

The model ^Y = 62.3 + .126X2 + .222X3, with R2 = 0.851 had been fit to the following data:

Agent X2 X3 Y1 100 0 752 90 10 783 70 0 714 65 10 735 50 50 796 50 35 757 40 10 658 30 70 829 15 20 7210 10 10 66

Exercise: What are the sample correlations of X2 and X3, X2 and Y, and X3 and Y?

[Solution: rX X2 3 = Σx2x3/√(Σx22)(Σx32) = -.400.

rYX2 = Σx2y/√(Σx22)(Σy2) = .318.

rYX3 = Σyx3/√(Σy2)(Σx32) = .666.]

One can calculate the partial correlation coefficients in terms of these sample correlations.311

The partial correlation coefficient of Y and X2 controlling for X3 is:

rYX .X2 3 = ( rYX2

- rYX3rX X2 3

)/√(1 - rYX32)(1 - rX X2 3

2) = .855.

Similarly, the partial correlation coefficient of Y and X3 controlling for X2 is:

rYX .X3 2 = ( rYX3

- rYX2rX X2 3

)/√(1 - rYX22)(1 - rX X2 3

2) = .913.

It turns out that one can also calculate the partial correlation coefficients in terms of R2 and sample correlations.312

rYX .X2 32 = (R2 - rYX3

2)/(1 - rYX32) = (.851 - .6662)/(1 - .6662) = .732. rYX .X2 3

= ±.856.

rYX .X3 22 = (R2 - rYX2

2)/(1 - rYX22) = (.851 - .3182)/(1 - .3182) = .834. rYX .X2 3

= ±.913.

Matching the previous results, subject to determining the sign and subject to rounding.311 See equations 4.16 and 4.17 in Pindyck and Rubinfeld.312 See equation 4.18 in Pindyck and Rubinfeld.


R2 is the portion of the variation of Y explained by the model, involving both X2 and X3.

(R2 - rYX32)/(1 - rYX3

2) = .732 is the percentage of the variation of Y which is accounted for by

the part of X2 that is uncorrelated with X3.

In this case, rYX .X2 32 = 73.2% of the variation of Y is accounted for by the part of X2 that is

uncorrelated with X3.

In general, the square of the partial correlation coefficient measures the percentage of the variation of Y that is accounted for by the part of Xj that is uncorrelated with the other variables.

Exercise: What percentage of the variation of Y is accounted for by the part of X3 that is uncorrelated with X2?

[Solution: rYX .X3 22 = 83.4%]

Using Regressions to Estimate the Partial Correlation Coefficients:

Here is another way to estimate the partial correlation coefficients using regressions.For example, here is a calculation of rYX .X2 3

.

First run the regression of Y on just X3.

^Y = 70.24 + .1564X3.

Next run the regression of X2 on X3.

X2^ = 63.10 - .5164X3.

Eliminate the effect of X3 on Y and X2:

Y* = Y - ^Y and X*2 = X2 - X2

^ .

Take the sample correlation of X*2 and Y*.Corr[X*2, Y*] = ΣX*2 Y* - ΣX*2ΣY*/N/√(ΣX*2 X*2 - ΣX*2ΣX*2/N)(ΣY* Y* - ΣY*ΣY*/N) =850.81 - (.03)(-.03)/10/√(6729.3 - .032/10)(146.96 - .032/10) = .856.

The partial correlation coefficient of Y and X2 controlling for X3 is rYX .X2 3 = .856,

matching the previous result.


X2 X3 Y Fitted Y Y * Fitted X2 X 2 * Y*X2* Y*^2 X2*^2

1 0 0 0 7 5 70.24 4.76 63.10 36.90 175.64 22.66 1361.619 0 1 0 7 8 71.80 6.20 57.94 32.06 198.67 38.39 1028.107 0 0 7 1 70.24 0.76 63.10 6.90 5.24 0.58 47.616 5 1 0 7 3 71.80 1.20 57.94 7.06 8.45 1.43 49.905 0 5 0 7 9 78.06 0.94 37.28 12.72 11.96 0.88 161.805 0 3 5 7 5 75.71 - 0 . 7 1 45.03 4.97 - 3 . 5 5 0.51 24.744 0 1 0 6 5 71.80 - 6 . 8 0 57.94 - 1 7 . 9 4 122.04 46.29 321.703 0 7 0 8 2 81.19 0.81 26.95 3.05 2.47 0.66 9.291 5 2 0 7 2 73.37 - 1 . 3 7 52.77 - 3 7 . 7 7 51.67 1.87 1426.721 0 1 0 6 6 71.80 - 5 . 8 0 57.94 - 4 7 . 9 4 278.22 33.69 2297.86

Sum - 0 . 0 3 0.03 850.81 146.96 6729.3Avg. - 0 . 0 0 3 0.003

One can calculate rYX .X3 2 in a similar manner:

1. Run the regression of Y on just X2.

2. Run the regression of X3 on X2.

3. Eliminate the effect of X2 on Y and X3:

Y* = Y - ^Y and X*3 = X3 - X3

^ .

4. rYX .X3 2 is the simple correlation of Y* and X*3.

Exercise: For the agent’s regression, use this technique to calculate rYX .X3 2.

[Solution: Regression of Y on just X2: ^Y = 70.59 + .0578X2.

^Y = 76.37, 75.80, 74.64, 74.35, 73.48, 73.48, 72.91, 72.33, 71.46, 71.17.

Y* = Y - ^Y = -1.37, 2.20, -3.64, -1.35, 5.52, 1.52, -7.91, 9.67, 0.54, -5.17.

Regression of X3 on just X2: X3^ = 37.60 - .3096X2.

X3^ = 6.64, 9.73, 15.93, 17.48, 22.12, 22.12, 25.22, 28.31, 32.96, 34.50.

X*3 = X3 - X3^ = -6.64, 0.27, -15.93, -7.48, 27.88, 12.88, -15.22, 41.69, -12.96, -24.50

rYX .X3 2 = Corr[Y* , X*3 ] = .913, matching the result obtained previously.]


Multiple Regression with More Than Two Independent Variables:*

With three independent variables, one could calculate rYX .X X2 3 4in a similar manner:

1. Run the regression of Y on just X3 and X4.

2. Run the regression of X2 on X3 and X4.

3. Eliminate the effect of X3 and X4 on Y and X2:

Y* = Y - ^Y and X*2 = X2 - X2

^ .

4. rYX .X X2 3 4 is the simple correlation of Y* and X*2.

Relation to the Standardized Coefficients:*

For the three variable model, we had:β2

* = ( rYX2 - rYX3

rX X2 3) /(1 - rX X2 3

2) and β3* = ( rYX3 - rYX2

rX X2 3) /(1 - rX X2 3

2).

rYX .X2 3 = ( rYX2

- rYX3rX X2 3

)/√(1 - rYX32)(1 - rX X2 3

2).

rYX .X3 2 = ( rYX3

- rYX2rX X2 3

)/√(1 - rYX22)(1 - rX X2 3

2).

The numerators of β2* and rYX .X2 3

are the same, as are those of β3* and rYX .X3 2.

β2* = rYX .X2 3

√(1 - rYX32)/(1 - rX X2 3

2).

β3* = rYX .X3 2√(1 - rYX2

2)/(1 - rX X2 32).


Problems:

Use the following information for the next two questions:• The correlation of X2 and X3 is -.2537.• The correlation of X2 and Y is .7296.• The correlation of X3 and Y is .3952.

38.1 (2 points) For the model Y = β1 + β2X2 + β3X3 + ε, determine rYX .X2 3, the partial

correlation of Y with X2.(A) 0.85 (B) 0.88 (C) 0.90 (D) 0.93 (E) 0.95

38.2 (2 points) For the model Y = β1 + β2X2 + β3X3 + ε, determine rYX .X3 2, the partial

correlation of Y with X3.(A) 0.85 (B) 0.88 (C) 0.90 (D) 0.93 (E) 0.95

Use the following information for the next two questions:• For the model Y = β1 + β2X2 + β3X3 + ε, R2 = 0.90.• The correlation of X2 and Y is 0.60.• The correlation of X3 and Y is 0.40.

38.3 (2 points) Determine the absolute value of the partial correlation of Y with X2.(A) 0.86 (B) 0.88 (C) 0.90 (D) 0.92 (E) 0.94

38.4 (2 points) Determine the absolute value of the partial correlation of Y with X3.(A) 0.86 (B) 0.88 (C) 0.90 (D) 0.92 (E) 0.94

38.5 (6 points) For the multiple regression model Y = β1 + β2X2 + β3X3 + β4X4 + β5X5 + ε,you are given:Independent Fitted Partial Standardized Variable Slope Elasticity Correlation Coefficient CoefficientX2 0.00875 .649 .867 .911X3 -1.927 -.337 -.471 -.395X4 -3444 -.062 -.561 -.537X5 2093 .271 .776 .390Briefly interpret each of these values.


38 .6 (Course 120 Sample Exam #3, Q.5) (2 points) You fit a multiple linear regression function relating Y with X2 and X3. The simple correlation coefficients are:rX X2 3

= -.878, rYX2= .970, rYX3

= -0.938.

Calculate rYX .X3 2, the partial correlation of Y with X3.

(A) -0.7 (B) -0.1 (C) 0.5 (D) 0.9 (E) 1.0

38.7 (2 points) In the previous question, calculate rYX .X2 3, the partial correlation of Y with X2.

(A) .86 (B) .88 (C) .90 (D) .92 (E) .94

38 .8 (Course 4 Sample Exam, Q.5) (2.5 points) For the multiple regression model

Yi = β1 + β2X2i + β3X3i + β4X4i + εi,you are given:Independent Partial Standardized ElasticityVariable Correlation Coefficient Coefficient X2 0.64 0.50 0.20X3 -0.04 -0.01 -0.01X4 0.70 0.40 0.60Which of the following is implied by this model?(A) 16% of the variance of Y not accounted for by X2 and X3 is accounted for by X4.(B) An increase of 1 standard deviation in X2 will lead to an increase of 0.64 standarddeviations in Y.(C) An increase of 1% in X2 will lead to an increase of 0.20% in Y.(D) An increase of 1 unit in X3 will lead to a decrease of 0.04 units in Y.(E) X4 is a more important determinant of Y than X2 is.

38.9 (4, 11/02, Q.12) (2.5 points) For the three variables Y, X2 and X3, you are given the following sample correlation coefficients:rYX2

= 0.6

rYX3 = 0.5

rX X2 3 = 0.4

Calculate rYX .X2 3, the partial correlation coefficient between Y and X2.

(A) 0.50 (B) 0.55 (C) 0.58 (D) 0.64 (E) 0.73

38.10 (4, 11/04, Q.11) (2.5 points) For the model Y = β1 + β2X2 + β3X3 + ε, you are given:(i) rYX2

= 0.4

(ii) rYX .X3 2 = -0.4.



Section 39, Regression Diagnostics*313

There are additional items one can look at in order to examine a fitted model. This section will discuss three of these: studentized residuals, DFBETAS, and Cook’s D.

An Example of a Regression:

Take the set of 14 observations:X: 1.3 2.0 2.7 3.3 3.7 4.0 4.7 5.0 5.3 5.7 6 6.3 6.7 7.0Y: 2.3 2.8 2.2 3.8 1.7 2.8 3.2 1.8 3.5 3.4 3.2 3.0 5.9 3.9

A linear regression was fit to this data:

α = 1.6042. sα = 0.6952 tα = 2.307.

^β = 0.3303. βs = 0.1430 tβ = 2.310.

s2 = 0.838. R2 = 0.308. 2

R = 0.250.TSS = 14.5293. RSS = 4.4709 ESS = 10.0584

Here is a graph of the residuals:

2 3 4 5 6 7X

-1.5

-1

-0.5

0.5

1

1.5

2

Resid.

We observe that some of the absolute values of the residuals are large, for example at X = 6.7 the residual is 2.083.

313 See Section 7.4 of Pindyck and Rubinfeld, not on the syllabus. For a more complete discussion, see for example Chapter 8 of Applied Regression Analysis by Draper and Smith.


Studentized Residuals:314

One way to judge the size of the residuals is to “studentize” them.

Let s(i) be the standard error of the regression excluding observation i.Excluding the 13th observation, (6.7, 5.9), the fitted regression is:

α = 2.031. sα = 0.5132^β = 0.1964. βs = 0.1094. s2 = 0.4310.

Let H = X(X’X)-1X’.315 For the original regression, h13,13 = 0.1842.

Then as discussed in a previous section, the covariance matrix of the residuals is: (I - H)σ2.

Thus an estimate of the variance of îε is (1 - hii)s2.

Let the studentized residual be: îεε * = ^

iεε / s(i)√√√√ (1 - hii).316

Therefore, the studentized residual corresponding to the 13th observation is:2.0827/√(.4310)(1 - .1842) = 3.512.

Here is a graph of the studentized residuals:317

2 3 4 5 6 7X

-1

1

2

3

Resid.

314 Also called Externally Studentized Residuals, in order to distinguished them from Internally Studentized Residuals, also called Standardized Residuals. See Section 8.1 of Draper and Smith. 315 X is the design matrix. As discussed previously H is called the hat matrix. See Section 8.1 of Draper and Smith.316 See equation 8.1.18 in Draper and Smith. This seems to differ somewhat from Equation 7.16 in Pindyck and Rubinfeld.317 Many regression software packages will calculate studentized residuals. I used Mathematica.


The studentized residuals have a t-distribution with n - k - 1 degrees of freedom.318

For this example, the studentized residuals have a t-distribution with 14 - 2 - 1 = 11 d. f.The 5% critical value is 2.201, while the 1% critical value is 3.106.Thus the studentized residual corresponding to the 13th observation of 3.512 is significantly large. One could treat the 13th observation, (6.7, 5.9), as an “outlier”.

An outlier is an observation that is far from the fitted least squares line. One can compare studentized residuals to the t-distribution in order to spot outliers.

The presence of one or more outliers could be due to an inaccurate or inappropriate model. These outliers could also just be unusual observations, which occur from time to time due to random fluctuation. Importantly, these outliers could indicate data errors; i.e., the observations could have been either incorrectly measured or recorded.319

If a large percentage of studentized residuals had a large absolute value compared to the critical values of the t-distribution, that would call into question the assumption of Normally distributed errors.

DFBETAS:320

Let ^β(i) be the estimated slope of the regression excluding observation i.

Let ^β be the estimated slope from the original regression including all observations.

The numerator of DFBETASi is ^β -

^β(i), the difference between the two estimates of the slope.

Let s(i) be the standard error of the regression excluding observation i.Recall that in matrix form, the variance-covariance matrix of the estimated coefficients is:

Var[^β] = s2(X’X)-1.

The denominator of DFBETASi is an estimate of the standard error of ^β(i):

s(i) √(X’X)-12,2, where X is the design matrix including all of the observations.

DFBETASi = ^ββ - ^ββ(i)/(s(i) √√√√ (X’X)-12,2).3 2 1

318 Assuming the errors are Normally Distributed.319 Data errors are quite common in actuarial work. An actuary should always be very concerned about the quality of the data on which he is relying.320 See Section 7.4.2 of Pindyck and Rubinfeld. Also called Best Fit Parameter Deltas.There are similar DFFITS, which measure the influence of a single observation on its corresponding predicted value of the dependent variable.321 Differs somewhat from Equation 7.17 in Pindyck and Rubinfeld.


For the example, β = 0.3303.

(X’X)-1 = 0.576676 -0.111043

-0.111043 0.0244051

.

Excluding the 13th observation, β(13) = 0.1964, with s(13)2 = 0.4310. DFBETAS13 = (.3303 - .1964)/√(.4310)(.0244051) = 1.306.

DFBETAS are measuring in units of standard errors the change in the estimated slope when a single observation is included rather than excluded from the regression.

Here is a graph of the DFBETAS:322

The DFBETAS corresponding to excluding the 13th observation, (6.7, 5.9), has a much larger absolute value than the others. The 13th observation has a large effect on the estimated slope.

One rule of thumb, is that one should be concerned when the absolute value of a DFBETAS is larger than 2/√N. Since 1.306 > 2/√14 = 0.534, the 13th observation, (6.7, 5.9), has a large effect on the fitted slope.

When the absolute value of a DFBETAS is larger than 2/√√√√N, then the corresponding observation has a large effect on the estimated slope.

In a similar manner as was done here for the two variable regression, one can define DFBETAS for each coefficient in a multiple regression:

DFBETASj(i) = ^βj -

^βj(i)/(s(i) √(X’X)-1jj).

322 Many regression software packages will calculate DFBETAS. I used Mathematica.


Cook’s D:323

Since in deviations form, ^β = ΣxiYi/Σxi2, those observations for which X is far from X have more

effect on the fitted slope. Least squares estimation is sensitive to observations far from the means of the independent variables.

Therefore, an observation is influential in fitting the model if its value of X is far from X and it is also an outlier. One way to spot influential points is via Cook’s D.

Let ^β(i) be the vector of fitted coefficients, excluding observation i.

Let ^Y(i) = X

^β(i), the fitted values, when observation i is excluded from the regression.

The numerator of Cook’s D is the squared distance between the fitted values with and without

the ith observation: ^Y -

^Y(i)’

^Y -

^Y(i).

The denominator of Cook’s D is a scaling factor: k s2.

Di = ^Y - ^Y(i)’ ^Y - ^Y(i)/(k s2).324

Cook’s D is nonnegative.

For the example, for the original regression, the fitted values are:

^Y = 1.60417 + 0.330323X = (2.03359, 2.26482, 2.49604, 2.69424, 2.82637, 2.92546, 3.15669, 3.25579, 3.35488, 3.48701, 3.58611, 3.6852, 3.81733, 3.91643).

With the 13th observation, (6.7, 5.9), excluded from the regression:325

^Y(13) = 2.03133 + 0.196363X = (2.2866, 2.42406, 2.56151, 2.67933, 2.75787, 2.81678, 2.95424, 3.01315, 3.07205, 3.1506, 3.20951, 3.26842, 3.34696, 3.40587).

^Y -

^Y(13)’

^Y -

^Y(13) = (2.03359 - 2.2866)2 + ... + (3.91643 - 3.40587)2 = 1.20087.

k = 2 variables (slope plus intercept).For the original regression, s2 = 0.838197.

D13 = ^Y -

^Y(13)’

^Y -

^Y(13)/(k s2) = 1.20087/(2)(0.838197) = 0.716.

323 See Section 8.3 of Applied Regression Analysis by Draper and Smith, not on the syllabus. 324 See Equation 8.3.1 in Applied Regression Analysis by Draper and Smith. 325 We include a fitted value for X = 6.7, even though this observation was not use to fit the parameters.


Here is a graph of the values of Cook’s D:326

2 3 4 5 6 7X

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Cook’s D

The value of Cook’s D for the 13th observation, (6.7, 5.9), is much larger than the others.The 13 observation is very influential.

In general, a large value of Cook’s D relative to the other values, indicates an influential observation.

Cook’s D may be written in other ways.

Di = ^β -

^β(i)’X’X

^β -

^β(i)/(k s2).327

Di = îε 2 hii /k s2 (1 - hii)2.328

326 Many regression software packages will calculate Cook’s D. I used Mathematica.327 See Equation 8.3.2 in Applied Regression Analysis by Draper and Smith. 328 See Equation 8.3.3 in Applied Regression Analysis by Draper and Smith. hii is the diagonal element of the hat matrix.


Problems:

Use a regression software package and the following information for the next 6 questions:One has the following 21 observations, (Xi, Yi):(15, 95), (26, 71), (10, 83), (9, 91), (15, 102), (20, 87), (18, 93), (11, 100), (8, 104), (20, 94), (7, 113), (9, 96), (10, 83), (11, 84), (11, 102), (10, 100), (12, 105), (42, 57), (17, 121), (11, 86), (10, 100).

39.1 (2 points) Draw a scatterplot of this data.

39.2 (2 points) Fit a least squares regression: Y = α + βX + ε.Add the fitted line to your scatterplot.

39.3 (2 points) Graph the residuals of this regression.

39.4 (2 points) Graph the studentized residuals of this regression.

39.5 (2 points) Graph the DFBETAS of this regression.

39.6 (2 points) Graph the values of Cook’s D of this regression.


Section 40, Stepwise Regression*329

In model building, it is important to choose which independent variables to include. On the one hand, we do not want to exclude important explanatory variables. On the other hand, we wish to keep the model as simple as possible.

Stepwise regression is a technique that can be employed when there are many possible explanatory variables.

One chooses one variable to use first, usually the independent variable with the largest absolute value of its correlation with the dependent variable. Then at each stage one adds the independent variable with the largest absolute value of its partial correlation coefficient with respect to all of the variables already included in the model.

At each stage, one could use the F-Test to decide whether a variable should be eliminated.One would proceed to add (and possibly eliminate) variables step by step, until no more

improvement in 2

R is possible. Then the final model that results contains the set of independent variables that “work best”.

It should be noted that since one is choosing variables from a large set, specifically to get a better fit, one can not then apply the t-test and F-test to the result of the stepwise regression process in order to test hypotheses in the usual manner. However, if one then collects additional similar data, one can fit to this additional data a model of the form determined to be best by stepwise regression, and then apply the t-test and F-test in order to test hypotheses in the usual manner.

329 See page 101 of Pindyck and Rubinfeld. For a more complete explanation, see for example Section 15.2 of Applied Regression Analysis by Draper and Smith.


Section 41, Stochastic Explanatory Variables*330

For example let us assume one regresses the number of claims for each private passenger automobile insured one year versus the number of claims the subsequent year.331 Then both Xi, the number of claims this year, and Yi, the number of claims next year, are stochastic variables; i.e., random variables.

Examples of stochastic variables in insurance include: number of claims, frequency, severity, aggregate loss, loss ratio, and pure premium.

In the classical linear regression model, it was assumed that the independent variable(s) were deterministic rather than random. If we relax this assumption and allow for stochastic independent variables, then provided we make some additional assumptions, many results still hold.

Assume in addition that: 1. The distribution of each independent variable is independent of the true regression parameters.2. The distribution of each independent variable is independent of the errors of the model.

Then conditional on the given values of the independent variables, all the basic properties of the least squares estimators continue to hold. For example, the estimates of Ordinary Least Squares are conditionally unbiased. However, looked at unconditionally not all these properties continue to hold.

Unconditionally, the Ordinary Least Squares estimator is no longer unbiased, although it is asymptotically unbiased. Ordinary Least Squares is still consistent. Ordinary Least Squares is asymptotically efficient (for very large sample sizes it has the smallest mean squared error.) The least squares estimators are the maximum likelihood estimators of the regression parameters.

Deterministic Independent Variables Stochastic Independent Variables

OLS unbiased OLS asymptotically unbiased

OLS consistent OLS consistent

OLS efficient OLS asymptotically efficient

Not Applicable OLS ⇔ maximum likelihood estimates

330 See Section 5.5 of Pindyck and Rubinfeld.331 See for example, “A Graphical Illustration of Experience Rating Credibilities,” by Howard C. Mahler, PCAS 1998.


Section 42, Generalized Least Squares*332

Generalized Least Squares is a generalization of ordinary least squares regression.Weighted Least Squares, discussed previously, is a special case of Generalized Least Squares.

Variance-Covariance Matrix of the Errors:

Assume we have four observations, then the variance-covariance matrix of the errors is:

(Var[ε1] Cov[ε1, ε2] Cov[ε1, ε3] Cov[ε1, ε4])

(Cov[ε2, ε1] Var[ε2] Cov[ε2, ε3] Cov[ε2, ε4])

(Cov[ε3, ε1] Cov[ε3, ε2] Var[ε3] Cov[ε3, ε4])

(Cov[ε4, ε1] Cov[ε4, ε2] Cov[ε4, ε3] Var[ε4])

With N observations, the variance-covariance matrix is an N by N symmetric matrix with entries Cov[εi, εj].

333 In certain cases discussed previously, this variance-covariance matrix of the errors has a special form.

Classical Linear Regression Model:

In the classical linear regression model, among the assumptions are that: Var[εi] = σ2 for all i (homoscedasticity), and εi and εj independent for i ≠ j.

Therefore, Cov[εi, εi] = σ2 and Cov[εi, εj] = 0 for i ≠ j. ⇒ Cov[εi, εj] = σ2I.

For example, if σ = 3, then for the classical linear regression model, with 4 observations, the variance-covariance matrix of the errors is:

(9 0 0 0)(0 9 0 0)(0 0 9 0)(0 0 0 9)

Heteroscedasticity:

As discussed previously, in the case of heteroscedasticity Var[εi] = σi2, however we still assumed independent errors. Therefore, the variance-covariance matrix of the errors is still diagonal, but the entries along the diagonal are not all equal.

332 See Appendix 6.1 of Pindyck and Rubinfeld, not on the Syllabus.333 Note that the dimension of the variance-covariance matrix only depends on the number of observations, not the number of independent variables.


As an example of heteroscedasticity with 4 observations, the variance-covariance matrix of the errors could be:

(9 0 0 0)(0 11 0 0)(0 0 14 0)(0 0 0 20)

Serial Correlation:

As discussed previously, when there is serial correlation, the errors are no longer independent. In the case of first order serial correlation, Cov[εi, εj] = σ2ρ|i - j|, -1 < ρ < 1.

For four observations, the variance-covariance matrix would look like:

(1 ρ ρ2 ρ3)(ρ 1 ρ ρ2) σ2.(ρ2 ρ 1 ρ)(ρ3 ρ2 ρ 1)

General Case:

These are three important special cases of a general situation in which the variance-covariance matrix of the errors, σ2Ω, can be any positive definite matrix.334 We assume Ω is known, while σ2 is usually unknown. We usually given the covariance matrix up to a proportionality constant.

If X is the design matrix and Y is the vector of observations of the dependent variable,then the fitted parameters using Generalized Least Squares are:335

~β = (X’Ω−1X)-1 X’Ω−1Y.

The variance-covariance matrix of the fitted parameters is:336

Cov[~β ] = σ2(X’Ω−1X)-1.

An unbiased estimator of σ2 is given by:337

( ε ‘ Ω−1 ε )/(N-k).

334 A positive definite matrix is a symmetric, square matrix such that xTAx >0 for every x ≠ 0. All variance-covariances matrices are positive definite.335 See Equation A6.8 in Pindyck and Rubinfeld. In the classical regression model this reduces to the fitted

parameters being: (X’X)-1 X’Y, as discussed previously.336 See Equation A6.9 in Pindyck and Rubinfeld. In the classical regression model this reduces to the

variance-covariance matrix of the fitted parameters being: σ2(X’X)-1, as discussed previously.337 See Equation A6.10 in Pindyck and Rubinfeld. In the classical regression model this reduces to ( ε ‘ ε )/(N-k).


Unfortunately without assumptions as to the form of Ω, one can not estimate Ω solely from the observations.

Analysis of Variance, GLS:*

For Generalized Least Squares, TSS = Y’Ω−1Y, RSS = Y’Ω−1X(X’Ω−1X)-1X’Ω−1Y, andESS = TSS - RSS = Y’Ω−1Y - Y’Ω−1X(X’Ω−1X)-1X’Ω−1Y.

OLS versus GLS:

If Ω = σ2I, then Generalized Least Squares (GLS) reduces to Ordinary Least Squares (OLS).

If Ω is not σ2I, then the use of Ordinary Least Squares rather than Generalized Least Squares would result in unbiased but inefficient estimates; the Generalized Least Squares estimates would have a smaller variance than the Ordinary Least Squares estimates. In addition, Ordinary Least Squares would result in a biased estimate of the variance-covariance matrix.

Transformation of Equations:*

Let Ω be proportional to the variance-covariance matrix. Then since the variance-covariance matrix is positive definite, so is Ω. Therefore, there exists a matrix H, such that H’H = Ω-1.338 339 In the derivation of the equations for Generalized Least Squares, the original equation is transformed by multiplying everything by this matrix H.

If one has heteroscedasticity, then if for example we have only three observations:

(σ12 0 0) (1/σ1 0 0)

Ω = (0 σ22 0) H = (0 1/σ2 0)

(0 0 σ32) (0 0 1/σ3)

If one has first order serial correlation, then if for example we have only three observations:

(1 ρ ρ2) (√(1-ρ2) 0 0)Ω = (ρ 1 ρ) H = (−ρ 1 0)/√(1-ρ2)

(ρ2 ρ 1) (0 −ρ 1)

338 H’ is the transpose of H; H’ has the rows and columns of H reversed.339 The Choleski Square Root Method is one method of getting C, such that C C’ = Ω. Then if H = C-1, H’H = Ω-1. See for example Fundamentals of Numerical Analysis, by Stephen J. Kellison. The Choleski Square Root Method applies to any positive definite matrix.


Problems:

42.1 (2 points) If N = 5, σ2 = 100 and ρ = .6, with first order serial correlation, what is the covariance matrix of the errors?

Use the following information for the next three questions:(i) Yi = α + βXi + εi

Var(εi) = (Xi/2)2

(ii) i Xi Yi1 1 82 2 53 3 34 4 –4

42.2 (2 points) What is the covariance matrix of the errors?

42.3 (3 points) Use the methods of generalized least squares to fit this model.

42.4 (2 points) What are the variances and covariances of the fitted parameters?

Use the following information for the next three questions:(i) Yi = α + βXi + εi(ii) i Xi Yi

1 1 32 4 83 9 15

(iii) The covariance matrix of the errors is σ2Ω, where σ2 is unknown and ( 10 15 20) (119 -34 -17)

Ω = ( 15 40 25) Ω−1 = (-34 20 2 ) / 340( 20 25 90) (-17 2 7 )

42.5 (4 points) Use the methods of generalized least squares to fit this model.

42.6 (3 points) Estimate σ2.

42.7 (3 points) What are the variances and covariances of the fitted parameters?


Section 43, Nonlinear Estimation340

We previously discussed linear models and models that can be transformed into linear models by changing variables.

Exercise: Fit via least squares lnY = βX to the following three observations.X: 1 2 4Y: 2 3 5

[Solution: ^β = Σ Xi lnYi / Σ Xi2 = 9.32812/21 = 0.44420.]

However, there are other models that are inherently nonlinear. It is harder to estimate the parameters for such nonlinear models, than for linear models.

Examples of nonlinear models:Y = α0 + α1X1

β1 + α2X2β2 + ε.

Y = α1 exp[X1β1] + α2 exp[X2β2] + ε.

One way to fit a nonlinear model is by minimizing the squared errors.341

Determining the fitted parameters for a nonlinear model is more difficult than for the linear case.

Sum of Squared Errors:

For the model Y = eβX, the sum of squared errors is: Σ (Yi - exp[bXi])2.

Exercise: For the model Y = eβX, determine the sum of squared errors for β = 0.5 and the following three observations.X: 1 2 4Y: 2 3 5[Solution: (2 - e.5)2 + (3 - e1)2 + (5 - e2)2 = 5.910.]

340 See Section 10.1 in Pindyck and Rubinfeld.341 Another technique is via maximum likelihood, which is covered on Exam 4/C. See Section 10.2 in Pindyck and Rubinfeld, not on the syllabus of this exam.


Here is a graph of the sum of squared errors as a function of β:

0.35 0.4 0.45 0.5

1

2

3

4

5

6

Solving numerically, the smallest sum of squared errors corresponds to β = 0.411867.

Note this does not match β = 0.44420, the least squares fit for lnY = βX obtained previously.One can convert Y = eβX to a linear form, lnY = βX, by taking logs of both sides. However, minimizing the squared errors of this linear model is not equivalent to minimizing the squared errors of the original nonlinear model.

Normal Equations:

Just as in the linear case, one can write down a series of equations to solve for the least squares parameters, the Normal Equations, by setting the partial derivatives of the sum of squared errors equal to zero. For this example, with the model Y = eβX, and the dataX: 1 2 4Y: 2 3 5

Sum of Squared Errors = S = Σ(Yi - exp[-βXi])2.

In order to minimize S, we set its derivative with respect to β equal to zero:342

0 = ∂S/∂β = 2Σ(Yi - exp[-βXi])Xiexp[-βXi] = 2(2 - eβ)eβ + (3 - e2β)e2β + (5 - e4β)e4β.

⇒ 4e7β - 18e3β - 5eβ - 2 = 0.

342 In general, one would set equal to zero the partial derivative with respect to each of the parameters.There would be as many Normal Equations as there are parameters in the model.


Graphing the righthand side of this Normal Equation:

0.35 0.4 0.45 0.5

-20

-10

10

20

30

40

Solving numerically, this Normal Equation is satisfied by β = 0.411867, matching the previous result.

Iterative Linearization Method:

Another method of solution involves approximating the nonlinear model by a linear model.343

For example, let us continue to work with the nonlinear model: Y = eβX.

Let f(X) = ebX.∂f/∂b = X ebX.

Then we have the linear approximation: Y - f(X) ≅ (β - b)∂f/∂b = (β - b) X ebX.

For three observations, we have the following 3 linear equations: Yi - f(Xi) = (β - b)Xi exp[bXi]. ⇒Yi - exp[bXi] + bXi exp[bXi] = βXi exp[bXi].

We can treat the lefthand side of the above equation as a constructed dependent variable, and the portion of the righthand side multiplying β as a constructed independent variable.In other words, we can view this as the linear equation without intercept: Vi = βUi.

If for example, we take b = 0.5, then the constructed dependent variable is:Vi = Yi - exp[bXi] + bXi exp[bXi] = Yi + (.5Xi - 1)exp[.5Xi], while the constructed independent variable is: Ui = Xi exp[bXi] = Xi exp[.5Xi].

343 This is similar to the idea behind the delta method, covered on Exam 4/C.


For example, with b = 0.5 for X = 4 and Y = 5, the constructed independent variable is: 5 + (2 - 1)e2 = 12.3891.For X = 4 and Y = 5, the constructed dependent variable is: 4e2 = 29.5562.

X Y Constructed Independent Variable Constructed Dependent Variable

1 2 1.6487 1.17562 3 5.4366 3.00004 5 29.5562 12.3891

One can preform a linear regression with no intercept on the constructed variables.^β = (1.6487)(1.1756) + (5.4366)(3) + (29.5562)(12.3891)/(1.64872 + 5.43662 + 29.55622) =384.42/905.83 = 0.424. Now we can iterate, taking 0.424 as the new value of b.

Exercise: For b = 0.424, construct the dependent and independent variables.Then perform a linear regression with no intercept on the constructed variables.[Solution: For X = 4 and Y = 5, the constructed independent variable is: 5 + ((4)(.424) - 1)e(4)(.424) = 8.7947.

X Y Constructed Independent Variable Constructed Dependent Variable

1 2 1.5281 1.11982 3 4.6699 2.64514 5 21.8084 8.7947

^β = (1.5281)(1.1198) + (4.6699)(2.6451) + (21.8084)(8.7947)/ (1.52812 + 4.66992 + 21.80842) = 205.86/499.75 = .412.]

To three decimal places, we have converged to the previously determined value of β that minimizes the squared errors.344

In general, one would have a nonlinear model with k independent variables and p parameters:Y = f(X1, X2, ..., Xk; β1, β2, ..., βp), which we want to fit by least squares.345

Given initial guesses, β1,0, β2,0, ..., βp,0, we construct the dependent variable:

Y - f(X1, X2, ..., Xk; β1,0, β2,0, ..., βp,0) + Σ βi,0 (∂f/∂βi)0,

and the p independent variables: (∂f/∂βi)0.

Then we solve for least squares values of the coefficients βi.

These solutions are used as the next guesses, βi,1.

We iterate until there is convergence.346

344 If more accuracy was desired one could perform another iteration.345 See page 268 of Pindyck and Rubinfeld.346 In general, this iterative linearization method may or may not converge.Convergence may depend on making a good choice of initial values for the parameters.


A More Complicated Example:*347

We are given the following 44 observations:

X 0 0 2 2 2 2 4 4 4 4 6 Y 49 49 48 47 48 47 46 46 45 43 45

X 6 6 8 8 8 10 10 12 12 12 14 Y 43 43 44 43 43 46 45 42 42 43 41

X 14 14 16 16 16 18 18 18 20 20 22 Y 41 40 42 40 40 41 40 41 41 40 40

X 22 22 24 24 26 28 28 30 30 32 34Y 40 38 41 40 40 41 38 40 40 39 39

We wish to fit via least squares the nonlinear model: Y = α + (49 - α) e-βX + ε.

The sum of squared errors is: S = Σ(Yi - α - (49 - α) exp[-βXi])2.348

Given the data, S is a function of the parameters α and β.We wish to find the values of α and β that minimize S.

Here is a graph of the sum of squared errors, as a function of α and β:

30

35

40

45

50

alpha

0

0.05

0.1

0.15

0.2

beta

0

1000

2000

3000

30

35

40

45

50

alpha

347 Based on Example 24.3 in Applied Regression Analysis, by Draper and Smith.348 S is just the sum of the squares of the residuals, which in a linear regression context would be called ESS.


Many software packages will numerically minimize functions such as this one.349 Given a reasonable starting place, the computer program will search for a minimum.350 In this case, the values: α = 39.0140, and β = 0.101633, produce a minimum of 50.0168.

Exercise: Use the least squares fit to predict the value of Y when X = 15.[Solution: Y = 39.014 + (49 - 39.014) e-0.101633X = 39.014 + 9.986 exp[-(0.101633)(15)] = 41.188.]

Sum of Squared Errors = S = Σ(Yi - α - (49 - α) exp[-βXi])2. The Normal Equations are:

0 = ∂S/∂α = 2Σ(Yi - α - (49 - α) exp[-βXi])(-1 + exp[-βXi]).

0 = ∂S/∂β = 2Σ(Yi - α - (49 - α) exp[-βXi])(49 - α)Xi exp[-βXi].

These two equations become:Σ(Yi - α - (49 - α) exp[-βXi])(exp[-βXi] - 1) = 0.

Σ(Yi - α - (49 - α) exp[-βXi])Xi exp[-βXi] = 0.One can rewrite these two equations as:α = -Σ(Yi - 49 exp[-βXi])(exp[-βXi] - 1) / Σ(exp[-βXi] - 1)2.

α = -Σ(Yi - 49 exp[-βXi])Xi exp[-βXi] / Σ Xi exp[-βXi] (exp[-βXi] - 1).

One could eliminate α and numerically solve for β. For example, one could graph the difference of the righthand sides of these equations; where this difference is zero is the fitted value of β, about 0.102.

0.06 0.08 0.1 0.12 0.14beta

-0.4

-0.2

0.2

Then using either of the above equations, α is about 39.0.

349 I used the function FindMinimum in Mathematica.350 One has to beware of finding a local rather than a global minimum.The more parameters one is trying to fit, the harder it becomes to find a global minimum.


We can also use the iterative linearization method, in order to determine the values of α and β that minimize the squared errors.

Let f(X) = a + (49 - a) e-bX.∂f/∂a = 1 - e-bX.∂f/∂b = - (49 - a) X e-bX.

Then we have Y - f(X) ≅ (α - a)∂f/∂a + (β - b)∂f/∂b = (α - a)(1 - e-bX) - (β - b) (49 - a) X e-bX.

Thus we have 44 linear equations: Yi - f(Xi) = (α - a)(1 - exp[-bXi]) + (β - b) (a - 49) Xi exp[-bXi]. ⇔Yi - f(Xi) + a(1 - exp[-bXi]) + b(a - 49) Xi exp[-bXi] = α(1 - exp[-bXi]) + β(a - 49) Xi exp[-bXi].

The lefthand side is the constructed dependent variable:Yi - f(Xi) + a(1 - exp[-bXi]) + b(a - 49) Xi exp[-bXi]=Yi - a - (49 - a) exp[-bXi] + a(1 - exp[-bXi]) + b(a - 49) Xi exp[-bXi]= Yi - 49 exp[-bXi] + (a - 49) bXi exp[-bXi].

The righthand side is α times the first constructed independent variable: (1 - exp[-bXi]),

plus β times the second constructed independent variable: (a - 49) Xi exp[-bXi].

For example, given starting values (guesses) of a = 10 and b = 0.2, for X = 0 and Y = 49,the value of the constructed dependent variable is: 49 - 49 + (10 - 49)(.2)(0) = 0.The value of the first constructed independent variable is: 1 - 1 = 0.The value of the second constructed independent variable is: (10 - 49)0 = 0.

Exercise: Given starting values (guesses) of a = 10 and b = 0.2, for X = 34 and Y = 39,determine the value of the constructed variables.[Solution: The value of the constructed dependent variable is: 39 - 49exp[-(.2)(34)] + (10 - 49)(.2)(34)exp[-(.2)(34)] = -38.650.The value of the first constructed independent variable is: 1 - exp[-(.2)(34)] = .99889.The value of the second constructed independent variable is:(10 - 49)34exp[-(.2)(34)] = -1.4769.]

Let Z be the 44 by 2 matrix, with rows given by the values of the constructed independent variables: 1 - exp[-bXi], (a - 49) Xi exp[-bXi].

(0 0)Z = (... ...)

(0.99889 -1.4769)

Let the constructed dependent variable be the vector:Vi = Yi - 49 exp[-bXi]) + (a - 49) bXi exp[-bXi] = (0, ..., -38.650).


Then these equations in matrix form are: V = Zα

β

.

The solution to these matrix equations is:(Z’Z)-1Z’V = (39.7201, .1729796).

Thus the new values of α and β are: α1 = 39.7201, and β1 = .1729796.

Continuing in this manner one gets a series of values of the parameters:(39.5131, 0.0931486), (39.0293, 0.10189), (39.0142, 0.101638), (39.0140, 0.101633).The process has converged to the same fitted values α = 39.0140, and β = 0.101633, obtained previously.

In general, this iterative linearization method may or may not converge.

Damping Factors:*

It can sometimes help convergence to use a damping factor, 0 < d < 1, so that when one updates at each iteration, one only adds d times the computed difference:

For example, in the previous example, the first computed difference was (29.7201, -0.0270204). With a damping factor of d = .7, the new values of the parameters would be: (10, .2) + (.7)(29.7201, -0.0270204) = (30.8041, 0.181086). Then we would iterate as before, except at each stage only adding 0.7 times the computed difference to get the new values of the parameters.


Steepest Descent Method:*

As with the iterative linearization method, one would start with a set of initial values for the parameters. As before, S = the sum of squared errors. Define the gradient vector of S as(∂S/∂β1, ..., ∂S/∂βn). In the steepest descent method, at each stage one moves in the direction of minus the gradient vector. One determines the distance to move in this direction that produces the smallest value of S. One then iterates, using these new values of the parameters.

In this example, the gradient vector of S is:(∂S/∂α, ∂S/∂β) =

(2Σ(Yi - α - (49 - α) exp[-βXi])(-1 + exp[-βXi]), 2Σ(Yi - α - (49 - α) exp[-βXi])(49 - α)Xi exp[-βXi]).

For example, for an initial values of α = 40 and β = 0.1, the gradient vector of S is: (45.5366, -1437.91).

This gradient vector points in the direction of steepest assent of S at α = 40 and β = 0.1.Minus this gradient vector points in the direction of steepest descent; this direction is:(-45.5366, 1437.91)/ √(45.53662 + 1437.912) = (-0.0316527, 0.999499).

Therefore, the new values of a and b will be: (40 , 0.1) + λ(-0.0316527, 0.999499), where λ is chosen to minimize the value of S along that line. In this case, that value is λ = 0.02935, which results in S = 55.9276 rather than S = 73.7519.

The new values of the parameters are: α = 40 + (0.02935)(-0.0316527) = 39.9991, and β = 0.1 + (0.02935)(0.999499) = 0.129335.

The gradient vector at α = 39.9991 and β = 0.129335 is: (13.6431, 0.504235).Therefore, the steepest decent is in the direction: (-13.6431, -0.504235)/√(13.64312 + 0.5042352) = (-0.999318, -0.0369338).

It turns out that λ = 0.6699 minimizes S.The new values of the parameters are: α = 39.9991+ (0.6699)(-0.999318) = 39.3297, and β = 0.129335 + (0.6699)(-0.0369338) = 0.104815.

We would continue in this manner, until we converged on the minimum.351

351 The steepest descent method can take a large number of iterations to converge. It can be very important to choose a good starting point.See for example, Numerical Recipes, The Art of Scientific Computing, by Press, et. al.


R2:

One can compute R2 for a fitted nonlinear model in the same manner as one did for a fitted linear model.

For this example, the fitted parameters were: α = 39.0140 and β = 0.101633

^Y = 39.0140 + 9.9860 e-0.101633X.

X = (0, 0, 2, 2, 2, 2, 4, 4, 4, 4, 6, 6, 6, 8, 8, 8, 10, 10, 12, 12, 12, 14, 14, 14, 16, 16, 16, 18, 18, 18, 20, 20, 22, 22, 22, 24, 24, 26, 28, 28, 30, 30, 32, 34).

Y = (49, 49, 48, 47, 48, 47, 46, 46, 45, 43, 45, 43, 43, 44, 43, 43, 46, 45, 42, 42, 43, 41, 41, 40, 42, 40, 40, 41, 40, 41, 41, 40, 40, 40, 38, 41, 40, 40, 41, 38, 40, 40, 39, 39).

^Y = (49, 49, 47.1632, 47.1632, 47.1632, 47.1632, 45.6642, 45.6642, 45.6642, 45.6642, 44.441, 44.441, 44.441, 43.4428, 43.4428, 43.4428, 42.6281, 42.6281, 41.9634, 41.9634, 41.9634, 41.4209, 41.4209, 41.4209, 40.9781, 40.9781, 40.9781, 40.6169, 40.6169, 40.6169, 40.322, 40.322, 40.0814, 40.0814, 40.0814, 39.8851, 39.8851, 39.7249, 39.5941, 39.5941, 39.4874, 39.4874, 39.4003, 39.3293).

Sum of Squared Differences is: (49 - 49)2 + ... + (39 - 39.3293)2 = 50.02.

Y = 42.5. Σ(Yi - Y )2 = 395.

R2 = 1 - 50.02/395 = 87.3.

Evaluating the Nonlinear Fit:*

Unlike the linear case, the residuals can not be used to get an unbiased estimator of the variance of a nonlinear model. However, one can approximate this variance as that of the linear regression at the final stage of the Iterative Linearization Method.

One could not directly apply the t-test or F-Test to the fitted parameters of a nonlinear model.However, one could examine the results of such tests for the linear regression at the final stage of the Iterative Linearization Method.

Mean Squared Errors of Forecasts:*

One can not use the formula for the mean squared of error of forecasts, that applied for the linear model. However, once can use the standard errors of the parameters in the linear regression at the final stage of the Iterative Linearization Method, together with simulation, in order to estimate the mean squared of error of forecasts for the nonlinear model.


Problems:

43 .1 (2 points) Write down the Normal Equations for the model Y = a + bXc.

Use the following data for the next 9 questions:X 1 2 3 4 5 6 7 8 9 10 11Y 920 228 98 51 36 25 19 14 11 9 8

43.2 (4 points) Convert the model Y = a/Xb to a linear model. Fit this linear model to the above data via least squares linear regression.

43.3 (2 points) Write down the sum of squared errors function S for the nonlinear model Y = a/Xb. Write down the Normal Equations for this model.

43.4 (3 points) Use a computer to fit via least squares the nonlinear model, Y = a/Xb.

43.5 (3 points) Determine R2 for the fitted model in the previous question.

43.6 (8 points) For the initial values a = 900 and b = 2, for the nonlinear model Y = a/Xb,for the iterative linearization method, determine the values of the constructed variables, and then determine the resulting values of the parameters from the first iteration.

43.7 (2 points) Write down the sum of squared errors function S for the nonlinear model Y = a/(X + c)b. Write down the Normal Equations for this model.

43.8 (4 points) Use a computer to fit via least squares the nonlinear model, Y = a/(X + c)b.

43.9 (5 points) Verify that the Normal Equations are satisfied at the fitted parameters for the model Y = a/(X + c)b.

43.10 (3 points) Determine R2 for the fitted least squares model, Y = a/(X + c)b.


Use the following information for the next 4 questions:X 0 1 2Y 1.1 0.7 0.3The model is Y = 1/(α + X) + ε. You use the iterative linearization method to obtain a nonlinear least-square estimate of α. The initial value of α is α0 = 1. 43.11 (2 points) Determine the value of the constructed dependent variable in the first iteration.

43.12 (1 point) Determine the value of the constructed independent variable in the first iteration.

43.13 (1 point) Determine the estimate of α that results from the first iteration.

43.14 (4 points) Determine the estimate of α that results from the second iteration.

43.15 (VEE-Applied Statistics Exam, 8/05, Q.10) (2.5 points) You are given: (i) The model is Y = eβX + ε. (ii) You use the iterative linearization method to obtain a nonlinear least-square estimate of β. (iii) The initial value of β is β0 = 0.1. Determine the value of the constructed dependent variable in the first iteration when Y = 11.7 and X = 25. (A) 12 (B) 18 (C) 24 (D) 30 (E) 36

43.16 (2 points) In the previous question, determine the value of the constructed independent variable in the first iteration when Y = 11.7 and X = 25.(A) 120 (B) 180 (C) 240 (D) 300 (E) 360


Mahler’s Guide to

Regression

Sections 44-45:

44 *Generalized Linear Models45 Important Ideas and Formulas

VEE-Applied Statistical Methods Exam prepared by


Study Aid F06-Reg-K


Section 44, Generalized Linear Models*352

For the generalized linear model:353

A parametric distribution has mean µ, and θ a vector of additional parameters. µ and θ do not depend on each other. z is the vector of covariates for an individual.β is the vector of coefficients. η(µ) and c(y) are functions.354 X is the random variable.355 F(x | θ , β) = F(x | µ , θ),where µ is such that ηηηη(µµµµ) = c(ΣΣΣΣ ββββizi). Generalized Linear Models are fit by maximum likelihood.

Ordinary Linear Regression:

For ordinary linear regression, Y has a Normal Distribution with parameters µ = µ and θ = σ, and both η and c are the identity function. µ = E[Y] = Σ βixi.

The ordinary linear regression model is a special case of the generalized linear model.

In the case of Normally distributed errors, fitting by least squares is equivalent to fitting via maximum likelihood.

Generalized Linear Models allow other forms of the error than Normal, do not require homoscedasticity,356 and allow more forms of relationships between the covariates and the dependent variable.

352 See Section 12.7.3 of Loss Models, not on the syllabus of this exam.353 See Definition 12.70 in Loss Models. This is more general than the usual definition: g(µ) = Σ βizi, where g is the link function.354 η is called the link function. Common examples of link functions are listed below.In the applications of the Generalized Linear Model you are likely to read about, c is the identity function.355 We would assume a specific distribution form for X, for example Normal, Exponential, Poisson, Gamma, etc.356 Homoscedasticity refers to the assumption that Var[εi] = σ2 for all i.

HCMSA-F06-Reg-K, Mahler’s Guide to Regression, 7/12/06, Page 386

A One Variable Example of Linear Regression:

Assume we have the following set of three observations: (1, 1), (2, 2), (3, 9).For example, for x = 3, we observe the value y = 9.We assume Y = β0 + β1 X + ε.357 Then we could fit a linear regression by minimizing the sum of the squared errors.

The predicted values are ^Y = (β0 + β1, β0 + 2β1, β0 + 3β1).

The errors are: 1 - (β0 + β1), 2 - (β0 + 2β1), 9 - (β0 + 3β1).

The sum of the squared errors is: (1 - β0 - β1)2 + (2 - β0 - 2β1)2 + (9 - β0 - 3β1)2.

We can minimize the sum of the squared errors by setting its partial derivatives equal to zero:0 = 2(1 - β0 - β1) + 2(2 - β0 - 2β1) + 2(9 - β0 - 3β1). ⇒ 24 = 6β0 + 12β1. ⇒ 4 = β0 + 2β1.

0 = 2(1 - β0 - β1) + 4(2 - β0 - 2β1) + 6(9 - β0 - 3β1). ⇒ 64 = 12β0 + 28β1. ⇒ 16 = 3β0 + 7β1.

Solving these two equations in two unknowns: β0 = -4 and β1 = 4.

Thus the least squares line is: ^Y = -4 + 4X.

The predicted values of Y for X = 1, 2, 3 are: ^Y = (0, 4, 8).

Alternately, one can calculate the regression in deviations form.X = (1 + 2 + 3)/3 = 2. x = X - X = (-1, 0, 1). Y = (1 + 2 + 9)/3 = 4. y = Y - Y = (-3, -2, 5). ^β1 = Σxiyi /Σxi2 = (-1)(-3) + (0)(-2) + (1)(5)/(-1)2 + (0)2 + (1)2 = 8/2 = 4.

^β0 = Y −

^β1 X = 4 - (4)(2) = -4.

357 β1 is the slope and β0 is the intercept.


A One Dimensional Example of Generalized Linear Models:358

Let us assume the same set of three observations: (1, 1), (2, 2), (3, 9).We will now call the independent variable the covariate z, which takes on the values 1, 2, and 3 for the observations.359 While in a generalized linear model, one would call the dependent variable X, I will continue to call it Y in order to avoid confusion with the previous example.

In a generalized linear model, Y will have some distributional form. The mean of the distribution will vary with z. However, any other parameters will be constant.I will take what Loss Models refers to as c as the identity function. η(µ) = Σβizi. ⇔ µ = η−1(Σβizi ).

For now let us assume the identity link ratio, η(µ) = µ, so that µ = Σβizi = β0 + β1 z.360 Thus for now we are fitting a straight line.For the first observation, z = 1, µ = β0 + β1, and y = 1.

For the second observation, z = 2, µ = β0 + 2β1, and y = 2.

For the third observation, z = 3, µ = β0 + 3β1, and y = 9. We will do the fitting via maximum likelihood.Which line we get, depends on the distributional form we assume for Y.

Assume that Y is Normal, with mean µ and standard deviation σ.µ = β0 + β1 z, while σ is the same for all z.

For the Normal Distribution, f(y) = exp[-.5(y - µ)2/σ2]/σ√(2π).ln f(y) = -.5(y - µ)2/σ2 - .5ln(2π) - ln(σ).

The loglikelihood is the sum of the contributions from the three observations:-.5(1 - (β0 + β1))2/σ2 - .5ln(2π) - ln(σ) + -.5(2 - (β0 + 2β1))2/σ2 - .5ln(2π) - ln(σ)

+ -.5(9 - (β0 + 3β1))2/σ2 - .5ln(2π) - ln(σ) =

(-.5/σ2)(1 - (β0 + β1))2 + (2 - (β0 + 2β1))2 + (9 - (β0 + 3β1))2 - 3ln(σ) - 1.5ln(2π).

To maximize the loglikelihood, we set its partial derivatives equal to zero.Setting the partial derivative with respect to β0 equal to zero:

0 = (1/σ2)(1 - (β0 + β1)) + (2 - (β0 + 2β1)) + (9 - (β0 + 3β1)). ⇒ 12 = 3β0 + 6β1. ⇒ 4 = β0 + 2β1.

Setting the partial derivative with respect to β1 equal to zero:

0 = (1/σ2)(1 - (β0 + β1)) + 2(2 - (β0 + 2β1)) + 3(9 - (β0 + 3β1)).

⇒ 32 = 6β0 + 14β1. ⇒ 16 = 3β0 + 7β1.358 See page 15 of “A Practitioners Guide to Generalized Linear Models,” by Duncan Anderson, Sholom Feldblum, Claudine Modlin, Dora Schirmacher, Ernesto Schirmacher and Neeza Thandi, in the 2004 CAS Discussion Paper Program.359 It is not clear in this example whether z can take on values other than 1, 2 and 3. These may be the only possible values, or they might be the three values for which we happen to have had an observation. In practical applications, when z is discrete, we would expect to have many observations for each value of z. 360 I have treated z0 as the constant 1 and z1 as the covariate z.


Solving these two equations in two unknowns: β0 = -4 and β1 = 4.

µ = -4 + 4z. This should be interpreted as follows. For a given value of z, Y is Normally Distributed with mean = -4 + 4z and standard deviation independent of z. For example, for z = 3, the mean = 8.Thus for z = 3, the expected value of Y is 8. However, due to random fluctuation, for z = 3 we will observe values of Y varying around the expected value of 8. If we make a very large number of observations of individuals with z = 3, then we expect to observe a Normal Distribution of outcomes with mean 8 and standard deviation σ.361

The fitted line is: ^Y = -4 + 4z.

This is the exact same result as obtained previously for linear regression.For ordinary linear regression, Y has a Normal Distribution with parameters µ = µ and θ = σ, and both η and c are the identity function. µ = E[Y] = Σ βixi. As stated earlier, the ordinary linear regression model is a special case of the generalized linear model.

One could solve for σ, by setting the partial derivative of the loglikelihood with respect to σ equal to zero:0 = (1/σ3)(1 - (β0 + β1))2 + (2 - (β0 + 2β1))2 + (9 - (β0 + 3β1))2 - 3/σ.

⇒ 3σ2 = (1 - (β0 + β1))2 + (2 - (β0 + 2β1))2 + (9 - (β0 + 3β1))2 = 12 + (-2)2 + 12 = 6.

⇒ σ2 = 2. ⇒ σ = √2.

In the linear regression version of this same example, one would estimate the variance of the

regression as: s2 = Σ îε 2 / (N - 2) = (1 - 0)2 + (2 - 4)2 + (9 - 8)2/(3 - 2) = 6.362 This is an unbiased

estimate of σ2, which is not equal to that from maximum likelihood which is biased.

In general, for N observations the estimate of σ2 from maximum likelihood will be: Σ îε 2 / N.

For large N this is very close to s2 = Σ îε 2 / (N - 2).

The maximum likelihood estimator is asymptotically unbiased.

Gamma Distribution:* For this example, instead assume that Y is Gamma, with mean µ and shape parameter α.363 µ = β0 + β1 z, while α is the same for all z.

For the Gamma Distribution as per Loss Models, f(y) = θ−αyα−1 e−y/θ / Γ(α).

ln f(y) = (α-1)ln(y) - y/θ - αln(θ) - ln(Γ(α)) = (α-1)ln(y) - y/(µ/α) - αln(µ/α) - ln(Γ(α))

= (α-1)ln(y) - αy/µ - αln(µ) + αln(α) - ln(Γ(α)).

361 Note that we do not need to determine σ, in order to estimate the mean. How to fit σ will be discussed below. 362 See “Mahler’s Guide to Regression” or Economic Models and Economic Forecasts by Pindyck and Rubinfeld.363 The mean is αθ, for the Gamma Distribution as per Loss Models.


The loglikelihood is the sum of the contributions from the three observations:(α-1)ln(1) + ln(2) + ln(3) - α1/(β0 + β1) + 2/(β0 + 2β1) + 9/(β0 + 3β1)

- αln(β0 + β1) + ln(β0 + 2β1) + ln(β0 + 3β1) + 3αln(α) - 3 ln(Γ(α)).


0 = α1/(β0 + β1)2 + 2/(β0 + 2β1)2 + 9/(β0 + 3β1)2 - α1/(β0 + β1) + 1/(β0 + 2β1) + 1/(β0 + 3β1).

⇒ 1/(β0 + β1)2 + 2/(β0 + 2β1)2 + 9/(β0 + 3β1)2 = 1/(β0 + β1) + 1/(β0 + 2β1) + 1/(β0 + 3β1).


0 = α1/(β0 + β1)2 + 4/(β0 + 2β1)2 + 27/(β0 + 3β1)2 - α1/(β0 + β1) + 2/(β0 + 2β1) + 3/(β0 + 3β1).

⇒ 1/(β0 + β1)2 + 4/(β0 + 2β1)2 + 27/(β0 + 3β1)2 = 1/(β0 + β1) + 2/(β0 + 2β1) + 3/(β0 + 3β1).

Solving these two equations in two unknowns: β0 = -1.79927 and β1 = 2.74390.364

µ = -1.79927 + 2.74390z. For z = 1, µ = .94463. For z = 2, µ = 3.68853. For z = 3, µ = 6.43243.This differs from what was obtained when one assumed Y was Normal rather than Gamma!

Although it is not needed in order to estimate the means, one can solve for α by maximizing the loglikelihood via computer. The fitted α = 7.00417. This model should be interpreted as follows. For a given value of z, Y is Gamma Distributed with mean = -1.79927 + 2.74390z, and α = 7.00417 independent of z. For example, for z = 3, the mean = 6.43243 and α = 7.00417. This implies that for z = 3 the scale parameter of the Gamma is θ = 6.43243/7.00417 = .91837, and the variance of the Gamma is:365 (7.00417)(.918372) = 5.9073.

Thus for z = 3, the expected value of Y is 6.43243. However, due to random fluctuation, for z = 3 we will observe values of Y varying around the expected value of 6.43243. If we make a very large number of observations of individuals with z = 3, then we expect to observe a Gamma Distribution of outcomes with mean 6.43243 and variance 5.9073.

364 I used a computer to solve these two equations. When fitting Generalized Linear Models via maximum likelihood, one almost always needs to use a computer. There are specialized commercial software packages which are specifically designed to work with Generalized Linear Models. See http://www.statsci.org/glm/software.html.365 For a Gamma Distribution as per Loss Models, the variance is αθ2.

Since the mean is αθ, the variance = mean2/α.


Poisson Distribution: For this same example, instead assume that Y is Poisson, with mean µ.366 µ = β0 + β1 z.

For the Poisson Distribution as per Loss Models, f(y) = e−λ λy / y!.

ln f(y) = -λ + yln(λ) - ln(y!) = -µ + yln(µ) - ln(y!). The loglikelihood is the sum of the contributions from the three observations:-(β0 + β1) - (β0 + 2β1) - (β0 + 3β1) + ln(β0 + β1) + 2ln(β0 + 2β1) + 9ln(β0 + 3β1) - ln(1) - ln(2) - ln(9!).


0 = -3 + 1/(β0 + β1) + 2/(β0 + 2β1) + 9/(β0 + 3β1).


0 = -6 + 1/(β0 + β1) + 4/(β0 + 2β1) + 27/(β0 + 3β1).

Solving these two equations in two unknowns: β0 = -12/5 = -2.4 and β1 = 16/5 = 3.2.367

µ = -2.4 + 3.2z. For z = 1, µ = 0.8. For z = 2, µ = 4.0. For z = 3, µ = 7.2.This differs from what was obtained when one assumed Y was Normal rather than Poisson!

This model should be interpreted as follows. For a given value of z, Y is Poisson Distributed with mean = -2.4 + 3.2z. For example, for z = 3, the mean = 7.2. However, due to random fluctuation, for z = 3 we will observe values of Y varying around the expected value of 7.2. If we make a very large number of observations of individuals with z = 3, then we expect to observe a Poisson Distribution of outcomes with mean 7.2.

Here is a comparison of the results of the three fitted models:

z Observed Normal Poisson Gamma

1 1 0 0.8 0 .9452 2 4 4.0 3 .6893 9 8 7.2 6 .432

The Poisson assumes that the variance increases with the mean, and therefore less weight is given to the error related to the third observation. Therefore, the Poisson model is less affected by the observation of 9, than is the Normal model. The Gamma assumes that the variance increases with the square of the mean, and therefore even less weight is given to the error related to the third observation. Therefore, the Gamma model is even less affected by the observation of 9, than is the Poisson model.366 In the case of a Poisson, there are no additional parameters beyond the mean.367 I used a computer to solve these two equations. One can confirm that these values satisfy these equations.


Using a Different Link Function:*

In this example, let us maintain the assumption of a Poisson Distribution, but instead of the identity link function let us use the log link function.

ln(µ) = Σβizi = β0 + β1z. ⇒ µ = exp[Σβizi] = exp[β0 + β1z].

f(y) = e−λ λy / y!.

ln f(y) = -λ + yln(λ) - ln(y!) = -µ + yln(µ) - ln(y!) = -exp[β0 + β1z] + y(β0 + β1z) - ln(y!).

The loglikelihood is the sum of the contributions from the three observations:-exp[β0 + β1] - exp[β0 + 2β1] - exp[β0 + 3β1] + β0 + β1 + 2(β0 + 2β1) + 9(β0 + 3β1) - ln(1) - ln(2) - ln(9!).


0 = -exp[β0 + β1] - exp[β0 + 2β1] - exp[β0 + 3β1] + 12.


0 = -exp[β0 + β1] - 2exp[β0 + 2β1] - 3exp[β0 + 3β1] + 32.

Thus we have two equations in two unknowns:exp[β0 + β1]1 + exp[β1] + exp[2β1] = 12.

exp[β0 + β1]1 + 2exp[β1] + 3exp[2β1] = 32.

Dividing the second equation by the first equation:1 + 2exp[β1] + 3exp[2β1]/1 + exp[β1] + exp[2β1] = 8/3.

⇒ exp[2β1] - 2exp[β1] - 5 = 0.

Letting v = exp[β1], this equation is: v2 - 2v - 5 = 0, with positive solution v = 1 + √6 = 3.4495.

exp[β1] = 3.4495. ⇒ β1 = 1.238.

⇒ exp[β0] = 12/exp[β1] + exp[2β1] + exp[3β1] = 12/3.4495 + 3.44952 + 3.44953 = .2128.

⇒ β0 = -1.547.

µ = exp[β0 + β1z] = exp[β0] exp[β1]z = (.2128)(3.4495z).

For z = 1, µ = .734. For z = 2, µ = 2.532. For z = 3, µ = 8.735. This differs from the result obtained previously when using the identity link function:

z Observed Poisson, Identity Link Poisson, Log Link Function

1 1 0.8 0 .7342 2 4.0 2 .5323 9 7.2 8 .735


A Two Dimensional Example of Generalized Linear Models:*368

Let us assume we have two types of drivers, male and female, and two territories, urban and rural. Then there are a total of four combinations of gender and territory.

Let us assume that we have the following observed pure premiums:369

Urban Rural

Male 8 0 0 5 0 0Female 4 0 0 2 0 0

Let us assume the following generalized linear model:Gamma FunctionReciprocal Link Ratio370 z1 = 1 if male.z2 = 1 if female.z3 = 1 if urban and z3 = 0 if rural.

Then 1/µ = Σβizi = β1z1 + β2z2 + β3z3. ⇒ µ = 1/(β1z1 + β2z2 + β3z3).

Therefore, the modeled means are:

Urban Rural

Male 1/(β1 + β3) 1/β1

Female 1/(β2 + β3) 1/β2

For the Gamma Distribution as per Loss Models, f(y) = θ−αyα−1 e−y/θ / Γ(α).

ln f(y) = (α-1)ln(y) - y/θ - αln(θ) - ln(Γ(α)) = (α-1)ln(y) - y/(µ/α) - αln(µ/α) - ln(Γ(α))

= (α-1)ln(y) - αy/µ - αln(µ) + αln(α) - ln(Γ(α))

= (α-1)ln(y) - αy(β1z1 + β2z2 + β3z3) + αln(β1z1 + β2z2 + β3z3) + αln(α) - ln(Γ(α)).

The loglikelihood is the sum of the contributions from the four observations:(α-1)ln(800) + ln(400) + ln(500) + ln(200) - α800(β1 + β3) + 400(β2 + β3) + 500β1 + 200β2

+ αln(β1 + β3) + ln(β2 + β3) + ln(β1) + ln(β2) + 4αln(α) - 4 ln(Γ(α)).

368 See pages 24 to 28 and Appendix F of “A Practitioners Guide to Generalized Linear Models,” by Duncan Anderson, Sholom Feldblum, Claudine Modlin, Dora Schirmacher, Ernesto Schirmacher and Neeza Thandi, in the 2004 CAS Discussion Paper Program. Note that the same data I use is there assumed to be claim severities.369 For simplicity assume that for each cell we have the same number of exposures. If the exposures varied by cell, then modifications could be made to the generalized linear model in order to take into account the volume of data by cell. All values are for illustrative purposes only.370 One could instead use the log link function, and obtain somewhat different results.



0 = -α(800 + 500) + α1/(β1 + β3) + 1/β1. ⇒ 1/(β1 + β3) + 1/β1 = 1300.


0 = -α(400 + 200) + α1/(β2 + β3) + 1/β2. ⇒ 1/(β2 + β3) + 1/β2 = 800.


0 = -α(800 + 400) + α1/(β1 + β3) + 1/(β2 + β3). ⇒ 1/(β1 + β3) + 1/(β2 + β3) = 1200.

Solving these three equations in three unknowns:371 β1 = .00223804, β2 = .00394964, and β3 = -.00106601.

µ = 1/(.00223804z1 + .00394964z2 - .00106601z3).

For Male and Urban: z1 = 1, z2 = 0, z3 = 1, and µ = 1/(.00223804 - .00106601) = 853.22.

For Female and Urban: z1 = 0, z2 = 1, z3 = 1, and µ = 1/(.00394964 - .00106601) = 346.79.

For Male and Rural: z1 = 1, z2 = 0, z3 = 0, and µ = 1/.00223804 = 446.82.

For Female and Rural: z1 = 0, z2 = 1, z3 = 0, and µ = 1/.00394964 = 253.19.

The fitted pure premiums by cell are:372

Urban Rural Average

Male 853.22 446.82 650.02Female 346.79 253.19 299.99Average 600.00 350.00 475.00

This compares to the observed pure premiums by cell:

Urban Rural Average

Male 8 0 0 5 0 0 6 5 0Female 4 0 0 2 0 0 3 0 0Average 6 0 0 3 5 0 4 7 5

Notice how subject to rounding, the averages for male, female, urban, and rural are equal for the fitted and observed. The overall experience of each class and territory has been reproduced by the model.373

371 I used a computer to solve these three equations. There is no need to solve for α in order to calculate the fitted pure premiums by cell. 372 The averages were computed assuming the same number of exposures by cell. 373 This is an example of a relationship between generalized linear models and “Minimum Bias” Methods.See “A Systematic Relationship Between Minimum Bias and Generalized Linear Models”, by Stephen J. Mildenhall, PCAS 1999. The Gamma with a log link function, rather than the reciprocal link function, is equivalent to one of the Minimum Bias Methods considered by Mildenhall.


Common Link Functions:

What Loss Models calls η is commonly called the link function. In the actuarial applications of the Generalized Linear Model you are likely to read about, what Loss Models calls c is taken as the identity function. η(µ) = βz. ⇔ µ = η−1(βz).

Common link functions to use include:

Identity η(µ) = µ η−1(y) = y µ = βzLog η(µ) = ln(µ) η−1(y) = ey µ = eβz

Logit374 η(µ) = ln(µ/(1 - µ)) η−1(y) = ey/(ey + 1) µ = eβz/(eβz + 1)Reciprocal η(µ) = 1/µ η−1(y) = 1/y µ = 1/(βz)

It is common to pick the form of the variable X, to be a member of a linear exponential family.375 In that case, there are corresponding commonly chosen link functions.376 These are referred to as the “canonical link functions”.

Distribution Form Canonical Link Function

Normal377 Identity

Poisson378 Log: ln(µ)

Gamma379 Reciprocal: 1/µ

Bernoulli380 Logit: ln(µ/(1 - µ))

Inverse Gaussian 1/µ2

374 Used in Example 12.72 in Loss Models. Useful when 0 < µ < 1, since then -∞ < ln(µ/(1 - µ)) < ∞.375 See “Mahler’s Guide to Conjugate Priors” for a discussion of linear exponential families. Exponential families can be used in Generalized Linear Models, including a “dispersion parameter”.376 While these choices make it easier to fit the generalized linear model, they are not required. 377 For example, ordinary linear regression.378 Could be used to model claim frequencies or claim counts.379 Could be used to model claim severities. In that case, one could use the log link function, ln(µ).380 Could be used to model probability of policy renewal. The use of the logit link function and with the Bernoulli is the idea behind logistic regression.


Actuarial Applications:*381

Generalized Linear Models can be used for many purposes. Among the applications to actuarial work are: determining classification relativities, loss reserving, and studying policy renewal rates.

Further Reading:*382

“A Practitioners Guide to Generalized Linear Models,” by Duncan Anderson, Sholom Feldblum, Claudine Modlin, Dora Schirmacher, Ernesto Schirmacher and Neeza Thandi, in the 2004 CAS Discussion Paper Program.

“Something Old, Something New in Classification Ratemaking with a Novel Use of GLMs for Credit Insurance”, by K. D. Holler, D. B. Sommer, and G. Trahair, CAS Forum, Winter 1999, formerly on the Syllabus for CAS Part 9.

“Using Generalized Linear Models to Build Dynamic Pricing Systems”, by Karl P. Murphy, Michael J. Brockman, and Peter K. Lee, CAS Forum, Winter 2000.

“A Systematic Relationship Between Minimum Bias and Generalized Linear Models”, by Stephen J. Mildenhall, PCAS 1999.

381 See for example the section for Insurance Applications at http://www.statsci.org/glm/bibliog.html382 For a list of books, see for example http://www.statsci.org/glm/books.html


Problems:

Use the following information for the next two questions:X: 1 5 10 25Y: 5 15 50 100 Y1, Y2, Y3, Y4 are independently Normally distributed with means µi = βXi, i = 1, 2,...,4, and

common variance σ2.

44.1 (2 points) Determine ^β via maximum likelihood.

(A) 3.9 (B) 4.0 (C) 4.1 (D) 4.2 (E) 4.3

44.2 (3 points) Estimate the standard deviation of ^β.

(A) 0.15 (B) 0.20 (C) 0.25 (D) 0.30 (E) 0.35

44.3 (1 point) Which of the following are statements are true?A. If the errors are Normally distributed, then the method of least squares produces the same fit

as the method of maximum likelihood.B. Ordinary Linear Regression is a special case of Generalized Linear Models.C. Weighted Least Squares Regression is a special case of Generalized Linear Models.A. 1 B. 2 C. 3 D. 1, 2, 3 E. None of A, B, C, or D

44.4 (4 points) Assume a set of three observations: For z = 1, we observe 4. For z = 2, we observe 7. For z = 3, we observe 8.Fit to these observations a Generalized Linear Model with a Poisson Distribution and a log link function. In other words, assume that each observation is a Poisson random variable, with mean λ and ln(λ) = β0 + β1z.


Use the following information for the next five questions:X 2 5 8 9Y 10 6 11 13Y1, Y2, Y3, Y4 are independently Normally distributed with means µi = β0 + β1Xi, i = 1, 2,...,4,

and common variance σ2.

44.5 (2 points) Determine ^β1 via maximum likelihood.

(A) 0.1 (B) 0.2 (C) 0.3 (D) 0.4 (E) 0.5

44.6 (2 points) Determine ^β0 via maximum likelihood.

(A) 4 (B) 6 (C) 7 (D) 8 (E) 9

44.7 (2 points) Determine σ via maximum likelihood.(A) 1.0 (B) 1.5 (C) 2.0 (D) 2.5 (E) 3.0

44.8 (3 points) Estimate the standard deviation of ^β1.

(A) 0.3 (B) 0.4 (C) 0.5 (D) 0.6 (E) 0.7

44.9 (3 points) Estimate the standard deviation of ^β0.

(A) 2.5 (B) 3.0 (C) 3.5 (D) 4.0 (E) 4.5

44.10 (8 points) You have the following data on reported occurrences of a communicable disease in two areas of the country at 2 month intervals:

Months Area A Area B2 8 144 8 196 10 168 11 2110 14 2312 17 2714 13 2816 15 2918 17 3320 15 31

Let X1 = ln(months). Let X2 = 0 for Area A and 1 for Area B.

Assume the number of occurrences Yi are Poisson variables with means µi, and

ln(µi) = β0 + β1X1i + β2X2i.Set up the equations to be solved in order to fit this model via maximum likelihood.


44.11 (5 points) You have the following data on the renewal of homeowners insurance policies with the ABC Insurance Company: Number of Year Insured Number of Policies Number of Policies Renewed

1 1000 9002 900 8203 800 7404 700 6605 600 580

Let X = number of years insured with ABC Insurance Company.Assume the number of renewals is Binomial with expected rate of renewal p.Further you assume that ln[p/(1-p)] = β0 + β1X.Determine the equations to be solved in order to fit this model via maximum likelihood.

44.12 (4, 11/04, Q.34) (2.5 points) You are given:(i) The ages and number of accidents for five insureds are as follows:

Insured X = Age Y = Number of Accidents1 34 22 38 13 45 04 25 35 21 3

Total 163 9(ii) Y1, Y2,..., Y5 are independently Poisson distributed with means µi = βXi, i = 1, 2,...,5.

Estimate the standard deviation of ^β.

(A) Less than 0.015(B) At least 0.015, but less than 0.020(C) At least 0.020, but less than 0.025(D) At least 0.025, but less than 0.030(E) At least 0.030


Section 45, Important Ideas and Formulas

Fitting a Straight Line with No Intercept (Section 1)

Least squares fit to the linear model with no intercept, Y = ββββX + εεεε:::: ^ββ = ΣΣΣΣXiYi /ΣΣΣΣXi2.

Fitting a Straight Line with an Intercept (Section 2)

Two-variable regression model ⇔⇔⇔⇔ 1 independent variable and 1 intercept. Yi = αααα + ββββXi + εεεε i

Ordinary least squares regression: minimize the sum of the squared differences between the estimated and observed values of the dependent variable.

estimated slope = ^ββ = NΣΣΣΣXiYi - ΣΣΣΣXiΣΣΣΣYi / NΣΣΣΣXi2 - (ΣΣΣΣXi)2. αα ==== Y −−−−

^ββ X .

To convert a variable to deviations form, one subtracts its mean.A variable in deviations form is written with a small rather than capital letter.xi = Xi - X. Variables in deviations always have a mean of zero.

In deviations form, the least squares regression to the two-variable (linear) regression model, Yi = α + βXi + εi, has solution:

^ββ = ΣΣΣΣxiyi /ΣΣΣΣxi2 = ΣxiYi /Σxi2. αα ==== Y −−−−

^ββ X ....

^β = Cov[X, Y] / Var[X] = r sY/sX.

Provided you are given the individual data rather than the summary statistics, the allowed electronic calculators will fit a least squares straight line with an intercept.

Residuals (Section 3)

Residual = actual - estimated. ⇔ îεε ≡ Yi -

îY .

For the linear regression model with an intercept, the sum of the residuals is always zero.

The sum of squared errors is referred to the Error Sum of Squares or ESS.

ESS ≡≡≡≡ ΣΣΣΣ îεε 2 = ΣΣΣΣ (Yi -

îY )2 .

Corr[ ε , X] = 0. Corr[^Y - Y , ε ] = 0.


Dividing the Sum of Squares into Two Pieces (Section 4)

ΣΣΣΣ (Xi - X)2/(N - 1) ⇔ the sample variance of X, an unbiased estimator of the underlying variance, when the underlying mean is unknown.

Total Sum of Squares ≡≡≡≡ TSS ≡≡≡≡ ΣΣΣΣ (Yi - Y)2 = ΣΣΣΣ yi2.TSS = numerator of the sample variance of Y.

Regression Sum of Squares ≡≡≡≡ RSS ≡≡≡≡ ΣΣΣΣ (^

iY - Y)2.

TSS = RSS + ESS.

The total variation has been broken into two pieces: that explained by the regression model, RSS, and that unexplained by the regression model, ESS.

Source of Variation Sum of Squares Degrees of FreedomModel RSS k - 1Error ESS N - k

Total TSS N - 1

Where N is the number of points, and k is the number of variables including the intercept (k = 2 for the two-variable model with one slope and an intercept.)

R-Squared (Section 5)

R2 = RSS/TSS = 1 - ESS/TSS = 1 - ΣΣΣΣ îε 2222/ΣΣΣΣyi2.

R-Squared is the percentage of variation explained by the regression model.0 ≤≤≤≤ R2 ≤≤≤≤ 1. (Not applicable to a model with no intercept.)

Large R2 ⇔ good fit, but not necessarily a good model.As one adds more variables, R2 increases.

For the Two Variable Model:

RSS = ^β2 Σxi2=

^β Σxiyi = SXY2/SXX.

R2 = RSS/TSS = ^β2 Σxi2 /Σyi2 = Corr[X, Y]2.

Corrected R-Squared (Section 6)

Corrected R2 = 2

R = 1 - (1 - R2)(N - 1)/(N - k), where N is the number of observations,

and k is the number of variables including the intercept. 2

R ≤ R2.One can usefully compare the corrected R2’s of different regressions. Larger is better.


Normal Distribution (Section 7)

F(x) = ΦΦΦΦ((((((((x−−−−µµµµ))))////σσσσ))))....

f(x) = φφφφ((((((((x−−−−µµµµ))))////σσσσ)))) ==== (1/σσσσ√√√√2ππππ) exp( -[(x-µµµµ )2]/[2σσσσ2] ).

Mean = µµµµ.... Variance = σσσσ2. Skewness = 0000 ((((distribution is symmetric).The sum of two independent Normal Distributions is also a Normal Distribution. If X is normally distributed, then so is aX + b.

Let X1, X2, ..., Xn be a series of independent, identically distributed variables, with finite mean

and variance. Let Xn = average of X1, X2, ..., Xn.

Then as n approaches infinity, Xn approaches a Normal Distribution.

Assumptions of Linear Regression (Section 8)

Five assumptions are made in the Classical Linear Regression Model:1. Yi = α + βXi + εi.2. Xi are known fixed values.

3. E[ε] = 0.4. Var[εi] = σ2 for all i.

5. εi and εj independent for i ≠ j.If we add an assumption, we get the Classical Normal Linear Regression Model:6. The error terms are Normally Distributed.

Therefore, Yi is Normally Distributed with mean α + βXi and variance σ2.

In the multivariate version, we add an assumption that no exact linear relationship exists between two or more of the independent variables.

Properties of Estimators (Section 8)

The Bias of an estimator is the expected value of the estimator minus the true value. An unbiased estimator has a Bias of zero. For an asymptotically unbiased estimator, as the number of data points, n → ∞, the bias approaches zero.

When based on a large number of observations, a consistent estimator, has a very small probability that it will differ by a large amount from the true value.

The mean square error (MSE) of an estimator is the expected value of the squared difference between the estimate and the true value. The smaller the MSE, the better the estimator, all else equal. The mean squared error is equal to the variance plus the square of the bias. Thus for an unbiased estimator, the mean square error is equal to the variance.


An unbiased estimator is efficient if for a given sample size it has the smallest variance of any unbiased estimator.

The least squares estimator of the slope is unbiased. The ordinary least squares estimator is consistent, even with heteroscedasticity and/or serial correlation of errors.

Gauss-Markov Theorem: the ordinary least squares estimator are the best linear unbiased estimator, BLUE; they have the smallest variance among linear unbiased estimators. With heteroscedasticity and/or serial correlation of errors, the ordinary least squares estimator is not efficient.

Variances and Covariances (Section 10)

s2 = ΣΣΣΣ îε 2 / (N - k) = ESS/(N - k) = estimated variance of the regression.

s is called the standard error of the regression.

sα = √√√√Var[α] = standard error of the estimate of α.

βs = √√√√Var[ ^β] = standard error of the estimate of β.

For the two-variable model:

Var[α ] = s2ΣΣΣΣXi2 /(NΣΣΣΣxi2). Var[ ^β ] = s2 /ΣΣΣΣxi2.

Cov[ α,^β] = -s2 X /Σxi2. Corr[ α,

^β] = - X/√(E[X2]).

If X > 0, then the estimates of α and β are negatively correlated.

^β is Normally Distributed, with mean β and variance σ2/Σxi2.

α is Normally Distributed, with mean α and variance E[X2]σ2/Σxi2.

α and ^β are jointly Bivariate Normally Distributed, with correlation - X/√(E[X2]).

If we simulated the Y values many times, we would get a set of many different values for ^β;

Var[^β] measures the variance of

^β around its expected value of β.

t-distribution (Section 11)

Support: -∞∞∞∞ < x < ∞∞∞∞. Parameters: νννν = positive integer Mean = 0 Skewness = 0 (symmetric)Mode = 0 Median = 0 As νννν →→→→ ∞∞∞∞, the t-Distribution →→→→ the Standard Normal Distribution.


t-test (Section 12)

For a confidence interval for the mean with probability 1 - α, take X ± t√(S2/n), where S2 is the sample variance and t is the critical value for the t-distribution with n-1 degrees of freedom and α area in both tails.

Confidence Intervals for Estimated Parameters (Section 13)

To get a confidence interval for a regression parameter with probability 1 - αααα , one uses the critical value for t-distribution with N-k degrees of freedom and αααα

area in both tails. The confidence interval is: ^ββ ± t ββs .

F-Distribution (Section 14)

Support: 0 < x < ∞∞∞∞. Parameters: νννν1111 ⇔⇔⇔⇔ number of degrees of freedom associated with the numerator

⇔⇔⇔⇔ columns of table.νννν2222 ⇔⇔⇔⇔ number of degrees of freedom associated with the denominator

⇔⇔⇔⇔ rows of table.

Generally an F-Statistic will involve in the numerator some sort of sum of squares divided by its number of degrees of freedom. In the denominator will be another sum of squares divided by its number of degrees of freedom.

For the tests applied to regression models, one applies a one-sided F-test.

Prob[F-Distribution with 1 and ν degrees of freedom > c2] = Prob[absolute value of t-distribution with ν degrees of freedom > c].

Testing the Slope, Two Variable Model (Section 15)

Most Common t-test for the 2-variable model:1. H0: ββββ = 0. H1: ββββ ≠≠≠≠ 0.

2. t = ^ββ / ββs .

3. If H0 is true, then t follows a t-distribution. 4. Number of degrees of freedom = N - 2. 5. Compare the absolute value of the t-statistic to the critical values in the



General t-test, 2-variable model:1. H0: a particular regression parameter takes on certain value b. H1: H0 is not true.2. t = (estimated parameter - b)/standard error of the parameter.3. If H0 is true, then t follows a t-distribution. 4. Number of degrees of freedom = N - 2. 5. Compare the absolute value of the t-statistic to the critical values in the


Applying the F-Test to a single slope is equivalent to the t-test with t = √√√√F.

For the 2-variable model, F = (N - 2)R2/(1 - R2) = (N - 2)RSS/ESS, with 1 and N - 2 degrees of freedom.

Hypothesis Testing (Section 16)

One tests the null hypothesis H0 versus an alternative hypothesis H1. Hypothesis tests are set up to disprove something, H0, rather than prove anything.

A hypothesis test needs a test statistic whose distribution is known. critical values ⇔ the values used to decide whether to reject H0 ⇔boundaries of the critical region other than ±∞.critical region ⇔ if test statistic is in this region then we reject H0.

The significance level, α, of the test is a probability level selected prior to performing the test. If given the value of the test statistic, the probability that H0 is true is less than the significance level chosen, then we reject the H0. If not, we do not reject H0.

p-value = probability of rejecting H0 even though it is true = probability of a Type I error = Prob[test statistic takes on a value equal to its calculated value or a value less in agreement with H0 (in the direction of H1) | H0].

If the p-value is less than the chosen significance level, then we reject H0. When applying hypothesis testing to test the fit of a model, the larger the p-value the better the fit.

Type I Error ⇔ Reject H0 when it is true.Type II Error ⇔ Do not reject H0 when it is false.

Rejecting H0 at a significance level of α ⇔ the probability of a Type I error is less than α.

Power of the test ≡ probability of rejecting H0 when it is false = 1 - Prob[Type II error].The larger the data set, the more powerful a given test.


Three Variable Regression Model (Section 18)

Y = β1 + β2X2 + β3X3 + ε.^β2 = Σx2iyi Σx3i2 - Σx3iyi Σx2ix3i / Σx2i2 Σx3i2 - (Σx2ix3i)2.^β3 = Σx3iyi Σx2i2 - Σx2iyi Σx2ix3i / Σx2i2 Σx3i2 - (Σx2ix3i)2.^β1 = Y -

^β2 2X -

^β3 3X .

s2 = ESS/(N - k) = ESS/(N - 3).

Var[^β2 ] = s2/(1 - rX X2 3

2)Σx2i2.

Var[^β3] = s2/(1 - rX X2 3

2)Σx3i2.

Cov[^β2 ,

^β3] = - rX X2 3

s2 / (1 - rX X2 32)√(Σx2i2Σx3i2).

Corr[^β2 ,

^β3] = Cov[

^β2 ,

^β3] / √(Var[

^β2 ]Var[

^β3]) = - rX X2 3

.


2).

Cov[^β1,


2).

Cov[^β1,


2).

Matrix Form of Linear Regression (Section 19)

X = the design matrix, is an N by k matrix in which the first column consists of ones, corresponding to the constant term in the model, and the remainder of each row is the values of the independent variables for an observation.

fitted parameters = ^β = (X’X)-1X’Y.

variance-covariance matrix of fitted parameters = Var[^β] = s2(X’X)-1.


Tests of Slopes, Multiple Regression Model, (Section 20)

Summary of the general t-test:1. H0: a particular regression parameter takes on certain value b. H1: H0 is not true.2. t = (estimated parameter - b)/standard error of the parameter.3. If H0 is true, then t follows a t-distribution. 4. Number of degrees of freedom = N - k. 5. Compare the absolute value of the t-statistic to the critical values in the


To test the hypothesis that all of the slope coefficients are zero compute the F-Statistic = RSS/(k-1)/ESS/(N - k) = R2/(1 - R2)(N - k)/(k - 1), which if H0 is true follows an F-Distribution with νννν1111 = k -1 and νννν2222 = N - k.

Additional Tests of Slopes (Section 21)

When testing whether some set of slope coefficients are all zero, or whether some linear relationship holds between two slope coefficients:F-Statistic = (ESSR - ESSUR)/q / ESSUR/(N - k)

= (R2UR - R2R)/q/(1 - R2UR)/(N - k).where q is the dimension of the restriction, R ⇔ restricted model, and UR ⇔ unrestricted model.

Applying the F-Test to a single slope is equivalent to the t-test with t = √√√√F.

To test the equality of the coefficients of 2 similar regressions fit to different data sets A and B, F = (ESSR - ESSUR)/q/ESSUR/(N - k) = (ESSC - ESSA - ESSB)/k/(ESSA + ESSB)/(NA + NB - 2k), with k and NA + NB - 2k degrees of freedom.

Additional Models (Section 22)

By change of variables, for example taking logs, one can convert certain models into those that are linear in their parameters. Exponential regression: ln[Yi] = α + βXi + εi.exponential regression ⇔⇔⇔⇔ constant percentage rate of inflation.

The Normal Equations, N linear equations in N unknowns, can be obtained by writing the expression for the sum of squared errors, and setting equal to zero the partial derivative with respect to each of the parameters.


Dummy Variables (Section 23)

A dummy variable is one that is discrete rather than continuous. Most commonly a dummy variable takes on only the values 0 or 1.

Piecewise Linear Regression (Section 24)

Piecewise Linear Regression uses a model made up a series of straight line segments, with the entire model continuous.

Weighted Regressions (Section 25)

In a weighted regression we weight some of the observations more heavily than others.

One can perform a weighted regression by minimizing the weighted sum of squared errors:

For the model with no intercept: ^β = ΣwiXiYi / ΣwiXi2.

One can put weighted regression into deviations form, by subtracting the weighted average from each variable.xi = Xi - ΣwiXi yi = Yi - ΣwiYi

For the two variable model: ^β = Σwixiyi /Σwixi2. α = ΣwiYi -

^βΣwiXi.

Heteroscedasticity (Section 26)

Variances of εεεε i are all equal ⇔⇔⇔⇔ Homoscedasticity.

Variances of εεεε i are not all equal ⇔⇔⇔⇔ Heteroscedasticity.

If there is heteroscedasticity, then the usual estimator of the variance of ^ββ is

biased and inconsistent.

For the two variable model, when there is heteroscedasticity, Var[^β] = Σxi2σi2 /(Σxi2)2.


Tests for Heteroscedasticity (Section 27)

The Goldfeld-Quandt Test proceeds as follows:0. Test H0 that σi2, the variance of εi, is the same for all i.

1. Find a variable that seems to be related to σi2, by graphing the squared residuals, or other techniques.2. Order the observations in assumed increasing order of σi2, based on the relationship from step 1. 3. Run a regression on the first (N - d)/2 observations, with assumed smaller σi2.

4. Run a regression on the last (N - d)/2 observations, with assumed larger σi2.

5. (ESS from step 4)/(ESS from step 3) has an F-Distribution, with (N - d)/2 - k and (N - d)/2 - k degrees of freedom.

The Breusch-Pagan Test proceeds as follows:0. Test H0 that σi2, the variance of εi, is the same for all i.


2. Run the assumed regression model. Note the residuals îε and let σ2 = ESS/N.

3. Run a regression of îε 2/σ2 from step 2 on the variable(s) from step 1.

4. RSS/2 from step 3 has a Chi-Square Distribution with number of degrees of freedom equal to the number of variables from step 1, not counting an intercept.

The White Test proceeds as follows:0. Test H0 that σi2, the variance of εi, is the same for all i.

1. Find a variable(s) that seems to be related to σi2, by graphing the squared residuals, or other techniques.2. Run the assumed regression model. Note the residuals.

3. Run a regression of îε 2 from step 2 on the variable(s) from step 1.

4. N R2 from step 3 has a Chi-Square Distribution with number of degrees of freedom equal to the number of variables from step 1, not counting an intercept.

Correcting for Heteroscedasticity (Section 28)

In order to adjust for heteroscedasticity, we use a weighted regression, in which we weight each observation by wi, with wi proportional to 1/σσσσ i2222, the inverse of

the variance of the error εεεε i.

For the model with no intercept:^ββ = ΣΣΣΣ((((Xi/σσσσ i)((((Yi/σσσσ i) / ΣΣΣΣ((((Xi/σσσσ i)2 = ΣΣΣΣwiXiYi / ΣΣΣΣwiXi2, where wi = (1/σσσσ i2222)/ ΣΣΣΣ (1/σσσσ j2222)....


Heteroscedasticity-consistent estimators (HCE), provide unbiased, and consistent estimators of variances of estimated parameters, when heteroscedasticity is present.

Var[^β] ≅ Σxi2

îε 2 / ( Σxi2)2.

Serial Correlation (Section 29)

ρρρρ = Corr[εεεε t-1, εεεε t]. If ρρρρ > 0, then we have positive (first order) serial correlation.

If ρ < 0, then we have negative (first order) serial correlation. ρ > 0 ⇔ successive residuals tend to be alike.ρ < 0 ⇔ successive residuals tend to be unalike.

εt = ρεt-1 + νt. Var[εt] = σε2 = σν2/(1 - ρ2).

Effects of Positive Serial Correlation on Ordinary Regression Estimators:1. Still unbiased.2. Still consistent.3. No longer efficient. 4. Standard error of regression is biased downwards.5. Overestimate precision of estimates of model coefficients.6. Some tendency to reject H0: β = 0, when one should not.

7. R2 is biased upwards.

Durbin-Watson Statistic (Section 30)

DW = ΣΣΣΣ ( εεt - εεt 1− )2 / ΣΣΣΣ εεt2.no serial correlation ⇔ DW near 2.positive serial correlation ⇔ DW small.negative serial correlation ⇔ DW large.DW ≅ 2(1 - ρ).

When there is a lagged dependent variable, the Durbin-Watson test is biased towards not rejecting the null hypothesis of no serial correlation.

When there is a lagged dependent variable with coefficient β, then the Durbin h-statistic:

h = (1 - DW/2)√N/(1 - N Var[^β]), where N is the number of observations and DW is the Durbin-

Watson Statistic, has a Standard Normal Distribution if H0: no serial correlation, is true.

This is not valid if N Var[^β] ≥ 1.


A second technique for dealing with the situation with a lagged dependent variable, involves

fitting a regression to the residuals using lagged dependent variable ^t 1ε − . We then apply the

usual t-test to ρ, the coefficient of ^t 1ε − . If ρ is significantly different from zero, then we reject the

null hypothesis of no serial correlation.

Correcting for Serial Correlation (Section 31)

The Hildreth-Lu procedure: 0. Choose a grid of likely values for -1 ≤ ρ ≤ 1. For each such value of ρ:


* = Yt - ρYt-1.

2. Fit by regression the transformed equation Y∗ = α(1 - ρ) + βX∗.3. The best regression has the smallest Error Sum of Squares (ESS).If desired, then refine the grid of values for ρ, and again perform steps 1, 2, and 3.Translate the transformed equation back to the original variables: Yt = α + βXt.

The Cochrane-Orcutt procedure:

1. Fit a linear regression and get the resulting residuals εt .


t 1ε −2.


* = Yt - ρYt-1.





t 1ε −2.

7. Unless the value of ρ seems to have converged or enough iterations have been performed, return to step 3.

Multicollinearity (Section 32)

In multiple regression, there is a high degree of multicollinearity when some of the independent variables or combinations of independent variables are highly correlated. A high degree of multicollinearity, usually leads to unreliable estimates of the regression parameters.

Forecasting (Section 33)

Variance of the forecast of the expected value at x = X - X is: s21/N + x2/ΣΣΣΣxi2.

Mean Squared Forecast Error at x = X - X is: sf2 = s21 + 1/N + x2/ΣΣΣΣxi2

= s2 + Variance of forecast at x.


If one estimated the regression from data at time 1, 2, ..., T, and one is forecasting at time T+1,Mean Squared Forecast Error is: s21 + 1/T + (XT+1 - X)2/Σ(Xi - X)2.

The mean squared forecast error is smallest when we try to predict the value of the dependent variable at the mean of the independent variable.

Normalized Forecast Error = λ = (^YT+1 - YT+1)/sf.

One can use the t-distribution to get confidence intervals for forecasted expected values and forecasted observed values.

Testing Forecasts (Section 34)

The errors that would result from the repeated use of a procedure is what is referred to when we discuss the qualities of an estimator.

ex post forecast ⇔ values to be forecast known at time of forecast.ex ante forecast ⇔ values to be forecast not known at the time of the forecast.

In a conditional forecast, not all of the values of the independent variable(s) are known at the time of the forecast.

Root Mean Squared Forecast Error = √√√√ΣΣΣΣ (forecasti - observationi)2/(# of forecasts).

U = Theil’s Inequality Coefficient = (RMS Error) / √(2nd moment of forecasts) + √(2nd moment of observeds).0 ≤ U ≤ 1. The smaller Theil’s Inequality Coefficient, the better the forecasts.

Bias proportion of U = UM = (mean forecast) - (mean observation)2/ MSE.Variance proportion of U = US = (stddev of forecasts) - (stddev of observations)2/ MSE.Covariance proportion of U = UC = 2(1 - correlation of fore. & obser.)(stddev of fore.)(stddev of obser.) / MSE.UM + US + UC = 1.

Forecasting with Serial Correlation (Section 35)

With serial correlation one successively forecasts ahead one time period at a time:

Forecast for time t+1 = ρ(Forecast for time t) + α(1- ρ) + ^β(t+1 - ρ t).

= ρ(^β + Forecast for time t) + (1- ρ) α +

^β(t+1).

Where, α, ^β, and ρ have been estimated using one of the procedures to correct for serial

correlation, the Cochrane-Orcutt procedure or Hildreth-Lu procedure.


Standardized Coefficients (Section 36)

Prior to preforming the regression, standardize each variable by subtracting its mean and then dividing each variable by its standard deviation.

For the two variable model: ββ^* = ^ββsX/sY = rXY.

For the three variable model, the standardized slopes can be written in terms of correlations:

β^*2 = ( rYX2

- rYX3rX X2 3

) /(1 - rX X2 32) and β^*3 = ( rYX3

- rYX2rX X2 3

) /(1 - rX X2 32).

For the multiple regression model: β^*2 =

^β2 2Xs /sY, β

^*3 = ^β3 3Xs /sY, etc.

The larger the absolute value of the standardized coefficient, the more important the corresponding variable is in determining the value of Y.

Elasticity (Section 37)

The elasticity measures the percent change in the dependent variable for a given percent change in an independent variable, near the mean value of each variable.

Ej = ^

jββ jX /Y . elasticity ≅ (∂Y / ∂Xi)( Xi/Y ).|Ej| large ⇔ Y is responsive to changes in Xj.For a model is estimated in logarithms, the variable coefficients are the elasticities.

Partial Correlation Coefficients (Section 38)

In a multiple regression, the partial correlation coefficient measures the effect of Xj on Y which is not accounted for by the other variables. The square of the partial correlation coefficient measures the percentage of the variation of Y that is accounted for by the part of Xj that is uncorrelated with the other variables.

For the three variable model:rYX .X2 3

= ( rYX2 - rYX3

rX X2 3)/√(1 - rYX3

2)(1 - rX X2 32).

rYX .X3 2 = ( rYX3

- rYX2rX X2 3

)/√(1 - rYX22)(1 - rX X2 3

2).

rYX .X2 32 = (R2 - rYX3

2)/(1 - rYX32). rYX .X3 2

2 = (R2 - rYX22)/(1 - rYX2

2).


Regression Diagnostics (Section 39) *

s(i) = the standard error of the regression excluding observation i. H = X(X’X)-1X’.

studentized residual = îε * = ^

iε / s(i)√(1 - hii).An outlier is an observation that is far from the fitted least squares line.One can compare studentized residuals to the t-distribution in order to spot outliers.

DFBETASi = ^β -

^β(i)/(s(i) √(X’X)-12,2).

When the absolute value of a DFBETAS is larger than 2/√N, then the corresponding observation has a large effect on the estimated slope.

Cook’s D = Di = ^Y -

^Y(i)’

^Y -

^Y(i)/(k s2).

A large value of Cook’s D relative to the other values, indicates an influential observation.

Stepwise Regression (Section 40) *

At each stage one adds the independent variable with the largest absolute value of its partial correlation coefficient with respect to all of the variables already included in the model.

Proceed until no more improvement in 2

R is possible.

Stochastic Explanatory Variables (Section 41) *

The Ordinary Least Squares estimator is no longer (unconditionally) unbiased.

Generalized Least Squares (Section 42) *

The variance-covariance matrix of the errors is σ2Ω, with Ω known.~β = (X’Ω−1X)-1 X’Ω−1Y.

The variance-covariance matrix of the fitted parameters is: Cov[~β ] = σ2(X’Ω−1X)-1.

An unbiased estimator of σ2 is given by: ( ε ‘ Ω−1 ε )/(N - k).If Ω is not σ2I, then the use of Ordinary Least Squares rather than Generalized Least Squares would result in unbiased but inefficient estimates.


Nonlinear Estimation (Section 43)

One can fit a nonlinear model by minimizing the squared errors.

Determining the fitted parameters for a nonlinear model is more difficult than for the linear case. Among the methods of determining the fitted parameters for a nonlinear model: solve the Normal Equations, the Iterative Linearization Method, and the Steepest Descent Method.

Iterative Linearization Method:There is a nonlinear model with k independent variables and parameters:Y = f(X1, X2, ..., Xk; β1, β2, ..., βp), which we want to fit by least squares.

Given initial guesses, β1,0, β2,0, ..., βp,0, we construct the dependent variable:

Y - f(X1, X2, ..., Xk; β1,0, β2,0, ..., βp,0) + Σ βi,0 (∂f/∂βi)0,

and the p independent variables: (∂f/∂βi)0.

Then we solve for least squares values of the coefficients βi.

These solutions are used as the next guesses, βi,1.

We iterate until there is convergence.

In the nonlinear case, one can compute R2 as in the linear case.

In the nonlinear case, one can not directly use from the linear case:s2 = ESS/(N - k), t-test, F-Test, and the formula for the mean squared forecast error.

Generalized Linear Models (Section 44) *

The ordinary linear regression model is a special case of the generalized linear model.

In the case of Normally distributed errors, fitting by least squares is equivalent to fitting via maximum likelihood.

F(x | θ , β) = F(x | µ , θ), where µ is such that η(µ) = c(Σ βizi). η is called the link function.

Given a distributional form for F, a link function, and data, one fit the coefficients β via maximum likelihood.


Mahler’s Guide to

Regression

Solutions to ProblemsSections 1-12


prepared by


Study Aid F06-Reg-L

New England Actuarial Seminars Howard MahlerPOB 315 [email protected], MA, 02067

www.neas-seminars.com

Solutions to Problems, Sections 1-12

1.1. E. ^β = ΣXiYi /ΣXi2 = (4)(5) + (7)(15) + (13)(22) + (19)(35)/42 + 72 + 132 + 192 =

1076/595 = 1.808.

1.2. C. ^

iY = 1.808Xi. îε = Yi -

îY .

Xi: 4 7 13 19Yi: 5 15 22 35^

iY : 7.232 12.656 23.504 34.352îε : -2.232 2.344 -1.504 .648

ESS = Σ îε 2 = 2.2322 + 2.3442 + 1.5042 + .6482 = 13.16.

1.3. B. Xi: 4 7 13 19Yi: 5 15 22 352Xi: 8 14 26 38error: -3 1 -4 -3

sum of squared errors = 32 + 12 + 42 + 32 = 35.Comment: Note that the sum of squared errors is larger than for the least squares fit.

1.4. C. A model with no intercept, therefore ^β = ΣXiYi / ΣXi2 = (Y1 + 5Y2 + 10Y3)/126.

ε2 = Y2 - ^βX2 = Y2 - 5(Y1 + 5Y2 + 10Y3)/126 = (101Y2 - 5Y1 - 50Y3)/126.

E[ ε2 ] = 0. E[ ε22] = Var[ ε2 ] = 1012Var[Y2] + 52Var[Y1] + 502Var[Y3]/1262 =

10201Var[ε2] + 25Var[ε1] + 2500Var[ε3]/15876 = (10201)(2) + (25)(1) + (2500)(4)/15876 = 30427/15876 = 1.92.Comment: Similar to 4, 11/03, Q.29.

1.5. C. A model with no intercept, therefore ^β = ΣXiYi / ΣXi2 = 3080/751 = 4.10.

1.6. D. ^β = ΣXiYi / ΣXi2 = 128417/111234 = 1.1545. (300)(1.1545) = 346.3.

1.7. C. ^β = ΣXiYi / ΣXi2 = 379/60 = 6.32.

1.8. B. ^β = ΣXiYi/ΣXi2 = (630)(570) + ... + (760)(720)/(6302 + ... + 7602)

= 4,107,900/4,150,900 = .9896.

1.9. C. ^β = ΣXiYi/ΣXi2 = 36,981/191,711 = .193.

HCMSA-F06-Reg-L, Solutions to Regression §1-12, 7/12/06, Page 416

1.10. B. ^β = Σxiyi / Σxi2 = (1)(2) + (5)(3)/12 + 52) = 17/26.

1.11. E. β* = ΣXiYi/ΣXi2 = (-2Y1 - Y2 + Y4 + 2Y5)/10.

E[Y1] = E[10x + 3x2 + ε] = (10)(-2) + (3)(-2)2 + 0 = -8. E[Y2] = -7. E[Y4] = 13. E[Y5] = 32.

E[β*] = -.2E[Y1] - .1E[Y2] + .1E[Y4] + .2E[Y5] = 10.Comment: Discussed in Section 7.3.1 of of Pindyck & Rubinfeld, no longer on the syllabus. Thus even though this question can be answered from first principles, you are unlikely to be asked it on your exam.

1.12. E. The fitted slope = .3 = Σtiyi /Σti2 = Σti(.1ti - zi + εi)/16 = .1 - Σtizi/16 + Σtiεi/16.

Since E[εi] = 0, and the errors are assumed to be uncorrelated with t, the expected value of the last term is zero. Therefore, we estimate that:.3 = .1 - Σtizi/16. ⇒ Σtizi = -3.2. Since Σt = 0, Cov[t , z] = Σtizi / n = -3.2/n.

Since Σt = 0, Var[t] = Σti2 / n = 16/n. Since Σz = 0, Var[z] = Σzi2 / n = 9/n.Corr[t , z] = (-3.2/n) / √((16/n)(9/n)) = -.267.Comment: Discussed in Section 7.3.1 of of Pindyck & Rubinfeld, no longer on the syllabus. Thus even though this question can be answered from first principles, you are unlikely to be asked it on your exam.

1.13. (i) (a) Squared Error is: (Yi - βXi)2.

Setting the derivative with respect to β equal to zero: -2Xi(Yi - βXi) = 0. ⇒ β = ΣXiYi/ΣXi2.

(b) E[^β1] = E[ΣXiYi/ΣXi2] = ΣE[XiYi/ΣXi2] =ΣE[Yi](Xi/ΣXi2) =ΣβXi(Xi/ΣXi2) = βΣXi2/ΣXi2 = β.

Var[^β1] = Var[ΣXiYi/ΣXi2] = ΣVar[XiYi/ΣXi2] = ΣVar[Yi](Xi/ΣXi2)2 = σ2ΣXi2/(ΣXi2)2 = σ2/ΣXi2.

(ii) (a) E[^β2 ] = E[ΣYi/ΣXi] = ΣE[Yi/ΣXi] =ΣE[Yi]/ΣXi =ΣβXi/ΣXi = βΣXi/ΣXi = β.

Var[^β2 ] = Var[ΣYi/ΣXi] = ΣVar[Yi/ΣXi] = ΣVar[Yi]/(ΣXi)2 = Σσ2/(ΣXi)2 = Nσ2/(ΣXi)2.

(b) Var[X] ≥ 0. ⇒ E[X2] ≥ E[X]2. ⇔ ΣXi2/N ≥ (ΣXi/N)2. ⇔ 1/ΣXi2 ≤ N/(ΣXi)2.

⇒ Var[^β1] ≤ Var[

^β2 ].

(iii) (a) E[^β3] = E[ΣaiYi] = ΣE[aiYi] =ΣE[Yi]ai =ΣβXiai = βΣaiXi.

Unbiased if and only if E[^β3] = β.⇔ ΣaiXi = 1.

Var[^β3] = Var[ΣaiYi] = ΣVar[aiYi] = ΣVar[Yi]ai2 = Σσ2ai2 = σ2Σai2.

(b) For ^β1, ai = Xi/ΣXi2. ΣaiXi = ΣXi2/ΣXi2 = 1.

For ^β2 , ai = 1/ΣXi. ΣaiXi = ΣXi/ΣXi = 1.

(c) Therefore, the least squares estimator, ^β1, has the smallest variance of any linear unbiased

estimator of β.


1.14. B. A model with no intercept, therefore for ordinary least squares:^β = ΣXiYi / ΣXi2 = (Y1 + 2Y2 + 3Y3)/14.

ε1 = Y1 - ^βX1 = Y1 - (Y1 + 2Y2 + 3Y3)/14 = (13Y1 - 2Y2 - 3Y3)/14.

E[ ε1] = 0. ⇒ E[ ε12] = Var[ ε1] = 132Var[Y1] + 22Var[Y2] + 32Var[Y3]/142 =

169Var[ε1] + 4Var[ε2] + 9Var[ε3]/196 = (169)(1) + (4)(9) + (9)(16)/196 = 349/196 = 1.78.

Comment: Since it is not stated otherwise, it is assumed the εi are independent.

Since the Var(εi) are not equal, this is an example of heteroscedasticity; ordinary least squares

is not efficient. ε2 = Y2 - ^βX2 = Y2 - (Y1 + 2Y2 + 3Y3)/7 = (5Y2 - Y1 - 3Y3)/7.

E[ ε2 ] = 0. E[ ε22] = Var[ ε2 ] = 52Var[Y2] + Var[Y1] + 32Var[Y3]/72 =

25Var[ε2] + Var[ε1] + 9Var[ε3]/49 = (25)(9) + 1 + (9)(16)/49 = 370/49 = 7.55.

ε3 = Y3 - ^βX3 = Y3 - 3(Y1 + 2Y2 + 3Y3)/14 = (5Y3 - 3Y1 - 6Y2)/14.

E[ ε3 ] = 0. E[ ε32] = Var[ ε3 ] = 52Var[Y3] + 32Var[Y1] + 62Var[Y3]/142 =

25Var[ε3] + 9Var[ε1] + 36Var[ε2]/196 = (25)(16) + (9)(1) + (36)(9)/196 = 733/196 = 3.74.

1.15. D. Let Vi = 1 + Xi. Then Yi = βVi + εi, a line without an intercept.

Therefore, the least-squares estimate of β is: ΣViYi / ΣVi2 = ΣΣΣΣ(1 + Xi)Yi / ΣΣΣΣ(1 + Xi)2.

Alternately, the squared error is: Σ(Yi - β - βXi)2 =

ΣYi2 + Nβ2 + β2ΣXi2 - 2βΣYi + 2β2ΣXi - 2βΣXiYi.

Set the derivative with respect to β equal to zero:

0 = 2Nβ + 2βΣXi2 - 2ΣYi + 4βΣXi - 2ΣXiYi.

⇒ β = (ΣYi + ΣXiYi)/(N + ΣXi2 + 2ΣXi) = ΣΣΣΣ(1 + Xi)Yi / ΣΣΣΣ(1 + Xi)2.

2.1. C. and 2.2. E. X = 24/4 = 6. x = X - X = -6, -2, 2, 6. Y = 3589/4 = 897. y = Y - Y = -63, -8, 19, 53. Σxiyi = 750. Σxi2 = 80.

^β = Σxiyi /Σxi2 = 750/80 = 9.375. α = Y -

^βX = 897 - (9.375)(6) = 841.

2.3. A. Sample covariance of X and Y = Σxiyi /(N-1) = -413.

sample variance of X = Σxi2/(N-1) = 512. ^β = Σxiyi /Σxi2 = Cov[X, Y]/Var[X] = -413/512 = -.807.

2.4. C. X = 3. x = X - X = -2, -1, 0, 1, 2. Y = 1914/5 = 383. y = Y - Y = -181, -62, 21, 97, 124. Σxiyi = 769. Σxi2 = 10.

^β = Σxiyi /Σxi2 = 769/10 = 76.9. α = Y -

^βX = 383 - (76.9)(3) = 152.

The predicted value of Y, when X = 7 is: α + 7^β = 152 + (7)(76.9) = 691.


2.5. B. ^β = Σxiyi /Σxi2 = rsY/sX = (0.6)(8/6) = 0.8.

2.6. C. X = 2. x = X - X = -3, -1, 1, 3. Y = 5. y = Y - Y = -2, -1, 2, 1. Σxiyi = 6. Σxi2 = 20.

^β = Σxiyi /Σxi2 = 12/20 = .6. α = Y -

^βX = 5 - (.6)(2) = 3.8.

The predicted value of Y, when X = 6 is: α + 6^β = 3.8 + (6)(.6) = 7.4.

2.7. B. X = (1 + 2 + 3 + 4 + 5)/5 = 3. xi = Xi - X.

Y = (82 + 78 + 80 + 73 + 77)/5 = 78. yi = Yi - Y .

X Y x y xy x2

1 82 -2 4 -8 42 78 -1 0 0 13 80 0 2 0 04 73 1 -5 -5 15 77 2 -1 -2 4^β = Σxiyi/Σxi2 = -15/10 = -1.5. α = Y -

^βX = 78 - (-1.5)(3) = 82.5.

Forecast for year 7 is: α + ^β7 = 82.5 + (-1.5)(7) = 72.

2.8. X = 55. x = -10, -5, 0, 5, 10. Y = 63.6. y = -20.6, -5.6, -.6, 12.4, 14.4.

Σxi2 = 250. Σxiyi = 440. ^β = 440/250 = 1.76. α = Y -

^βX = -33.2.

2.9. A. X = 21/7 = 3. x = X - X = -1, 0, 0, 1, -1, 1, 0. Y = 448/7 = 64. y = Y - Y = -14, -1, -8, 2, -4, 18, 7. Σxiyi = 38. Σxi2 = 4.

^β = Σxiyi /Σxi2 = 38/4 = 9.5. α = Y -

^βX = 64 - (9.5)(3) = 35.5.

Estimated salary of an actuary with 5 exams is: 35.5 + (5)(9.5) = 83. Comment: The estimated salary of an actuary with 8 exams is: 35.5 + (8)(9.5) = 111.5.However, one should be cautious about using forecasts for values significantly outside the range of the data, such as 8 exams in this case.

2.10. D. X = 13/10 = 1.3. x = X - X = -1.3, -1.3, -1.3, -1.3, -.3, -.3, -.3, .7, 1.7, 3.7. Y = 290/10 = 29. y = Y - Y = -19, -29, 14, -29, 6, -29, 51, -29, 29, 35. Σxiyi = 232. Σxi2 = 24.1.

^β = Σxiyi /Σxi2 = 232/24.1 = 9.627. α = Y -

^βX = 29 - (9.627)(1.3) = 16.485.

Estimated losses for a taxi driver with 4 moving violations is: 16.485 + (4)(9.627) = 55.0.


2.11. C. ^β = NΣXiYi - ΣXiΣYi / NΣXi2 - (ΣXi)2 =

(26)(204,296) - (351)(15,227) /(26)(6201) - 3512 = -32981/38025 = -.86735.

α = Y - ^β X = 15,227/26 - (-.86735)(351/26) = 597.363.

îY = 597.4 - 0.867Xi.

Alternately, Σxi2 = Σ(Xi - X)2 = ΣXi2 - N X2 = ΣXi2 - ΣXi2/N = 6201 - 3512/26 = 1462.5.

Σxiyi = Σ(Xi - X)(Yi - Y ) = ΣXiYi - NX Y = ΣXiYi - ΣXiΣYi/N = 204,296 - (351)(15,227)/26 = -1268.5.^β = Σxiyi / Σxi2 = -1268.5/1462.5 = -.86735.

α = Y - ^β X = 15,227/26 - (-.86735)(351/26) = 597.363.

Comment: Similar to CAS3, 5/05, Q.27.

2.12. If each Xi is multiplied by 3.28 then so is X, and in deviations form so is each xi.

If each Yi is multiplied by 116 then so is Y , and in deviations form so is each yi.

Therefore, ^β = Σxiyi/Σxi2 is multiplied by (116)(3.28)/3.282 = 116/3.28 = 35.37.

The new fitted slope is: (35.67)(2.4) = 84.9.

α = Y - ^βX. Y is multiplied by 116, and

^βX is multiplied by (116/3.28)(3.28) = 116.

Thus the new α is multiplied by 116.The new fitted intercept is: (116)(37) = 4292.New fitted model is: Y = 4292 + 84.9X.

Comment: In general, if X is multiplied by a constant, ^β is divided by that constant.

In general, if Y is multiplied by a constant, α and ^β are each multiplied by that constant.

2.13. D. X = (1 + 2 + 3 + 4 + 5)/5 = 3. xi = Xi - X.

x = (-2, -1, 0, 1, 2). Σxi2 = 10.

ΣxiYi = (-2)(3.18) + (-1)(3.12) + (0)(3.30) + (1)(3.39) + (2)(3.41) = 0.73.

^β = ΣxiYi/Σxi2 = 0.73/10 = 0.073.

Y = (3.18 + 3.12 + 3.30 + 3.39 + 3.41)/5 = 3.28.

α = Y - ^βX = 3.28 - (.073)(3) = 3.061.

Forecast for year 7 is: α + ^β7 = 3.061 + (.073)(7) = 3.572%.

Comment: I have used the shortcut, which avoids calculating yi = Yi - Y .


2.14. A. X = (142 + 146 + 156 + 163 + 170 + 177)/6 = 159.x = (-17, -13, -3, 4, 11, 18). Σxi2 = 928. ΣxiYi = -248.

^β1 = ΣxiYi/ Σxi2 = - 248/928 = -.267.

^β0 = Y -

^β1 X = 46.83 - (-.267)(159) = 89.3.

2.15. If c is added to each Xi, then X is also increased by c, and in deviations form each xi is

the same. Therefore, ^β = Σxiyi/Σxi2 remains the same. α = Y -

^βX, decreases by c

^β.

If c is added to each Yi, then Y is also increased by c, and in deviations form each yi is the

same. Therefore, ^β = Σxiyi/Σxi2 remains the same. α = Y -

^βX, increases by c.

2.16. D. X = 5/10 = .5. xi = Xi - X = (-.5, -.5, -.5 , -.5 , -.5 , .5 , .5, .5, .5, .5).

^β = ΣxiYi/Σxi2 = 0.5/2.5 = .2. Y = 5/10 = .5. α = Y -

^βX = .5 - (.2)(.5) = .4.

Estimated future claim frequency for an insured with 1 claim is: α + ^β1 = .4 + .2 = 0.6.

Comment: See “A Graphical Illustration of Experience Rating Credibilities,” by Howard C. Mahler, PCAS 1998.

2.17. B. ^β = NΣXiYi - ΣXiΣYi / NΣXi2 - (ΣXi)2 =

(62)(4002) - (153)(727) /(62)(2016) - 1532 = 1.3476.

α = Y - ^β X = 727/62 - (1.3476)(153/62) = 8.400.

îY = 8.400 + 1.3476Xi.

8.400 + (1.3476)(20) = 35.35.

2.18. D. ΣXi = 850 + (2)(50) = 950. ΣXi2 = 850 + (22)(50) = 1050.

ΣYi = 858 + (2)(62) = 982. ΣXiYi = 100 + (8)(2) + (10)(2) + (2)(4) = 144.

^β = NΣXiYi - ΣXiΣYi / NΣXi2 - (ΣXi)2 =

(10000)(144) - (950)(982) /(10000)(1050) - 9502 = 0.0528.

α = Y - ^β X = 0.0982 - (0.0528)(0.0950) = 0.0932.

0.0932 + (2)(0.0528) = 0.1988.Comment: Related to the ideas behind Buhlmann Credibility, covered on Exam 4/C. See “A Graphical Illustration of Experience Rating Credibilities,” by Howard Mahler, PCAS 1998.


2.19. C. X = 8/3. x = X - X = -5/3, 1/3, 4/3.Let Y2 = 3z. Then Y = z + 7/3 and y = Y - Y = -1/3 - z, 2z - 7/3, 8/3 - z.

^β = Σxiyi /Σxi2 = (30 + 9z)/ 42. α = Y -

^βX = z + 7/3 - (8/3)(30 + 9z)/ 42 = z3/7 + 3/7.

But we are given α = 5/7. Therefore, z3/7 + 3/7 = 5/7. ⇒ Y2 = 3z = 2.

Alternately, minimizing the squared error Σ(Yi - α − βXi)2

results in two equations in two unknowns: αN + βΣXi = ΣYi αΣXi + βΣXi2 = ΣXiYi .

N = 3, ΣXi = 8, ΣXi2 = 26, ΣYi = 7 + Y2, ΣXiYi = 22 + 3Y2. α = 5/7.

Therefore, 15/7 + 8β = 7 + Y2, and 40/7 + 26β = 22 + 3Y2. Multiplying the first equation by 13 and subtracting the second equation times 4:35/7 = 3 + Y2. ⇒ Y2 = 2.Comment: Given an output and asked to solve for the missing input.

You can check your work by taking Y2 = 2. And then getting α= 5/7 and ^β = 6/7.

2.20. B. Σ xi2 = (9)(4) = 36. Σ yi2 = (9)(64) = 576.

-0.98 = r = Σ xiyi / √ Σxi2Σyi2 ⇒ Σ xiyi = (-.98)√(36)(576) = -141.12.

^β = Σ xiyi / Σ xi2 = -141.12/36 = -3.92.

α = Y - ^βX = 10 - (2)(-3.92) = 17.84.

For X = 5, ^Y = 17.84 - (5)(3.92) = -1.76.

Alternately, sX = √4 = 2. sY = √64 = 8. ^β = rsY/sX = (-.98)(8/2) = -3.92. Proceed as before.

2.21. α = Y - ^βX. ⇒ α +

^βX = Y . ⇒ ( X , Y ) is on the fitted line, Y = α+

^βX.

2.22. In deviations form, Σxiyi = Σxi(Yi - Y ) = ΣxiYi - YΣxi = ΣxiYi - Y0 = ΣxiYi.

Therefore, ^β = ΣxiYi /Σxi2.

xi = (-1, 0, 1). Σxi2 = 2. ΣxiYi = (-1)(0) + (0)(y) + (1)(2) = 2. ^β = 2/2 = 1.

Alternately, ^β = NΣXiYi - ΣXiΣYi / NΣXi2 - (ΣXi)2 = (3)(y + 4) - (3)(2 + y)/(3)(5) - 32 = 1.


2.23. E. ^β = NΣXiYi - ΣXiΣYi / NΣXi2 - (ΣXi)2 =

(12)(26,696) - (144)(1,742) /(12)(2,300) - 1442 = 69504/6864 = 10.126.

α = Y - ^β X = 1742/12 - (10.126)(144/12) = 23.655.

îY = 23.66 + 10.13Xi.

Alternately, Σxi2 = Σ(Xi - X)2 = ΣXi2 - N X2 = ΣXi2 - ΣXi2/N = 2300 - 1442/12 = 572.

Σxiyi = Σ(Xi - X)(Yi - Y ) = ΣXiYi - NX Y = ΣXiYi - ΣXiΣYi/N = 26696 - (144)(1742)/12 = 5792.

^β = Σxiyi / Σxi2 = 5792/572 = 10.126. α = Y -

^β X = 1742/12 - (10.126)(144/12) = 23.655.

2.24. A. ^β = NΣXiYi - ΣXiΣYi / NΣXi2 - (ΣXi)2 =

(7)(599.5) - (315)(12.8) /(7)(14,875) - 3152 = 164.5/4900 = 0.03357.

α = Y - ^β X = 12.8/7 - (0.03357)(315/7) = 0.3179.

îY = 0.3179 + 0.03357Xi.

The predicted price of oil for X = 75 is: 0.3179 + (0.03357)(75) = $2.836.

3.1. E. The residuals always sum to zero. ⇒ The final residual must be: -(12 - 4 - 9 + 6) = -5.

ESS = Σ îε 2 = (122 + 42 + 92 + 62 + 52) = 302.

3.2. C. X = 2.5. x = -1.5, -.5, .5, 1.5. Y = 46.25. y = -16.25, -6.25, 8.75, 13.75.

Σxi2 = 5. Σxiyi = 52.5. ^β = Σxiyi /Σxi2 = 10.5. α = Y -

^β X = 20.

îY = 20 + 10.5Xi = 30.5, 41, 51.5, 62. ^

iε = Yi - ^

iY = -.5, -1, 3.5, -2. ESS = Σîε 2 = 17.5.

3.3. B. The residuals add to zero. ⇒ ^5ε = -1.017 + .409 - .557 - 2.487 = 1.618.

ε and X have a correlation of zero. ⇒ E[ εX] - E[ ε ]E[X] = 0. But E[ ε ] = 0. ⇒ E[ εX].

0 = ΣXiîε = (7)(1.017) + (12)(.409) + (15)(-.557) + (21)(-2.487) + X5(1.618). ⇒ X5 = 30.

3.4. A. The first four residuals are: (13, 25, 36, 40) - (18.036, 22.989, 30.419, 40.325) = -5.036, 2.011, 5.581, -.325.

The residuals add to zero. ⇒ ^5ε = --5.036 + 2.011 + 5.581 - .325 = -2.231.

ε and ^

iY - Y have a correlation of zero. But E[ ε ] = 0. ⇒ E[ ε (^

iY - Y )] = 0.

0 = Σ(^

iY - Y ) îε = Σ ^

iY îε - YΣ^

iε = Σ îY ^

iε =

(18.036)(-5.036) + (22.989)(2.011) + (30.419)(5.581) + (40.325)(-.325) + ^5Y ( -2.231). ⇒

^5Y = 50.231. Y5 = ^

5ε + ^5Y = -2.231 + 50.231 = 48.


4.1. TSS has N - 1 = 24 degrees of freedom.RSS has k - 1 = 2 - 1 = 1 degree of freedom.ESS has N - k = 25 - 2 = 23 degrees of freedom.

4.2. B. & 4.3. D. ESS = TSS - RSS = 1230 - 1020 = 210.1020 / (# degrees of freedom for RSS) = 255. ⇒ # d.f. for RSS = 1020/255 = 4.k - 1 = 4. ⇒ k = 5.210 / (# degrees of freedom for ESS) = 7. ⇒ # d.f. for ESS = 210/7 = 30.N - k = 30. ⇒ N = 35.

4.4. TSS has N - 1 = 49 degrees of freedom.RSS has k - 1 = 4 - 1 = 3 degrees of freedom.ESS has N - k = 50 - 4 = 46 degrees of freedom.Comment: Note that: 3 + 46 = 49. This multivariable regression model would be written as:

^Y = β1 + β2X2 + β3X3 + β4X4, where X2, X3 and X4 are the three independent variables.

4.5. ^β = Σ xiyi / Σ xi2 = 3640.5/253024 = .0144.

α = Y - ^βX = 7.663 - (.0144)(199.7) = 4.79.

X Y x y x^2 x y y^2

0 4.90 - 1 9 9 . 7 - 2 . 7 6 3 3 9 8 8 2 - 9 7 8 . 6 7 .632 5 7.41 - 1 7 4 . 7 - 0 . 2 5 3 3 0 5 2 2 - 1 2 9 4 . 6 0 .065 0 6.19 - 1 4 9 . 7 - 1 . 4 7 3 2 2 4 1 2 - 9 2 6 . 7 2 .177 5 5.57 - 1 2 4 . 7 - 2 . 0 9 3 1 5 5 5 2 - 6 9 4 . 6 4 .38

1 0 0 5.17 - 9 9 . 7 - 2 . 4 9 3 9 9 4 1 - 5 1 5 . 5 6 .211 2 5 6.89 - 7 4 . 7 - 0 . 7 7 3 5 5 8 1 - 5 1 4 . 7 0 .601 5 0 7.05 - 4 9 . 7 - 0 . 6 1 3 2 4 7 1 - 3 5 0 . 4 0 .381 7 5 7.11 - 2 4 . 7 - 0 . 5 5 3 6 1 0 - 1 7 5 . 7 0 .312 0 0 6.19 0.3 - 1 . 4 7 3 0 1.8 2 .172 2 5 8.28 25.3 0 .617 6 4 0 209.4 0 .382 5 0 4.84 50.3 - 2 . 8 2 3 2 5 2 9 243.4 7 .972 7 5 8.29 75.3 0 .627 5 6 6 9 624.2 0 .393 0 0 8.91 100.3 1 .247 1 0 0 5 9 893.6 1 .563 2 5 8.54 125.3 0 .877 1 5 6 9 9 1070.0 0 .773 5 0 11.79 150.3 4.127 2 2 5 8 8 1772.0 17.033 7 5 12.12 175.3 4.457 3 0 7 2 8 2124.6 19.873 9 5 11.02 195.3 3.357 3 8 1 4 0 2152.1 11.27

Sum 3 3 9 5 130.27 - 0 . 0 0 .000 2 5 3 0 2 4 3640.5 83.15Avg. 199.7 7 .663

Graph of the data and the fitted line:


100 200 300 400X

5

6

7

8

9

10

11

12

Y

For example, for X =100, the estimated Y is: 4.79 + (.0144)(100) = 6.23.For X =100, the observed Y is 5.17.Therefore, the corresponding residual is: 5.17 - 6.23 = -1.06.Graph of the residuals:

100 200 300 400X

-3

-2

-1

1

2

Residual

TSS = Σ yi2 = 83.15.

RSS = Σ ( îY - Y )2 = (4.79 - 7.663)2 + (5.15 - 7.633)2 + ... + (10.478 - 7.633)2 = 52.38.

Alternately, RSS = ^βΣxiyi = (.0144)(3640.5) = 52.4.

ESS = Σ îε 2 = Σ (Yi -

îY )2 = (4.90 - 4.79)2 + (7.41 - 5.15)2 + ... + (11.02 - 10.478)2 = 30.77.

TSS has degrees of freedom: N - 1 = 17 - 1 = 16.RSS has degrees of freedom: k - 1 = 2 - 1 = 1.


ESS has degrees of freedom: N - k = 17 - 2 = 15.Source of Variation Sum of Squares Degrees of FreedomModel (RSS) 52.38 1Error (ESS) 30.77 15 Total (TSS) 83.15 16 Comment: Once one has TSS and either RSS or ESS, one can get the other one by subtraction. For example, ESS = TSS - RSS = 83.15 - 52.38 = 30.77.

4.6. E. RSS = Σ(^

iY - Y )2 = Σ( α + ^βXi - Y )2 = 49.

TSS = Σ(Yi - Y )2 = (10 - 1)(Sample Variance of Y) = (9)(8) = 72.

Σ( α + ^βXi - Yi )2 = Σ(

îY - Yi)2 = ESS = TSS - RSS = 72 - 49 = 23.

4.7. ^β = NΣXiYi - ΣXiΣYi / NΣXi2 - (ΣXi)2 = (20)(167) - (42)(76)/(20)(101) - 422 = 0.578.

4.8. α = Y - ^βX = 76/20 - (0.578)(42/20) = 2.586.

4.9. TSS = Σ(Yi - Y )2 = ΣYi2 - (ΣYi)2/N = 310 - 762/20 = 21.2.

4.10. RSS = ^β2Σxi2 =

^β2ΣXi2 - (ΣXi)2/N = (0.5782)(101 - 422/20) = 4.28.

4.11. ESS =TSS - RSS = 21.2 - 4.28 = 16.92.

4.12. B. X = 1.5. x = - 1.5, -.5, .5, 1.5. Y = (2, 3.5, 5, 6.5) + (1, 1.5, -2, 0.5) = 3, 5, 3, 7. Y = 4.5. y = - 1.5, .5, -1.5, 2.5.^β = Σxiyi/Σxi2 = 5/5 = 1. RSS =

^βΣxi2 = (5)(1) = 5.

TSS = Var[Y] =Σyi2 = 11. Σ (Yi - ^

iY )2 = ESS = TSS - RSS = 11 - 5 = 6.

Alternately, α = Y - ^βX = 4.5 - (1)(1.5) = 3.

îY = 3 + Xi = (3, 4, 5, 6).

Σ (Yi - ^

iY )2 = (3 - 3)2 + (5 - 4)2 + (3 - 5)2 + (7 - 6)2 = 6.Comment: The true values are on the line 1.5X + 2. The estimated slope of 1.0 is not equal to the true slope of 1.5 due to the random error terms contained in the observations of Y.

5.1. E. R2 = RSS/TSS = RSS/(RSS + ESS) = 124/(124 + 21) = .855.


5.2. A. ^β = Σ xiyi / Σ xi2 = 7/10 = 0.7. α = Y -

^βX = 2 - (0.7)(3) = -0.1.

X Y x y x^2 x y y^2

1 1 - 2 - 1 4 - 2 12 1 - 1 - 1 1 - 1 13 2 0 0 0 0 04 2 1 0 1 2 05 4 2 2 4 8 4

Sum 1 5 1 0 0 0 1 0 7 6Avg. 3 2

TSS = Σ yi2 = 6. RSS = ^βΣxiyi = (.7)(7) = 4.9. R2 = RSS/TSS = 4.9/6 = .817.

5.3. ^β = r sY/sX. ^

δ = r sX/sY. ^β ^δ = r2.

Comment: Therefore, the product of the two fitted slopes is between 0 and 1.^β ^δ have the same sign. If

^β = ^

δ = 1, then r = 1. If ^β = ^

δ = -1, then r = -1For the 2-variable model, R2 is the square of the sample correlation of X and Y,

and thus R2 = ^β ^δ.

For the heights example, when one regresses X = heights of fathers and Y = heights of sons, the fitted slope is 0.6254. When instead one regresses X = heights of sons and Y = heights of fathers, the fitted slope is 1.383. (0.6254)(1.383) = 0.865 = 0.9302 = square of the correlation between the heights of the fathers and sons.

5.4. A. For the 2-variable model, R2 = ^β2 Σxi2 /Σyi2 = 17.252 (37/20019) = .550.

5.5. D. X = 2. x = -2, -1, 0, 3. ΣxiYi = (-2)(2) + (-1)(5) + (0)(11) + (3)(18) = 45.

Σxi2 = 14. ^β = 45/14 = 3.214. Y = 9. α = 9 - (3.214)(2) = 2.572.

X Y Fitted Model E r r o r Squared Error

0 2 2.572 0.572 0.3271 5 5.786 0.786 0.6182 1 1 9.000 - 2 . 0 0 0 4.0005 1 8 18.642 0.642 0.412

Sum 5.357

ESS = 5.357. TSS = Σyi2 = (2 - 9)2 + (5 - 9)2 + (11 - 9)2 + (18 - 9)2 = 150.

R2 = 1 - 5.357/150 = 96.43%.


5.6. A. Let V = X - Y = -2, -4, -9, -13.X = 2. x = -2, -1, 0, 3. ΣxiVi = (-2)(-2) + (-1)(-4) + (0)(-9) + (3)(-13) = -31.

Σxi2 = 14. ^β = -31/14 = -2.214. V = -7. α = -7 - (-2.214)(2) = -2.572.

X V Fitted E r r o r Squared Model E r r o r

0 - 2 - 2 . 5 7 2 - 0 . 5 7 2 0.3271 - 4 - 4 . 7 8 6 - 0 . 7 8 6 0.6182 - 9 - 7 . 0 0 0 2.000 4.0005 - 1 3 - 1 3 . 6 4 2 - 0 . 6 4 2 0.412

Sum 5.357

ESS = 5.357. TSS = Σvi2 = (-2 + 7)2 + (-4 + 7)2 + (-9 + 7)2 + (-13 + 7)2 = 74.

R2 = 1 - 5.357/74 = 92.76%.Comment: Restate the model in this question: Yi = -α + (1 - β)Xi - εi.Therefore, the intercept in this question is minus the intercept in the previous question.Also the slope in this question plus the slope in the previous question sum to one.The errors in this question are minus those in the previous question; thus the ESS is the same for the two models. However, the TSS are different. Therefore R2 for the two models are different, even though the two models contain the same information. See VEE-Applied Statistics Exam, 8/05, Q.9.

5.7. D. For the 2-variable model, R2 = (Σxiyi)2/ Σxi2Σyi2 = 10222 /(1660)(899) = .700.

5.8. A. Since for a model with an intercept, the residuals sum to zero, the fifth residual must

be 0.6. ESS = Σ îε 2 = .42 + .32 + 02 + .72 + .62 = 1.1.

TSS = Σ(Yi - Y )2 = 1.5(N - 1) = (1.5)(5 - 1) = 6. R2 = 1 - ESS/ TSS = 1 - 1.1/6 = .817.

5.9. t X Y^Y = -25 +20X

^Y - Y

1 2 10 15 -10 2 2 20 15 -10 3 3 30 35 10 4 3 40 35 10

Y = 25. TSS = Σ(Yt - Y )2 = 152 + 52 + 52 + 152 = 500.

RSS = Σ(^tY - Y )2 = 100 + 100 + 100 + 100 = 400. R2 = RSS/TSS = 400/500 = .80.

Alternately, ESS = Σ(Yt - ^tY ) = (-5)2 + 52 + (-5)2 + 52 = 100. R2 = 1 - ESS/TSS = .80.


5.10. (a) α is in the same units as Y, which is in dollars (or other monetary unit such as pounds, yen, or euros.)(b) β is in the same units as Y/X, which is in dollars per year.(c) R2 is a dimensional quantity, a pure number without units.Comment: R is a correlation coefficient, a pure number without units.

5.11. ^

iY = α + ^βXi = Y +

^β(Xi - X). ⇒ The mean of

îY is Y .

Cov[Yi, ^

iY ] = Σ(Yi - Y )(^

iY - Y ) = Σ(Yi - Y )^β(Xi - X) =

^βΣ(Yi - Y )(Xi - X) =

^βCov[Xi, Yi].

Var[^

iY ] = Σ(^

iY - Y )2 = Σ^β(Xi - X)2 =

^β2Var[Xi].

Corr[Yi, ^

iY ] = ^βCov[Xi, Yi]/√(Var[Yi]

^β2Var[Xi]) = Cov[Xi, Yi]/√(Var[Yi]Var[Xi]) = Corr[Xi, Yi] = r.

5.12. E. RSS = Σ( îY - Y )2 = Σ ^

β(Xi - X)2 = ^β2Σxi2.

R2 = RSS/TSS = ^β2Σxi2/Σyi2 = (2.0652)(42)/182 = .984.

Comment: See page 73 of Pindyck and Rubinfeld.

5.13. A. Fk-1,N-k = RSS/(k-1)/ESS/(N-k) = TSS R2/(k-1)/TSS(1 - R2)/(N-k) =

R2/(1 - R2)(N-k)/(k-1), Equation 4.12 in Pindyck and Rubinfeld, so statement A is true. However, statement A is not an objection to the use of R2 to compare the validity of regression results. Statement B is true, see page 89 of Pindyck and Rubinfeld. The corrected R2 solves this problem with R2. This is related to the principle of parsimony; one does not want to use more variables than are needed to get the job done. Statement C is true, see page 89 of Pindyck and Rubinfeld. Generally, R2 is not applied to models without an intercept. Statement D is true, see page 92 of Pindyck and Rubinfeld. Thus R2 is to some extent an artifact of how the model relationship is stated, rather than an inherent feature of the model. Statement E is true, see page 89 of Pindyck and Rubinfeld.

5.14. D. Restate the second model: Yi = -α2 + (1 - β2)Xi - ε2i.

Therefore, α2 = - α1 and ^β1 = 1 -

^β2. Therefore, statements A and B are true.

Models I and II contain the same information; ε2i = -ε1i. Σε1i2 = Σε2i2. Statement C is true.Thus the ESS is the same for the two models. However, the TSS are different.For model one, TSS = sum of (Y - Y )2. For model two, TSS = sum of (X - Y - X + Y)2.R2 = 1 - ESS/TSS. Therefore R2 for the two models are different. Statement D is false.

6.1. A. R2 = RSS/TSS = 335.2/437.7 = .766.

6.2. C. 1 - 2

R = (1 - R2)(N - 1)/(N - k) = 1 - (1 - .766)(9/6) = .351. 2

R = .649.

Alternately, 2

R = 1 - (ESS/(N - k))/(TSS/(N - 1)) = 1 - (102.5/6)/(437.7/9) = .649.


6.3. A. TSS = (N - 1)(sample variance of Y) = (14)(.0103) = .1442. R2 = 1 - ESS/TSS.

1 - 2

R = (1 - R2)(N - 1)/(N - k), where k is the number of variables including the intercept.Model k ESS TSS R2 corrected R2

I 3 0.0131 0.1442 0.909 0.894I I 3 0 .0123 0.1442 0.915 0.900I I I 3 0 .0144 0.1442 0.900 0.883IV 4 0.0115 0.1442 0.920 0.898V 4 0.0117 0.1442 0.919 0.897VI 4 0.0118 0.1442 0.918 0.896VII 5 0 .0106 0.1442 0.926 0.897

Model II has the largest 2

R .Comment: Based very loosely on “Workers Compensation and Economic Cycles: A Longitudinal Approach”, by Hartwig, Retterath, Restrepo, and Kahley, PCAS 1997.

6.4. R2 follows a Beta Distribution as per Loss Models with a = ν1/2 = (k - 1)/2 = (3 - 1)/2 = 1,

b = ν2/2 = (N - k)/2 = (11 - 3)/2 = 4, and θ = 1.

The density is: (a + b - 1)! / ((a-1)! (b-1)!) (x/θ)a-1 (1 - x/θ)b-1 /θ, 0 ≤ x ≤ θ,

which is: 4(1 - x)3, 0 ≤ x ≤ 1.

Mean is: θa/(a+b) = 1/(1 + 4) = 0.20.Variance is: θ2ab / (a + b)2 (a + b + 1) = (1)(4)/(1 + 4)2(1 + 4 + 1) = 0.0267.Comment: Here is a graph of this density:

0.2 0.4 0.6 0.8 1R2

1

2

3

4Prob.

Since a ≤ 1, and b > 1, the mode is zero.


6.5. B. Σxi2 = Σ(Xi - X) = (sample variance of X)(18), is the same for both regressions.

TSS = Σyi2 = Σ(Yi - Y ) = (sample variance of Y)(18), is the same for both regressions.

Σxiyi = Σ(Xi - X)(Yi - Y ) is larger for the first regression.

⇒ ^β = Σxiyi/ Σxi2 is larger for the first regression than it is for the second regression.

R2 = correlation2 = (Σxiyi)2/(Σxi2Σyi2), is larger for the first regression than it is for the second.

⇒ 2

R = 1 - (1 - R2)(N - 1)/(N - k) = 1 - (1 - R2)18/17, is larger for the first regression. ESS = (1 - R2)TSS. Since R2 is bigger for the first and TSS is the same, ESS is smaller for the first regression. ⇒ s2 = ESS/(N-k) = ESS/17 is smaller for the first regression.

α could be either bigger, the same, or smaller for the first regression.Since we are not given X and Y , there is no way to determine which of these is the case.

6.6. C. R2 = 1 - (1 - 2

R )(N - k)/(N - 1).

6.7. B. The original model is a special case of a model that includes the additional variable, but with the coefficient corresponding to that variable equal to zero. Therefore, when an additional independent variable is added, and a least squares regression is fit, the match between the model and the data is as good or better than it was. Therefore, the regression sum

of squares, RSS = Σ(^

iY - Y )2, either is the same or increases. (RSS almost always increases.) Since the data has remained the same, TSS = Σ(Yi - Y )2, remains the same. Therefore, ESS = TSS - RSS, remains the same or decreases. (ESS almost always decreases.) Therefore, R2 = RSS/TSS, either is the same or increases.(R2 almost always increases.)

2R = 1 - (1 - R2)(N - 1)/(N - k). When we add a variable, N - k is one less, while R2 usually

increases. The former reduces 2

R , while the latter increases 2

R . The net effect is that 2

R can either increase, decrease, or stay the same.Comment: For example, if one were modeling the annual frequency of claims for Workers Compensation Insurance in California, and the independent variable added to the model was the number of games won each year by the New York Yankees baseball team, one would

expect RSS to increase very little, resulting in a decline in 2

R , since this additional variable explains nothing significant about what is being modeled. The model including the irrelevant baseball data is inferior to the model without it.

6.8 . C. For the new fit, the expected value of 2

R is the same, 0.64.

0.64 = 2

R = 1 - (1 - R2)(N - 1)/(N - k) = 1 - (1 - R2)(9/8). ⇒ R2 = 0.680.

Comment: Unlike R2, 2

R has been adjusted for degrees of freedom, so its expected value remains the same.


6.9. R2 follows a Beta Distribution as per Loss Models with a = ν1/2 = (k - 1)/2 = (2 - 1)/2 = 1/2,

b = ν2/2 = (N - k)/2 = (3 - 2)/2 = 1/2, and θ = 1.

The density is: Γ[a + b] / (Γ[a] Γ[b]) (x/θ)a-1 (1 - x/θ)b-1 /θ, 0 ≤ x ≤ θ,

which is: 1/(π √x(1 - x)), 0 ≤ x ≤ 1.

Mean is: θa/(a+b) = .5/(.5 + .5) = 0.5.Variance is: θ2ab / (a + b)2 (a + b + 1) = (1/2)(1/2)/(1)2(2) = 0.125.Comment: The constant in front of the density, 1/π, follows from Γ[1/2] = √π, a fact for which you are not responsible. With this constant, the density integrates to one over its support (0, 1). Here is a graph of this density:

0.2 0.4 0.6 0.8 1R2

1

2

3

4

5

6

7Prob.

Since a ≤ 1, and b ≤ 1, the density is bimodal, with modes at zero and one.

6.10. A. R2 = RSS/TSS = 1115.11/1254.00 = .889.

1 - 2

R = (1 - R2)(N - 1)/(N - k) = 1 - (1 - .889)(7/5) = .155. 2

R = .845.

Alternately, 2

R = 1 - (ESS/(N - k))/(TSS/(N - 1)) = 1 - (138.89/5)/(1254/7) = .845.

6.11. E. 1 - 2

R = (1 - R2)(N -1)/(N - k) = (.15)(11 - 1)/(11 - 2) = .167. 2

R = .833.

6.12. A. R2 = RSS/TSS = RSS/(RSS + ESS) = 103658/(103658 + 69204) = .600.

1 - 2

R = (1 - R2)(N - 1)/(N - k) = (1 - .600)(48 - 1)/(48 - 4) = (.4)(47/44) = .427. 2

R = .573.Alternately, s2 = ESS/(N - k) = 69204/44 = 1572.8.Sample Variance of Y = Σ(Yi - Y )2/(N - 1) = TSS/(N - 1) = (103658 + 69204)/47 = 3677.9

2R = 1 - s2/Var[Y] = 1 - 1572.8/3677.9 = .572.

7.1. B. Φ(1.645) = .95, so ±1.645 standard deviations has 5% on each tail; it covers 90% probability.


7.2. A. Standardize each value by subtracting the mean and dividing by σ.Prob[6 ≤ x ≤ 12] = Φ((12 - 7)/4) - Φ((6 - 7)/4) = Φ(1.25) - Φ(-.25) = .8941 - .4014 = 49.3%.

7.3. E. The sum of the residuals is always zero, so they have a first moment of zero.Therefore, Variance = Second Central Moment = Second Moment.Therefore, Third Central Moment = E[(X - X)3] = E[(X - 0)3] = 3rd moment.Therefore, Fourth Central Moment = 4th moment.Skewness = Third Central Moment/Variance1.5 = Third Moment/Second Moment1.5.Kurtosis = Fourth Central Moment/Variance2 = Fourth Moment/Second Moment2.

2nd mom. 3rd mom. 4th mom. Skewness Kurtosis

A 5 2 8 0 0.18 3.20B 5 1 5 0 0.09 2.00C 5 0 1 0 0 0.00 4.00D 5 - 2 5 0 - 0 . 1 8 2.00E 5 - 1 7 0 - 0 . 0 9 2.80

For the Normal Distribution the skewness is 0 and the Kurtosis is 3.Regression E seems to most closely fit these criteria.

7.4. C. Φ(1.960) = .975, so ±1.960 standard deviations has 2.5% on each tail; it covers 95%

probability. ^β ± 1.960StdDev[

^β] = 13 ± (1.960)(3) = [7.12, 18.88].

7.5. A. 0 = E[L] = E[aX1 + 4X2 + bX3] = aµ + 4µ + bµ. ⇒ a + b = -4.

1 = Var[L] = Var[aX1 + 4X2 + bX3] = a2/24 + 16/24 + b2/24. ⇒ a2 + b2 = 8.⇒ a = -2, b = -2.

7.6. C. Φ(2.326) = .99. The variance of the mean is 10/n. Therefore a 98% confidence interval is: ± (2.326)√(10/n).Set the length equal to 3: 3 = (2)(2.326)√(10/n). ⇒ n = 24.04. ⇒ n = 25.

7.7. A. X is Normal with mean 30.4 and variance: 122/36 = 4.Y is Normal with mean 32.1 and variance: 142/49 = 4.X - Y is Normal with mean 30.4 - 32.1 = -1.7, and variance: 4 + 4 = 8.P[ X > Y ] = P[ X - Y > 0] = 1 - Φ((0 - (-1.7))/√8) = 1 - Φ(.60) = 27.4%.

7.8. C. For a 95% confidence interval we want ± 1.960 standard deviations.X has variance 12/n. So the length of the confidence interval is: (2)(1.960)√(12/n).5 = (2)(1.960)√(12/n). ⇒ n = 7.4. Since n is integer, take n = 8.Comment: For n = 7 the length is: (2)(1.960)√(12/7) = 5.13, too wide.

7.9. C. Prob[X2 < a] = Prob[X2 / a < 1] = Prob[|X|/√a < 1] = 1 - Prob[|X|/√a ≥ 1] = 1 - (Prob[X/√a ≤ -1] + Prob[X/√a ≥ 1]) = 1 - 2Prob[X/√a ≥ 1] = 1 - 2(1 - Φ(1)) = .6826.Comment: X/√a has a Unit Normal Distribution. X2/a has a Chi-Square Distribution with one degree of freedom.


7.10. D. Φ(1.645) = 95% = (1+90%)/2. X is Normal with variance: 144/16 = 9.90% confidence interval = 200 ± (1.645)√9 = (195.065, 204.935).

7.11. D. Let X1 and X2 denote the measurement errors of the two instruments.

(X1 + X2)/2 is Normal with mean 0 and variance: (0.0056h)2 + (0.0044h)2/4 =

.00001268h2. The standard deviation is: .003561h.Prob[-0.005h ≤ (X1 + X2)/2 ≤ 0.005h] = Φ(.005h/.003561h) - Φ(-.005h/.003561h) =

Φ(1.404) - Φ(-1.404) = 2Φ(1.404) - 1 = (2)(.9196) - 1 = 0.839.

7.12. B. The mean of this exponential is θ = 1000. Its variance is θ2 = 1 million.The sum of losses from 100 polices is approximately Normal with mean 100,000 and variance 100 million. The premium from those 100 polices is (100)(1000 + 100) = 110,000.Prob[claims > premium] ≅ 1 - Φ((100,000 - 110,000)/√100 million) = 1 - Φ(1) = .1587.

7.13. D. This uniform distribution has mean 0 and variance 52/12 = 25/12.The mean for 48 people has mean 0 and variance: (25/12)/48 = .04340.Prob[-.25 ≤ mean ≤ .25] = Φ(.25/√.04340) - Φ(-.25/√.04340) =Φ(1.2) - Φ(-1.2) = 2Φ(1.2) - 1 = (2)(.8849) - 1 = 0.770.

7.14. C. The average has a mean of 19,400 and standard deviation of 5000/√25 = 1000.Prob[average > 20,000] = 1 - Φ((20000 - 19400)/1000) = 1 - Φ(.6) = 1 - 0.7257 = 0.2743.

7.15. B. For n light bulbs, the total is Normal with mean 3n and variance n.Prob[sum ≥ 40] = 1 - Φ((40 - 3n)/√n) = .9772. ⇒ (40 - 3n)/√n = -2.0. ⇒ 40 - 3n = -2√n. ⇒ 3n - 2√n - 40 = 0. √n = (2 ± √484)/6 = 4 or - 10/3. ⇒ n = 16.

7.16. C. The sum of the contributions has mean: (2025)(3125) = 6,328,125 and standard deviation: (250)(√2025) = 11250. Φ(1.282) = .9. 90th percentile ≅ 6,328,125 + (1.282)(11250) = 6,342,548.

8.1. C. For i ≠ j, E[εi εj] = E[εi]E[εj] = (0)(0) = 0, since εi and εj are independent, each with mean

zero. However, E[εi εi] = Var[εi] + E[εi]2 = σ2, which is not necessarily equal to 1.Statements A, B, and D are true, where D is the definition of homoscedasticity. 8.2. Cov[X, Y] = Cov[X, α + βX + ε] = Cov[X, α] + Cov[X, βX] + Cov[X, ε]

= 0 + βCov[X, X] + 0 = βVar[X].Var[Y] = Var[α + βX + ε] = 0 + Var[βX] + Var[ε] = β2Var[X] + σ2.Corr[X, Y] = βVar[X]/√Var[X](β2Var[X] + σ2) = ββββ/√√√√ββββ2222 + σσσσ2/Var[X].Comment: I have made use of the fact that X and ε are independent, and Var[ε] = σ2.As σ → ∞, Corr[X, Y] → 0. As β → ∞, Corr[X, Y] → 1. As Var[X] → ∞, Corr[X, Y] → 1.


9.1. E. Heteroscedasticity does not produce biased or inconsistent estimators of the slopeparameters. It does produce inefficient estimators, unless corrected for. (See page 147 of Pindyck and Rubinfeld.) Serial correlation does not produce biased or inconsistent estimators of the slope parameters. It does produce inefficient estimators, unless corrected for. (See page 159 of Pindyck and Rubinfeld.) Comment: Based on Course 4 Sample Exam, question 40, but I have excluded those ideas which are no longer on the Syllabus.

9.2. E[X] = (.7)(0) + (.3)(10) = 3. E[X2] = (.7)(02) + (.3)(102) = 30. Var[X] = 30 - 32 = 21.Sample Probability Mean Sample Variance 0, 0 (.7)(.7) = .49 0 (0 - 0)2 + (0 - 0)2/(2 - 1) = 00, 10 (.7)(.3) = .21 5 (0 - 5)2 + (10 - 5)2/(2 - 1) = 5010, 0 (.3)(.7) = .21 5 (10 - 5)2 + (0 - 5)2/(2 - 1) = 5010, 10 (.3)(.3) = .09 10 (10 - 10)2 + (10 - 10)2/(2 - 1) = 0Expected value of the sample variance is: (.49)(0) + (.21)(50) + (.21)(50) + (.09)(0) = 21.Comment: Illustrating the general fact, that the sample variance, with N - 1 in the denominator, is an unbiased estimator of the variance of the distribution.

9.3. A. They are biased and inconsistent. They need not be efficient.

9.4. C. Cov[α, β] = Corr[α, β]√(Var[α]Var[β]) = (.6)√15 = 2.324.Var[γ] = Var[wα + (1-w)β] = w2Var[α] + (1-w)2Var[β] + 2w(1-w)Cov[α, β] =5w2 + 3(1-w)2 + 4.648w(1-w) = 3.352w2 - 1.352w + 3.Bias[γ] = Bias[wα + (1-w)α] = wBias[α] + (1-w)Bias[β] = (-1)w + (2)(1 - w) = 2 - 3w.MSE[γ] = Var[γ] + Bias[γ]2 = 3.352w2 - 1.352w + 3 + (2 - 3w)2 = 12.352w2 - 13.352w + 7.Setting the derivative with respect to w equal to zero:24.704w - 13.352 = 0. ⇒ w = 13.352/24.704 = .540. Comment: Bias[α] = E[α] - true value. Bias[β] = E[β] - true value. Bias[γ] = E[γ] - true value = wE[α] + (1-w)E[β] - true value = wBias[α] + (1-w)Bias[β].Here is a graph of the Mean Squared Error as a function of w:

0.2 0.4 0.6 0.8 1w

3.5

4

4.5

5

5.5

6

6.5

7MSE


9.5. B. They are unbiased and consistent. They are no longer efficient.

9.6. A. mean = (0)(.6) + (10)(.3) + (100)(.1) = 13.2nd moment = (02)(.6) + (102)(.3) + (1002)(.1) = 1030. variance = 1030 - 132 = 861.Σ(Xi - X)2/6 = (5/6)(Σ(Xi - X)2 /5) = (5/6)(sample variance).The sample variance is unbiased. ⇒ E[sample variance] = 861. ⇒ E[Σ(Xi - X)2/6] = (5/6)(861) = 717.5.Bias = expected value of the estimator - true value = 717.5 - 861 = -143.5. Comment: This estimator of the variance is biased downwards, since n is in the denominator rather than n - 1.

* 9.7. An example of specification error would be if we fit a Weibull Distribution, while the data was actually drawn from a Gamma Distribution. The fitted parameters will vary around the actual parameters, an example of parameter error. If we use the fitted model to predict the size of the next loss, there will be error due to the fact that the next loss is a random draw from a size of loss distribution; this is an example of process risk.

9.8. Y = 10 + 3X2 + 2X3 + ε. ⇒ Y1 = 10 + ε1. Y2 = 15 + ε2. Y3 = 22 + ε3.

In deviations form, x = (-1, 0, 1), and ^β = ΣxiYi/Σxi2 = (-Y1 + Y3)/2 = 6 + (ε3 - ε1)/2.

α = Y - ^βX = 15.667 + (ε1 + ε2 + ε3)/3 - 6 + (ε3 - ε1)/2(1) = 9.667 + (5ε1 + 2ε2 - ε3)/6.

E[ α] = 9.667 + E[(5ε1 + 2ε2 - ε3)/6] = 9.667.

E[^β] = 6 + E[(ε3 - ε1)/2] = 6.

Comment: Since a relevant variable has been omitted from the model, the least squares estimators of the coefficients are biased. Var[X2] = (0 - 1)2 + (1 - 1)2 + (2 - 1)2/3 = 2/3.Cov[X2 , X3] = (0)(0) + (1)(1) + (2)(3)/3 - (0 + 1 + 2)/3(0 + 1 + 3)/3 = 7/3 - (1)(4/3) = 1.

E[^β2 ] = β2 + β3Cov[X2 , X3]/Var[X2] = 3 + (2)(1)/(2/3) = 6.

9.9. D. λ is both the mean and the variance of the Poisson Distribution.X is an unbiased estimator of the mean and thus of λ.Estimator II is the sample variance, an unbiased estimator of the variance and thus of λ.E[2X1 - X2] = 2E[X1] - E[X2] = 2λ - λ = λ. Thus estimator III is unbiased.

9.10. A. 1. T. Definition of Unbiased. 2. False. For an asymptotically unbiased estimator, the average error goes to zero as n goes to infinity. However, there can still be a large chance of a large error, as long as the errors are of opposite sign. For example, a 50% chance of an error of +1000 and a 50% chance of an error of -1000, gives an average error of 0. 3. False.


9.11. C. 1. False. An unbiased estimator would have its expected value equal to the true value, but 2 ≠ 2.05. 2. False. MSE(α1) = (.052) + 1.025 < (.052) + 1.050 = MSE(α2). 3. True.

9.12. E. Z = αX + βY. E[Z] = αE[X] + βE[Y]. For Z unbiased, E[Z] = m. Thus m = α(.8m) + βm. Thus β = 1 - .8α.

Since X and Y are independent, Var[Z] = α2Var[X] + β2Var[Y] = α2m2 + β21.5m2 = m2(α2 + 1.5(1 - .8α)2) = m2(1.96α2 - 2.4α + 1.5). The mean squared error is the sum of the variance and the square of the bias. Since the bias is zero we need to minimize the variance of Z. ∂Var[Z] / ∂α = m2(3.92α - 2.4). Setting the partial derivative equal to zero: m2(3.92α - 2.4) = 0. Therefore, α = 2.4 / 3.92 = .612....

9.13. B. E[ψ] = 165/75 = 2.2. E[ψ2] = 375/75 = 5. Var[ψ] = 5 - 2.22 = .16.Mean Square error of ψ = Square of Bias plus Variance = (2.2 - 2)2 + .16 = .200.E[φ] = 147/75 = 1.96. E[φ2] = 312/75 = 4.16. Var[φ] = 4.16 - 1.962 = .318.Mean Square error of φ = Square of Bias plus Variance = (1.96 - 2)2 + .318 = .320.MSE(ψ) / MSE(φ) = .200 /.320 = .625.

9.14. B. Statement 2 is the definition of a consistent estimator, and therefore must be true of a consistent estimator. Neither statement 1 or 3 must be true of a consistent estimator.

9.15 C. Mean squared error = variance plus the square of the bias = 1.00 + .52 = 1.25.

9.16. C. This absolute value loss function is minimized by the median. The median of this distribution is the value of x such that .5 = F(x) = 1- e-x. Thus x = ln 2 = .693.

9.17. D. b∗ = Σ(Zi - Z )(Yi - Y )/ Σ(Zi - Z )(Xi - X) =

ΣYi(Zi - Z )/ Σ(Zi - Z )(Xi - X) - (1/N)ΣYiΣ(Zi - Z )/ Σ(Zi - Z )(Xi - X),

which is linear in Yi; b∗ is of the form ΣwiYi.

E[Yi] = α + βXi = Y − βX + βXi = Y + β(Xi - X). E[Yi - Y ] = β(Xi - X).

E[b∗] = Σ(Zi - Z )E[Yi - Y ]/ Σ(Zi - Z )(Xi - X) = Σ(Zi - Z )β(Xi - X)/ Σ(Zi - Z )(Xi - X) = β.

The ordinary least squares estimator, Σxiyi/Σxi2 is the best linear unbiased estimator, so b∗ can not be the best, in other words with minimum mean squared error, of the linear unbiased estimators. Thus b∗∗∗∗ is a linear unbiased estimator of ββββ , but not the best linear unbiased estimator (BLUE) of ββββ .Heteroscedasticity-consistent estimators are concerned with estimating variances of estimated regression parameters, which b∗ is not doing, so choice B is false.Comment: If the true model were: Yi = α + βXi2 + εi, then b∗ would be the best linear unbiased

estimator (BLUE) of β. In deviations form, b∗ = Σzi yi / Σzi xi = Σyi zi / Σzi xi, which is linear in

yi; b∗ is of the form Σwiyi, with wi = zi / Σzi xi. E[b∗] = ΣE[yi] zi / Σzi xi = Σβxi zi / Σzi xi = β.


9.18. B. Let the parameter be θ. E[Y1] = θ. E[Y2] = θ. E[k1Y1 + k2Y2] = θ.

⇒ k1E[Y1] + k2E[Y2] = θ. ⇒ k1θ + k2θ = θ. ⇒ k1 + k2 = 1.

Var[k1Y1 + k2Y2] = k12Var[Y1] + k22Var[Y2] = (1 - k2)2 4Var[Y2] + k22 Var[Y2].Set equal to zero the partial derivative with respect k2:0 = -8(1 - k2) Var[Y2] + 2 k2 Var[Y2]. ⇒ k2 = 4/5. ⇒ k1 = 1/5 = 0.20.Comment: Weight each estimator inversely proportional to its variance.

9.19. D. E[ ^θ ] = E[k/(k+1) X] = k/(k+1)E[X] = k/(k+1)θ. Bias = E[ ^θ ] - θ = θ/(k + 1).

Var[ ^θ ] = Var[k/(k+1) X] = k/(k+1)2Var[X] = k/(k+1)2θ2/25.

MSE = Var + Bias2. We are given that in this case MSE ( ) = 2[bias ( )]^ ^ 2θ θθ θ . ⇒ Var[ ^θ ] = Bias2.

⇒ k/(k+1)2θ2/25 = θ/(k + 1)2. ⇒ k2 = 25. ⇒ k = 5.

9.20. D. There are a total of five observations. The average of 5 independent, identically distributed variables has the same mean and 1/5 the variance; it is Normal with mean 20 and variance 4/5 = 0.8.The expected value of the estimator is E[X2], which is the second moment of a Normal with µ = 20 and σ2 = 0.8, which is: 0.8 + 202 = 400.8.The true value of the area is: 202 = 400.The bias is: 400.8 - 400 = 0.8. It will overestimate on average by 0.8 square meters.Comment: Even though we have an unbiased estimator of the length of a side, squaring it does not give an unbiased estimator of the area, since E[X2] ≥ E[X]2. Variance ≥ 0. ⇒ E[X2] ≥ E[X]2.

10.1. X = 24/4 = 6. x = -4, -1, 2, 3. Y = 40/4 = 10. y = 0, -4, 1, 3.

^β = Σxiyi/Σxi2 = 15/30 = 0.5. α = Y -

^βX = 10 - (.5)(6) = 7.

^Y = α +

^βX = 8, 9.5, 11, 11.5. ε = Y -

^Y = 2, -3.5, 0, 1.5. ESS = Σ ^

iε 2 = 18.5.

TSS = Σyi2 = 26. Note that, RSS = Σ( ^Y - Y)2 = 7.5 = TSS - ESS.

R2 = 1 - ESS/TSS = 1 - 18.5/26 = .288.

1 - 2

R = (1 - R2)(N - 1)/(N - 2) = (.712)(3/2) = 1.067. ⇒ 2

R = -.067.s2 = ESS/(N - 2) = 18.5/(4 - 2) = 9.25.

Var[^β] = s2 /Σxi2 = 9.25/30 = .3083. βs = √.3083 = .555.

Var[ α] = s2ΣXi2 /(NΣxi2) = (9.25)(174)/((4)(30)) = 13.41. sα = √13.41 = 3.66.

Cov[ α,^β] = -s2 X /Σxi2 = -(9.25)(6)/30 = -1.85.

Corr[ α,^β] = Cov[ α,


^β]) = -1.85/√((13.41)(.3083)) = -.910.


10.2. C., 10.3. E., & 10.4. D.s2 = ESS/(N - 2) = 533000/(100 - 2) = 5439.

Σxi2 = Σ(Xi - X)2 = ΣXi2 - N X2 = 5,158,000 - (100)(1222) = 3,670,000.

Var[^β] = s2 /Σxi2 = 5439/3,670,000 = .001482. βs = √.001482 = .0385.

Var[ α] = s2ΣXi2 /(NΣxi2) = (5439)(5,158,000)/((100)(3,670,000)) = 76.44.

sα = √76.44 = 8.74. Cov[ α,^β] = -s2 X /Σxi2 = -(5439)(122)/3,670,000 = -.1808.

10.5. C. Y = (0 + 3 + 6 + 10 + 8)/5 = 5.4.TSS = (0 - 5.4)2 + (3 - 5.4)2 + (6 - 5.4)2 + (10 - 5.4)2 + (8 - 5.4)2 = 63.2.2.554 = s2 = ESS/(5-2). ⇒ ESS = 7.662. R2 = 1 - ESS/TSS = 1 - 7.662/63.2 = .879.

10.6. C. 4 = s2 = ESS/(N - 2) = ESS/13. ESS = (4)(13) = 52.R2 = 1 - ESS/TSS = 1 - 52/169 = 0.692.

2R = 1 - (1 - R2)(N - 1)/(N - k) = 1 - (1 - 0.692)(15 - 1)/(15 - 2) = 0.669.

Alternately, 1 - 2

R = ESS/(N - k)/TSS/(N - 1) = s2/ Σ yi2/(N - 1) = 4/(169/14) = 56/169.2

R = 1 - 56/169 = 113/169 = 0.669.

10.7. B. Var[ α] = σ2ΣXi2/(NΣxi2) = σ2(N E[X2])/N (N Var[X]) = (σ2/N)E[X2]/Var[X] =

(σ2/N)(Var[X] + E[X]2)/Var[X] = (σ2/N)(1 + E[X]2/Var[X]) ≥ σ2/N.

Thus the minimum possible value of Var[ α] is σ2/N = 255/30 = 8.5.Comment: The values of the independent variable X are known rather than random.However, over all possible of sets of X at which we could have had observations of Y, the minimum variance of the estimated intercept occurs for those sets in which X = 0.

10.8. D. s2 = ESS/(N - k) = 1036/(15 - 5) = 103.6.

s2 is an unbiased estimator of σ2. ⇒ σ2 = E[s2] = E[Σ îε 2/(N - k)] = E[Σ^

iε 2]/(N - k).

E[Σ îε 2] = (N - k)σ2, N ≥ k. For the 30 observations, E[Σ^

iε 2] = (30 - 5)(103.6) = 2590.

Comment: E[Σ îε 2] is not proportional to N, so twice the data does not result in twice the

expected ESS. If for example, we had had only 5 observations, then the regression would

have fit perfectly and ESS = 0. With 10 observations, E[Σ îε 2] = (10 - 5)σ2 > 0.

10.9. ^β = NΣXiYi - ΣXiΣYi / NΣXi2 - (ΣXi)2 = (30)(173) - (44)(106)/(30)(81) - 442 = 1.065.

10.10. α = Y - ^βX = 106/30 - (1.065)(44/30) = 1.971.


10.11. R2 = Corr[X, Y]2 = (Σxiyi)2/(Σxi2Σyi2)

= ΣXiYi - ΣXiΣYi/N2/(ΣXi2 - (ΣXi)2/NΣYi2 - (ΣYi)2/N) = SXY2/(SXXSYY).

SXX = 81 - 442/30 = 16.467. SYY = 410 - 1062/30 = 35.467. SXY = 173 - (44)(106)/30 = 17.533.

R2 = 17.5332/(16.467)(35.467) = 0.526.

10.12. TSS = Σ(Yi - Y )2 = ΣYi2 - (ΣYi)2/N = 410 - 1062/30 = 35.467.

ESS = (1 - R2)TSS = ( 1 - 0.526)(35.467) = 16.81.s2 = ESS/(N - 2) = 16.81/28 = 0.600.

10.13. βs = √(s2 /Σxi2) = √(0.600/ΣXi2 - (ΣXi)2/N) = √(0.600/(81 - 442/30)) = 0.191.

Alternately, Var[^β] = SYY/SXX - (SXY/SXX)2/(N-2). = 35.467/16.467 - (17.533/16.467)2/28

= 0.03643. βs = √0.03643 = 0.191.

10.14. sα = √(ΣXi2/N)(s2/Σxi2) = √(81/30)√(0.600/(81 - 442/30)) = 0.314.

10.15. Cov[ α,^β] = - Xs2 /Σxi2 = -(44/30)(0.600)/(81 - 442/30) = -0.0534.

10.16. Corr[ α,^β] = - X√(E[X2]) = -(44/30)/√(81/30) = -0.893.

10.17. A. Y = (16.1 + 15.4 + 13.3 + 12.6 + 10.5)/5 = 13.6.RSS = (16.1 - 13.6)2 + (15.4 - 13.6)2 + (13.3 - 13.6)2 + (12.6 - 13.6)2 + (10.5 - 13.6)2 = 20.2.8.356 = s2 = ESS/(5-2). ⇒ ESS = 25.07. R2 = RSS/TSS = 20.2/(20.2 + 25.07) = .446.

Comment: RSS is the numerator of the variance of ^Y.

The mean of the predicted values, ^Y, is equal to the mean of the Ys, Y .

10.18. Both estimates of β are unbiased. Therefore, MSE[^β] = Var[

^β].

Var[^β] = σ2 /Σxi2 = σ2/(N Var[X]).

Since Bob’s variance of X is smaller than Ray’s, Bob’s Var[^β] is larger than Ray’s.

Comment: All else being equal, the more dispersed the values of the independent variable, the better the estimate of the slope.Assuming the model is a reasonable one for boys ages 10 to 14, it is probably reasonable to use the model to estimate the average weight of a boy aged 15. It would not be reasonable to use the model to estimate the average weight of a boy aged 2 or a man aged 25! In general, one should be cautious about applying a model outside the range of data used to fit that model.


10.19. C. Error Sum of Squares = ESS ≡ Σ îε 2 = 967.

estimated variance of the regression = s2 = ESS/(N - k) = 967/(7 - 2) = 193.4.

Σxi2 = Σ(Xi - X)2 = 2000. βs 2= s2 / Σxi2 = 193.4/2000 = .0967.

standard error of ^β = βs = √.0967 = .311.

10.20. B. Var[^β] = (σ2/Var[X])/N = σ2 / Σxi2.

Independent Variable over wider domain ⇔ larger Var[X] ⇔ better estimate of slope.Therefore, Statement B is exactly wrong, as the variation of the X’s increases one gets a better estimate of the slope.s measures the dispersion of the error terms associated with the regression line.Statement A is true, since s/Y is like the estimated coefficient of variation of Y; the smaller this is, the smaller the errors and the better fit.Statement C is true; see page 65 in Pindyck and Rubinfeld.

Statement D is true since Cov[α ,^β ] = -σ2 X /Σxi2; see equation 3.13 in Pindyck and Rubinfeld.

X > 0 ⇒ Cov[α ,^β ] < 0.

⇒ an overestimate of α is likely to be associated with an underestimate of β.Statement E is true; see equation 3.6 in Pindyck and Rubinfeld.

10.21. E. s2 = ESS/ (N - 2) = 2000/(20 - 2) = 111.11. X = -300/20 = -15.Σxi2 = Σ(Xi - X)2 = Σ Xi2 - N X2 = 6000 - (20)(152) = 1500.

Cov[ α, ^β] = -s2 X /Σxi2 = -(111.11)(-15)/(1500) = 1.111.

Alternately, as discussed in a subsequent section, where X is the design matrix:

XX’ = N X

X Xi

i i2

Σ

Σ Σ

=

20 300

300 6000

−−

−−

.

(XX’)-1 = 6000 300

300 2020 6000 300 300

−− −− −−/ ( )( ) ( )( ) =

. .

. .

2 01

01 000667

.

Covariance matrix is: s2(XX’)-1 = 111.11. .

. .

2 01

01 000667

=

22 222 1 111

1 111 0741

. .

. .

.

Cov[ α, ^β] = 1.111.

10.22. Continuing the previous solution, Var( α) is the upper lefthand element of the

covariance matrix, 22.222, while Var(^β) is the lower righthand element of the covariance

matrix, 0.0741.

Alternately, Var[ α] = s2ΣXi2/(NΣxi2) = (111.11)(6000)/(20)(1500) = 22.222.

Var[^β] = s2 /Σxi2 = (111.11/1500 = 0.0741.


11.1. D. From the t-table, for 16 degrees of freedom at 2.583, there is 2% probability in the sum of the tails. Therefore, the distribution function at 2.583 is 99%.

11.2. B. For 6 d.f. the 2% critical value is 3.143. ⇒ Prob[t < -3.5] < 2%/2 = 1%.For 6 degrees of freedom the 1% critical value is 3.707. ⇒ Prob[t < -3.5] < 1%/2 = 0.5%.

11.3. C. From the t-table, for 7 degrees of freedom at 1.895, there is 10% probability in the sum of the tails. Therefore, the distribution function at -1.895 is 5%.

11.4. B. For 27 degrees of freedom the 10% critical value is 1.703. ⇒ Prob[|t| < 2] > 90%.For 27 degrees of freedom the 5% critical value is 2.052. ⇒ Prob[|t| < 2] < 95%.

11.5. C. From the t-table, for 7 degrees of freedom at 1.895, there is 10% probability in the sum of the tails. Therefore, the distribution function at -1.895 is 5%.

11.6. B. X1 + X4 is Normal with mean 0 and variance 2. ⇒ (X1 + X4)/√2 is a Unit Normal.

X22 + X32 is Chi-Square with 2 degrees of freedom.

⇒ t = (X1 + X4)/√2/√(X22 + X32)/2 = (X1 + X4)/√(X22 + X32)has a t-distribution with 2 degrees of freedom.

11.7. E. W has a t-distribution with one degree of freedom. Prob[W < 6.31] = .95.Comment: From the t-table, for one degree of freedom, Prob[|W| > 6.31] = 10%.⇒ Prob[W > 6.31] = 5%. ⇒ Prob[W < 6.31] = 95%.

12.1. A. X = (18 + 24 + 33 + 34 + 30 + 35 + 39 + 12 + 18 + 30) / 10 = 27.3.Variance of the estimated mean = 80/10 = 8.Using the Normal Distribution, a 95% confidence interval is ± 1.960 standard deviations.27.3 ± 1.960 √8 = 27.3 ± 5.54 = 21.8 to 32.8.

12.2. B. X = (18 + 24 + 33 + 34 + 30 + 35 + 39 + 12 + 18 + 30) / 10 = 27.3.Second moment = (182 + 242 + 332 + 342 + 302 + 352 + 392 + 122 + 182 + 302) / 10 = 815.9.Sample variance = (10/9)(815.9 - 27.32) = 78.46.Variance of the estimated mean = 78.46/10 = 7.846.For 10 - 1 = 9 degrees of freedom , the critical value for 5% sum of the area in both tails is2.262. 27.3 ± 2.262 √7.846 = 27.3 ± 6.34 = 21.0 to 33.6.

12.3. E. The estimated mean is: (0 + 3 + 4 + 4 + 6 + 9 + 9 + 13)/8 = 48 / 8 = 6. The sample variance S2 = (1/7)(62 + 32 + 22 + 22 + 02 + 32 + 32 + 72) = 120 /7 = 17.14. The number of degrees of freedom is 8 - 1 = 7. Since we want a 90% confidence interval, we allow a total of 10% in both tails, so t = 1.895. The approximate interval is 6 ± tS/√n = 6 ± (1.895)(√17.14)/√8 = 6 ± 2.77 = (3.23, 8.77).


12.4. D. X = 32. Second moment = 1590. x square of x8 6 4

1 4 1 9 61 8 3 2 42 0 4 0 02 1 4 4 12 2 4 8 42 6 6 7 63 0 9 0 04 2 1 7 6 45 5 3 0 2 59 6 9 2 1 6

Average 32.00 1590.00 Sample Variance = (11/10)(1590 - 322) = 622.6. The sample standard deviation S = 24.95. We have 11-1 = 10 degrees of freedom, and for a 95% confidence interval we have t = 2.228. Therefore, an approximate 95% confidence interval for the mean is: 32 ± tS/√n = 32 ± (2.228)(24.95)/√11 = 32 ± 16.76 = (15.2, 48.8).

12.5. B. t = ( X - µ)/(S/√n) = (-6 - (-2))/√(46/20) = -2.638.For 19 d.f., for a 1-sided test, the 1% critical value is 2.539 and the 0.5% critical value is 2.861. 2.539 < 2.638 < 2.861. ⇒ reject H0 at 1% and do not reject at 0.5%.

Comment: We perform a one-sided test since H1: µ < -2. Note that X < -2.

If X ≥ -2, then we do not reject in this case.

12.6. C. H0: the two means are the same, or that the mean of the difference is zero.

County New Old Difference SquareProgram Program of Difference

1 5 6 4 9 7 4 92 5 2 5 7 - 5 2 53 5 2 6 4 - 1 2 1 4 44 4 9 5 3 - 4 1 65 5 9 6 4 - 5 2 56 5 6 6 8 - 1 2 1 4 47 6 0 6 8 - 8 6 48 5 6 6 6 - 1 0 1 0 0

Average - 6 . 1 2 5 70.875

The mean of the differences is: - 6.125.The sample variance of the differences is: (8/7)(70.875 - 6.1252) = 38.125.t = -6.125/√(38.125/8) = -2.806, with 7 degrees of freedom.

ν 0.1 0.05 0.02 0.01 7 1.895 2.365 2.998 3.499 Since 2.365 < 2.806 < 2.998, reject H0 at 5% and do not reject H0 at 2%.


12.7. C. From the previous solution, t = -2.806. We now do a one-sided rather than two-sided t-test. Since 2.365 < 2.806 < 2.998, we reject H0 at 5%/2 = 2.5% and do not reject H0 at 2%/2 = 1%.Comment: At a 2.5% significance level, we have shown that the new program reduces expected losses. In other words, there is less than a 2.5% chance we would have observed the improvements we did see if the new program did not reduce expected losses. While the values in the t-table consider the area in both tails, for this test we only look at the area under the t-distribution to the left of -2.806.

-4 -2 2 4

0.1

0.2

0.3

Using a computer, the area to the left of -2.806 is 1.31%, for a p-value of 1.31% for the one-sided test.

12.8. D. From a previous solution, the mean of the differences is: - 6.125.The sample variance of the differences is 38.125. The cost is 1.5.t = (-6.125 + 1.5)/√(38.125/8) = -2.119, with 7 degrees of freedom.

ν 0.1 0.05 0.02 0.01 7 1.895 2.365 2.998 3.499 Doing a one-sided test, since 1.895 < 2.119 < 2.365, we reject H0 at 10%/2 = 5% and do not reject H0 at 5%/2 = 2.5%.

12.9. A. t = ( X - µ)/(S/√n) = (60 - 55)/√(33/15) = 3.371.For 14 d.f., for a 2-sided test, the 1% critical value is 2.977. 3.371 > 2.977. ⇒ reject H0 at 1%.

Comment: A 99% confidence interval for µ is: 60 ± (2.977)√(33/15) = 60 ± 4.416.Since 55 is outside this confidence interval, we can reject H0 at 1%.Using a computer, the p-value of the two-sided test is .46%.

12.10. E. Differences are: 5, -1, 6, 0, 10. Mean of differences is: 4.Sample variance of differences is: (5 - 4)2 + (-1 - 4)2 + (6 - 4)2 + (0 - 4)2 + (10 - 4)2/4 = 20.5.t = 4/√(20.5/5) = 1.975. Perform a one-sided t-test at 4 degrees of freedom.1.975 < 2.132. Do not reject at 5%.


12.11. E. X = Σxi/n = 132/11 = 12. S2 = Σ(xi - x)2/(n-1) = 99/10 = 9.9.For the t-distribution with 11 - 1 = 10 degrees of freedom, for 10% area in both tails, the critical value is 1.812. 90% confidence interval: X ± t√(S2/n) = 12 ± (1.812)√(9.9/11) = 12 ± (1.812)√.90. Thus k = 1.812.

12.12. D. t = ( X - µ)/√(S2/n) = (15.84 - 10)/√(16/4) = 2.92.For 3 d.f. and 5% area in both tails, the critical value is 3.182.Since 2.92 < 3.182, do not reject at 5%.

12.13. E. t = ( X - µ)/√(S2/n) = 3( X - µ)/S has a t-distribution with 8 degrees of freedom.( X- µ) < .62S. ⇔ ( X- µ)/S < .62. ⇔ t < 1.86. Prob[t < 1.86] = .95.Comment: For 8 d.f. at 1.86 there is 10% area in both tails, or 5% area in the righthand tail.

12.14. A. With a sample size of 10, we have a t-distribution with 10 - 1 = 9 d.f.The critical value for 5% area in both tails is 2.262.The variance of the mean is s2/10.The standard deviation of the mean is s/√10.A 95% confidence interval for µ: x ± 2.26 s/√10.

12.15. A. The point estimate of the mean is: (0 + 3 + 5 + 5 + 5 + 6 + 8 + 8 + 9 + 11)/10 = 6. S2 = (62 + 32 + 12 + 12 + 12 + 02 + 22 + 22 + 32 + 52)/9 = 10. Thus the sample standard deviation S = 3.162. We have 10 - 1 = 9 degrees of freedom, and for a 95% confidence interval we want t = 2.262. An approximate 95% confidence interval for the mean is: 6 ± t S/√n = 6 ± (2.262)(3.162)/√10 = (3.738, 8.262).

12.16. E. t = ( X - m)/(S/√n) = (52.53 - 50)/(3.3/√9) = 2.3.The critical value at 2.5% for a one-sided test t-test with 8 d.f. is 2.306.Comment: Since 2.3 < 2.306, we do not reject at 2.5%.

12.17. C. X = 10. S2 = (12 - 10)2 + (8 - 10)2 + (10 - 10)2/(3 - 1) = 4.Standard Deviation of the mean is: √(4/3) = 1.155For 2 d.f., from the t-table, S(t) = 95% for t = -2.920 standard deviations.k = 10 - (2.920)(1.155) = 6.63. Comment: 2.920 leaves 10% outside on both tails, and therefore 5% in the lefthand tail.


12.18. E. Xi - Yi is Normal with mean µX - µY and variance: σ2 = σX2 + σY2.

If H0 is true Xi - Yi is Normal with mean 0 and variance σ2.

Σ (Xi - Yi) is Normal with mean 0 and variance 3σ2.

Σ (Xi - Yi)/(σ√3) is a Unit Normal.

Σ (Xi - Yi) - ( X - Y )2/σ2 has Chi-Square Distribution with 3 - 1 = 2 degrees of freedom.

Thus Σ (Xi - Yi)/(σ√3)/√(Σ (Xi - Yi) - (X - Y )2/σ2/2) = √1.5 Σ (Xi - Yi) /√(Σ (Xi - Yi) - (X - Y )2)has a t-statistic with 2 degrees of freedom.For 2 degrees of freedom, for a two-sided test, the 5% critical value is 4.303.Since 4.10 ≤ 4.303, we do not reject at the 5% significance level.Comment: When doing such a paired test of means, the statistic has a t-distribution with n - 1 degrees of freedom, where n is the number of pairs.

12.19. D. There are 10 points, so we have 10 - 1 = 9 degrees of freedom when using the Student’s t-distribution to get an interval estimate of the mean. For a 90% confidence interval t = 1.833. The point estimate of the mean is: X = (2 + 0 + 4 + 4 + 6 + 3 + 1 + 5 + 6 + 9)/10 = 4. The sample variance is: S2 = (2-4)2 + (0-4)2 + (4-4)2 + (4-4)2 + (6-4)2 + (3-4)2 + (1-4)2 + (1-4)2 + (5-4)2 + (6-4)2 + (9-4)2/9 = 7.111. The 90% confidence interval for the mean is: X ± t S / √n = 4 ± (1.833)(√7.111)/√10 = 4 ± 1.55 = (2.45, 5.55).

12.20. A. The estimated variance of the mean is: 135/15 = 9.Looking at the t-table, with 15 - 1 = 14 degrees of freedom, for a 90% confidence interval we want ±1.761 standard deviations. 47.21 ± (1.761)(3) = (41.93, 52.49).Comment: Since this is a variance estimated from the data, we use the t-distribution rather than the Normal Distribution.

12.21. C. Zi is Normal with mean 0 and variance σX2 + σY2.

Z is Normal with mean 0 and variance (σX2 + σY2)/9.

3 Z /√(σX2 + σY2) has a unit Normal Distribution.

8SZ2/(σX2 + σY2) is Chi-Square with 9 - 1 = 8 degrees of freedom.

⇒ 3 Z /√(σX2 + σY2)/√SZ2/(σX2 + σY2) = 3 Z /SZ has a t-distrib. with 8 degrees of freedom.

P[3 Z /SZ ≤ 1.860] = .95. ⇒ c = 1.860/3 = .620.Comment: From the t-table, for 8 degrees of freedom, 10% probability on both tails has a critical value of 1.860.

12.22. E. X - Y is Normal with mean µx - µy, (and variance σx2 + σy2 - 2ρσxσy.) Let Z = X - Y.

√8 ( X - Y ) /√(1/7)Σ(Xi - Yi) - (X - Y )2 = √n Z /√(1/(n-1))Σ(Zi - Z )2 = Z /SZ/√n.This is the form of a t-distribution with n -1 = 8 -1 = 7 degrees of freedom.Prob[|t| > 2.365] = 5%.Comment: The absolute value results in a two-sided t-test.For 7 degrees of freedom, for a two-sided 5% significance level, the critical value is 2.365.


12.23. A. If X follows a LogNormal Distribution with parameters µ and σ, then ln(X) follows a Normal Distribution with parameters µ and σ. Thus we have that ln(2) = .693, ln(8) = 2.079, ln(13)=2.656 and ln(27) = 3.296 are 4 random draws from a Normal Distribution. The point estimate of the mean is: (.693 + 2.079 + 2.565 + 3.296)/4 = 2.158. S2 = (.693 - 2.158)2 + (2.079 - 2.158)2 + (2.079 - 2.158)2 + (2.079 - 2.158)2/3 = 1.204.Thus the sample standard deviation S = 1.097. We have 4-1 = 3 degrees of freedom, and for a 90% confidence interval we have t = 2.353. Therefore, an approximate 90% confidence interval for the mean is: 2.158 ± t S/√n = 2.158 ± (2.353)(1.097)/√4 = 2.158 ± 1.291 = (.867, 3.449).

12.24. X is Normal with mean 0 and variance σ2/9.Therefore, 3 X/σ is a Standard Normal.8S2/σ2 is Chi-Square with 8 degrees of freedom.Therefore, (3 X/σ)/√S2/σ2 = 3X/S has a t-distribution with 8 degrees of freedom.Prob( X > S) = Prob(3 X/S > 3) = Prob[t with 8 d.f. > 3], which is between 1/2% and 1%.Comment: ( X - µ0)/(S/√n) follows a t-distribution with n-1 d.f.Using the t-table, for 8 degrees of freedom, Prob[t > 2.896] = 1% and Prob[t > 3.355] = 1/2%.Using a computer, the exact probability that t > 3 is 0.85%.

Solutions to problems in the remaining sections appear in Study Guides M, etc.


Mahler’s Guide to

Regression



prepared by


Study Aid F06-Reg-M

New England Actuarial Seminars Howard MahlerPOB 315 [email protected], MA, 02067

www.neas-seminars.com


13.1. C. For 25 - 2 = 22 degrees of freedom, for 1 - 90% = 10% area in both tails, the critical value for the t-distribution is 1.714. The 90% confidence interval for the slope is:^β + t βs = 1.73 ± (1.714)(.24) = 1.73 ± .41 = [1.32, 2.14].

13.2. E. For 22 - 2 = 10 degrees of freedom, for 1 - 98% = 2% area in both tails, the critical value for the t-distribution is 2.764. The 98% confidence interval is:

α + t sα = α ± (2.764)(126) = α ± 348, of width (2)(348) = 696.

13.3. B. &. 13.4. E. s2 = Σ îε 2/(N - 2) = 276/(10 - 2) = 34.5.

Var[ α] = s2(ΣXi2 / N)/Σxi2 = (34.5)(385/10)/82.5 = 16.1. sα = √16.1 = 4.012.

Var[^β] = s2 /Σxi2 = 34.5/82.5 = .418. βs = √.418 = .647.

For N - 2 = 8 degrees of freedom, for a 1% area in both tails of the t-distribution, t = 3.355.Therefore, for a 99% confidence interval for α, we want:

α ± 3.355sα = 6.93 ± (3.355)(4.012) = 6.93 ± 13.46 = (-6.5, 20.4).

Therefore, for a 99% confidence interval for β, we want:^β ± 3.355 βs = 2.79 ± (3.355)(.647) = 2.79 ± 2.17 = (.62, 4.96).

Comment: Similar to 4, 11/00, Q.5.

13.5. E. &. 13.6. A. ESS = Σ îε 2 = Σ(Yi

- ^

iY )2 = 272.

s2 = ESS/(N - 2) = 272/(12 - 2) = 27.2.

α = Y - ^βX ⇒ 9.88 = 390/12 - 2.36 X ⇒ X = 9.585.

Var[X] = Σ(Xi - X)2 / N = 1283/12.

However, Var[X] = ΣXi2 / N - X2 ⇒ ΣXi2 / N = 1283/12 + 9.5852 = 198.8.

Var[ α] = s2(ΣXi2 / N)/Σxi2 = (27.2)(198.8)/1283 = 4.215. sα = √4.215 = 2.05.

Var[^β] = s2 /Σxi2 = 27.2/1283 = .0212. βs = √.0212 = .146.


α ± 1.812sα = 9.88 ± (1.812)(2.05) = 9.88 ± 3.71 = (6.2, 13.6).

Therefore, for a 90% confidence interval for β, we want:^β ± 1.812 βs = 2.36 ± (1.812)(.146) = 2.36 ± .265 = (2.1, 2.6).


HCMSA-F06-Reg-M, Solutions to Regression §13-21, 7/12/06, Page 448

13.7. ^β = Σ xiyi / Σ xi2 = -20008/180021 = -0.111. α = Y -

^βX = 31.7 - (-0.111)(155.8) = 49.0.

X X^2 Y x y x^2 x y y^2

1 0 1 0 0 6 0 - 1 4 5 . 8 28.3 21 ,267 - 8 , 7 5 0 802.82 5 6 2 5 4 0 - 1 3 0 . 8 8 .3 17 ,117 - 5 , 2 3 3 69.45 0 2 , 5 0 0 5 0 - 1 0 5 . 8 18.3 11 ,201 - 5 , 2 9 2 336.1

1 0 0 10 ,000 3 0 - 5 5 . 8 - 1 . 7 3 , 1 1 7 - 1 , 6 7 5 2.82 5 0 62 ,500 1 0 94.2 - 2 1 . 7 8 , 8 6 7 9 4 2 469.45 0 0 250 ,000 0 344.2 - 3 1 . 7 118 ,451 0 1002.8

Sum 9 3 5 325 ,725 1 9 0 - 0 . 0 - 0 . 0 180 ,021 - 2 0 , 0 0 8 2683.3Avg. 155.8 31.7

TSS = Σ yi2 = 2683. RSS = ^βΣxiyi = (-.111)(-20008) = 2221.

ESS = TSS - RSS = 2683 - 2221 = 462. s2 = ESS/(N - 2) = 462/(6 - 2) = 115.

Var[ α] = s2(ΣXi2 / N)/Σxi2 = (115)(325725/6)/180021 = 34.68. sα = √34.68 = 5.89.

Var[^β] = s2 /Σxi2 = 115/180021 = .000639. βs = √.000639. = .0253.


α ± 3.747sα = 49.0 ± (3.747)(5.89) = 49.0 ± 22.1 = (27, 71).

Therefore, for a 98% confidence interval for β, we want:^β ± 3.747 βs = -0.111 ± (3.747)(.0253) = -0.111 ± .0948 = (-.206, -.016).

13.8. D. &. 13.9. C. TSS = Σyi2 = 6748. ESS = TSS - RSS = 6748 - 1279 = 5469.

s2 = ESS/(N - 2) = 5469/(14 - 2) = 455.75.

α = Y - ^βX = 25.86 - (.643)(13.86) = 16.95. Var[X] = Σ(Xi - X)2 / N = 3096/14.

However, Var[X] = ΣXi2 / N - X2 ⇒ ΣXi2 / N = 3096/14 + 13.862 = 413.2.

Var[ α] = s2(ΣXi2 / N)/Σxi2 = (455.75)(413.2)/3096 = 60.83. sα = √60.83 = 7.80.

Var[^β] = s2 /Σxi2 = 455.75/3096 = .1472. βs = √.1472 = .384.


α ± 2.179sα = 16.95 ± (2.179)(7.80) = 16.95 ± 17.00 = (0, 34).

Therefore, for a 95% confidence interval for β, we want:^β ± 2.179 βs = .643 ± (2.179)(.384) = .643 ± .837 = (-0.2, 1.5).



13.10. E. s2 = Σ(Yi - ^

iY )2/(N - k) = 2.79/8 = .349.

variance = 2nd moment - mean2. ⇒ Σxi2/N = ΣXi2/N - X2. ⇒ 180/10 = ΣXi2/10 - 62. ⇒ ΣXi2

= 180 + 360 = 540. Var[ α] = s2(ΣXi2 / N)/Σxi2 = (.349)(540/10)/180 = .1047. sα = √.1047 = .324. For 8 degrees of freedom and 95%, the t-statistic is 2.306.

Therefore we want: α ± (2.306)(.324) = α ± .747.Thus the width of the interval is: (2)(.747) = 1.49.

13.11. C. For 12 - 2 = 10 degrees of freedom and 95% confidence, the t statistic is 2.228.

The confidence interval for β is 2.5 ± 1.3. Therefore, 1.3 = t βs . ⇒ βs = 1.3/2.228 = .583.

Now βs 2 = s2/Σxi2. ⇒ s2 = βs 2Σxi2 = (.5832)(.826) = .281.

13.12. (i) ^β = NΣXiYi - ΣXiΣYi / NΣXi2 - (ΣXi)2 =

(13)(286.6299) - (34.023)(110.679)/(13)(91.3978) - 34.0232 = -39.443/30.607 = -1.289.

α = Y - ^βX = (110.679/13) - (-1.289)(34.023/13) = 11.887.

(ii) (a) ESS = (-.005)2 + (-.049)2 + (-.019)2 + (-.043)2 + (.032)2 + (.024)2 + (.070)2 + (.068)2 + (.037)2 + (-.095)2 + (-.129)2 = .049019.s2 = ESS/(n-2) = .049019/11 = 0.004456.(b) Var[X] = 91.3978/13 - (34.023/13)2 = 0.1811.

Var[^β] = s2/nVar[X] = 0.004456/(13)(0.1811) = .00189.

For a t-distribution with 11 d.f., for 5% area in both tails, the critical value is 2.201. A 95% confidence interval for β is: -1.289 ± 2.201√.00189 = -1.289 ± .096 = -1.385 to -1.193.(c) If the relationship: P = k/L. ⇔ lnP = lnk - lnL. ⇔ Y = lnk - X. is correct, then the slope parameter of the regression line should be -1. Since -1 is not in the 95% confidence interval for β, the data do not support the suggested relationship; we can reject this hypothesis at the 5% significance level.


(iii) The residuals show that the line underestimates in the middle. A straight line doesn’t fit the data very well.

2 2.2 2.4 2.6 2.8 3 3.2ln L

-0.1

-0.05

0.05

Residuals

Comment: One could back out α and ^β from the given fitted values.

Check: 11.887 + (-1.289)(1.946) = 9.379. 11.887 + (-1.289)(2.197) = 9.055.Part (ii) (c) is equivalent to a t-test of the hypothesis β = -1.

13.13. D. s2 = Σ îε 2/(N - 2) = 7832/(20 - 2) = 435.11.

Var[^β] = s2 /Σxi2 = 435.11/10,668 = .04079. βs = √.04079 = .202.

For N - 2 = 18 degrees of freedom, for a 5% area in both tails of the t-distribution, t = 2.101.

Therefore, for a 95% confidence interval, we want ^β ± 2.101 βs = -1.104 ± (2.101)(.202) =

-1.104 ± .424 = (-1.528, -.680).

Comment: We are given that TSS = Σyi2 = Σ(Yi - Y )2 = 20,838 and ESS = Σ îε 2 = 7,832.

Therefore, RSS = TSS - ESS = 20838 - 7832 = 13006.

However, we also have that RSS = ^βΣxiyi =

^β(

^βΣxi2) =

^β2Σxi2 = (-1.104)2(10668) = 13002,

matching subject to rounding.

13.14. D. R2 = 1 - ESS/TSS = 1 - 7832/20838 = 1 - .376.

1 - 2

R = (1 - R2)(N - 1)/(N - k) = (.376)(20 - 1)/(20 - 2) = .397. 2

R = .603.


13.15. B. ESS ≡ Σ îε 2 = Σ(Yi

- ^

iY )2 = 2394.

s2 = ESS/(N - k) = 2394/(8 - 2) = 399.Σxi2 = Σ(Xi - X)2 = 1.62.

βs 2= s2 / Σxi2 = 399/1.62 = 246.3. βs = √246.3 = 15.69.

For 8 -2 = 6 degrees of freedom, for 10% two-tailed test using the t-distribution, or a 90%

confidence interval, we want ±1.943 standard deviations.^β ± t.95 βs = -35.69 ± (1.943)(15.69) = (-66.2, -5.2).

13.16. (i) A scatter plot of wx against x:

70 71 72 73 74 75 76 77x

-2.6

-2.4

-2.2

-2

-1.8

-1.6

wx

The relationship of x and y is not linear, while the relationship of x and w appears to be approximately linear.

(ii) ^β = NΣXiWi - ΣXiΣWi / NΣXi2 - (ΣXi)2 =

(8)(-1246.7879) - (588)(-17.0568)/(8)(43260) - 5882 = 0.16397.

α = W - ^βX = (-17.0568/8) - ( 0.16397)(588/8) = -14.184.

Fitted line is: wx = -14.184 + 0.16397x.

Alternately, SXX = ΣXi2 - (ΣXi)2/N = 43260 - 5882/8 = 42.SXW = ΣXiWi - ΣXiΣWi /N = -1246.7879 - (588)(-17.0568)/8 = 6.8869

^β = SXW/SXX = 6.8869/42 = 0.16397. Proceed as before.


(b) A scatter plot of wx against x, showing the regression line:

70 71 72 73 74 75 76 77x

-2.6

-2.4

-2.2

-2

-1.8

-1.6

wx

(c) RSS = SXW2/SXX = 6.88692/42 = 1.129271.

TSS = SWW = ΣWi2 - (ΣWi) 2/N = 37.5173 - 17.05682/8 = 1.150497. ESS = TSS - RSS = 1.150497 - 1.129271 = 0.021226.s2 = ESS/(N-2) = 0.021226/6 = 0.003538.

Var[^β] = s2 /SXX = 0.003538/42 = .000084229.

For the t-distribution with 6 degrees of freedom, in order to have a total of 5% area in both tails,the critical value is 2.447.Thus a 95% confidence interval for β is: 0.1640 ± 2.447√.000084229 = 0.1640 ± .0225, or (0.141, 0.187).(d) The fitted value of w for age 71 is: -14.184 + (0.16397)(71) = -2.542.The fitted value of y for age 71 is: exp[-2.542] = 0.0787.The fitted value of n for age 71 is: (0.0787)(471) = 37.1.The fitted value of w for age 76 is: -14.184 + (0.16397)(76) = -1.722.The fitted value of y for age 76 is: exp[-1.722] = 0.1787.The fitted value of n for age 76 is: (0.1787)(468) = 83.6.(iii) E[Nx] = Exbcx. ⇒ E[Yx] = E[Nx/Ex] = bcx. ⇒ ln(E[Yx]) = ln b + x ln c.

The above regression fitted the model: wx = lnyx = α + βx.

This model was based on an assumption that: E[lnYx] = α + βx.Thus the two models are similar.However, E[lnYx] ≠ ln(E[Yx]), so the two models are not the same.

Comment: If Nx is a random variable with mean Exbcx, then mortality approximately follows

Gompertz’s Law, which has a force of mortality of bcx. See Actuarial Mathematics, Section 3.7.


13.17. (a) A scatter plot of y against x:

30 40 50 60x

-2

-1.5

-1

-0.5

0.5

1

y

While the relationship of x and y appears to be approximately linear for the middle groups, it seems to deviate from linear for the lowest and highest age groups.(b) SXX = ΣXi2 - (ΣXi)2/N = 17437.5 - 3602/8 = 1237.5.SXY = ΣXiYi - ΣXiΣYi /N = -9.0429 - (360)(-2.9392)/8 = 123.22

^β = SXY/SXX = 123.22/1237.5 = 0.09957.

α = Y - ^βX = (-2.9392/8) - (0.09957)(360/8) = -4.848.

Fitted line is: y = -4.848 + 0.09957x.(c) RSS = SXY2/SXX = 123.222/1237.5 = 12.2692.

TSS = SYY = ΣYi2 - (ΣYi) 2/N = 13.615 - 2.93922/8 = 12.5351. ESS = TSS - RSS = 12.5351 - 12.2692 = 0.2659.s2 = ESS/(N-2) = 0.2659/6 = 0.04432.

Var[^β] = s2/SXX = 0.04432/1237.5 = .00003581.

For the t-distribution with 6 degrees of freedom, in order to have a total of 1% area in both tails,the critical value is 3.707.Thus a 99% confidence interval for β is: 0.09957 ± 3.707√.00003581 = 0.09957 ± .0222, or (0.077, 0.122).(d) If the probability of having coronary heat disease for the different age groups were the same, then the slope would be zero. However, zero is not in the 99% confidence interval for β.Therefore, at a significance level of 1%, we can reject the hypothesis that the probability of having coronary heat disease for the different age groups is the same.Comment: ln[p/(1-p)] is called the logit function, and is used in Generalized Linear Models.


13.18. D. s2 = Σ îε 2/(N - 2) = ESS/(N - 2) = 5348/(20 - 2) = 297.1.

Σxi2 = Σ(Xi - X)2 = ΣXi2 - N X2. ΣXi2 = Σxi2 + N X2 = 2266 + (20)(1002) = 202266.

Var[ α] = s2ΣXi2 / (NΣxi2) = (297.1)(202266)/(20)(2266) = 1326. sα = √1326 = 36.4.For N - 2 = 18 degrees of freedom, for a 5% area in both tails of the t-distribution, t = 2.101.

Therefore, for a 95% confidence interval, we want: α ± 2.101sα = 68.73 ± (2.101)(36.4) =

68.73 ± 76.5 = (-7.8, 145.2).

13.19. A. Var[^β] = s2 /Σxi2 = 297.1/2266 = .1311. βs = √.1311= .362.

For N - 2 = 18 degrees of freedom, for a 5% area in both tails of the t-distribution, t = 2.101.

Therefore, for a 95% confidence interval we want: ± 2.101 βs .

The width of the confidence interval is: (2)(2.101)(.362) = 1.52.

14.1. The critical values for 5% and 1% are 8.88 and 27.67 respectively. Since 30.03 > 27.67, the p-value is less than 1%.Comment: Using a computer, the p-value is: 0.89%.

14.2. The critical values for 5% and 1% are 2.60 and 3.94 respectively. Since 2.60 < 2.72 < 3.94, we reject at 5% and do not reject at 1%.

14.3. Finding the appropriate column of the F-Table where the critical value for 5% is 3, ν2 = 12.

14.4. In the F-Table for ν1 = 1 and ν2 = 10, the 1% critical value is 10.04.

Therefore at the 1% significance level with 10 degrees of freedom the t-statistic has a critical value of √10.04 = 3.169. Comment: This is indeed the critical value shown in the t-table.

14.5. As shown in the F-Table, 15.21.

14.6. As shown in the F-Table, for 3 and 8 degrees of freedom, the 95th percentile is 4.07. Fν

1,ν

2(x) = 1 - Fν

2,ν

1(1/x). Therefore 95% = F3,8(4.07) = 1 - F8,3(1/4.07). ⇒ F8,3(.246) = 5%.

The 5th percentile of the F-Distribution with 8 and 3 d.f. is 0.246.

14.7. Z12 + Z22 is Chi-Square with 2 degrees of freedom.

Z32 + Z42 + Z52 is Chi-Square with 3 degrees of freedom.

Therefore, (Z12 + Z22)/2/(Z32 + Z42 + Z52)/3 has an F-Distribution with 2 and 3 d.f.

Prob[Z12 + Z22 ≥ 20.55(Z32 + Z42 + Z52)] = Prob[(Z12 + Z22)/(Z32 + Z42 + Z52) ≥ 20.55] =

Prob[(Z12 + Z22)/2/(Z32 + Z42 + Z52)/3 ≥ 30.82] = Prob[F2,3 ≥ 30.82] = 1%.

14.8. As shown in the F-Table, 3.01.


14.9. F = second sample variance / first sample variance = 54.33/4.92 = 11.04.observation Sample

Sample 1 2 3 4 Mean Variance

A 5 7 2 3 4.25 4.92B 4 1 1 5 6.67 54.33

F has an F-Distribution with 2 and 3 degrees of freedom.The 5% critical value is 9.55. The 1% critical value is 30.82.Since 9.55 < 11.04 < 30.82, we reject H0 at 5% and do not reject at 1%.Comment: The p-value is 4.1%.

14.10. D. Σ(Xi - X)2 is (n-1) times the sample variance of X, and thus is σX2 times a Chi-Square distribution with n - 1 = 11 degrees of freedom. Σ(Yi2 - Y )2 is (m-1) times the sample variance of Y, and thus is σY2 times a Chi-Square distribution with m - 1 = 14 degrees of freedom.If H0 is true, (Σ(Xi - X)2/σX2/11) /(Σ(Yi2 - Y )2/σY2/14) = 14W/11, has an F-Distribution with 11 and 14 degrees of freedom. The critical value for 1% is 3.86.Prob[14W/11 > 3.86] = 1%. ⇒ Prob[W > 3.03] = 1%.

14.11. E. Mean of X is: 255/25 = 10.2. Second Moment of X is: 2867/25 = 114.68.Variance of X is: 114.68 - 10.22 = 10.64. Sample Variance of X is: (25/24)(10.64) = 11.08.Mean of Y is: 212/20 = 10.6. Second Moment of Y is: 2368/20 = 118.4.Variance of Y is: 118.4 - 10.62 = 6.04. Sample Variance of Y is: (20/19)(6.04) = 6.36.F = (Sample Variance of X)/(Sample Variance of Y) = 11.08/6.36 = 1.74.For 24 and 19 degrees of freedom, the critical value for 5% is 2.11.Since 1.74 < 2.11, we do not reject at 5%.Comment: The “regular” variance has n in the denominator, while the sample variance has n - 1 in the denominator. Thus one way to calculate the sample variance is as:n/(n-1)(Second Moment - Square of the Mean).

14.12. B. To test the hypothesis σW2 = σX2 versus σW2 < σX2, F = 1787/598 = 2.99.For 12 and 12 d.f. the 5% critical value is 2.69 and the 1% critical value is 4.16.2.69 < 2.99 < 4.16. ⇒ Reject σW2 = σX2 at 5% but not at 1%.

To test the hypothesis σX2 = σY2 versus σX2 < σY2, F = 3560/1787 = 1.99.

1.99 < 2.69. ⇒ Do not reject σX2 = σY2 at 5%.

14.13. E. F = 1083/722 = 1.5. Y is in the numerator, and has 200 - 1 = 199 degrees of freedom; ν1 = 199. ν2 = 100 - 1 = 99.The survival function of the F-Distribution with 199 and 99 degrees of freedom at 1.5 is:1 - β[199/2, 99/2; ((199)(1.5) / (99 + (199)(1.5))] = 1 - ββββ[99.5, 49.5; .751] Comment: Using a computer, the p-value is 1.22%.F(x) = β[ν1/2, ν2/2; ν1x / (ν2 + ν1x)] = 1 - β[ν2/2, ν1/2; ν2/(ν2 + ν1x)].

Thus the p-value can also be written as: β[49.5, 99.5; .249].


14.14. A. The sample variance of X divided by σX2 has a Chi-Square Distribution with 10 d.f.

The sample variance of Y divided by σY2 has a Chi-Square Distribution with 6 d.f.

Therefore, (SX2/σX2) / (SY2/σY2) = (σY2/σX2)(SX2/SY2) has an F-Distribution with 10 and 6 d.f.

If H0 is true, then (σY2/σX2)(SX2/SY2) = (1/b)(SX2/SY2) has an F-Distribution with 10 and 6 d.f.

(1/b)(SX2/SY2) = (1/b)(189/37) = 5.108/b. For 10 and 6 d.f., the 5% critical value is 4.06.Provided 5.108/b > 4.06 we reject H0. 1.26 > b. The largest b for which we reject is 1.25.

14.15. E. Taking logarithms, each sample has a Normal Distribution with parameters µ

and σ equal to those of the samples LogNormal Distribution.Sample one: 6.908, 7.313, 8.006, 10.127, 13.122. The mean is 9.095 and the sample variance is 6.607.Sample two: 6.215, 6.908, 7.601.The mean is 6.908 and the sample variance is .480.F = 6.607/.480 = 13.76.We perform a two-sided test, since the alternative is that the two variances are not equal.The (one-sided) 5% critical value for 4 and 2 degrees of freedom is 19.25.13.76 < 19.25 so we do not reject at 10%, for a 2-sided test.Comment: This test does not have a lot of power with such small sample sizes.

14.16. C. Σ(Xi - X)2 is (n-1) times the sample variance of X, and thus is σX2 times a Chi-Square distribution with n - 1 = 9 degrees of freedom. Y has a known mean of zero, therefore ΣYi2 is σY2 times a Chi-Square distribution with m = 6 degrees of freedom.

If H0 is true, (Σ(Xi - X)2/σX2/9) /(ΣYi2/σY2/6) = 2W/3, has an F-Distribution with 9 and 6 degrees of freedom. The critical value for 5% is 4.10.Prob[2W/3 > 4.10] = 5%. ⇒ Prob[W > 6.15] = 5%.

14.17. A. ΣXi2 has a Chi-Square Distribution with 6 degrees of freedom.

ΣYi2 has a Chi-Square Distribution with 8 degrees of freedom.

(ΣXi2/6)/(ΣYi2/8) = (4/3)ΣXi2/ΣYi2 = Z has an F-Distribution with 6 and 8 d. f.Finding the 1% critical value in the F-Table, the 99th percentile of Z is 6.37.

14.18. D. X - Y is Normal with mean 0 and variance: σ2/4 + σ2/4 = σ2/2. Therefore, ( X - Y )2/(σ2/2) is Chi-Square with 1 d.f.Σ(Xi - X)2/σ2 is Chi-Square with 3 d.f.

Σ(Yi - Y )2/σ2 is Chi-Square with 3 d.f.

Therefore, Σ(Xi - X)2 + Σ(Yi - Y )2/σ2 is Chi-Square with 6 d.f.

Therefore, (( X - Y )2/(σ2/2))/1(Σ(Xi - X)2 + Σ(Yi - Y )2/σ2)/6 =

12( X - Y )2/Σ(Xi - X)2 + Σ(Yi - Y )2 has an F-distribution with 1 and 6 degrees of freedom.

Comment: (n-1)S2/σ2 = Σ(Xi - X)2/σ2 is Chi-Square with n -1 = 4 - 1 = 3 d.f.


14.19. B. (X - 2)2/σ2 is the square of a Unit Normal, or a Chi-Square with 1 d.f. (Y - 1)2/σ2 + (Z - 2)2/σ2 is a Chi-Square with two degrees of freedom.⇒ (X - 2)2/σ2/1/(Y - 1)2/σ2 + (Z - 2)2/σ2/2 = 2 (X - 2)2/(Y - 1)2 + (Z - 2)2 is an F-Distribution with 1 and 2 degrees of freedom, thus so would W if 4c = 2. ⇒ c = 1/2.

14.20. B. Xi/2 is a Unit Normal. Xi2

i 1

9

==∑∑ /4 is Chi-Square with 9 d.f.

Yi/3 is a Unit Normal. Yj2

j 1

8

==∑∑ /9 is Chi-Square with 8 d.f.

( Xi2

i 1

9

==∑∑ /4/9)/( Yj

2

j 1

8

==∑∑ /9/8) = 2 Xi

2

i 1

9

==∑∑ / Yj

2

j 1

8

==∑∑ has an F-Distribution with 9 and 8 d.f.

P[2 Xi2

i 1

9

==∑∑ / Yj

2

j 1

8

==∑∑ > 5.91] = 1%. ⇒ c = 5.91/2 = 2.96.

14.21. B. S12 divided by σ12 has a Chi-Square Distribution with 9 - 1 = 8 d.f.

S22 divided by σ22 has a Chi-Square Distribution with 6 - 1 = 5 d.f.

⇒ (S12/σ12) / (S22/σ22) = (σ22/σ12)W has an F-Distribution with 8 and 5 degrees of freedom. If H0 is true then 2W has an F-Distribution with 8 and 5 d.f. For a one sided F-Test of size 5%, consulting the table, the critical value is 4.82. 2W = 4.82. ⇒ W = 2.41. Comment: We reject when W > 2.41; when σ12 > σ22/2, we expect S12 / S22 to be big.

14.22. A. X - 30 is Normal with mean 0 and variance σ2/7.⇒ ( X - 30)/(σ/√7) is a Unit Normal. ⇒ 7( X - 30)2/σ2 is Chi-Square with one degree of freedom.Y - 30 is Normal with mean 0 and variance σ2/14.⇒ (Y - 30)/(σ/√14) is a Unit Normal. ⇒ 14(Y - 30)2/σ2 is Chi-Square with one degree of freedom.Therefore, (14(Y - 30)2/σ2/1)/(7(X - 30)2/σ2/1) = 2(Y - 30)2/( X - 30)2 has an F distribution with 1 and 1 degrees of freedom.

14.23. Consulting the F-Table for 12 and 6 degrees of freedom, S(4.00) = 5%.Therefore, using the general fact that Fν

1,ν

2(x) = 1 - Fν

2,ν

1(1/x), for 6 and 12 degrees of freedom,

F(1/4.00) = F(.250) = 1 - 5% = 95%. Thus statement a is true.Consulting the F-Table for 6 and 12 degrees of freedom, S(4.82) = 1%.Thus statement b is true.Consulting the F-Table for 12 and 6 degrees of freedom, S(7.72) = 1%.Therefore, using the general fact that Fν

1,ν

2(x) = 1 - Fν

2,ν

1(1/x), for 6 and 12 degrees of freedom,

F(1/7.72) = F(.130) = 1%. Thus statement c is true.


14.24. (i) (a) Dotplot for Isotonic-Isometric

It seems plausible that each plot could be from a Normal variable.(b) sA2 = 0.3027. sB2 = 0.7734. H0: σA2 = σB2 versus H1: σA2 ≠ σB2.F = .7734/.3027 = 2.555, with 9 and 9 degrees of freedom.Perform a two-sided test. Since 2.555 < 3.18, do not reject H0 at 10%.

(c) sp2 = (9sA2 + 9sB2)/18 = .5381. A = 2.76 and B = 3.37.t = (3.37 - 2.76)/√.5381(1/10 + 1/10) = 1.859 with 18 degrees of freedom.H0: µB = µA versus H1: µB > µA .Perform a one-sided test.Since 1.734 < 1.859 < 2.101, reject H0 at 5% but not at 2.5%.(ii) (a) The observed difference is 3.37 - 2.76 = 0.61.For the t-distribution with 18 degrees of freedom for 5% area in both tails, 2.101.0.61 ± 2.101√.5381(1/10 + 1/10) = 0.61 ± 0.69 = (-0.08, 1.30).(b) 18sp2/σ2 has a Chi-Square Distribution with 18 degrees of freedom.

Using the 2.5th and 97.5th percentiles, 95% = Prob[8.23 < 18sp2/σ2 < 31.53] =

Prob[8.23 < 9.6858/σ2 < 31.53] = Prob[31.53/9.6858 < σ2 < 8.23/9.6858] = Prob[0.554 < σ < 1.085]. A 95% confidence interval for σ is (0.554, 1.085).

15.1. D. t -stat = ^β/ βs = 13/7 = 1.857 with 20 - 2 = 18 degrees of freedom.

For 18 degrees of freedom, for a two-tailed test using the t-distribution, the critical values are: 10% 5% 2% 1%1.734 2.101 2.552 2.878Since 1.734 < 1.857 < 2.101, we reject H0 at 10% and do not reject H0 at 5%.

15.2. E. ^β = Σxiyi/Σxi2 = Σ(Xi - X)(Yi -Y) /Σ(Xi - X)2 = 1942.1/24.88 = 78.06.


15.3. C. ESS ≡ Σ îε 2 = Σ(Yi -

îY )2 = 282,750. s2 = ESS/(N - k) = 282,750/(15 - 2) = 21,750.

Σxi2 = Σ(Xi - X)2 = 24.88. βs 2= s2 / Σxi2 = 21750/24.88 = 874.2. βs = √874.2 = 29.57.

t = ^β / βs = 78.06/29.57 = 2.640. For 15 - 2 = 13 degrees of freedom, for a two-tailed test using

the t-distribution, the critical values are: 10% 5% 2% 1%1.771 2.160 2.650 3.012Since 2.160 < 2.640 < 2.650, we reject H0 at 5% and do not reject H0 at 2%.Comment: If one applies the F-Test, F = (RSS/(k-1))/(ESS/(N-k)) = ((434348 - 282750)/1)/(282750/13) = 6.970 = 2.6402 = t2.For 1 and 13 degrees of freedom, since 4.67 < 6.970 < 9.07, we reject at 5% and do not reject at 1%. Since the F-Table has fewer significance levels, we can not use it to determine whether to reject or not at 2%. (The critical value at 2% is 2.6502 = 7.02, is not shown in the F-Table. 6.970 < 7.02, so we would not reject at 2%.)Note that RSS = TSS - ESS = 434348 - 282,750 = 151,598 is alsoRSS = (Σxiyi)2 /Σxi2 =1942.12/24.88 = 151,598.

15.4. A. For 15 - 2 = 13 degrees of freedom, for 5% two-tailed test using the t-distribution, or a 95% confidence interval, we want ±2.160 standard deviations.^β ± t.975 βs = 78.06 ± (2.160)(29.57) = (14.2, 141.9).


15.5. B. ESS = TSS - RSS = 44 - 9 = 35. F = (RSS/1)/(ESS/(N - k)) = 9/(35/(27 - 2)) = 6.429.t -stat = √F = √6.429 = 2.536, with 27 - 2 = 25 degrees of freedom.For 25 degrees of freedom, for a two-tailed test using the t-distribution, the critical values are: 10% 5% 2% 1%1.708 2.060 2.485 2.787Since 2.485 < 2.536 < 2.787, reject H0 at 2% and do not reject H0 at 1%.

15.6. B. X = (1 + 2 + 3 + 4 + 5)/5 = 3. xi = Xi - X.

Y = (82 + 78 + 80 + 73 + 77)/5 = 78. yi = Yi - Y .

X Y x y xy x2

1 82 -2 4 -8 42 78 -1 0 0 13 80 0 2 0 04 73 1 -5 -5 15 77 2 -1 -2 4^β = Σxiyi/Σxi2 = -15/10 = -1.5.

α = Y - ^βX = 78 - (-1.5)(3) = 82.5.

Forecast for year 7 is: α + ^β7 = 82.5 + (-1.5)(7) = 72.


15.7. D. ΣXi2 = 1 + 4 + 9 + 16 + 25 = 55.

Xi Yi^

iY îε = Yi -

îY

1 82 81 12 78 79.5 -1.53 80 78 24 73 76.5 -3.55 77 75 2

s2 = Σ îε 2 /(N - 2) = (1 + 2.25 + 4 + 12.25 + 4)/(5 - 3) = 23.5/3 = 7.833.

Var[ α] = s2ΣXi2 /(NΣxi2) = (7.833)(55)/((5)(10)) = 8.616.

15.8. B. Var[^β] = s2 /Σxi2 = 7.833/10 = .7833. βs = √.7833 = .885.

t = ^β/ βs = -1.5/.885 = -1.7. |t| = 1.7.

15.9. E. There are 5 -2 = 3 degrees of freedom. The critical value of the two sided t-test at 10% is 2.353. 1.7 < 2.353, so the p-value is greater than 10%Comment: At a 10% significance level, we can not reject the hypothesis that β = 0.Using a computer, the p-value is 18.9%.

15.10. A. Cov[ α,^β] = -s2 X /Σxi2 = -(7.833)(3)/10 = -2.35.

15.11. A. Corr[ α,^β] = Cov[ α,

^β]/ √Var[ α]Var[

^β] = -2.35/√((8.616)(.7833)) = -.905.

Alternately, E[X2] = 55/5 = 11. Corr[ α,^β] = - X/√(E[X2] ) = -3/√11 = -.905.

15.12. E. Let h(α, β) = ^Y7 = α + 7β. ∂h/∂α = 1 and ∂h/ ∂β = 7. The gradient vector is: 1, 7.

Therefore, the variance of the forecast is:(transpose of gradient vector)(Variance-Covariance matrix)(gradient vector) = ( 8.616 -2.35) ( 1 ) (-7.834 )(1 7) ( ) ( ) = (1 7) ( ) = 14.10.

(-2.35 .7833) ( 7 ) (3.133 )The standard deviation of the forecast is: √14.10 = 3.75.

Alternately, Var[^Y7 ] = Var[ α + 7

^β] = Var[ α] + Var[7

^β] + 2Cov[ α , 7

^β] =

Var[ α] + 49Var[^β] + 14Cov[ α ,

^β] = 8.626 + (49)(.7833) + (14)(-2.35) = 14.11.

√14.11 = 3.76.Comment: See 4, 5/01, Q.25 and “Mahler’s Guide to Fitting Loss Distributions.”


15.13. C. ^Yt = α + t

^β ≅ t

^β, for very large t.

Var[^Yt] = Var[ α + t

^β] = Var[ α] + Var[t

^β] + 2Cov[ α , t

^β] =

Var[ α] + t2Var[^β] + 2tCov[ α ,

^β] ≅ t2Var[

^β], for very large t.

Coefficient of variation of the forecast for year t ≅ √(t2Var[^β] ) / t

^β = βs /

^β= .885/-1.5 = -.59.

Comment: In general, as t → ∞, the limit of the coefficient of variation of ^Yt is βs /

^β,

the inverse of the t-statistic used for testing the hypothesis β = 0. Even if one would use this regression to predict loss ratios 2 years beyond the most recent observation, an actuary would be unlikely to use it to extrapolate out 10, 100 or even 1000 years! The forecast for year 1000 is -1417.5, not a possible loss ratio! So while this has mathematical validity it has no actuarial meaning.

15.14. ^β is measuring ∆Y/∆X, so multiplying each of the X values by 12, divides

^β by 12.

When a variable is divided by a constant, so is its standard deviation.

Therefore, βs is divided by 12. t = ^β/ βs is unaffected.

Comment: In general, t-tests and F-tests are unaffected by changes in scale.

15.15. A. For the 2-variable model, F = (N - 2)R2/(1 - R2) = (15 - 2)(.45)/(1 - .45) = 10.636.t -stat = √F = √10.636 = 3.261 with 15 - 2 = 13 degrees of freedom.For 13 degrees of freedom, for a two-tailed test using the t-distribution, the critical values are: 10% 5% 2% 1%1.771 2.160 2.650 3.012 Since 3.012 < 3.261, reject H0 at 1%.Alternately, the critical value at 1% for the F-Stat with 1 and 13 degrees of freedom is 9.07.Since 9.07 < 10.636, reject H0 at 1%.

15.16. C. Since 100 is not in the 95% confidence interval we reject at 5%.Since 100 is in the 98% confidence interval we do not reject at 2%.

15.17. C. Since we have 300 observations, and 300 - 2 = 298 degrees of freedom, the critical values in the t-table are those for the Normal Distribution.Since H1: β < 0, we perform a one-sided test. t = -2.74/1.30 = -2.108. 1.960 < 2.108 < 2.326.Reject H0 at 2.5%, do not reject H0 at 1%.Alternately, using the Normal Table, for a one-sided test, p-value = Φ[-2.108] = 1 - .9824 = 1.76%. Reject H0 at 2.5%, do not reject H0 at 1%.

15.18. D. t = ^β / βs = -1.27/.57 = -2.23. F = t2 = 4.97.


15.19. ^β is measuring ∆Y/∆X, so multiplying each of the Y values by 1.3, multiplies

^β by 1.3.

When a variable is multiplied by a constant, so is its standard deviation.

Therefore, βs is multiplied by 1.3. t = ^β/ βs is unaffected.

Comment: In general, t-tests and F-tests are unaffected by changes in scale.

15.20. E. 1 - 2

R = (1 - R2)(N-1)/(N-k) ⇒ 1 - .72 = (1 - R2)(14)/(13) ⇒ R2 = .74.F = RSS/(k-1)/ESS/(N-k) = TSS R2/(k-1)/TSS(1 - R2)/(N-k) = R2/(1 - R2)(N-k)/(k-1) = (.74/.26)(13/1) = 37.Comment: Similar to 4, 5/00, Q.1.

15.21. B. ^β = Σxiyi/Σxi2 = Σ(Xi - X)(Yi -Y) /Σ(Xi - X)2 = 302.1/42.65 = 7.083.

ESS ≡ Σ îε 2 = Σ(Yi -

îY )2 = 7502. s2 = ESS/(N - k) = 7502/(25 - 2) = 326.2.

βs 2= s2 / Σxi2 = 326.2/42.65 = 7.648. βs = √7.648 = 2.766.

t = ^β / βs = 7.083/2.766 = 2.561. For 25 - 2 = 23 degrees of freedom, for a two-tailed test using

the t-distribution, the critical values are: 10% 5% 2% 1%1.714 2.069 2.500 2.807Since 2.500 < 2.561 < 2.807, we reject H0 at 2% and do not reject H0 at 1%.

15.22. D. ESS = TSS - RSS = 54 - 7 = 47. F = (RSS/1)/(ESS/(N - k)) = 7/(47/(47 - 2)) = 6.70.t -stat = √F = √6.70 = 2.59.

Alternately, ^Y - Y = α +

^βX - Y = Y -

^βX +

^βX - Y =

^β(X - X) =

^βx.

RSS = Σ(^Y - Y)2 = Σ(

^βx)2 =

^β2 Σx2. ⇒ 7 = (1.22)Σx2. ⇒ Σx2 = 4.861.

s2 = ESS/ (N - k) = (TSS - RSS)/(47 - 2) = 47/45 = 1.044.

Var[^β] = s2/Σx2 = 1.044/4.861 = .2148. βs = √.2148 = .463.

t -stat = ^β / βs = 1.2/.463 = 2.59.

15.23. A. 2.088 = F = (RSS/1)/(ESS/(7 - 2)) = RSS/(218.680/5). ⇒ RSS = (2.088)(43.736) = 91.32. R2 = RSS/TSS = 91.32/(91.32 + 218.680) = .295.Alternately, 2.088 = F = (N - 2)R2/(1 - R2) = (7 - 2)R2/(1 - R2), ⇒ R2 = .295.

15.24. B. F = RSS/(k-1)/ESS/(N-k) = TSS R2/(k-1)/TSS(1 - R2)/(N-k) = R2/(1 - R2)(N-k)/(k-1) = (.64/.36)(18/1) = 32.Comment: This F-Statistic has 1 and 18 degrees of freedom. Applying the F-Test in this case would be the same as applying the t-test to the slope coefficient. The t-statistic would be √32 = 5.66, with 18 degrees of freedom.


15.25. (i) All of the observations other than (9, 330), appear to fall approximately on a straight line. The observation (9, 330) does not appear to lie on this straight line.

2 4 6 8 10dur.

50

100

150

200

250

300

amount

(9, 330) is an “outlier” from the general linear pattern.(ii) Adding the fitted line, y = 22.4 + 25.4x, to the scatterplot:

2 4 6 8 10dur.

50

100

150

200

250

300

amount

(ii) (a) Excluding the observation (9, 330): Σ xi = 60 - 9 = 51, Σ xi2 = 402 - 81 = 321,

Σ yi = 1795 - 330 = 1465, Σ yi2 = 343,725 - 3302 = 234,825, Σ xiyi = 11,570 - (9)(330) = 8600.

^β = NΣXiYi - ΣXiΣYi / NΣXi2 - (ΣXi)2 =

(11)(8600) - (51)(1465)/(11)(321) - 512 = 21.38.

α = Y - ^βX = 1465/11 - (21.38)(51/11) = 34.06.

The fitted line is: y = 34.06 + 21.38 x.

(b) R2 = ^β2 Σ(Xi - X)2/Σ(Yi - Y )2 = 21.382ΣXi2 - (ΣXi)2/N/ΣYi2 - (ΣYi)2/N

= (457.1)(321 - 512/11)/(234825 - 14652/11) = 0.973.


(c) The line fit the 11 observations other than (9, 330) is shown as dashed:

2 4 6 8 10dur.

50

100

150

200

250

300

amount

(d) R2 has increased from 87.8% to 97.3%. Removing the outlier, has resulted in a much better fit for the remaining data.The fitted slope changed from 25.4 to 21.38.(e) H0: β = 25 versus H1: β ≠ 25.

For the second regression, ESS = (1 - R2)TSS = (.027)(234825 - 14652/11) = 1072.s2 = 1072/(11 - 2) = 119.

Var[^β] = s2/Σ(Xi - X)2 = 119/(321 - 512/11) = 1.408.

t = (21.38 - 25)/√1.408 = -3.05.For a two-sided test, with 9 degrees of freedom, the 2% critical value is 2.821 and the 1% critical value is 3.250. 2.821 < 3.05 < 3.250. Reject H0 at 2% and not at 1%.


15.26. (i) The scatterplot indicates that a linear relationship, with a negative slope, seems appropriate.

10 20 30 40 50 60Interval

1

1.5

2

2.5

3

Concentration

There are two points with a much higher post-mortem interval than the other observations.Care should be taken, as these two points might have a large impact on the regression results.(ii) SXX = Σx2 - (Σx)2/N = 9854.5 - 3372/18 = 3545.1111.

SYY = Σy2 - (Σy)2/N = 109.7936 - 42.982/18 = 7.1669111.

SXY = Σxy - (Σx)(Σy)/N = 672.8 - (337)(42.98)/18 = -131.88111.r = SXY/√(SXXSYY) = -131.88111/√(3545.1111)(7.1669111) = -0.827.

Test H0: ρ = 0 versus H1: ρ ≠ 0.

t = r√(N - 2)/(1 - r2) = (-0.827)√(18 - 2)/(1 - 0.8272) = -5.89.If H0 is true, t has a t-distribution with n - 2 = 16 degrees of freedom.The critical value for 1% is 2.921. Since 5.89 > 2.921, reject H0 at 1%.

(iii) ^β = SXY/SXX = -131.88111/3545.1111 = -0.0372.

α = Y - ^βX = 42.98/18 - (-0.0372)(337/18) = 3.084.

For 1 day (x = 24 hours): 3.084 - (0.0372)(24) = 2.19.For 2 days (x = 48 hours): 3.084 - (0.0372)(48) = 1.30.Even though 48 hours is within the range of observed x-values, one should be cautious about the forecast for 48 hours, since there are only 2 observations with x more than 26 hours.(iv) ESS = TSS - RSS = SYY - SXY2/SXX = 7.1669111 - (-131.88111)2/3545.1111 = 2.2608.

s2 = ESS/(N - 2) = 2.2608/(18 - 2) = 0.14130.

Var[^β] = s2/SXX = 0.14130/3545.1111 = 0.00003986.

For 18 - 2 = 16 degrees of freedom, the 1% critical value for the t-distribution is 2.921.99% confidence interval for β: -0.0372 ± 2.921√0.00003986 = -0.0372 ± 0.0184 = (-0.0556, -0.0188).zero is not in this 99% confidence interval for β. ⇒ Reject H0: β = 0 at 1%.


This is consistent with the previous rejection at 1% of the hypothesis that the correlation is 0.Comment: As shown in a solution to a problem in a previous section, for the linear regression model, Corr[X, Y] = β/√β2 + σ2/Var[X]. Therefore, in a linear regression, β = 0 if and only if ρ = 0.The test of ρ = 0 in part (ii) is equivalent to an F-Test of whether the slope is zero.If β = 0, then F = (N - 2)R2/(1 - R2), has an F-Distribution with 1 and N - 2 degrees of freedom.Therefore, if β = 0, t = √F, has a t-Distribution with N - 2 degrees of freedom.This test of correlation is discussed for example in Section 8.8 in Probability and Statistical Inference by Hogg and Tanis, or Section 9.7 in Introduction to Mathematical Statistics by Hogg, McKean, and Craig.

16.1. D. The p-value = Prob[Type I error] = Prob[rejecting H0 when it is true]. ⇒ Statement D is false. For 7 degrees of freedom, the 2% critical value is 2.998. 3 > 2.998 ⇒ Statement B is true.

16.2. to 16.4. Compute t = ^β/ βs . t has N - 2 degrees of freedom.

For 15 observations, t has 13 degrees of freedom and the 5% critical value is 2.160.Critical region is when we reject H0, which is when |t| ≥≥≥≥ 2.160. For 30 observations, t has 28 degrees of freedom and the 5% critical value is 2.048.Critical region is when we reject H0, which is when |t| ≥≥≥≥ 2.048. For a given significance level, the more data, the more powerful the test. With 30 observations the probability of making a Type II error is less than with only 15 observations.Comment: The probability of a Type I error, rejecting H0 when it is true, is the significance level of the test.

16.5. X = 810/5 = 162. S2 = (5/4)(131726/5 - 1622) = 126.5.t = (162 - 175)/√(126.5/5) = -2.585, with 4 degrees of freedom. Perform a 2-sided test.Since 2.132 < 2.585 < 2.776, reject H0 at 10% and do not reject at 5%.Kirk should have verified that these Slubs were fully grown adults (assuming Slubs do not continue to grow their whole life, as do many reptiles.)Kirk should have verified that these Slubs were males (assuming Slubs have genders, one of which corresponds to the human concept of male.)The test assumes a random sample of 5 out of the entire set of adult male Slubs.However, these five Slubs may have been related, for example they might have been brothers.They may have been the equivalent of a group of jockeys, who are shorter than the average population. The heights of Slubs in the capital might not be a representative sample of the heights over the whole planet.Finally, it would have been preferable to collect a larger (random) sample of data, improving the power of such a test.Comment: I have listed a number of possible problems with Kirk’s test. There may be others.


16.6. A. 1. True. This is the definition of the significance level. 2. False. One should not reject the null hypothesis (at the given level of significance) when the test statistic falls outside of the critical region. 3. False. The fact that the test criteria is not significant merely tells us that the data do not contradict the null hypothesis, rather than proving that H0 is true.

16.7. A. Statement 1 is true. A Type II error occurs if H0 is not rejected when it is false. ⇒ Statement 2 is false.Depending on the situation being modeled, either type of error can be worse.Comment: From a purely statistical point of view, one wants to avoid both types of errors, and neither is inherently worse. However, for a given sample size, decreasing the probability of one type of error increases the probability of the other type of error.

18.1. A. The means of X2 and X3 are 6 and 9. x2 = -4, -1, 1, 4. x3 = -5, -1, 1, 5.

Σx2i2 = (-4)2 + (-1)2 + 12 + 42 = 34. Σx3i2 = (-5)2 + (-1)2 + (1)2 + 52 = 52. Σx2ix3i = (-4)(-5) + (-1)(-1) + (1)(1) + (4)(5) = 42.^β2 = Σx2iyi Σx3i2 - Σx3iyi Σx2ix3i / Σx2i2 Σx3i2 - (Σx2ix3i)2

= Σx2iyi (52) - (Σx3iyi)(42)/(34)(52) - (42)2 = 13Σx2iyi - 10.5Σx3iyi.

We are given ^β2 = Σ wiYi . Thus wi = 13x2i - 10.5x3i.

(w1, w2, w3, w4) = (13)(-4, -1, 1, 4) - (10.5)(-5, -1, 1, 5) = (0.5, -2.5, 2.5, -0.5).

Alternately, using the matrix formulas for multiple regression, ^β = (X’X)-1X’Y,

(1 2 4) (4 24 36) (22.75 16.5 -13.5)X = (1 5 8) X’X = (24 178 258) (X’X)-1 = (16.5 13 -10.5) (1 7 10) (36 258 376) (-13.5 -10.5 8.5 ) (1 10 14)

(1.75 -2.75 3.25 -1.25)(X’X)-1X’ = (0.5 -2.5 2.5 -0.5 )

(-0.5 2 -2 0.5 )^β2 = Σ wiYi, with wi the elements of the 2nd row of (X’X)-1X’: (0.5, -2.5, 2.5, -0.5).Comment: Similar to 4, 11/01, Q.13. Note that Σ wi = 0. This is generally true for the slope parameters, but not the intercept parameter. For the intercept parameter, Σ wi = 1.75 - 2.75 + 3.25 - 1.25 = 1. The first column of the design matrix, X, is all ones.

Therefore, the first column of ((X’X)-1X’)X consists of the sums of the rows of ((X’X)-1X’), i.e. the sums of the different sets of w’s. However ((X’X)-1X’)X = the identity matrix, whose first column has a one followed by zeros. Therefore, the sum of the w’s for the intercept is 1, and the sum of the the w’s for a slope parameter is 0.


18.2. B. Y = 13. y = (-3, -5, 1, 7). Σx2iyi = 46. Σx3iyi = 56. ^β3 = Σx3iyi Σx2i2 - Σx2iyi Σx2ix3i / Σx2i2 Σx3i2 - (Σx2ix3i)2 =

(56)(34) - (46)(42)/(34)(52) - 422 = -7.Alternately, using the matrix formulas for multiple regression: (1 2 4) (4 24 36) (22.75 16.5 -13.5)X = (1 5 8) X’X = (24 178 258) (X’X)-1 = (16.5 13 -10.5) (1 7 10) (36 258 376) (-13.5 -10.5 8.5 ) (1 10 14) (10) (52) (16)

Y = ( 8) X’Y = (358) ^β = (X’X)-1X’Y = (10)

(14) (524) ( -7) (20)

18.3. E. From a previous solution, ^β2 = .5Y1 - 2.5Y2 + 2.5 Y3 - .5Y4 = 10.

^β1 = Y -

^β2 2X -

^β3 3X = 13 - (10)(6) - (9)(-7) = 16.

18.4. A., 18.5. C., 18.6. D., 18.7. E.s2 = ESS/(N - k) = 516727/(32 - 3) = 17818.rX X2 3

= Σx2ix3i/√(Σx2i2Σx3i2) = -612/√(23,266)(250) = -.254.

Var[^β2 ] = s2/(1 - rX X2 3

2)Σx2i2 = (17818)/(1 - .2542)(23,266) = .819.

Var[^β3] = s2/(1 - rX X2 3

2)Σx3i2 = (17818)/(1 - .2542)(250) = 76.2.

Cov[^β2 ,

^β3] = - rX X2 3

s2 / (1 - rX X2 32)√(Σx2i2Σx3i2) =

-(-.254)(17818)/(1 - .2542)√(23,266)(250) = 2.01.

Var[^β2 +

^β3] = Var[

^β3] + Var[

^β3] + 2 Cov[

^β2 ,

^β3] = .819 + 76.2 + (2)(2.01) = 81.0.

StdDev[^β2 +

^β3] = √81.0 = 9.0.



18.8. B. Y = β1 + β2X2 + β3X3.

Squared error: β12 + (β1 + 100β3 - 30)2 + (β1 + 100β2 - 40)2 + (β1 + 100β2 + 100β3 - 80)2.

Set the partial derivative with respect to β1 equal to zero:

0 = 2β1 + β1 + 100β3 - 30 + β1 + 100β2 - 40 + β1 + 100β2 + 100β3 - 80.

⇒ 2β1 + 100β2 + 100β3 = 75.


0 = 2100(β1 + 100β2 - 40) + 100(β1 + 100β2 + 100β3 - 80).

⇒ 2β1 + 200β2 + 100β3 = 120.


0 = 2100(β1 + 100β3 - 30) + 100(β1 + 100β2 + 100β3 - 80).

⇒ 2β1 + 100β2 + 200β3 = 110.

From the first two equations: 100β2 = 120 - 75. ⇒ β2 = .45.

From the last two equations: 100β2 - 100β3 = 120 - 110. ⇒ β3 = β2 - .1 = .35.

⇒ β1 = 60 - 100β2 - 50β3 = -2.5.

^Y = -2.5 + .45(hours at home) + .35(hours at library).60 = -2.5 + .45(hours at home) + (.35)(75). ⇒ hours at home = 80.56.

18.9. A. The variables X2i and X3i have means of zero, so they are in deviations form.

The usual 3 variable regression would give ^β1 = Y -

^β2 2X -

^β3 3X = Y .

If we were to rewrite Y in deviations form we would get ^β1 = 0 and the same

^β2 and

^β3;

thus the given model with no intercept has the same variance of its estimated slopes as would the usual 3 variable regression model.

Var[^β2 ] = s2/Σxi22 (1 - rX X2 3

2).

We are given Var[^β2 ] = 4s2/3 and Σxi22 = 1 ⇒ 4/3 = 1/(1 - rX X2 3

2). ⇒ rX X2 3= ±0.50.

Both X2 and X3 have standard deviations of 1, therefore, the regression of X2 on X3 has slope rX X2 3

. We are given this slope is negative, so that rX X2 3 < 0. rX X2 3

= -0.5.

Comment: In general, for a regression of X on Y, the slope is: ^β = Σxiyi/Σxi2 = rXYsY/sX.


18.10. A. 2X = 3X = 0. Σx2i2 = 4. Σx3i2 = 4. Σx2ix3i = 0. Y = 2.5. Σx2iyi = 0. Σx3iyi = 4. ^β2 = Σx2iyi Σx3i2 - Σx3iyi Σx2ix3i / Σx2i2 Σx3i2 - (Σx2ix3i)2 = (0)(4) - (4)(0)/(4)(4) - 02 = 0.Alternately, using the matrix form of regression: (1 -1 -1) (4 0 0) (1/4 0 0)X = (1 1 -1) X’X = (0 4 0) (X’X)-1 = (0 1/4 0) (1 -1 1) (0 0 4) (0 0 1/4) (1 1 1)

(1/4 0 0) (10) (2.5)^β = (X’X)-1X’Y = (0 1/4 0) ( 0 ) = ( 0 ) (0 0 1/4) ( 4 ) ( 1 ).^β1 = 2.5,

^β2 = 0, and

^β3 = 1.

18.11. C. Var[^β2 ] = s2/(1 - rX X2 3

2)Σx2i2 = 10/(1 - .52)(4) = 10/3 = 3.33.

Var[^β3] = s2/(1 - rX X2 3

2)Σx3i2 = 10/(1 - .52)(8) = 10/6 = 1.67.

Cov[^β2 ,

^β3] = - rX X2 3

s2/(1 - rX X2 32)√(Σx2i2Σx3i2) = -(.5)(10)/(1 - .52)√((4)(8)) = -1.1785.

Var[^β2 -

^β3] = Var[

^β2 ] + Var[

^β3] - 2Cov[

^β2 ,

^β3] = 3.33 + 1.67 - (2)(-1.1785) = 7.357.

StdDev[^β2 -

^β3] = √7.357 = 2.71.

Alternately, Σx2ix3i = rX X2 3√(Σx2i2Σx3i2) = (.5)√((4)(8)) = 2√2 = 2.828. If the variables were

rewritten in deviations form, the slope coefficients are the same as the original model, and the intercept is zero. The design matrix would be: (x2,1 x3,2)

x = (x2,2 x3,2) (... ...) (x2,30 x3,30)

(Σx2i2 Σx2ix3i ) (4 2.818)x’x = ( ) = ( ) (Σx2ix3i Σx3i2) (2.818 8)

( 8 -2.828) (3.33 -1.174)Variance-Covariance Matrix = s2(x’x)-1 = (10) ( )/24 = ( ) (-2.828 4) (-1.174 1.67 )

Var[^β2 -

^β3] = Var[

^β2 ] + Var[

^β3] - 2Cov[

^β2 ,

^β3] = 3.33 + 1.67 - (2)(-1.174) = 7.35.

StdDev[^β2 -

^β3] = √7.35 = 2.71.

Comment: Note that Corr[^β2 ,

^β3] = -.5 = - rX X2 3

.


18.12. B. The mean of X2 and X3 are each zero, so they are already in deviations form.

Σx2i2 = (-3)2 + (-1)2 + 12 + 32 = 20. Σx3i2 = (-1)2 + 32 + (-3)2 + 12 = 20. Σx2ix3i = (-3)(-1) + (-1)(3) + (1)(-3) + (3)(1) = 0.^β3 = Σx3iyi Σx2i2 - Σx2iyi Σx2ix3i / Σx2i2 Σx3i2 - (Σx2ix3i)2

= Σx3iyi (20) - (Σx2iyi)(0)/(20)(20) - (0)2 = Σx3iyi /20.

We are given ^β3 = Σ wiYi . Thus wi = x3i /20.

(w1, w2, w3, w4) = (-1, 3, -3, 1)/20 = (–0.05, 0.15, –0.15, 0.05).

Alternately, using the matrix formulas for multiple regression, ^β = (X’X)-1X’Y:

(1 -3 -1) (4 0 0) (1/4 0 0)X = (1 -1 3) X’X = (0 20 0) (X’X)-1 = (0 1/20 0) (1 1 -3) (0 0 20) (0 0 1/20) (1 3 1)

(1/4 1/4 1/4 1/4)(X’X)-1X’ = (-3/20 -1/20 1/20 3/20)

(-1/20 3/20 -3/20 1/20)^β3 = Σ wiYi, with wi the elements of the third row of (X’X)-1X’: (–0.05, 0.15, –0.15, 0.05).Comment: Arithmetic simplifies a little due to the terms which turn out to be zero in this case.

18.13. A. Continuing the previous solution:^β2 = Σx2iyi Σx3i2 - Σx3iyi Σx2ix3i / Σx2i2 Σx3i2 - (Σx2ix3i)2

= Σx2iyi (20) - (Σx3iyi)(0)/(20)(20) - (0)2 = Σx2iyi /20.

We are given ^β2 = Σ wiYi . Thus wi = x2i /20.

(w1, w2, w3, w4) = (-3, -1, 1, 3)/20 = (–0.15, -0.05, 0.05, 0.15).

Alternately, using the matrix formulas for multiple regression, ^β = (X’X)-1X’Y,

^β2 = Σ wiYi, with wi the elements of the 2nd row of (X’X)-1X’: (–0.15, -0.05, 0.05, 0.15).

19.1. ( 1.47027 )^β = (X’X)-1X’Y = ( 0.81449 )

(0.820444) ( 13.5286 )

This is a four variable model, 3 independent variables plus an intercept. ^β1 = 1.470.

^β2= 0.814.

^β3 = 0.820.

^β4 = 13.53.

19.2. ESS = Y’Y - ^β‘X’Y = 73990.3 - 72986.7 = 1003.6.

s2 = ESS/(N - k) = 1003.6/(20 - 4) = 62.72.Comment: Similar to Course 120 Sample Exam #3, Q.6.


19.3. (33.0 .442 -0.0507 -23.3 )s2(X’X)-1 = (.442 0.262 -0.0451 -0.0916)

(-0.0507 -0.0451 0.0446 -0.802)(-23.3 -0.0916 -0.802 43.4 )

19.4. The sum of the Y’s is the first element of X’Y: 1133.2. Y = 1133.2/20 = 56.66.TSS = Y’Y - NY2 = 73990.3 - (20)(56.662) = 9783.2.R2 = 1 - ESS/TSS = 1 - 1003.6/9783.2 = .897.

2R = 1 - (1 - R2)(N-1)/(N - k) = 1 - (1 - .897)(20 - 1)/(20 - 4) = .878.

19.5. D. s2 = ESS/(N - k) = (4799790 - 4283063)/ (32 - 3) = 17818.

Var[^β] = s2(X’X)-1. Therefore, Var[

^β2 ] = (17818)(0.0000459) = .818.

Var[^β3] = (17818)(0.00428) = 76.3. Cov[

^β2 ,

^β3] = (17818)(0.0001125) = 2.005.

Var[100^β2 + 10

^β3] = 10000 Var[

^β2 ] + 100 Var[

^β3] + 2000 Cov[

^β2 ,

^β3] =

(10000)(.818) + (100)(76.3) + (2000)(2.005) = 19820.

StdDev[100^β2 + 10

^β3] = √19820 = 141.

Comment: Similar to Course 120 Sample Exam #1, Q.4

1 - 2

R = (1 - R2)(N - 1)/(N - k) = (1 - .895)(20 - 1)/(20 - 4) = .125. 2

R = .875.

19.6. For the one variable model (slope and no intercept), the design matrix has a single column containing X. In other words, X is a column vector.

X’X = ΣXi2. (X’X)-1 = 1/ΣXi2. X’Y = ΣXiYi.^β = (X’X)-1X’Y = ΣXiYi/ΣXi2.

19.7. Since the regression passes through the point where all the variables are equal to their means, Y = 11 - 4 X2 + 7 X3 - 12 X4 = 11 - (4)(148/25) + (7)(201/25) - (12)(82/25) = 4.24.


19.8. For the two variable model (slope and intercept), the design matrix has 1s in the first column and Xi in the second column:

(1 X1) X = (1 X2)

(1 X3)(... ... )

X’ = (1 1 1 .... (X1 X2 X3 ....

X’X = N X

X Xi

i i2

Σ

Σ Σ

(X’X)-1 = Σ Σ

Σ

X - X

X Ni2

i

i −−

/N ΣXi2 - (ΣXi)2.

X’Y = (ΣYi, ΣXiYi), a column vector.

^β = (X’X)-1X’Y = (ΣYiΣXi2 - ΣXiΣXiYi, NΣXiYi - ΣXiYi/N ΣXi2 - (ΣXi)2.The first component of this vector gives the equation for the fitted intercept, and the second component gives the equation for the fitted slope of the two-variable model:

α = ΣYiΣXi2 - ΣXiΣXiYi / NΣXi2 - (ΣXi)2.

^β = NΣXiYi - ΣXiΣYi / NΣXi2 - (ΣXi)2.Comment: The equations in deviations form can be derived from these equations not in deviations form.

19.9. C. and 19.10. B. Model with no intercept. Put it in matrix form. (2 4) (10) (178 258) (358)X = (5 8) Y = ( 8 ) X’X = ( ) X’Y = ( ) (7 10) (14) (258 376) (524) (10 14) (20)

(376 -258) (376 -258)(X’X)-1 = ( )/(178)(376) - (258)(258) = ( )/364.

(-258 178) (-258 178) (376 -258)(358) (-1.6044)

^β = (X’X)-1X’Y = ( )( )/364 = ( ).

(-258 178)(524) ( 2.4945 )

^β2 = -1.6044.

^β3 = 2.4945.


19.11. E., 19.12. D., and 19.13. A.

^Y = -1.6044X2 + 2.4945X3 = (6.7692, 11.934, 13.7142, 18.879).

ε = Y-^Y = (10, 8 ,14, 20) - (6.7692, 11.934, 13.7142, 18.879) = (3.2308, -3.934, .2858, 1.121).

ESS = Σ ε2 = 3.23082 + (-3.9342) + .28582 + 1.1212 = 27.253.s2 = ESS/(N -k) = 27.253/(4 - 2) = 13.626. (376 -258) (14.075 -9.658)

Var[^β] = s2(X’X)-1 =13.626 ( )/364 = ( ).

(-258 178) (-9.658 6.663)

Var[^β2 ] = 14.075. Var[

^β3] = 6.663. Cov[

^β2 ,

^β3] = -9.658.

19.14. The design matrix X has a first column of ones and a second column equal to the Xi.

X’X = N X

X Xi

i i2

Σ

Σ Σ

(X’X)-1 = Σ Σ

Σ

X - X

X Ni2

i

i −−

/N ΣXi2 - (ΣXi)2 =

Σ Σ

Σ

X - X

X Ni2

i

i −−

/N Σxi2 =

E X X

X 1

2[ ] −−

−−

/ Σxi2.

X (X’X)-1 =

E X X X X - X

E X X X X - X

... ...

21 1

22 2

[ ]

[ ]

−−

−−

/ Σxi2.

H = X (X’X)-1X’ =

E X 2X X + X E X X X X X + X X ...

E X X X X X + X X E X 2X X + X ...

... ... ...

21 1

2 21 2 1 2

21 2 1 2

22 2

2[ ] [ ]

[ ] [ ]

−− −− −−

−− −− −−

/ Σxi2.

Hij = (E[X2] - Xi X - Xj X + XiXj) / Σxi2.

(I - H)σ2 = δij - (E[X2] - Xi X - Xj X + XiXj) / Σxi2σ2.

Var[ îε ] = σ21 - (E[X2] - 2Xi X + Xi2) / Σxi2.

Cov[ îε , ^

jε ] = -σ2(E[X2] - Xi X - Xj X + XiXj) / Σxi2.Comment: The hat matrix H is an N by N matrix.


19.15. Transpose of the design matrix, X’ = 1 1 1 1 1

1 2 3 4 5

.

X’X = 5 15

15 55

. (X’X)-1 =

55 -15

-15 510

/ .

X (X’X)-1 =

8 2

5 1

2 0

1 1

4 2

−−

−−

−−

−−

/10. H = X (X’X)-1X’ =

6 4 2 0 2

4 3 2 1 0

2 2 2 2 2

0 1 2 3 4

2 0 2 4 6

10

−−

−−

/ .

(I - H)σ2 = σ2

4 -4 -2 0 2

-4 7 -2 -1 0

-2 -2 8 -2 -2

0 -1 -2 7 -4

2 0 -2 -4 4

10

/ .

Comment: Var[ ε1] = 0.4σ2. Var[ ε2] = 0.7σ2. Var[ ε3] = 0.8σ2. Var[ ε4] = 0.7σ2. Var[ ε5] = 0.4σ2.

E[ESS] = ΣE[ ε i2] = ΣVar[ ε i] = (.4 + .7 + .8 + .7 + .4)σ2 = 3σ2 = (5 - 2)σ2 = (N - k)σ2.

The correlation matrix of the residuals is:

1 -.76 -.35 0 .5

-.76 1 -.26 -.14 0

-.35 -.26 1 -.26 -.35

0 -.14 -.26 1 -.76

.5 0 -.35 -.76 1

.

Note that Corr[ ε1 , ε5] = 0.5 > 0. For observations with X values near opposite extremes, the corresponding residuals may be positively correlated. It can be shown that if Xi = i, i = 1, 2, ... N, then

Var[ ε1] = σ2(N - 1)(N - 2)/N(N+1) = Var[ εN],

Cov[ ε1 , εN] = σ22(N - 2)/N(N+1), and Corr[ ε1 , εN] = 2/(N - 1).

19.16. C. Var[^β] = s2(X’X)-1. Var[

^β2 ] = s2(.0087). Var[

^β3] = s2(.0087).

Cov[ ^β2 ,

^β3] = s2(-.0020).

Var[^β2-

^β3] = Var[

^β2 ] + Var[

^β3] - 2Cov[

^β2 ,

^β3] = .0214s2 = (.0214)(280.1167) = 5.994.

The estimated standard error of ^β2 -

^β3 is: √5.994 = 2.45.

Comment: Var[^β2 ] is s2 times the 2,2 element of (X’X)-1.

Cov[ ^β2 ,

^β3] is s2 times the 2,3 element of (X’X)-1.


19.17. E. ^Y = X ^β = X(X’X)-1X’Y.

^Y’

^Y = X(X’X)-1X’Y’X(X’X)-1X’Y = Y’X(X’X)-1X’X(X’X)-1X’Y = Y’X(X’X)-1X’Y = Y’X ^β .

^Y’Y = X ^β’Y = ^β ‘(X’Y) = (5.22, 1.62, 0.21, -0.45)(261.5, 4041.5, 6177.5, 5707.0) = 6641.4.

Y’^Y = (

^Y’Y)’ = ( ^β ‘X’Y)’ = Y’X ^β =

^Y’

^Y.

ESS = Σ îε 2 = (Y -

^Y)’(Y -

^Y) = Y’Y -

^Y’Y - Y’

^Y +

^Y’

^Y = Y’Y - ^β ‘(X’Y) = 7995 - 6641.4 = 1353.6.

s2 = ESS/(N - k) = 1353.6/(30 - 4) = 52.1.Var[β3] = s2(3,3 element of (X’X)-1) = (52.1)(.0035) = .182. s

3β= √.182 = .427.

For the t-distribution, for 26 degrees of freedom, a 95% confidence interval is ±2.056 standard deviations. This has width: (2)(2.056)s

3β=(2)(2.056)(.427) = 1.76.

Comment: Difficult.

19.18. C. s2 = ESS/(N - k) = 282.82/(15 - 4) = 25.71. Covariance Matrix is s2(X’X)-1.

Var[^β2 ] = (25.71)(.03) = .7713. Var[

^β3] = (25.71)(2.14) = 55.02.

Cov[^β2 ,

^β3] = (25.71)(.11) = 2.828.

Var[^β3 -

^β2 ] = Var[

^β3] + Var[

^β2 ] - 2 Cov[

^β2 ,

^β3] = 55.02 + .7713 - (2)(2.828) = 50.14.

StdErr[^β3 -

^β2 ] = √50.14 = 7.08.

Comment: Var[^β3 -

^β4 ] = Var[

^β3] + Var[

^β4 ] - 2 Cov[

^β2 ,

^β4 ] = 55.02 + 111.07 - (2)(-64.79) =

295.67. StdErr[^β3 -

^β4 ] = √295.67 = 17.2.

19.19. D. Y = β1 + 500β2 + β3 for a private hospital with 500 beds, while Y = β1 + 500β2 for a

public hospital with 500 beds. The difference is β3. ^β3 = 28. Var[

^β3] = 38.8423.

For 393 - 3 = 390 degrees of freedom, the t-distribution for a total of 5% probability in both tails has a critical value of 1.960, the same as the Normal Distribution.28 ± 1.960√38.8423 = 28 ± 12.2 = (15.8, 40.2).Comment: The adjustment that was made for heteroscedasticity, similar to 4, 11/00, Q. 31, affects the fitted parameters and the estimated covariance matrix. However, once we have the fitted parameters and estimated covariance matrix, the fact that such an adjustment was made can be ignored for purposes of answering the question that was asked.


19.20. E. Y = β1 + 300β2 + β3 for a private hospital with 300 beds, while Y = β1 + 400β2 for a

public hospital with 400 beds. The difference is β3 - 100β2. ^β2= 3.1.

^β3 = 28.

^β3 - 100

^β2 = -282.

Var[^β3] = 38.8423. Var[

^β2 ] = .0035. Cov[

^β2 ,

^β3] = 0.0357.

Var[^β3 - 100

^β2] = Var[

^β3] + 10000Var[

^β2 ] - 200Cov[

^β2 ,

^β3] = 66.7023.

For 393 - 3 = 390 degrees of freedom, the t-distribution for a total of 1% probability in both tails has a critical value of 2.576, the same as the Normal Distribution.-282 ± 2.576√66.7023 = -282 ± 21.0 = (-303.0, -261.0).

Comment: For some reason the exam question listed ^β2before

^β1!

The same order was used when giving the variances and covariances.

Therefore, the first row and first column of the given matrix refer to ^β2 .

19.21. A. Var[^β1 + 600

^β2 ] = Var[

^β1] + 6002Var[

^β2 ] + (2)(600)Cov[

^β1 ,

^β2 ] =

1.89952 + (360,000)(.00001) + (1200)(-.00364) = 1.13152.

The standard error of ^β1 + 600

^β2 is: √1.13152 = 1.06373.

20.1. Degrees of Freedom = N - k = 30 - 4 = 26. t = -4.421/2.203 = -2.007.Since 1.706 < 2.007 < 2.056, reject at 10% and do not reject at 5%.

20.2. Degrees of Freedom = N - k = 36 - 6 = 30. t = (3.13 - 1)/.816 = 2.610.Since 2.457 < 2.610 < 2.750, reject at 2% and do not reject at 1%.

20.3. ESS = TSS - RSS = 3,600,196. F = RSS/(k-1)/ESS/(N - k) = 5,018,232/(3-1)/3,600,196/(15 - 3) = 8.363.For 2 and 12 degrees of freedom, the critical values are: 3.88 at 5% and 6.93 at 1%.Since 8.363 > 6.93, we reject the hypothesis at 1%. Comment: Using a computer, the p-value is .53%.

20.4. D. 1 - 2

R = (1 - R2)(N - 1)/(N - k) = (1 - .912)(23 - 1)/(23 - 5) = .108. 2

R = .892.

20.5. A. F = RSS/(k-1)/ESS/(N-k) = (N-k)/(k-1)R2TSS/(1-R2)TSS = (N-k)/(k-1)R2/(1-R2) = ((23 - 5)/(4 - 1))(.912)/(1 - .912) = 46.64.Comment: If for example TSS = 1000, then R2 ≡ RSS/TSS, so that RSS = 912. ESS = TSS - RSS = 88. F = RSS/(k-1)/ESS/(N-k) = (912/4)/(88/18) = 228/4.889 = 46.64.

20.6. D. F = (explained variance)/(unexplained variance).Comment: Statement A is R2.


20.7. E. The definition of the F-Distribution involves the ratio of two independent Chi-Square Distributions. In the derivation of this result, in order to get Chi-Square Distributions, the errors have to be independent Normal Distributions with mean zero and the same variance. Then F follows an F-Distribution, provided H0: all the slopes are zero, is true.Comment: If there were multicollinearity, then the effective value of k would be smaller than the number of variables actually used.

20.8. C. s1β= √426076 = 652.7. t = (

^β1 - 1500)/s

1β= (-20.352 - 1500)/652.7= -2.329.

For 15 - 3 = 12 degrees of freedom, the critical values for 5% and 2% are 2.179 and 2.681. Since 2.179 < 2.329 < 2.681, we reject at 5% and do not reject the hypothesis at 2%.

20.9. E. s2β= √58.85 = 7.671. t =

^β2 /s

2β= 13.3504/7.671 = 1.740.

For 15 - 3 = 12 degrees of freedom, the critical value for 10% is 1.782. Since 1.740 < 1.782, we do not reject the hypothesis at 10%.

20.10. A. s3β= √4034 = 63.51. t =

^β3/s

3β= 243.714/63.51 = 3.837.

For 15 - 3 = 12 degrees of freedom, the critical value for 1% is 3.055. Since 3.837 > 3.005, we reject the hypothesis at 1%.

20.11. E. Var[β1 + 50β2 + 10β3] =

Var[β1] + 2500Var[β2] + 100Var[β3] + 100Cov[β1, β2] + 20Cov[β1, β3] + 1000Cov[β2, β3] =

426076 + (2500)(58.85) + (100)(4034) + (100)(-2435) + (20)(-36703) + (1000)(41.99) = 41031. ^β1 + 50

^β2 + 10

^β3 = -20.352 + (50)(13.3504) + (10)(243.714) = 3084.

For the t-distribution with 15 - 3 = 12 degrees of freedom, the critical values for 5% is 2.179. thus a 95% confidence interval is: 3084 ± (2.179)√41031 = 3084 ± 441 = (2643, 3525).

20.12. A. For testing β = 0, t = ^β/ βs . βs = -1.9/(-2.70) = 0.7037.

Var[ α] = Var[^β] ΣXi2/N = (0.70372) 1018/16 = 31.507.

For testing α = 10, t = (27 - 10)/√31.507 = 3.029.For 16 - 2 = 14 degrees of freedom, for a 2-sided test 2.977 is the 1% critical value.3.029 > 2.977. Reject H0 at 1%.

20.13. D. t = 88/49 = 1.796, for 50 - 3 = 47 degrees of freedom.As shown in the t-table for 47 degrees of freedom, the 10% and 5% critical values are about1.68 and 2.01, so we reject H0 at 10% and do not reject at 5%.

20.14. B. t = 0.031/0.012 = 2.583, for 50 - 3 = 47 degrees of freedom.As shown in the t-table for 47 degrees of freedom, the 2% and 1% critical values are about2.41 and 2.69, so we reject H0 at 2% and do not reject at 1%.


20.15. E. t = -0.72/0.46 = -1.565, for 50 - 3 = 47 degrees of freedom.As shown in the t-table for 47 degrees of freedom, the 10% critical value is about 1.68. 1.565 < 1.68, so we do not reject H0 at 10%.

20.16. B. s2 = ESS/(N - 3) = 63/(50 - 3) = 1.340. s = 1.158.

20.17. 1 - 2

R = (1 - R2)(N - 1)/(N - k) = (1 - .84)(50 - 1)/(50-3) = 0.1668. 2

R = 0.833.

20.18. R2 = 1 - ESS/TSS. ⇒ .84 = 1 - 63/TSS. ⇒ TSS = 393.75.RSS = R2 TSS = (.84)(393.75) = 330.75.F = RSS/(k - 1)/ESS/(N - k) = 330.75/(3 - 1)/63/(50 - 3) = 123.375 with 2 and 47 degrees of freedom. The 1% critical value is about 5.5. 123.375 > 5.5, so we reject H0 at 1%!

Alternately, F = R2/(1 - R2)(N - k)/(k - 1) = (.84/.16)(47/2) = 123.375. Proceed as before.

20.19. to 20.21. Compute F-Statistic = RSS/(k-1)/ESS/(N - k) = (RSS/4)/ESS/(N - 5).F has k - 1 = 4 and N - 5 degrees of freedom.For 15 observations, F has 4 and 10 degrees of freedom and the 5% critical value is 3.48.Critical region is when we reject H0, which is when F ≥≥≥≥ 3.48. For 30 observations, F has 4 and 25 degrees of freedom and the 5% critical value is 2.76.Critical region is when we reject H0, which is when F ≥≥≥≥ 2.76. For a given significance level, the more data, the more powerful the test. With 30 observations the probability of making a Type II error is less than with only 15 observations.Comment: The probability of a Type I error, rejecting H0 when it is true, is the significance level of the test.

20.22. B. ESSUR = ESSV = 4.35. ESSR = ESSI = 5.85. N = 5. q = dimension of restriction = 1.k = independent variables for the unrestricted model = 3.(ESSR - ESSUR)/q/ESSUR/(N - k) = (5.85 - 4.35)/1 /4.35/(5 - 3) = .69 is an F-Statistic with 1 and 2 degrees of freedom. Comment: Sample variance of Y = Σ(Yi - Y )2/(N - 1). ⇒ 2.2 = TSS / 4. ⇒ TSS = 8.8.There is a lot of information given in this question that is not used.We are comparing a model with an intercept, X2 and X4, to a model with an intercept and X2.The unrestricted model has 3 variables including the intercept.The restricted model has 2 variables including the intercept.q is the difference between the number of variables in the unrestricted model and the restricted model. Thus q = 3 - 2 = 1.

20.23. C. ^β3 = -2. s2 = ESS/(N - k) = 12/3 = 4.

variance-covariance matrix of ^β is: s2(X’X)-1. ⇒ ^s

β32 = (2/3)s2 = 8/3. ⇒ ^s

β3 = 1.633.

The t statistic for testing the null hypothesis H0: β3 = 1 is: (^β3 - 1)/ ^s

β3 = (-2 - 1)/1.633 = -1.84.

Comment: For 3 degrees of freedom, since 1.638 < 1.84 < 2.353, we reject H0 at 20% and do not reject H0 at 10%, for a two-tailed test.


20.24. B. The first model has the restriction β4 = 0. ESS = TSS - RSS. Error sum of squares for the unrestricted (second) model = ESSUR = 128 - 65.6 = 62.4. Error sum of squares for the restricted (first) model = ESSR = 128 - 61.3 = 66.7.N = number of observations = 10. q = dimension of restriction = 1.k = independent variables for the unrestricted model = 4.F = (ESSR - ESSUR)/q / ESSUR/(N - k) = (66.7 - 62.4)/1/62.4/(10 - 4) = .41.Comment: The TSS only depends on the data, so it is equal for the two models.

21.1. C. The unrestricted model is Model I with ESS = 2721. To obtain the restricted model, set β2 = β3 = 0 to yield Model III with ESS = 3763. Then, ESSUR

= Error Sum of Squares of the Unrestricted model = 2721.ESSR = Error Sum of Squares of the Restricted model = 3763.q = dimension of the restriction = independent variables for unrestricted model - independent variables restricted model = 4 - 2 = 2. N = number of observations = 25.k = independent variables for the unrestricted model = 4.(ESSR - ESSUR)/q/ESSUR/(N - k) = (3763 - 2721)/2 /2721/(25 - 4) = 4.02 is an F-Statistic with 2 and 21 degrees of freedom.

21.2. 4.02 is an F-Statistic with 2 and 21 degrees of freedom. Using the table with ν1 = 2 and

ν2 = 21, the critical values are 3.47 and 5.78 for 5% and 1% respectively. Since 3.47 < 4.02 < 5.78, we reject at 5% and do not reject at 1% the null hypothesis.

21.3. A. The unrestricted model is Model I with ESS = 2721. To obtain the restricted model, substitute β3 = β2 to yield Model II with ESS = 3024. Then, ESSUR = Error Sum of Squares of the Unrestricted model = 2721.ESSR = Error Sum of Squares of the Restricted model = 3024.q = dimension of the restriction = independent variables for unrestricted model - independent variables restricted model = 4 - 3 = 1. N = number of observations = 25.k = independent variables for the unrestricted model = 4.(ESSR - ESSUR)/q/ESSUR/(N - k) = (3024 - 2721)/1 /2721/(25 - 4) = 2.33 is an F-Statistic with 1 and 21 degrees of freedom.


ν2 = 21, the critical values are 4.32 and 8.02 for 5% and 1% respectively.

Since 2.33 < 4.32, we do not reject at 5% the null hypothesis, that β2 = β3.


21.5. B. The unrestricted model is Model I with ESS = 2721. To obtain the restricted model, substitute β4 = 1 - β2 to yield Model VI with ESS = 3897 . Then, ESSUR = Error Sum of Squares of the Unrestricted model = 2721.ESSR = Error Sum of Squares of the Restricted model = 3897.q = dimension of the restriction = independent variables for unrestricted model - independent variables restricted model = 4 - 3 = 1.N = number of observations = 25.k = independent variables for the unrestricted model = 4.(ESSR - ESSUR)/q/ESSUR/(N - k) = (3897 - 2721)/1 /2721/(25 - 4) = 9.08 is an F-Statistic with 1 and 21 degrees of freedom.


ν2 = 21, the critical values are 4.32 and 8.02 for 5% and 1% respectively.

Since 9.08 > 8.02, we reject at 1% the null hypothesis, that β2 + β4 = 1.

21.7. X = 55. x = -10, -5, 0, 5, 10. Y = 63.6. y = -20.6, -5.6, -.6, 12.4, 14.4.

Σxi2 = 250. Σxiyi = 440. ^β = 440/250 = 1.76. α = Y -

^βX = -33.2.

^Y = -33.2 + 1.76X = 46, 54.8, 63.6, 72.4, 81.2.

ε = Y - ^Y = -3, 3.2, -0.6, 3.6, -3.2. ESS = Σ εt

2 = 42.8.

21.8. Restricting α = 0 and β = 1, ^tY = Xt. εt = Yt - Xt.

ESSrestricted = Σ εt2 = (43 - 45)2 + (58 - 50)2 + (63 - 55)2 + (76 - 60)2 + (78 - 65)2 = 557.

The restriction is two dimensional, q = 2.F-Statistic = (ESSR - ESSUR)/q / ESSUR/(N - k) = (557 - 42.8/2/42.8/(5 - 2) = 18.02The numerator has 2 degrees of freedom and the denominator has 5 - 2 = 3 degrees of freedom. Since 9.55 < 18.02 < 30.82, we reject H0 at the 5%, but not at 1%.Comment: Similar to 4, 11/03, Q.20.

21.9. C. F = (ESSR - ESSUR)/q/ESSUR/(N - k) = (ESSC - ESS1 - ESS2)/k/(ESS1 + ESS2)/(N1 + N2 - 2k) =(5735 - 2573 - 2041)/3(2573 + 2041)/(30 + 20 - (2)(3)) = 373.7/104.9 = 3.56 Comment: Similar to 4, 11/00, Q. 21. The F-statistic has 3 and 44 degrees of freedom.For 3 and 40 degrees of freedom, the critical value at 5% is 2.84, and at 1% it is 4.31. Since 2.84 < 3.56 < 4.31, we reject the hypothesis at 5% and do not reject at 1%.


21.10. B. N - k is the number of degrees of freedom associated with the Error Sum of Squares for the unrestricted model, 33 for the first model.q = dimension of the restriction = the difference in the number of degrees of freedom associated with the regression sum of squares for the unrestricted and restricted models = 5 - 3 = 2. (2 independent variables must have been excluded.)Error Sum of Squares for the unrestricted model = 22,070.TSS = 72,195 + 22,070 = 94,265.Error Sum of Squares for the restricted model = TSS - RSSR = 94,265 - 63,021 = 31,244.F = (ESSR - ESSUR)/q/ESSUR/(N - k) = (31,244 - 22,070)/2 /22,070/33 = 4587/668.8= 6.86 is an F-Statistic with 2 and 33 degrees of freedom.

21.11. One initial step would be to examine each independent variable to see whether it seems reasonable that it could affect the claim frequency. If so, does the sign of the slope make sense? Data should go back and test whether each of the coefficients separately are significantly different from zero, using the t-test. Data should go back and test whether groups of the coefficients separately are significantly different from zero using F-Tests.While R2 = 0.92 is rather high, one needs to take into account the enormous number of separate regressions that were run. From 25 characteristics, the number of distinct sets of size 5 is: (25)(24)(23)(22)(21)/5! = 53,130. Thus even if there were no relation between the independent variables and claim frequency, it is not surprising that one of these regressions would show a good match to the observations, and thus have a high R2. One should be somewhat skeptical whenever a very large set of models has been examined, and the one that fits best has been selected.It would be very useful if the model could be tested on a similar data set that had not been used in this selection process.Comment: These are some reasonable things one could say. There are probably others.

2R = 1 - (1 - R2)(N - 1)/(N - k) = 1 - (1 - .92)(16 - 1)/(16 - 6) = 0.88, still rather high.Since all of the regressions used the same number of observations and the same number of

variables, the rankings by R2 are the same as the rankings by 2

R . In order to test H0: β1 = β2 = β3 = β4 = β5 = 0, F = R2/(1 - R2)(N - k)/(k - 1) = (.92/.08)(16 - 6)/(6 - 1) = 23, with 5 and 10 degrees of freedom. The 1% critical value is 5.64. Since 23 > 5.64, if β1 = β2 = β3 = β4 = β5 = 0, the probability of F ≥ 23 is less than 1%.Using a computer, Prob[F ≥ 23 | H0] = 0.0000346612. 1/0.0000346612 = 28,851.Thus we would expect to see F ≥ 23 by random chance, 1 in every 28,851 times.Alternately, as discussed previously, if all of the actual slopes of the model are zero, R2 follows a Beta Distribution as per Loss Models with a = ν1/2 = (k - 1)/2, b = ν2/2 = (N - k)/2, and θ = 1.In this case, a = (6 - 1)/2 = 2.5, and b = (16 - 6)/2 = 5.Thus, for a model that actually explains nothing about claim frequency, the probability that R2 ≥ .92 is: 1 - β[2.5, 5; .92]. Using a computer, 1 - β[2.5, 5; .92] = 0.0000346612. 1/0.0000346612 = 28,851.Thus we would expect to see R2 ≥ .92 by random chance, 1 in every 28,851 times.Thus one is not shocked that one out of 53,130 regressions tested has R2 as big as .92.


(On the one hand, many regressions were very similar, sharing all but one independent variable. So their matches to the observations were not independent of each other.On the other hand, this statistical result assumed independent variables with absolutely no explanatory value whatsoever, a somewhat extreme assumption for practical applications.)

21.12. D. Test the hypothesis H0: β3 = β4 = 0. The first model is unrestricted (UR). The second model is restricted (R).N = 11. k = 4. q = 3 - 1 = 2 = 4 - 2.F = (ESSR - ESSUR)/q / ESSUR/(N - k) = (27.7281 - 12.8156)/2/(12.8156/7) = 4.07.Comment: At 2 and 7 degrees of freedom, the critical value at 5% is 4.74. Since 4.07 < 4.74 we do not reject H0 at 5%. One could figure out the sample has 11 employees, by adding 1 to the sum of the degrees of freedom; 11 = 1 + 3 + 7 = 1 + 1 + 9.

21.13. D. There are eight independent variables. Thus including the intercept, k = 8 + 1 = 9. q = dimension of the restriction = 8 - 2 = 6.N = 27. Error Sum of Squares for the restricted model = 126,471. Error Sum of Squares for the unrestricted model = 76,893. F = (ESSR - ESSUR)/q/ESSUR/(N - k) = (126,471 - 76,893)/6 /76,893/(27 - 9) = 8263/4271.8 = 1.93 is an F-Statistic with 6 and 18 degrees of freedom.Comment: N - k is the number of degrees of freedom associated with the Error Sum of Squares for the unrestricted model, 18 for the first model. q is the difference in the number of degrees of freedom associated with the regression sum of squares for the unrestricted and restricted models. Note that the numerator of the F-Statistic is also: (RSSUR - RSSR)/q =(115,175 - 65,597)/(8 - 2) = 8263.

21.14. C. F = (ESSR - ESSUR)/q / ESSUR/(N - k) = (ESSI - ESSII)/2/ESSII/(30 -4) = 13(ESSI - ESSII)/ESSII.

We have TSS = Σ(Y - Y )2 = 160. ESSII = (1 - R2II)TSS = (1 - .7)(160) = 48.

For Model I, ^β2= Σx2iyi / Σx2i2 ⇒ -2 = Σx2iyi/10 ⇒ Σx2iyi = -20.

For Model I, ESSI = Σ(yi - ^β2x2i)2 = Σyi2 +

^β22 Σx2i2 - 2

^β2Σx2iyi = 160 + (4)(10) - (2)(-2)(-20) =

120.F = 13(ESSI - ESSII)/ESSII = (13)(120 - 48)/48 = 19.5.

Alternately, for Model I, which is a two variable model, we have:

R2I = ^β22Σx2i2 /Σyi2 = (4)(10)/160 = .25. ⇒ ESSI = (1 - R2I)TSS = (1 - .25)(160) = 120.

(See 73 of Pindyck & Rubinfeld.) Proceed as above.Alternately, F = (R2UR - R2R)/q / R2R/(N - k). Proceed as above.


21.15. A. One takes G as the restricted model, and A plus B as the unrestricted model.There are 30 + 50 = 80 total observations ⇒ N = 80.There are three restrictions: A1 = B1, A2 = B2, A3 = B3 ⇒ q = 3. There are 6 coefficients being fit in the unrestricted model ⇒ k = 6.ESSR = ESSG. ESSUR = ESSA + ESSB. F = (ESSR - ESSUR)/q/ESSUR/(N - k) = (ESSG - ESSA - ESSB)/3/(ESSA + ESSB)/74.This F-statistic has 3 and 74 degrees of freedom.Comment: See Section 5.3.3 of Pindyck and Rubinfeld. Note that the F-Statistic = (R2UR - R2R)/q/(1 - R2UR)/(N - k) = ( A

2R + B2R - G

2R )/3/(1 - A2R - B

2R )/74.When guessing, some people compare the choices and choose those features that show up most often. In this case, three involve ESS, while only two involve R2, so one would choose ESS. Two involve 3,74, two involve 6,77, and only one involves 6,74, so one would choose either 3,74 or 6,77. Thus in this case, we would guess either A or B.

21.16. D. The unrestricted model is Model I with ESS = 484. To obtain the restricted model, substitute β3 = 1 - β2 to yield Model III with ESS = 982. Then, ESSUR = Error Sum of Squares of the Unrestricted model = 484.ESSR = Error Sum of Squares of the Restricted model = 982.q = dimension of the restriction = independent variables for unrestricted model - independent variables restricted model = 3 - 2 = 1. N = number of observations = 20.k = independent variables for the unrestricted model = 3.(ESSR - ESSUR)/q/ESSUR/(N - k) = (982 - 484)/1 /484/(20 - 3) = 17.49 is an F-Statistic with 1 and 17 degrees of freedom.Comment: See Equation 5.20 at page 129 of Econometric Models and Economic Forecasts.

21.17. 17.49 is an F-Statistic with 1 and 17 degrees of freedom. Using the table with ν1 = 1

and ν2 = 17, the critical values are 4.45 and 8.40 for 5% and 1% respectively.

Since 17.49 > 8.40, we reject at 1% the null hypothesis, that β2 + β3 = 1.

Comment: Using a computer, the p-value is 0.06%. So we really, really reject the H0.

21.18. B. The unrestricted model is Model I with ESS = 484. To obtain the restricted model, substitute β3 = β2 to yield Model II with ESS = 925. Then, ESSUR = Error Sum of Squares of the Unrestricted model = 484.ESSR = Error Sum of Squares of the Restricted model = 925.q = dimension of the restriction = independent variables for unrestricted model - independent variables restricted model = 3 - 2 = 1. N = number of observations = 20.k = independent variables for the unrestricted model = 3.(ESSR - ESSUR)/q/ESSUR/(N - k) = (925 - 484)/1 /484/(20 - 3) = 15.49 is an F-Statistic with 1 and 17 degrees of freedom.Comment: See Section 5.3.2 of Econometric Models and Economic Forecasts.


21.19. 15.49 is an F-Statistic with 1 and 17 degrees of freedom. Using the table with ν1 = 1

and ν2 = 17, the critical values are 4.45 and 8.40 for 5% and 1% respectively.

Since 15.49 > 8.40, we reject at 1% the null hypothesis, that β2 = β3.Comment: Using a computer, the p-value is 0.11%.

21.20. D. ESSR = TSS - RSSR = 15,000 - 5,565 = 9,435.

ESSUR = (1 - RUR2)TSS = (1 - 0.38)(15,000) = 9,300.F = (ESSR - ESSUR)/q / ESSUR/(N - k) = (9435 - 9300)/3/9300/(3120 - 6) = 15.07.

Alternately, RR2 = RSSR/TSS = 5565/15000 = 0.371.

F = (R2UR - R2R)/q/(1 - R2UR)/(N - k) = (0.38 - 0.371)/3/(1 - 0.38)/(3120 - 6) = 15.07.Comment: This F-Statistic has 3 and 3114 degrees of freedom.

21.21. A. Restricting α = 0 and β = 1, ^tY = Xt. εt = Yt -

^tY = Yt - Xt.

ESSrestricted = Σ εt2 = (254 - 475)2 + (463 - 254)2 + (515 - 463)2 + (567 - 515)2 + (605 - 567)2

= 99,374. The restriction is two dimensional, q = 2. F-Statistic = (ESSR - ESSUR)/q / ESSUR/(N - k) = (99374 - 69843)/2/69843/(5 - 2) = .634.The numerator has 2 degrees of freedom and the denominator has 5 - 2 = 3 degrees of freedom. Since .634 < 9.55, we do not reject H0 at the 5% significance level.

Comment: Y = α + βX + ε. If α = 0 and β = 1, then X = Y + ε, and the actuary’s forecast method is a good one. The actuary’s forecast method is to use the current year as the prediction of the next year.

21.22. A. F = (ESSR - ESSUR)/q/ESSUR/(N - k) = (ESSC - ESSA - ESSB)/k/(ESSA + ESSB)/(NA + NB - 2k), with k and NA + NB - 2k degrees of freedom.ESSC = 10,374. ESSA = 4053. ESSB = 2087. k = 4. NA = 18 and NB = 19.F = (10374 - 4053 - 2087)/4)/(4053 + 2087)/(18 + 19 - 8) = 5.00, with 4 and 29 degrees of freedom. From the table the critical value for 5% for 4 and 29 degrees of freedom is less than 2.74. 5.00 > 2.74 so it is statistically significant at the 5% significance level.Comment: I have assumed that the model includes an intercept, so that k = 4. The question should have made it clear whether the model had an intercept. If there were no intercept in the model then k = 3, which does not produce any of the given choices. The particular situation modeled here would require an intercept in order for the model to make sense.The critical value for 5% for 4 and 29 degrees of freedom is 2.70.From the table the critical value for 1% for 4 and 29 degrees of freedom is less than 4.14. 5.00 > 4.14 so the F statistic is statistically significant at the 1% significance level.The critical value for 1% for 4 and 29 degrees of freedom is 4.04.


21.23. A. For the unrestricted model, ESS = .482 + .282 + .442 + .202 + .442 = 0.736.X Y Unrestricted E r r o r Unrestricted E r r o r

Model Model

1 2 .8 2 .32 - 0 . 4 8 1.986 - 0 . 8 1 42 2.9 3 .18 0.28 2.979 0.0793 3.6 4 .04 0.44 3.972 0.3724 4.7 4 .90 0.20 4.965 0.2655 6.2 5 .76 - 0 . 4 4 5.958 - 0 . 2 4 2

For the restricted model, ESS = .8142 + .0792 + .3722 + .2652 + .2422 = 0.93601.q = dimension of the restriction = 1.N = number of data points = 5. k = number of independent variables including intercept = 2.F = (ESSR - ESSUR)/q/ESSUR/(N - k) = (.93601 - .736)/1/0.736/(5 - 2) = 0.815.This F-Statistic has 1 and 3 degrees of freedom.The 5% critical value for 1 and 3 degrees of freedom is 10.13.0.815 < 10.13, and therefore we do not reject H0 at 5%.

Solutions to problems in the remaining sections appear in Study Guides N, etc.


Mahler’s Guide to

Regression



prepared by


Study Aid F06-Reg-N



22.1. A. Fit a linear regression to the natural log of the claim sizes:ln(117) = 4.762, ln(132) = 4.883, ln(136) = 4.913, ln(149) = 5.004, ln(151) = 5.017. X = 3. x = -2, -1, 0, 1, 2. Σxi2 = 10.

Y = (4.762 + 4.883 + 4.913 + 5.004 + 5.017)/5 = 4.916.y = Y - Y = -.154, -.033, -.003, .088, .101.Σxiyi = (-2)(-.154) + (-1)(-.033) + (0)(-.003) + (1)(.088) + (2)(.101) = 0.631.

^β = Σxiyi /Σxi2 = .631/10 = .0631. b = exp[

^β] = e.0631 = 1.065.

Comment: α = Y − ^βX = 4.916 - (.0631)(3) = 4.73. a = exp[ α] = e4.73 = 113.

The fitted model is: Y = 113 (1.065t).

22.2. C. Let t be time (in years) and Y = ln(average written premium at current rate level).t = 45/9 = 5. Y = 63.50844/9 = 7.05649.Σt2/N = 285/9 = 31.667. ΣtY/N = 317.75352/9 = 35.30595.variance of t = Σt2/N - t 2 = 31.667 - 52 = 6.667.sample covariance of t and Y = ΣtY/N - t Y = 35.30595 - (5)(7.05649) = .0235.slope of the regression line = .0235/6.667 = .00352.

Year LN ofavg. W.P. 12 month avg

@CRL W.P. @ CRL t^2 t ln(avg w.p)

1 1156.83 7.05344 1 7.053442 1152.34 7.04955 4 14.099103 1153.64 7.05068 9 21.152034 1150.22 7.04771 1 6 28.190835 1144.52 7.04274 2 5 35.213706 1150.11 7.04761 3 6 42.285687 1164.21 7.05980 4 9 49.418598 1178.57 7.07206 6 4 56.576469 1193.75 7.08485 8 1 63.76369

Sum 4 5 10444.19 63.50844 2 8 5 317.75352

intercept of the regression line is: Y - ^β t = 7.05649 - (.00352)(5) = 7.03889.

Therefore, the exponential regression is: average written premium at current rate level = (e7.03889)e.00352t = 1140.1(1.00353t).For year 2002 (t = 12), the estimate is: 1140.1(1.00353t) = 1189.3.

HCMSA-F06-Reg-N, Solutions to Regression §22-32, 7/12/06, Page 488

22.3. Using the matrix formulas for multiple regression:The first three rows, out of fifteen rows in total, of the design matrix X are:(1 29 12 348)(1 21 8 168)(1 62 10 620)

(15 527 131 4549)X’X = (527 23651 4549 195459) (131 4549 1219 42241)

(4549 195459 42241 1759219) (5.6118 -0.113031 -0.676391 0.0142883)(X’X)-1 = (-0.113031 0.00282223 0.0140079 -0.000357637)

(-0.676391 0.0140079 0.0866804 -0.00188865)(0.0142883 -0.000357637 -0.00188865 0.0000487058)

( 38657 ) (1041.89)

X’Y = (1413683) ^β = (X’X)-1X’Y = (-13.2376)

( 355153 ) ( 103.306 ) (12885970) ( 3.62096)

ESS = Y’Y - ^β‘X’Y = 108,242,671 - 104,911,589 = 3,331,081.

s2 = ESS/(N - k) = 3,331,081/(15 - 4) = 302,826.

Var[^β] = s2(X’X)-1. Var[

^β4 ] = (302,826)(0.0000487058) = 14.749.

s4β = √14.74 = 3.84. t =

^β4 /s

4β = 3.621/ 3.84 = .943.

For 15 - 4 = 11 degrees of freedom, the critical value for 10% is 1.796.Since .943 < 1.796, we do not reject at 10% the hypothesis that β4 = 0.

Comment: The null hypothesis is that β4 = 0. The alternate hypothesis is that β4 ≠ 0, in other words that X2 and X3 interact.

22.4. A., 22.5. C., 22.6. C. Using the matrix formulas for multiple regression:The first two rows, out of ten rows in total, of the design matrix X are:(1 11.7 11.72)(1 25.3 25.32) (10 909.1 164015)X’X = (909.1 164015 3.71471 x 107) (164015 3.71471 x 107 9.23334 x 109) (0.354364 -0.0060632 0.0000180984)(X’X)-1 = (-0.0060632 0.00017239 -5.85846 x 10-7)

(0.0000180984 -5.85846 x 10-7 2.14376 x 10-9)( 89.4 ) ( 13.71 )

X’Y = (5922.02) ^β = (X’X)-1X’Y = ( -.1018 )

(991190) (.0002735) ^β1 = 13.71,

^β2= -.1018,

^β3 = .0002735.


22.7. E. ESS = Y’Y - ^β‘X’Y = 927.6 - 893.895 = 33.705.

s2 = ESS/(N - k) = 33.705/(10 - 3) = 4.815.

Alternately, ^Y = (12.56, 11.31, 6.75, 4.43, 12.70, 12.01, 10.68, 7.24, 4.78, 6.94).

ε = Y - ^Y = (2.74, -2.01, -0.25, 1.57, 3.00, -2.01, -2.08, -0.84, 0.82, -0.94).

ESS = Σε2 = 33.703. s2 = ESS/(N - k) = 33.703/(10 - 3) = 4.815.

22.8. C. Var[^β] = s2(X’X)-1. Var[

^β3] = (4.815)(2.14376 x 10-9) = 1.0322 x 10-8.

s3β = √1.0322 x 10-8 = .0001016. t =

^β3/s

3β = .0002735/ .0001016 = 2.692.

For 10 - 3 = 7 degrees of freedom, the critical values for 5% and 2% are 2.365 and 2.998.Since 2.365 < 2.692 < 2.998, reject at 5% and do not reject at 2%.

22.9. For a given age of antique, X2, for an increase of 2 in the number of bidders, the

auction price, Y increases: 2^β3 + 2

^β4X2 = -180 + (2.6)(age).

So if for example age is 100, then the expected increase in price is 80. If instead the age is 200, then the expected increase in price is 340.Comment: A general feature of the interactive model is that the effect of a change in one independent variable depends on the level of the other independent variable.

22.10. E. lnY = ln(a) + b ln (X). Therefore, fit a least squares line to lnX and lnY, with intercept ln(a) and slope b.V = ln X = -0.941609, -0.328504, 0, 0.41871, 1.64866, 2.25549.V = 0.508792. v = V - V = -1.4504, -0.837296, -0.508792, -0.0900813, 1.13987, 1.7467.W = ln Y = 4.47734, 5.4161, 5.8999, 6.53233, 8.37402, 9.2835.W = 6.66386. w = W - W = -2.18653, -1.24776, -0.763966, -0.131529, 1.71015, 2.61963.Σ vi wi = 11.14. Σ vi2 = 7.42. b = Σ vi wi /Σ vi2 = 11.14/7.42 = 1.50. ln a = W - b V = 5.90.

For X = 19.2, ln(Y) = 5.90 + (1.5)ln(19.2) = 10.33. Y = e10.33 = 30,638.Comment: The data is for Mercury, Venus, Earth, Mars, Jupiter, and Saturn, the 6 planets known at the time Kepler published his third law of motion, which states that b = 3/2.We used the fitted curve to estimate the year of Uranus, which is actually 30,685 days.(The difference is due to rounding.)


22.11. D. Let T = 1, 2, 3,..., 12. T = 6.5. t = T - T = -5.5, - 4.5, ... , 5.5Taking the logarithms of the consumer price indices:W = lnY = 4.80402, 4.81947, 4.82751, 4.84419, 4.87901, 4.88734, 4.90749, 4.92435, 4.9395,

4.95018, 4.96284, 4.99721. W = 4.89526. w = W - W = -0.0912387, -0.075785, -0.0677464, -0.0510727, -0.0162529, -0.00792269,

0.0122348, 0.0290912, 0.0442375, 0.0549176, 0.0675849, 0.101953.

Σwi ti = 2.45341. Σ ti2 = 143. ^β = Σwi ti/Σ ti2 = .01716.

α = W - ^β T = 4.89526 - (.01716)(6.5) = 4.784.

Second Quarter 2005 ⇔ T = 12. Third Quarter of 2006 ⇔ T = 17.Fitted consumer price index is: exp[4.784 + (.01716)(17)] = e5.076 = 160.1.

22.12. E. Let Z = √X. Yi = α + β√Xi ⇔ Yi = α + βZi.Z = 1, 1.732, 2, 2, 2.646.Z = 1.876. zi = -.876, -.144, .124, .124, .770.

^β = Σziyi / Σzi2 = ΣziYi / Σzi2 = 5.468/1.412 = 3.87


22.13. A.(1 -1 1) (3)

X = (1 1 1) Y = (4)(1 3 9) (7)(1 5 25) (6)

(4 8 36)X’X = (8 36 152)

(36 152 708)

(2384 -192 -80)(X’X)-1 = (-192 1536 -320) /5120

(-80 -320 80)(20 )

X’Y = (52 )(220)

(20096) (3.925)^β = (X’X)-1X’Y = (5632 ) /5120 = (1.1 )

(-640 ) (-.125)The fitted quadratic polynomial is: y = 3.925 + 1.1x - .125x2.The predicted value of Y, when X = 6 is: 3.925 + (1.1)(6) - (.125)(62) = 3.8 + (6)(.6) = 6.025.Alternately, the squared error is: Σ(Yi − β1 + β2Xi + β3Xi2)2

Setting the partial derivatives with respect to β1, β2, and β3 equal to zero, we get the Normal Equations, three equations in three unknowns:Nβ1 + ΣXi β2 + ΣXi2 β3 = ΣYi.

ΣXiβ1 + ΣXiXi β2 + ΣXiXi2 β3 = ΣYiXi.

ΣXi2β1 + ΣXiXi2 β2 + ΣXi2Xi2 β3 = ΣYiXi2.

4β1 + 8β2 + 36β3 = 20. ⇒ β1 + 2β2 + 9β3 = 5.

8β1 + 36β2 + 152β3 = 52. ⇒ 2β1 + 9β2 + 38β3 = 13.

36β1 + 152β2 + 708β3 = 220. ⇒ 9β1 + 38β2 + 177β3 = 55.

The first two equations imply: 5β2 + 20β3 = 3.

The first and third equations imply: 20β2 + 96β3 = 10. ⇒ 10β2 + 48β3 = 5.

Therefore, 8β3 = -1. ⇒ β3 = -1/8. ⇒ β2 = 1.1. ⇒ β1 = 3.925. The fitted quadratic polynomial is: y = 3.925 + 1.1x - .125x2. The predicted value of Y, when X = 6 is: 3.925 + (1.1)(6) - (.125)(62) = 3.8 + (6)(.6) = 6.025.


22.14. E. The restriction of going from the quadratic to the linear model is one dimensional.F = (ESSR - ESSUR)/q/ESSUR/(N - k) = (24005.9 - 1273.7)/1/(1273.7/(15 - 3) = 214.2. Perform a one sided F-Test at 1 and 12 d.f. 4.75 is the critical value at 5%. Since 214.2 > 4.75 we reject the simpler linear model at the 5% level.The restriction of going from the third degree to the quadratic model is one dimensional.F = (ESSR - ESSUR)/q/ESSUR/(N - k) = (1273.7 - 582.3)/1/(582.3/(15 - 4) = 13.06. Perform a one sided F-Test at 1 and 11 d.f. 4.84 is the critical value at 5%. Since 13.06 > 4.84 we reject the simpler quadratic model at the 5% level.The restriction of going from the fourth degree to the third degree model is one dimensional.F = (ESSR - ESSUR)/q/ESSUR/(N - k) = (582.3 - 433.7)/1/(433.7/(15 - 5) = 3.43. Perform a one sided F-Test at 1 and 10 d.f. 4.96 is the critical value at 5%. Since 3.43 < 4.96 we do not reject the simpler third degree model at the 5% level.Now compare the third degree model to the fifth degree model.F = (ESSR - ESSUR)/q/ESSUR/(N - k) = (582.3 - 282.9)/2/(282.9/(15 - 6) = 4.76. Perform a one sided F-Test at 2 and 9 d.f. 4.26 is the critical value at 5%. Since 4.76 > 4.26 we reject the simpler third degree model in favor of the fifth degree model at the 5% level. Use the fifth degree model.Comment: Based on Table 15.1 and Figures 15.3 to 15.6 in Loss Models.The ESS for the sixth degree polynomial is 278.2.The ESS for the seventh degree polynomial is 271.2.Neither is a significant improvement over the fifth degree polynomial.The restriction of going from the fifth degree to the sixth degree model is one dimensional.F = (ESSR - ESSUR)/q/ESSUR/(N - k) = (282.9 - 278.2)/1/(278.2/(15 - 7) = .135. Perform a one sided F-Test at 1 and 8 d.f. 5.32 is the critical value at 5%. Since .135 < 5.32 we do not reject the simpler fifth degree model at the 5% level.The restriction of going from the fifth degree to the seventh degree model is two dimensional.F = (ESSR - ESSUR)/q/ESSUR/(N - k) = (282.9 - 271.2)/2/(271.2/(15 - 8) = .151. Perform a one sided F-Test at 2 and 7 d.f. 4.74 is the critical value at 5%. Since .151 < 4.74 we do not reject the simpler fifth degree model at the 5% level.


22.15. The mean mortality is 71.9067. The TSS = Σ(Yi - Y )2 = (3.89 - 71.9067)2 + ... + (271.60 - 71.9067)2 = 114,035.

R2 = 1 - ESS/TSS = 1 - ESS/114035.The sample variance of the mortalities = 114035/(15 - 1) = 8145.

2R ≡ 1 - (sample variance of residuals)/ (sample variance of Y) = 1 - ESS/(N-k)/ (sample variance of Y) = 1 - ESS/(14 - order)/ 8145.

Order ESS R^2 Corrected R^2

1 24005.9 0.78949 0.773282 1273.7 0.98883 0.986973 582.3 0.99489 0.993504 433.7 0.99620 0.994685 282.9 0.99752 0.996146 278.2 0.99756 0.995737 271.2 0.99762 0.99524

The values of R2 increase as the order of the equation increases.In general, R2 increases as we add more variables to the model.

The fifth degree polynomial has the best 2

R .

22.16. C. Fit a regression with no intercept, taking lnXi as the independent variable.

ln(X1) = ln(e) = 1. ln(X2) = ln(e2) = 2.

b = Σln(Xi)Yi /Σln(Xi)2 = (Y1 + 2Y2)/(1 + 22) = (Y1 + 2Y2)/5.

22.17. D. S(x) = exp[-m(cx - 1)]. f(x) = -S'(x) = m ln(c) cx exp[-m(cx - 1)]. µx = f(x) / S(x) = m ln(c) cx. ln(µx) = ln(c) x + ln(m) + ln(ln(c)).

Thus under Gompertz law, lnµx is linear in x.

Let Y = ln(µx). Fitting a linear regression:

^β = NΣXiYi - ΣXiΣYi / NΣXi2 - (ΣXi)2 = (5)(-1199) - (200)(-30)/(5)(8010) - 2002 = 5/50 = 0.1.

α = Y - ^β X = (-30/5) - (.1)(200/5) = -10.

Fitted value of ln(µx) at X = 41 is: -10 + (0.1)(41) = -5.90.

Comment: The fitted µ41 = e-5.9 = .00274.


22.18. C. To minimize the SS, set the partial derivative with respect to λ1 equal to zero: 3 22

0 = -2 Σ Σ (u[x]+r - λ1 - λ2r - λ3x) = -2 Σ Σu[x]+r - λ1Σ Σ 1 - λ2 Σ Σ r - λ3Σ Σx . ⇒ r =0 x =21 r x r x r x r x 3 22

Σ Σ u[x]+r = λ1(4)(2) + λ2(2)(0 + 1 + 2 + 3) + λ3(4)(21 + 22) = 8λ1 + 12λ2 + 172λ3. ⇒

r =0 x =21 f = 8, g = 12 , and h = 172. ⇒ f + g + h = 192.Comment: The other two normal equations result from setting the partial derivatives of SS with respect to λ2 and λ3 equal to zero. 3 22

0 = -2 Σ Σ r (u[x]+r - λ1 - λ2r - λ3x). r =0 x =21 3 22

0 = -2 Σ Σ x (u[x]+r - λ1 - λ2r - λ3x). r =0 x =21

22.19. B. One can take exp[xi] as the independent variable, and just use the equation for the slope of the least squares line with no intercept:

^θ = Σ yi exp[xi] / Σ exp[xi]2 = y exp[x ]ii=1

ni∑∑ / exp[2x ]i

i=1

n∑∑ .

22.20 C. ln(µx) = ln(B) + x ln(c). Let Y = ln(µx), and fit a linear regression.

X = 2. x = (-1, 0, 1). Y = -3.07667. y = (-.02333, .00667, .01667).

ln(c) = ^β = Σ xiyi/ Σ xi2 = .04/2 = .02.

ln(B) = α = Y - ^βX = -3.07667 - (.02)(2) = -3.11667.

22.21. C. A + B0 + C02 = v[0] = 1.4. ⇒ A = 1.4.

A + B1 + C12 = v[1]+1 = 1.8. ⇒ A + B + C = 1.8.

A + B0 + C22 = v[0]+2 = 2.0. ⇒ A + 4C = 2.0.⇒ C = .15. ⇒ B = .25.Now the sum of squared errors is: ΣΣ(v[x]+t - u[x]+t)2 = ΣΣ(A + Bx + Ct2 - u[x]+t)2.Setting the partial derivative with respect to A equal to zero:0 = ΣΣ(A + Bx + Ct2 - u[x]+t) = ΣΣ(v[x]+t - u[x]+t). ⇒ ΣΣv[x]+t = ΣΣu[x]+t.ΣΣv[x]+t = 9A + 3(0 + 1 + 2)B + 3(0 + 1 + 4)C = 17.1.⇒ 17.1 = ΣΣu[x]+t = u[0] + 1.5 + 1.8 + 1.5 + 1.8 + 2.3 + 1.8 + 2.3 + 2.4.⇒ u[0] = 17.1 - 15.6 = 1.5.


22.22. C. S(x) = exp[-m(cx - 1)]. f(x) = -S'(x) = m ln(c) cx exp[-m(cx - 1)]. µx = f(x) / S(x) = m ln(c) cx. ln(µx) = ln(c) x + ln(m) + ln(ln(c)).

Thus under Gompertz law, lnµx is linear in x.

Let Y = ln(µx). Fitting a linear regression:

^β = NΣXiYi - ΣXiΣYi / NΣXi2 - (ΣXi)2 = (5)(-122.11) - (15)(-41)/(5)(55) - 152 = 4.45/50 = 0.089.

α = Y - ^β X = (-41/5) - (.089)(3) = -8.467.

Fitted value of ln(µx) at X = 4 is: -8.467 + (0.089)(4) = -8.111.

Comment: The fitted µ4 = e-8.111 = .000300.

22.23. B. θ is the slope of a least squares line through the origin, treating ln(x) as the

independent variable. θ = y ln(x )i ii=1

n∑∑ / ln(x )i

2

i=1

n∑∑ .

22.24. B. i xi yi θ(xi + xi2)

1 1 4 2θ2 2 8 6θ3 3 14 12θ

The squared error is: (4 - 2θ)2 + (8 - 6θ)2 + (14 - 12θ)2.

Set the derivative with respect to θ equal to zero:0 = -22(4 - 2θ) + 6(8 - 6θ) + 12(14 - 12θ). ⇒ θ = 28/23.

22.25. B. ln µx = ln k + b ln(x) = c + b ln(x).

Sum of Squares = (c + b ln(5) + 3.9)2 + (c + b ln(10) + 3.3)2 + (c + b ln(15) + 2.8)2.Set the partial derivative with respect to c equal to zero:0 = 2(c + b ln(5) + 3.9) + (c + b ln(10) + 3.3) + (c + b ln(15) + 2.8). ⇒ 3c + 6.620b + 10 = 0.Set the partial derivative with respect to b equal to zero:0 = 2ln(5)(c + b ln(5) + 3.9) + ln(10)(c + b ln(10) + 3.3) + ln(15)(c + b ln(15) + 2.8).⇒ 6.620c + 15.226b + 21.458 = 0.⇒ b = (3)(21.458) - (6.620)(10)/6.6202 - (3)(15.226) = (-1.826)/(-1.8536) = .985.⇒ c = -5.507.The estimate of ln µ15 is: -5.507 + ln(15)(.985) = -2.84.

Comment: For a Weibull Distribution as per Loss Models, F(x) = 1 - exp(-(x/θ)τ),f(x) = τ(x/θ)τ exp(-(x/θ)τ) /x, and the force of mortality is: τxτ−1/θτ.


22.26. C. To minimize the SS, set the partial derivative with respect to λ1 equal to zero: 3 12

0 = -2 Σ Σ (u[x]+r - λ1 - λ2r - λ3x) = -2 Σ Σu[x]+r - λ1Σ Σ 1 - λ2 Σ Σ r - λ3Σ Σx . ⇒ r =0 x =11 r x r x r x r x 3 12

Σ Σ u[x]+r = λ1(4)(2) + λ2(2)(0 + 1 + 2 + 3) + λ3(4)(11 + 12) = 8λ1 + 12λ2 + 92λ3. ⇒

r =0 x =11 f = 8, g = 12 , and h = 92. ⇒ f + g + h = 112.

22.27. A. Let Zi = Xi2, then the model is Yi = βZi + εi. The least squares fit to this model with

no intercept is: β = ΣYiZi /ΣZi2 = ΣΣΣΣYiXi2 /ΣΣΣΣXi4.

Alternately, one could minimize: Σ(Yi - βXi2)2.Comment: There are two other Normal Equations, which are gotten by setting the partial derivatives with respect to λ2 and λ3 equal to zero.

22.28. C. Let Z = X2. Yi = α + βXi2 + εi ⇔ Yi = α + βZi + εi.

Z = 9/5. zi = -9/5, -9/5, -4/5, 11/5, 11/5. Y = 10. yi = -8, -6, -2, 6, 10.

Σziyi = 310/5. Σzi2 = 420/25. ^β = Σziyi / Σzi2 = (310/5)/(420/25) = 155/42 = 3.69.

Alternately, the sum of squared errors is: Σ(Yi - α - βXi2)2.

Setting the partial derivative with respect to α equal to zero: 0 = 2Σ(Yi - α - βXi2).

Therefore, ΣYi = Nα + βΣXi2. ⇒ 50 = 5α + 9β.

Setting the partial derivative with respect to β equal to zero: 0 = 2ΣXi2(Yi - α - βXi2).

Therefore, ΣYiXi2 = αΣXi2 + βΣXi4. ⇒ 152 = 9α + 33β.

Solving the two equations: β= 310/84 = 3.69, and α = 282/84.Comment: Unless stated otherwise, we usually transform the variables to achieve a linear form of the equation. In this situation, minimizing the squared errors in the original equation gives the same result as applying a change variables, since the original model is linear in the coefficients.In the case of transforming an exponential relationship into a linear relationship by taking logs of both sides, the result is not the same as minimizing the squared errors in the original equation. In that situation, minimizing the squared errors in the original equation is an example of nonlinear estimation, covered in subsequent section.


22.29. D. Y = α eβX. lnY = lnα + βX.Fit a linear regression between year and the natural log of the claim sizes.In deviations form, x = X - X = -2, -1, 0, 1, 2.^β = Σxi ln Yi/Σ xi2 = (-2)(ln1020) + (-1)(ln1120) + (0)(ln1130) + (1)(ln1210) + (2)(ln1280) / 10

= .05314. lnα = Fitted intercept = average of lnY - ^βX =

(ln1020 + ln1120 + ln1130 + ln1210 + ln1280) / 5 - (.05314)(3) = 7.0463 - .15942 = 6.8869.α = e6.8869 = 979.36. Predicted claim cost for year 6 is: 979.36 exp[(6)(.05314)] = $1347.14.Comment: One can use your electronic calculator to fit a linear regression between year and the natural log of the claim sizes.

23.1. D. For model A, lnY = Dlnα1 + Xlnβ1

+ lnε.

The term Dlnα1 incorporates the one time effect via the use of the dummy variable.

The term Xlnβ1 incorporates constant inflation over the entire period.

(Y = α1 β1X is the basic form of constant inflation.)

If ε is lognormal, then lnε is Normal.However, unlike model B, Model A does not have an intercept. (Thus under model A, the average claim cost in year 0 is automatically 1.)

For model B, lnY = lnα1 + Dlnα2

+ Xlnβ1 + lnε. This incorporates a one time effect, but does

not incorporate a change in the rate of inflation on 1/1/95.

For model C, lnY = lnα1 + Xlnβ1

+ XDlnβ2 + lnε.

Model C would be used, if we had assumed the rate of inflation changed on 1/1/95, without a one time effect.

For model D, lnY = lnα1 + Dlnα2 + Xlnβ1

+ XDlnβ2 + lnε.

Model D is used, since we assumed the rate of inflation changed on 1/1/95 in addition to a one time immediate change in claim costs. Comment: For example if the fitted Model D were Y = 6702(.88D)(1.057X)(.981XD), (assuming the regression statistics were significant), that would indicate a claim cost at time = 0 (1990) of 6702, a one time reduction of 12% on 1/1/95, an annual inflation rate of 5.7% prior to 1/1/95, and an annual inflation rate of: (1.057)(.981) - 1 = 3.7% after 1/1/95.

23.2. B. For a 7 year old SUV with a cell phone the expected cost is:100 - (2)(7) + (10)(1) + (4)(1) - (7)(1) + (3)(1)(1) = 96.For a 2 year old car that is not an SUV and that has no cell phone the expected cost is:100 - (2)(2) + (10)(0) + (4)(0) - (2)(0) + (3)(0)(0) = 96.The difference in expected costs is zero.Comment: Similar to Course 120 Sample Exam #2, Q.9.For a new car that is not an SUV and has no cell phone, the expected cost is 100.This is not intended as a realistic model of insurance costs.


23.3. A. For not SUV and no cell phone the expected cost is: 100 - 2 age. Averaged over these vehicles: 100 - (2)(6.7) = 86.6. For not SUV and with cell phone the expected cost is: 100 - 2 age + 4 = 104 - 2 age. Averaged over these vehicles: 104 - (2)(6.1) = 91.8. For SUV and no cell phone the expected cost is: 100 - 2 age + 10 - age = 110 - 3 age. Averaged over these vehicles: 110 - (3)(4.9) = 95.3. For SUV and with cell phone the expected cost is: 100 - 2 age + 10 + 4 - age + 3 = 117 - 3 age. Averaged over these vehicles: 117 - (3)(4.4) = 103.8. Overall average = (3000)(86.6)+ (4000)(91.8)+ (1000)(95.3)+ (2000)(103.8)/10000 = 93.0. Alternately, E[Xi] = average age = 5.82. E[D1i] = portion SUV = .3. E[D2i] = portion with cell phones = .6. E[XiD1i] = average age of SUVs = 4.567.E[D1i D2i] = portion that are SUVs with cell phones = .2.Overall average = 100 - (2)(5.82) + (10)(.3) + (4)(.6) - (4.567)(.3) + (3)(.2) = 93.0.

23.4. Comparing the model with no X4 and X4X5 terms to that with both:

F = (R2UR - R2R)/q/(1 - R2UR)/(N - k) = (.9401 - .7695)/2 /(1 - .9401)/(100 - 8) = 131.0.This has 2 and 92 degrees of freedom.One can also usefully, compare the model with no X4 and X4X5 terms to that with no X4X5 :

F = (R2UR - R2R)/q/(1 - R2UR)/(N - k) = (.9331 - .7695)/1 /(1 - .9331)/(100 - 7) = 227.This has 1 and 93 degrees of freedom.

23.5. Comparing the model with no X3 term to that with X3:

F = (R2UR - R2R)/q/(1 - R2UR)/(N - k) = (.9401 - .8685)/1 /(1 - .9401)/(100 - 8) = 110.0.This has 1 and 92 degrees of freedom.Comment: The t-statistic is √110.0 = 10.49.

23.6. Comparing the model with no X2 and X22 terms to that with both:

F = (R2UR - R2R)/q/(1 - R2UR)/(N - k) = (.9401 - .3605)/2 /(1 - .9401)/(100 - 8) = 445.This has 2 and 92 degrees of freedom.One can also usefully, compare the model with no X2 and X22 terms to that with no X22:

F = (R2UR - R2R)/q/(1 - R2UR)/(N - k) = (.9264 - .3605)/1 /(1 - .9264)/(100 - 7) = 715.This has 1 and 93 degrees of freedom.




23.8. Comparing the model with no X5 and X4X5 terms to that with both:

F = (R2UR - R2R)/q/(1 - R2UR)/(N - k) = (.9401 - .8234)/2 /(1 - .9401)/(100 - 8) = 89.6.This has 2 and 92 degrees of freedom.One can also usefully, compare the model with no X5 and X4X5 terms to that with no X4X5 :

F = (R2UR - R2R)/q/(1 - R2UR)/(N - k) = (.9331 - .8234)/1 /(1 - .9331)/(100 - 7) = 152.5.This has 1 and 93 degrees of freedom.

23.9. Comparing the model with no X4X5 term to that with X4X5:



F = (R2UR - R2R)/q/(1 - R2UR)/(N - k) = (.9401 - .9264)/1 /(1 - .9401)/(100 - 8) = 21.04.This has 1 and 92 degrees of freedom.Comment: The t-statistic is -√21.04 = -4.59. (The fitted coefficient is negative, so t < 0.)

23.11. The model should include many independent variables, in addition to whether or not the school has soda machines. For example dummy variables might include: whether or not the school has candy machines, whether or not the school has any machines besides soda and candy machines for example those that allow students to buy juice or healthy snacks like apples, the average age of the students, the percentage of students that are female, the percentage of students in various ethnic groups, etc. One would probably want one or more variables measuring the socioeconomic status of the students, for example how many students qualify for the federal school lunch program.Some measure of the academic performance of the school might be a useful variable.The important point is that there are many other relevant variables than whether there are soda machines in a school. Unless one accounts for many of them, the model will be incomplete and the results of any test could be spurious.Once one has a relatively good model, one could perform a one-sided t-test on the coefficient of the dummy variable for soda machines, and see whether it is significantly different from zero and positive. Comment: In a practical application one would talk to people who know something about the subject and/or read the literature, in order to obtain ideas for potentially useful variables.


23.12. 1. Y = α + βX + ε.Assumes that gender has no effect.2. Y = α + βX + γD + ε.Assumes the same slope by gender, but the intercept for males is α and for females is α + γ.3. Y = α + βX + γDX + ε.Assumes the same intercept by gender, but the slope for males is β and for females is β + γ.4. Y = α + βX + γD + δDX + ε.Assumes the intercept for males is α and for females is α + γ, while the slope for males is β and for females is β + δ. However, unlike the next case, we do assume the same variance of the error terms for the two genders.5. Y = α1 + β1X + ε1 for males, and Y = α2 + β2X + ε2 for females.We assume two totally separate models, one for each gender.The error terms for the two genders are assumed to have different variances.We estimate two separate regressions.Comment: There are possibly other models that one could list, but these are the five listed at page 124-125 of Pindyck and Rubinfeld. The exact notation used for the coefficients is not important.

23.13. D. The F-Statistic to test whether β2 = β3 = β4 = 0 is: (ESSR - ESSUR)/q / ESSUR/(N - k) = RSS/(k-1)/ESS/(N - k) =(1155820 - 1143071)/3/1143071/(1000 - 4) = 3.70, with 3 and 996 degrees of freedom. Source Sum of Squares Degrees of Freedom Mean Sum of Squares F-RatioRegression 12749 3 4249.67 3.70Error 1143071 996 1147.66 Total 1155820 999Comment: It turns out that for 3 and 996 degrees of freedom the critical values for 5% and 1% are 2.61 and 3.80. (These are not shown in the table attached to the exam, since ν2 is too large.) Therefore, since 2.61 < 3.70 < 3.80, season is significant at the 5% level, but not at the 1% level.

23.14. E. For 7 years experience D1i = 1. For Chicago D2i = 0 and D3i = 1.

^Y = 12 + (3)(7) + (2)(7 - 5)(1) - (2)(7 - 2)(0) + (7 - 5)(1) = 12 + 21 + 4 + 0 + 2 = 39.

23.15. B. For fires at least 4 kilometers from the fire station X2i = 1.For city A, the average fire damage is: 8 + 5X1i + 2(X1i - 4) + 9 - 2X1i = 9 + 5X1i.For city B, the average fire damage is: 8 + 5X1i + 2(X1i - 4) = 7X1i.Setting the damages equal: 9 + 5X1i = 7X1i ⇒ X1i = 4.5.


23.16. C. In order to test whether class section is a significant variable, we run a regression on the restricted model with the coefficients of the (dummy) variables that determine class section set equal to zero: β4 = β5 = 0.ESSUR = Error Sum of Squares of the Unrestricted model = (1 - R2UR)TSS = .06TSS.

ESSR = Error Sum of Squares of the Restricted model = (1 - R2R)TSS = .085TSS.q = dimension of the restriction = independent variables for unrestricted model - independent variables restricted model = 5 - 3 = 2.N = number of observations = 42.k = independent variables for the unrestricted model = 5.(ESSR - ESSUR)/q/ESSUR/(N - k) = (.085TSS - .06TSS)/2 /.06TSS/(42 - 5) = 7.71 is an F-Statistic with 2 and 37 degrees of freedom.Alternately, F = (R2UR - R2R)/q/(1 - R2UR)/(N - k) = (.940 - .915)/2 /(1 - .940)/(42 - 5) = 7.71.Comment: See formulas 5.20 and 5.21 in Pindyck and Rubinfeld.

23.17. B. For model A, lnY = Dlnα1 + Xlnβ1

+ lnε.

The term Dlnα1 incorporates the one time affect via the use of the dummy variable.

The term Xlnβ1 incorporates constant inflation over the entire period.

(Y = α1 β1X is the basic form of constant inflation.)

If ε is lognormal, then lnε is Normal.However, unlike model B, Model A does not have an intercept. (Thus under model A, the average claim cost in year 0 is automatically 1.)

For model B, lnY = lnα1 + Dlnα2

+ Xlnβ1 + lnε. This is what we want.

For model C, lnY = lnα1 + Xlnβ1

+ XDlnβ2 + lnε.

Model C would be used, if we had assumed the rate of inflation changed on 1/1/97.

For model D, lnY = lnα1 + Dlnα2 + Xlnβ1

+ XDlnβ2 + lnε.

Model D would be used, if we had assumed the rate of inflation changed on 1/1/97 in addition to a one time immediate change in claim costs.

For model E, Y = α1 α2DX 1β ε.Model E would be used, if we had assumed Y increases as an unknown power of X, rather than via a constant rate of inflation.Comment: For years 1997 and later, Y is multiplied by α2.

If the claims manager’s assertion were correct, then α2 ≅ .8. For example if the fitted Model B were Y = 2345(.83D)(1.043X), (assuming the regression statistics were significant), that would indicate a claim cost at time = 0 (presumably 1990) of 2345, a one time reduction of 17% on 1/1/97, and an annual inflation rate of 4.3%.


23.18. E. If the woman has post-secondary education, then E = -1 and F = -1. If the woman has more than 2 children, then G = -1 and H = -1.Thus the average of ln(wages) over women with post-secondary education and more than 2 children is: a - b1 - b2 - c1 - c2. The average ln(wages) over all women is: a.Therefore, the differential for women with post-secondary education and more than 2 children is: -b1 - b2 - c1 - c2.Comment: See Appendix 5.1 of Pindyck and Rubinfeld. If the woman has not completed high school, then E = 1 and F = 0. If the woman has completed high school (with no post-secondary education), then E = 0 and F = 1. If the woman has post-secondary education, then E = -1 and F = -1. So the combination of the values of E and F describes the amount of education. Another way to accomplish the same goal would be to have two dummy variables with 0 or 1; the first variable for whether high school was completed or not and the second variable for whether there was (some) post-secondary education or not. Which specification is preferable depends on the purpose of the model. Note that I have followed the textbook and taken the overall average as “a”. The overall average depends on the number of women in the study from each category. If there are equal numbers in each category, then the overall average is “a”. If instead we assume in the study for example, 100 women who have not completed high school, 300 people who have completed high school but with no secondary education, and 200 women with post secondary education, then the average is not “a”. In this case, the expected value of the variable E is: (100 - 200)/600 = -1/6 and the second term contributes -b1/6 to the average. Similarly, the expected value of the variable F is: (300 - 200)/600 = 1/6 and the second term contributes b2/6 to the average.

23.19. D. s2 = Σ îε 2 / (N - k) = 92/(10 - 2) = 11.5.

6 of the 10 Xi are zero, and 4 are 1. X = .4. 6 of the 10 xi are -.4, and 4 are .6.

Σxi2 = (6)(0 - .4)2 + (4)(1 - .4)2 = 2.4.

βs 2 = s2/Σxi2 = 11.5/2.4 = 4.792. βs = 2.189. t = ^β/ βs = 4/2.189 = 1.827.

Comment: For N - k = 10 - 2 = 8 degrees of freedom, since 1.827 < 1.860 we do not reject H0 at 10%.


23.20. E. For the first group, the predicted value of Y21j is: δ + .75Y11j.

For the second group, the predicted value of Y22j is: δ + .75Y12j + θ.The sum of the squared errors is: n n

Σ (Y21j - δ - .75Y11j)2 + Σ (Y22j - δ - .75Y12j - θ)2.j=1 j=1

Setting the partial derivative with respect to δ equal to zero:0 = -2Σ (Y21j - δ - .75Y11j) - 2Σ(Y22j - δ - .75Y12j - θ). ⇒ 0 = ΣY21j - Σδ - .75ΣY11j + ΣY22j - Σδ - .75ΣY12j - Σθ. ⇒30n - nδ - .75(40n) + 37n - nδ - (.75)(41n) - nθ = 0. ⇒ 2δ + θ = 6.25.

Setting the partial derivative with respect to θ equal to zero:0 = -2Σ(Y22j - δ - .75Y12j - θ). ⇒ 0 = ΣY22j - Σδ - .75ΣY12j - Σθ. ⇒37n - nδ - (.75)(41n) + nθ = 0. ⇒ δ + θ = 6.25.

Subtracting twice the second equation from the first equation: θ = 6.25. Comment: The estimated δ = 0. 21Y = 30. ⇒ ΣY21j = 30n.

24.1. A. The rate of inflation before time 6 is β2 and after time 6 is β2β3.

The two rates would be the same if β3 = 1.

Thus we apply a t-test to test the hypothesis β3 = 1.

H0 is β3 = 1. t = (^β3 - 1)/s

3β = .02/.011 = 1.818, with 30 - 3 = 27 degrees of freedom.

Since 1.703 < 1.818 < 2.052, we reject H0 at 10% and do not reject at 5%. At the 10% level the two rates of inflation are significantly different.


24.2. The desired form of the model is: Y = β1 + β2X + β3(X - 1000)D + ε, where D = 1 for X > 1000 and 0 for X ≤ 1000.The first three rows, out of fifteen rows in total, of the design matrix X are:(1 1150 150)(1 840 0)(1 900 0)

(15 14860 1390 )X’X = (14860 15561000 1785100) (1390 1785100 395100 ) (3.4589 -0.0039592 0.0057191)(X’X)-1 = (-0.0039592 4.6652 x 10-6 -7.1490 x 10-6)

( 0.0057191 -7.1490 x 10-6 0.000014711)( 27.37 ) ( 4.024 )

X’Y = (24791.1) ^β = (X’X)-1X’Y = (-.002090)

( 1312.4 ) (-.001394)

^Y = 4.02 - .00209X - 00139(X - 1000)D, where D = 1 for X > 1000 and 0 for X ≤ 1000.Comment: A graph of the data and the fitted model:

600 800 1000 1200 1400

0.5

1

1.5

2

2.5

3

24.3. ESS = Y’Y - ^β‘X’Y = 56.6105 - 56.5022 = .1083.

s2 = ESS/(N - k) = .1083/(15 - 3) = .00903.

Var[^β] = s2(X’X)-1.

Variance of the coefficient of (X - 1000)D is: (.00903)(0.000014711) = .0000001328.Standard Error of the coefficient of (X - 1000)D is: √.0000001328 = .000364.t = -.00139 / .000364 = -3.82. For 12 degrees of freedom, the critical value for 1% is 3.055.Since 3.82 > 3.055, we reject H0 at 1%. At the 1% level, the effect of X on Y is significantly different before and after 1000.Comment: Using a computer, the p-value is 0.24%.


24.4. & 24.5. The piecewise linear regression model would have the form:Y = β1 + β2X + β3(X - 27)D1 + β4(X - 60)D2 + ε, where D1 is 0 for X < 27 and D1 is 1 for X ≥ 27,

and D2 is 0 for X < 60 and D2 is 1 for X ≥ 60.

Squared error: ΣYi - β1 - β2Xi - β3(Xi - 27)D1 - β4(Xi - 60)D22.


0 = 2ΣYi - β1 - β2Xi - β3(Xi - 27)D1 - β4(Xi - 60)D2.

⇒ ΣYi = Σβ1 + β2ΣXi + β3Σ(D1Xi - 27D1) + β4Σ(D2Xi - 60D2).

⇒ c + g + l = (s + t + u)ββββ1 + (a + e + j)ββββ2 + (e + j - 27t - 27u)ββββ3333 + (j - 60u)ββββ4444 .


0 = 2ΣXiYi - β1 - β2Xi - β3(Xi - 27)D1 - β4(Xi - 60)D2.

⇒ ΣXiYi = β1ΣXi + β2ΣXi2 + β3Σ(D1Xi2 - 27D1Xi) + β4Σ(D2Xi2 - 60D2Xi).

⇒ d + h + m = (a + e + j)ββββ1 + (b + f + k)ββββ2 + (f + k - 27e - 27j)ββββ3333 + (k - 60j)ββββ4444 .


0 = 2Σ(Xi - 27)D1Yi - β1 - β2Xi - β3(Xi - 27)D1 - β4(Xi - 60)D2.

⇒ ΣXiYiD1 - 27ΣYiD1 = β1Σ(XiD1 - 27D1) + β2Σ(Xi2D1 - 27D1Xi)

+ β3Σ(D1Xi2 - 54D1Xi + 729D1) + β4Σ(D2Xi2 - 87D2Xi + 1620D2).

⇒ h + m - 27g - 27l = (e + j - 27t - 27u)ββββ1 + (f + k - 27e - 27j)ββββ2

+ (f + k - 54e - 54j + 729t + 729u)ββββ3333 + (k - 87j + 1620u)ββββ4444 .


0 = 2Σ(Xi - 60)D2Yi - β1 - β2Xi - β3(Xi - 27)D1 - β4(Xi - 60)D2.

⇒ ΣXiYiD2 - 60ΣYiD2 = β1Σ(XiD2 - 60D2) + β2Σ(Xi2D2 - 60D2Xi)

+ β3Σ(D2Xi2 - 87D2Xi + 1620D2) + β4Σ(D2Xi2 - 120D2Xi + 3600D2).

⇒ m - 60l = (j - 60u)ββββ1 + (k - 60j)ββββ2 + (k - 87j + 1620u)ββββ3333 + (k - 120j + 3600u)ββββ4444 .Comment: The fitted model might look something like this, with three different slopes:

17 27 60 80age

freq.


24.6. E. This is a piecewise linear model, a special case of spline functions. Thus A is false.The model has two structural breaks, one at t0 and one at t1. Thus C is false.

At the structural breaks, the slopes change. For example, before t0 the slope is β2 while after

t0 the slope is β2 + β3. The intercepts also change, however, the purpose of the dummy variables is to account for shifts in the slope. Thus D is “false”.At t0 the function is β1 + β2Yt0 = (β1 - β3Yt0) + (β2 + β3)Yt0, thus it is continuous at t0.

At t1 the function is (β1 - β3Yt0) + (β2 + β3)Yt1 = (β1 - β3Yt0 - β4Yt1) + (β2 + β3 + β4)Yt1, thus it is continuous at t1. Thus B is false and E is true.Comment: See pages 136 to 137 of Pindyck and Rubinfeld.

25.1. D. In this case one can just duplicate the point (3, 4) and perform an unweighted regression. Thus the X values are: 0, 3, 3, 10 and the Y values are: 2, 4, 4, 6. The slope is: ((1/N)ΣXiYi) - ((1/N)ΣXi) ((1/N)ΣYi) / ((1/N)ΣXi2) - ((1/N)ΣXi)2 =

(21) - (4)(4) /(29.5) - (4)2 = 5/13.5 = .370.X Y XY X^20 2 0 03 4 1 2 93 4 1 2 9

1 0 6 6 0 1 0 0

Average 4 4 2 1 29.5

Alternately, in deviations form: xi = Xi - ΣwiXi = (0, 3, 10) - (1/4)(0) + (1/2)(3) + (1/4)(10) = (-4, -1, 6).yi = Yi - ΣwiYi = (2, 4, 6) - (1/4)(2) + (1/2)(4) + (1/4)(6) = (-2, 0, 2).

^β = Σwixiyi /Σwixi2 =

(1/4)(-4)(-2) + (1/2)(-1)(0) + (1/4)(6)(2)/(1/4)(-4)2 + (1/2)(-1)2 + (1/4)(6)2 = 5/13.5 = .370.

α = ΣwiYi - ^βΣwiXi = 4 - (.370)(4) = 2.52.

25.2. B. X = Σ wiXi = 2.7. x = X - X = (-1.7, 1.3, 6.3).

Y = Σ wiYi = 23. y = Y - Y = (-8, 7, 27). Σ wixiyi = (.6)(-1.7)(-8) + (.3)(1.3)(7) + (.1)(6.3)(27) = 27.9.

Σ wixi2 = (.6)(-1.7)2 + (.3)(1.3)2 + (.1)(6.3)2 = 6.21.

slope = Σ wixiyi /Σ wixi2 = 27.9/6.21 = 4.493.

Intercept = Y - (slope) X = 23 - (4.493)(2.7) = 10.87. The fitted value of Y for X = 10 is: 10.87 + (10)( 4.493) = 55.80.

25.3. D. ^β = ΣwiXiYi / ΣwiXi2 =

(.3)(1)(3) + (.4)(5)(8) + (.2)(10)(13) + (.1)(20)(32)/(.3)(1)2 + (.4)(5)2 + (.2)(10)2 + (.1)(20)2 =106.9/70.3 = 1.521.


25.4. A. Let X be the age. Then the model is: u = βX. vi will be the fitted values of u.

Weighted squared error = F = Σwi (ui - βXi)2.

Setting the partial derivative with respect to β equal to zero:0 = -2 Σwi Xi(ui - βXi). ⇒ ΣwiuiXi = βΣwi Xi2. ⇒ β = ΣwiuiXi / Σwi Xi2.

^β = (25)(20) + (28)(25) + (30)(30)/(100)(202) + (112)(252) + (100)(302) =

2100/200000 = .0105. v1 = 20^β = (20)(.0105) = .210.

Comment: A weighted regression with no intercept. "Directly proportional to" ⇔ no intercept.

25.5. C. The Buhlmann Credibility is the slope of the least squares line fit to the Bayesian Estimates. One needs to do a weighted regression with the weights equal to the a priori probabilities; in this case since the a priori probabilities are the same one can perform an unweighted regression. The X values are: 0, 3, 12 and the Y values are: 1, 6, 8. The slope is: (1/N)ΣXiYi - (1/N)ΣXi (1/N)ΣYi) / (1/N)ΣXi2 - (1/N)ΣXi2 = (38) - (5)(5)/51 - 52 = 0.5.

X Y XY X^20 1 0 03 6 1 8 9

1 2 8 9 6 1 4 4

Average 5 5 3 8 5 1

Thus the Buhlmann Credibility is .50 and the new estimates are:(observation)Z + (prior mean)(1 - Z) = (0, 3, 12)(.5) + (5)(1 - .5) = (0, 1.5, 6) + 2.5 =(2.5, 4.0, 8.5).

25.6. B. ∆ vx = a. ⇒ vx is a linear function of x. v = α + βx.The sum of the squared deviations, weighted by exposures is:300(α - 3)2 + 200(α + β - 6)2 + 100(α + 2β - 11)2.Set the partial derivative with respect to α equal to zero:0 = 600(α - 3) + 400(α + β - 6) + 200(α + 2β - 11). ⇒ 3α + 2β = 16.Set the partial derivative with respect to β equal to zero: 0 = 400(α + β - 6) + 400(α + 2β - 11). ⇒ 2α + 3β = 17.Solving, α = 2.8 and β = 3.8.v1 = α + β = 2.8 + 3.8 = 6.6.Alternately, one can weight the first data point three times the third by pretending it appeared three times, and weight the second data point twice the third by pretending it appeared twice. X = (0, 0, 0, 1, 1, 2). X = 2/3. x = (-2/3, -2/3, -2/3, 1/3, 1/3, 4/3).Y = (3, 3, 3, 6, 6, 11). Y = 16/3. y = (-7/3, -7/3, -7/3, 2/3, 2/3, 17/3).^β = Σ xiyi/ Σ xi2 = (114/9)/(30/9) = 3.8. α = Y -

^βX = 16/3 - (3.8)(2/3) = 2.8.

v1 = α + ^β = 2.8 + 3.8 = 6.6.


25.7. D. Weighted Sum of Squared Errors is:300(30.5a - 3/300)2 + 400(40.5a - 10/400)2 + 300(50.5a - 15/300)2.Setting the derivative with respect to a equal to zero: 0 = 2(30.5)(300)(30.5a - .01) + (40.5)(400)(40.5a - .025) + (50.5)(300)(50.5a - .05).a = 12.54/17002.5 = .000738.

25.8. D. Buhlmann Credibility is the least squares linear approximation to the Bayesian analysis result. The given expression is the squared error of a linear estimate. Thus the values of a and b that minimize the given expression correspond to the Buhlmann credibility estimate. In this case, the new estimate using Buhlmann Credibility = (prior mean)(1-Z) + (observation) Z = 2(1 - 1/12) + 1/12(observation) = 22/12 + 1/12(obser.). Therefore a = 22/12 and b = 1/12. Alternately, one can minimize the given expression. One takes the partial derivatives with respect to a and b and sets them equal to zero. Σ 2Pi (a + bRi - Ei) = 0, and Σ 2Pi Ri (a + bRi - Ei) = 0.Therefore, (2/3)(a + b(0) - 7/4) + (2/9)(a + b(2) - 55/24) + (1/9)(a + b(14) - 35/12) = 0 ⇒ a + 2 b = 2, and (2/9)(2)(a + b(2) - 55/24) + (1/9)(14)(a + b(14) - 35/12) = 018 a + 204 b = 300/6 =50 ⇒ 9 a + 102 b = 25. One can either solve these two simultaneous linear equations by matrix methods or try the choices A through E.Comment: Normally one would not be given the Buhlmann credibility factor as was the case here, allowing the first method of solution, which does not use the information given on the values of the Bayesian analysis estimates. Note that the Bayesian estimates balance to the a priori mean of 2: (2/3)(7/4) + (2/9)(55/24) + (1/9)(35/12) = (126 + 55 + 35)/108 = 216/108 = 2.

25.9. E. v3 = 10 = 3a + b. ⇒ b = 10 - 3a. The sum of the squared deviations, weighted by exposures is:1(a + b - 4)2 + 1(2a + b - 6)2 + 2(u3 - 10)2 = (6 - 2a)2 + (4 - a)2 + 2(u3 - 10)2. Set the partial derivative with respect to a equal to zero:0 = -4(6 - 2a) - 2(4 - a). ⇒ a = 32/10 = 3.2.v2 = 2a + b = 10 - a = 10 - 3.2 = 6.8.Comment: b = 10 - 3a = 10 - (3)(3.2) = 0.4. Thus the fitted values are: (3.6, 6.8, 10).The weighted average of the fitted values is: (3.6 + 6.8 + 20)/4 = 7.6.This must be equal to the weighted average of the observed values:7.6 = (4 + 6 + 2u3)/4. ⇒ u3 = 10.2.

25.10. A. q10 = 1/30. q20 = 5/90 = 1/18. q30 = 7/80.

q10 = 31a/3. q20 = 61a/3. q30 = 91a/3. Weighted sum of squares is:30(31a/3 - 1/30)2 + 90(61a/3 - 1/18)2 + 80(91a/3 - 7/80)2 =(10/9)3(31a - 1/10)2 + 9(61a - 1/6)2 + 8(91a - 21/80)2.Setting the partial derivative with respect to a equal to zero:0 = (20/9)93(31a - 1/10) + 549(61a - 1/6) + 728(91a - 21/80).1000a = 1000(9.3 + 91.5 + 191.1)/(2883 + 33489 + 66248) = 291900/102620 = 2.844.


25.11. C. The line formed by the Buhlmann Credibility estimates is the weighted least squares line to the Bayesian estimates, with the a priori probability of each outcome acting as the weights. Since the a priori probabilities are equal we fit an unweighted regression.X = 1, 2, 3. X = 2. x = X - X = -1, 0, 1. Y = 1.5, 1.5, 3. Y = 2. y = Y - Y = -.5, -.5, 1. Σ xiyi = 1.5. Σ xi2 = 2. slope = Σ xiyi /Σ xi2 = 1.5/2 = .75 = Z.

Intercept = Y - (slope) X = 2 - (.75)(2) = .5.Bühlmann credibility estimate of the second observation = .5 + .75(first observation).Given that the first observation is 1, the Bühlmann credibility estimate is: (.5) + (.75)(1) = 1.25.Comment: The Bühlmann credibility estimate given 2 is 2; the estimate given 3 is 2.75.The Bayesian Estimates average to 2, the overall a priori mean. Bayesian estimates are in balance. The Bühlmann Estimates are also in balance; they also average to 2.

26.1. Homoscedasticity is when the errors terms of a regression have a constant variance.Heteroscedasticity is when the errors terms of a regression do not have a constant variance.

26.2. A. For the first graph, the squared residuals tend to be larger for later values, indicating that the errors terms of the regression do not have a constant variance.Comment: Due to random fluctuation, it can be hard to pick out a pattern, even if one exists.


26.3. In deviations form, x = (-5, 0, 5), and ^β = ΣxiYi/Σxi2 = (-5Y1 + 5Y3)/50 = .1(Y3 - Y1) =

.13 + (2)(10) + ε3 - 3 - ε1 = 2 + .1(ε3 - ε1).

α = Y - ^βX = 3 + 2 X + (ε1 + ε2 + ε3)/3 - 2 + .1(ε3 - ε1) X = 3 + (ε1 + ε2 + ε3)/3 - .5(ε3 - ε1)

= 3 + (5ε1 + 2ε2 - ε3)/6.ε1 ε2 ε3 Fitted beta Fitted alpha residual 1 residual 2 residual 3 ESS

1 2 4 2.3 3 .8333 - 0 . 1 6 6 7 0.3333 - 0 . 1 6 6 7 0.16671 2 - 4 1.5 5 .1667 1.1667 - 2 . 3 3 3 3 1.1667 8.16671 - 2 4 2.3 2 .5000 - 1 . 5 0 0 0 3.0000 - 1 . 5 0 0 0 13.50001 - 2 - 4 1 .5 3 .8333 - 0 . 1 6 6 7 0.3333 - 0 . 1 6 6 7 0.1667

- 1 2 4 2.5 2 .1667 0.1667 - 0 . 3 3 3 3 0.1667 0.1667- 1 2 - 4 1.7 3 .5000 1.5000 - 3 . 0 0 0 0 1.5000 13.5000- 1 - 2 4 2.5 0 .8333 - 1 . 1 6 6 7 2.3333 - 1 . 1 6 6 7 8.1667- 1 - 2 - 4 1 .7 2 .1667 0.1667 - 0 . 3 3 3 3 0.1667 0.1667

Avg. 2 3 - 0 . 0 0 0 0 0.0000 - 0 . 0 0 0 0 5.5000

For example, for the first set, Y1 = 3 + (2)(0) + 1 = 4, Y2 = 3 + (2)(5) + 2 = 15, and Y3 = 3 + (2)(10) + 4 = 27.

^β = (27 - 4)/10 = 2.3. α = (4 + 15 + 27)/3 - (2.3)(5) = 3.8333.

Y1^ = 3.8333. ε1 = Y1

^ - Y1 = 3.8333 - 4 = -0.1667.

Y2^ = 3.8333 + (2.3)(5) = 15.3333. ε2 = Y2

^ - Y2 = 15.3333 - 15 = 0.3333.

Y3^ = 3.8333 + (2.3)(10) = 26.8333. ε3 = Y3

^ - Y3 = 26.8333 - 27 = -0.1667.

ESS = (-0.1667)2 + (0.3333)2 + (-0.1667)2 = 0.1667. Comment: Note that the estimates of alpha and beta are both unbiased.

Var[^β] = (2.3 - 2)2 + (1.5 - 2)2 + (2.3 - 2)2 + (1.5 - 2)2 + (2.5 - 2)2 + (1.7 - 2)2 + (2.5 - 2)2

+ (1.7 - 2)2/8 = 0.17.

This matches the result of using the formula, Var[^β] = Σxi2σi2/(Σxi2)2

= (-5)2(12) + (0)2(22) + (5)2(42)/(-5)2 + (0)2 + (5)22 = 425/502 = 0.17.E[ESS/(N - 2)] = E[ESS] = 5.5. Since the variances of the εi are not equal, there is no σ2, and E[ESS/(N - 2)] can not be equal to it, as would be the case for homoscedasticity.

27.1. Calculate the F statistic used in the Goldfeld-Quandt test for heteroscedasticity in errors. For the first regression, ESS = (1 - R2)TSS = 19.7. For the second regression, ESS = (1 - R2)TSS = 33.9. F = (ESS for second regression)/(30 - 4)/ESS first regression/(30 - 4) = 33.9 /19.7 = 1.72.At 26 and 24 degrees of freedom, the critical value at 5% is 1.95. At 26 and 30 degrees of freedom, the critical value at 5% is 1.90. Thus at 26 and 26 degrees of freedom, the critical value at 5% is about 1.93. Since 1.72 < 1.93, we do not reject at 5% the null hypothesis that the errors are constant.


27.2. A. The null hypothesis is that there is homoscedasticity, not heteroscedasticity.Statements B, C, and D are true.

27.3. B. X = 12.2, x = (-11.2, -7.2, - 6.2, -4.2, -3.2, .8, 1.8, 5.8, 10.8, 12.8). Y = 15.3, y = (-12.3, -10.3, -6.3, -4.3, -2.3, .7, .7, 7.7, 14.7, 11.7). Σxiyi = 631.4. Σxi2 = 561.6.

^β = Σxiyi/Σxi2 = 631.4/561.6 = 1.124. α = Y -

^βX = 1.584. α + 23

^β = 27.4.

27.4. E. For the fitted regression, ^Y = 1.584 + 1.124X:

^Y = (2.708, 7.204, 8.328, 10.576, 11.7, 16.196, 17.32, 21.816, 27.436, 29.684).

ε = Y - ^Y = (0.292, -2.204, 0.672, 0.424, 1.300, -0.196, -1.320, 1.184, 2.564, -2.684).

ESS = Σ îε 2 = 24.225. Let σ2 = ESS/N = 24.225/10 = 2.4225.

Run a linear regression of îε 2/σ2 on X. X = 12.2.

Z = îε 2/σ2 = (0.0352, 2.005, 0.1864, 0.0742, 0.6976, 0.0159, 0.7193, 0.5787, 2.714, 2.974).

Z = 1.000. z = (-0.9648, 1.005, -0.8136, -0.9258, -0.3024, -0.9841, -0.2807, -0.4213, 1.714, 1.974)Σxizi = 53.50. Σxi2 = 561.6.

^β = Σxizi/Σxi2 = 53.50/561.6 = .0953. α = Z -

^βX = -.163.

Z^ = (-0.0677, 0.3135, 0.4088, 0.5994, 0.6947, 1.0759, 1.1712, 1.5524, 2.0289, 2.2195).

RSS = Σ ( Z^ - Z )2 = 5.10. RSS/2 = 2.55. Compare RSS/2 to the Chi-Square Distribution with 1 degree of freedom. Since 2.55 < 3.84 we do not reject the null hypothesis, homoscedasticity, at 5%.

27.5. A. From the previous solution, for the fitted regression, ^Y = 1.584 + 1.124X:

ε = Y - ^Y = (0.292, -2.204, 0.672, 0.424, 1.300, -0.196, -1.320, 1.184, 2.564, -2.684).

Run a linear regression of îε 2 on X. X = 12.2.

Z = îε 2 = (0.0853, 4.858, 0.4516, 0.1798, 1.6900, 0.0384, 1.7424, 1.4019, 6.5741, 7.2039).

Z = 2.4225. z = (-2.3372, 2.4351, -1.9709, -2.2427, -0.7325, -2.3841, -0.6801, -1.0206, 4.1516, 4.7814)

Σxizi = 129.6. Σxi2 = 561.6. ^β = Σxizi/Σxi2 = 129.6/561.6 = .231. α = Z -

^βX = -.396.

Z^ = (-0.165, 0.759, 0.990, 1.452, 1.683, 2.607, 2.838, 3.762, 4.917, 5.379).

RSS = Σ ( Z^ - Z )2 = 29.97. TSS = Σzi2 = 68.13. R2 = RSS/TSS = 0.440.

N R2 = (10)(.440) = 4.40, has a Chi-Square Distribution with 1 degree of freedom. Since 3.84 < 4.40, we reject the null hypothesis, homoscedasticity, at 5%.Since 5.02 < 4.40, we do not reject the null hypothesis, homoscedasticity, at 2.5%.


27.6. E. Run two separate regressions. One on the first four observations and one on the last four observations, omitting the middle 10/5 = 2 observations.For the first four observations: X = 5, x = (-4, 0, 1, 3). Y = 7, y = (-4, -2, 2, 4). Σxiyi = 30.

Σxi2 = 26. ^β = Σxiyi/Σxi2 = 30/26 = 1.154. α = Y -

^βX = 1.230.

^Y = 1.230 + 1.154X = (2.384, 7.000, 8.154, 10.462).îε = Y -

îY = (.616, -2, .846, .538). ESS = Σ ^

iε 2 = 5.38.For the last four observations: X = 20, x = (-6, -2, 3, 5). Y = 24, y = (-8, -1, 6, 3). Σxiyi = 83.

Σxi2 = 74. ^β = Σxiyi/Σxi2 = 83/74 = 1.122. α = Y -

^βX = 1.560.

^Y = 1.560 + 1.122X = (17.268, 21.756, 27.366, 29.610).îε = Y -

îY = (-1.268, 1.244, 2.634, -2.610). ESS = Σ ^

iε 2 = 16.91.F = (ESS for second regression)/(4 - 2)/ESS first regression/(4 - 2) = 16.91/5.38 = 3.14.Comment: You would order the observations from smallest to largest X, if it had not already been done for you. At 2 and 2 degrees of freedom, the critical value at 5% is 19.0, so we do not reject at 5% the null hypothesis that the errors are constant. For practical applications of this test, you would want more total observations.

27.7. The absolute value of the residuals seem to be increasing with X, and therefore we keep things in that order. We fit a regression to the first 8 observations: -35.7999 + 11.4083x - 0.1535489x2 with ESS = 567.536.We fit a regression to the last 8 observations: 46.2803 + 2.09519x + 0.175957x2 with ESS = 8859.01.F = 8859.01/567.536 = 15.61 with 8 - 3 = 5 d.f. and 5 d.f.The 1% critical value for 5 and 5 d.f. is 10.97.Since 15.61 > 10.97, we reject at 1% the null hypothesis, that there is homoscedasticity.

27.8. Run a linear regression of îε 2 on X. The result is:

intercept = -1374.31, and slope = 155.496. R2 = 0.465.

Since the regression of îε 2 on X has one independent variable (not counting the intercept),

N R2 has a Chi-Square Distribution, with 1 degree of freedom. N R2 = (20)(0.465) = 9.30. For the Chi-Square, the critical value for 1 degree of freedom at 1/2% is 7.88. Since 9.30 > 7.88, we reject the null hypothesis of homoscedasticity at 1/2%.


27.9. For the regression that was fit to the data, ESS = 6.064842 + ... + 21.91212 = 10719.1.Take σ2 = ESS/N = 10719.1/20 = 536.

Run a linear regression of îε 2/σ2 on X. The result is:

intercept = -2.56401, and slope = 0.290104. RSS = 15.555.

Since the regression of îε 2/σ2 on X has one independent variable (not counting the

intercept), RSS/2 has a Chi-Square Distribution, with 1 degree of freedom. RSS/2 = 7.78For the Chi-Square, the critical value for 1 degree of freedom at 1% is 6.64 and the critical value for 1 degree of freedom at 1/2% is 7.88. Since 6.64 < 7.78 < 7.88 , we reject the null hypothesis of homoscedasticity at 1%, but not at 1/2%.

27.10. E. In the Goldfeld-Quandt Test, the test statistic follows an F Distribution.In the Breusch-Pagan Test, the test statistic follows a Chi-Square Distribution.In the White Test, the test statistic follows a Chi-Square Distribution.

27.11. C. F = (ESS for 2nd regression)/(6 - 2)/ESS 1st regression/(6 - 2) = .93 /.79 = 1.18.

27.12. B. Statement A is the White Test; there are 2 d.f. since the formula for the alternative hypothesis for Var(εi) has two independent variables, not counting the intercept.Statement C is the Breusch-Pagan Test; there are 2 d.f. since the formula for the alternative hypothesis for Var(εi) has two independent variables, not counting the intercept.

Statement B is false; as in Statement C it should be îε 2/ σ2.

Statement D is true. See Equation 6.3 in Pindyck and Rubinfeld.If we assume Var(εi) = ηXi2, then we can order the observations in assumed increasing order

of Var(εi) and apply the Goldfeld-Quandt test. Statement E is true.


28.1. B. & 28.2. D. We calculate the weights: wi = (1/σi2)/ Σ1/σi2.

i Xi Yi σi2 1/σi2 wi1 10 10 1 1 400/5412 40 40 4 1/4 100/5413 160 100 16 1/16 25/5414 250 125 25 1/25 16/541

^β = ΣwiXiYi - ΣwiXiΣwiYi/ ΣwiXi2 - (ΣwiXi)2 =

2033.3 - (29.575)(23.105)/3401.1 - 29.5752 = .5343.

α = ΣwiYi - ^βΣwiXi = 23.105 - (.5343)(29.575) = 7.303.

Alternately, to correct for heteroscedasticity, we divide each variable by σi = √Var(εi) = √(Xi/10). The modified variables are:

i σi Xi Yi Xi /σi Yi/σi1 1 10 10 10 102 2 40 40 20 203 4 160 100 40 254 5 250 125 50 25

Minimize the sum of squared errors for the adjusted model: Yi/σi = α/σi + βXi /σi + εi /σi.The sum of squared errors is: (10 - α/1 - 10β)2 + (20 - α/2 - 20β)2 + (25 - α/4 - 40β)2 + (25 - α/5 - 50β)2.To minimize the squared error, set its partial derivatives with respect to α and β equal to zero:(10 - α/1 - 10β) + (20 - α/2 - 20β)/2 + (25 - α/4 - 40β)/4 + (25 - α/5 - 50β)/5 = 0, and10(10 - α/1 - 10β) + 20(20 - α/2 - 20β) + 40(25 - α/4 - 40β) + 50(25 - α/5 - 50β) = 0.

31.25 = 1.3525α + 40β, and 2750 = 40α + 4600β. ⇒ α = 7.303 and ^β = .5343.

Comment: An unweighted regression would produce the fit: α = 14.8 and ^β = .47.

28.3. D. α = Y = (225 + 290 + 205)/10 = 72.0.


28.4. B. Each Yt should be weighted inversely proportionally to its variance:

α = 3(Y1 + Y2 + Y3) + 5(Y4 + Y5 + Y6 + Y7) + 8(Y8 + Y9 + Y10)/(3)(3) + (5)(4) + (8)(3) = (3)(225) + (5)(290) + (8)(205)/53 = 3765/53 = 71.0.Alternately, we need to minimize Σ (Yt - α)2/σt2, where σt2 = Var(εt).

Taking the partial derivative with respect to α and setting it equal to zero:

0 = -2Σ (Yt - α)/σt2. ⇒ αΣ1 /σt2 = Σ Yt /σt2 ⇒ α = Σ Yt /σt2 / Σ1 /σt2. Proceed as before.

Alternately, divide each Yt by σt = √Var(εt): Zt = Yt / σt. The new model is Zt = α/ σt + εt/ σt.Unlike Y, Z has constant variances of the error terms; the model for Z is homoscedastic. Performing ordinary least squares on a model with a slope but no intercept (treat 1/ σt as xt):

α = ΣZt/ σt / Σ1 /σt2 = Σ Yt /σt2 / Σ1 /σt2. Proceed as before.Comment: Similar to 4, 5/01, Q.21.

28.5. X = 15. x = X - X. Y = 25.7333. y = Y - Y . Σxiyi = 485. Σxi2 = 250.

^β = Σxiyi /Σxi2 = 485/250 = 1.94. α = Y -

^βX = 25.7333 - (1.94)(15) = -3.367.

28.6. Take the weights proportional to 1/Var(εi); let wi = 1/Xi2/Σ1/Xi2.

1/Σ1/Xi2 = 11.803. wi = 11.803/Xi2. ΣwiXi = 11.803Σ1/Xi = (11.803)(1.08333) = 12.787.

ΣwiYi = 11.803ΣYi /Xi2 = (11.803)(1.81944) = 21.475.

xi = Xi - ΣwiXi = Xi - 12.787. yi = Yi - ΣwiYi = Yi - 21.475. Σwixiyi = 25.986. Σwixi2 = 13.544.

^β = Σwixiyi /Σwixi2 = 25.986/13.544 = 1.919.

α = ΣwiYi - ^βΣwiXi = 21.475 - (1.919)(12.787) = -3.06.

Comment: The weights should sum to one. One subtracts the weighted average in order to convert to deviations form.

28.7. B. X = (1 + 4 + 10)/3 = 5. x = (-4, -1, 5).Y = (3 + 9 + 14)/3 = 8.667. y = (-5.667, .333, 5.333).^β = Σ xiyi / Σ xi2 = 49/42 = 7/6. α = Y -

^βX = 26/3 - (7/6)(5) = 17/6.

îY = (4, 7.5, 14.5). ^

iε = Y - ^

iY = (-1, 1.5, -.5).

Var[^β] = Σxi2

îε 2 /(Σxi2)2 = (-4)2(-1)2 + (-1)2(1.5)2 + (5)2(-.5)2/(-4)2 + (-1)2 + (5)22

= 24.5/422 = .0139.


28.8. C. We want to have the variance of the errors be equal, rather than depend on X. If we multiply by a value, the variance of the errors will be multiplied by that value squared. If we multiply everything by X1/2, the variance of the errors the new model will be proportional to: (X1/2)2X-1 = 1. Multiplying the original model, Y = α + βX + ε, by X1/2 produces the new model: YX1/2 = ααααX1/2 + ββββX3/2 + εεεε∗∗∗∗.Comment: Similar to 4, 11/01, Q.28. To correct for heteroscedasticity, divide the model by something proportional to the standard deviation of the error. In this case, divide by √X-1 = X-1/2, which is equivalent to multiplying by X1/2.

28.9. D. Adjust each variable by dividing by something proportional to StdDev[εi], √Xi.

Yi /√Xi = βXi/√Xi + εi/√Xi = β√Xi + εi /√Xi. The errors now have constant variance and the least

squares solution is: ^β = Σ√Xi Yi /√Xi /Σ(√Xi)2 = ΣYi /ΣXi = 26/14 = 1.86.

Alternately, minimize: Σ(Yi - βXi)/σ√Xi2. 0 = -2βΣ√Xi/σ (Yi - βXi)/σ√Xi ⇒

0 = ΣYi - βXi ⇒ ^β = ΣYi /ΣXi = 26/14 = 1.86.

28.10. A. Adjust each variable by dividing by something proportional to StdDev[εi], Xi.

Yi /Xi = βXi/Xi + εi/Xi = β + εi /Xi. The errors now have constant variance and the least squares

solution is: ^β = ΣYi /Xi /Σ12 = (8/1) + (12/2) + (24/3) + (36/4) + (55/5)/5 = 8.4.

Alternately, minimize: Σ(Yi - βXi)/σXi2. 0 = -2β/σ Σ(Yi - βXi)/σXi ⇒

0 = ΣYi/Xi - Nβ ⇒ ^β = (1/N)ΣYi /Xi = 42/5 = 8.4.

28.11. E. To correct for heteroscedasticity, we divide each variable by σi = √Var(εi) = xi/2.The modified variables are:

i xi σi xi /σi yi yi/σi1 1 .5 2 8 162 2 1 2 5 53 3 1.5 2 3 24 4 2 2 -4 –2

Since there is no intercept in the model, ^β = Σ(xi /σi)(yi/σi)/Σ(xi /σi)2 = 42/16 = 2.625.

Comment: The variables are not in deviations form, since they do not sum to zero. An intercept of zero is a very poor model for this data. In my opinion, this is a very poor question!


28.12. C. An unweighted average of the Yt would be (8 1Y + 12 2Y )/(8 + 12).However, each Yt should be weighted inversely proportionally to its variance:

α = (8 1Y /.4 + 12 2Y /.6)/(8/.4 + 12/.6) = (20 1Y + 20 2Y )/(20 + 20) = 0.5 1Y + 0.5 2Y .Alternately, we need to minimize Σ (Yt - α)2/σt2, where σt2 = Var(εt).

Taking the partial derivative with respect to α and setting it equal to zero:

0 = -2Σ (Yt - α)/σt2. ⇒ αΣ1 /σt2 = Σ Yt /σt2 ⇒ α = Σ Yt /σt2 / Σ1 /σt2.

α = (Y1 + Y2 +...+ Y8)/.4 + (Y9 + Y10 +...+ Y20)/.6/8/.4 + 12/.6 =

(8 1Y /.4 + 12 2Y /.6)/(8/.4 + 12/.6) = (20 1Y + 20 2Y )/(20 + 20) = 0.5 1Y + 0.5 2Y .Alternately, divide each Yt by σt = √Var(εt): Zt = Yt / σt.

The new model is Zt = α/ σt + εt/ σt.Unlike Y, Z has constant variances of the error terms; the model for Z is homoscedastic. Performing ordinary least squares on a model with a slope but no intercept (treat 1/ σt as xt):

α = ΣZt/ σt / Σ1 /σt2 = Σ Yt /σt2 / Σ1 /σt2. Proceed as before.Comment: See pages 148-149 of Pindyck and Rubinfeld.

28.13. A. We want to have the variance of the errors be equal, rather than depend on X. If we multiply by a value, the variance of the errors will be multiplied by that value squared. If we multiply everything by X1/4, the variance of the errors the new model will be proportional to: (X1/4)2X-1/2 = 1. Multiplying the original model, Y = α + βX + ε, by X1/4 produces the new model: YX1/4 = ααααX1/4 + ββββX5/4 + εεεε∗∗∗∗.Comment: To correct for heteroscedasticity, divide the model by something proportional to the standard deviation of the error. In this case, divide by √X-1/2 = X-1/4, which is equivalent to multiplying by X1/4.

28.14. A. wi = 1/Var(εi) = 1/(σ2Xi), which is proportional to 1/Xi.

^β = Σ wiXiYi / Σ wi Xi2 = Σ Yi / Σ Xi = 30.1/12.5 = 2.408.

29.1. B. Ordinary least squares regression estimators remain consistent, even when there is positive serial correlation.


29.2. In deviations form x = (-2 , -1, 0, 1, 2).^β = ΣxiYi/Σxi2 = (-2Y1 - Y2 + Y4 + 2Y5)/10 = 3 + (-2ε1 - ε2 + ε4 + 2ε5)/10.

If ε1 = 1, then ε2 = ε3 = ε4 = ε5 = 1.Y = (8, 11, 14, 17, 20).^β = 3. α = 14 - (3)(3) = 5. All of the residuals are zero, and ESS = 0.If instead, ε1 = -1, then ε2 = ε3 = ε4 = ε5 = -1.Y = (6, 9, 12, 15, 18).^β = 3. α = 12 - (3)(3) = 3. All of the residuals are zero, and ESS = 0.In either case ESS = 0.Comment: ESS/(N - 2) = 0, which underestimates the actual variance of the regression which is 1. With positive serial correlation, s2 = ESS/(N - 2) is biased downwards as an estimator of the actual variance of the regression. While the errors are not Normally Distributed, this feature still holds for this simplified example.

29.3. If ε1 = 1, then ε2 = -1, ε3 = 1, ε4 = -1, ε5 = 1. Then, Y = (8, 9, 14, 15, 20). Y = 13.2.^β = 3 + (-2 + 1 -1 + 2)/10 = 3. α = 13.2 - (3)(3) = 4.2.

^Y = (7.2, 10.2, 13.2, 16.2, 19.2).

ε = Y - ^Y = (0.8, -1.2, 0.8, -1.2, 0.8). ESS = 4.8.

If instead, ε1 = -1, then ε2 = 1, ε3 = -1, ε4 = 1, ε5 = -1, Y = (6, 11, 12, 17, 18),and again it turns out that ESS = 4.8. In either case, ESS = 4.8.Comment: ESS/(N - 2) = 4.8/3 = 1.6, which overestimates the actual variance of the regression which is 1. With negative serial correlation, s2 = ESS/(N - 2) is biased upwards as an estimator of the actual variance of the regression.

29.4. D. Statement D should say “positive serial correlation.”Comment: See page 159 of Pindyck and Rubinfeld.

29.5. D. Var[εt] = Var[ν]/(1 - ρ2) = 40/(1 - .72) = 78.43.Comment: See Equation 6.13 in Pindyck and Rubinfeld.

29.6. E. In graph E, negative ^t 1ε − are more likely to be associated with positive εt , and

positive ^t 1ε − are more likely to be associated with negative εt . The points tend to be in the

upper-left and lower-right quadrants, indicating negative serial correlation. Comment: Graph A indicates positive serial correlation.

30.1. B. 2.9 = DW ≅ 2(1 - ρ). ρ ≅ 1 - 2.9/2 = -.45.

30.2. D. 4 - du = 4 - 1.66 = 2.34 < DW = 2.41 < 4 - dl = 4 - 1.34 = 2.66 ⇒ result indeterminate.


30.3. C. Xi Yi^

iY îε = Yi -

îY

1 82 81 12 78 79.5 -1.53 80 78 24 73 76.5 -3.55 77 75 2

Σεt2 = 1 + 2.25 + 4 + 12.25 + 4 = 23.5.

Σ( εt - ^t 1ε − )2 = 2.52 + 3.52 + 5.52 + 5.52 = 79.

Durbin-Watson Statistic = Σ( εt - ^t 1ε − )2 / Σεt

2 = 79/23.5 = 3.36.Comment: There are too few observations to draw any useful conclusions about possible serial correlation.

30.4. B. h = (1 - DW/2)√(N/(1- N Var[^β])) = (1 - 1.46/2)√50/(1 - (50)(0.009) =

(.27)(9.5346) = 2.574. We preform a one-sided Normal Test.Φ(2.326) = .990 and Φ(2.576) = .995. 2.326 < 2.574 < 2.576. We reject H0 at 1% and do not reject H0 at 0.5%.

Comment: We use Var[^β] in computing h, since β is the coefficient of the lagged dependent

variable.

30.5. B. The Durbin-Watson Statistic can be used for models with lagged variables, however, since it would be biased against rejection, Pindyck and Rubinfeld recommend other tests.

30.6. E. h = (1 - DW/2)√(N/(1- N Var[^β])) = (1 - 1.78/2)√(250/(1- (250)(.0023)) = 2.668.

Φ(2.576) = .995. 2.668 > 2.576. We reject H0 at 0.5%, a one-sided test.

30.7. B. One applies the t-test to test whether ρ is significantly different from zero. t = .35/√.031 = 1.988. We fit 44 residuals, with 4 parameters, for 40 degrees of freedom. The critical values at 10% and 5% are 1.684 and 2.021. 1.684 < 1.988 < 2.021.We reject H0 at 10% (two-sided test) and do not reject H0 at 5% (two-sided test). At 10%, we conclude there is serial correlation.

30.8. T = 14.5. t = T - T . Y = 35310/28 = 1261.07. y = Y - Y . Σtiyi = -113,118. Σti2 = 1827.

^β = Σtiyi /Σti2 = -113118/1827 = -61.91. α = Y -

^βX = 1261.07 - (-61.91)(14.5) = 2159.


30.9. ^Y = 2159 - 61.91T = 2097, 2035, 1973, 1911, 1849, 1788, 1726, 1664, 1602, 1540,

1478, 1416, 1354, 1292, 1230, 1168, 1107, 1045, 983, 921, 859, 797, 735, 673, 611, 549, 487, 426.

ε = Y - ^Y = 448, 434, 419, 282, 52, -70, -81, -118, -169, -251, -342, -375, -354, -308, -266,

-213, -138, -96, -57, -41, -20, 25, 77, 129, 186, 233, 275, 333.

Σεt2 = 1,689,870. Σ( εt -

^t 1ε − )2 = 141,915.

Durbin-Watson Statistic = Σ( εt - ^t 1ε − )2 / Σεt

2 = 141,915/1,689,870 = .084.

30.10. DW ≅ 2(1 - ρ). .084 ≅ 2(1 - ρ). ⇒ ρ ≅ .96.Comment: A graph of the residuals, showing that they are highly serially correlated:

5 10 15 20 25

-200

200

400

ρ ≅ 1 ⇒ ^t 1ε − ≅ εt ⇒ Σ( εt -

^t 1ε − )2 ≅ 0 ⇒ DW = Σ( εt -

^t 1ε − )2 / Σεt

2 ≅ 0.

If instead ρ ≅ -1 ⇒ ^t 1ε − ≅ − εt ⇒ Σ( εt -

^t 1ε − )2 ≅ Σ(2 εt )2 = 4Σεt

2 ⇒

DW = Σ( εt - ^t 1ε − )2 / Σεt

2 ≅ 4.

30.11. A. t = 30 t = 30 t = 30 t = 29

Σ( εt - ^t 1ε − )2 = Σεt

2 - 2Σεt^t 1ε − + Σεt

2 = (2422 - 72) - (2)(801) + (2422 - 112) = 3072. t = 2 t = 2 t = 2 t = 1 t = 30 t = 30

DW = Σ( εt - ^t 1ε − )2 / Σεt

2 = 3072/2422 = 1.268. t = 2 t = 1


30.12. The fitted parameters are: α = 0.0490683, β = -0.0117457, γ = -0.00460421.The standard errors are respectively: 0.00265772, 0.00900476, 0.000861757.The t-statistics and p-values are: 18.463 (0%), -5.343 (0%), -1.304 (20.3%).

RSS = .0001787. ESS = .0001500. R2 = .544. 2

R = .510. F = 16.09 (0%). Durbin-Watson Statistic is 1.258.

h = (1 - DW/2)√(N/(1- N Var[^β])) = (1 - 1.258/2)√30/(1 - (30)(0.009004762) = 2.051.

Φ(1.960) = .975 and Φ(2.326) = .990. 1.960 < 2.051 < 2.326. We reject H0 at 2.5% and do not reject H0 at 1% (one-sided test).

30.13. Y = (420, 450, 480, 490, 500, 510, 520, 550, 580, 610, 640, 650, 660, 670, 680, 710,

740). α = 407.059. ^β = 19.2157. s2 = 143.268. sα = 6.0721. βs = 0.59258.

tα = 407.059/6.0721 = 67.04. tβ = 19.2157/0.59258 = 32.43.

RSS = 150,651. ESS = 2149. TSS = 152,800. R2 = 0.986. 2

R = 0.985.F = 1052 = 32.432. The Durbin-Watson Statistic is 0.749.Comment: Unlike the linear regression model, the errors are not random in this example.However, the errors are positively correlated, and therefore DW < 2.For k = 1 (one explanatory variable excluding the constant) and N = 17, dl = 1.13.

Since 0.749 < 1.13, we reject H0: ρ = 0, at a 5% significance level.

One could add a random component to the errors, with E[ε1] = 0, E[ε2] = 10, etc.

30.14. A. Day Y X^tY = -25 + 20Xt εt = Yt -

^tY εt -

^t 1ε −

1 11 2 15 -42 20 2 15 5 93 30 3 35 -5 -104 39 3 35 4 95 51 4 55 -4 -86 59 4 55 4 87 70 5 75 -5 -98 80 5 75 5 10

DW = Σ( εt - ^t 1ε − )2 / Σεt

2 = 571/164 = 3.48.

30.15. D. DW ≅ 2(1 - ρ). .8 ≅ 2(1 - ρ). ⇒ ρ ≅ .6.


30.16 t X Y^Y = -25 +20X εt = Yt -

^Y

1 2 10 15 -5 2 2 20 15 5 3 3 30 35 -5 4 3 40 35 5

DW = Σ( εt - ^t 1ε − )2 / Σεt

2 = (102 + (-10)2 + 102)/(-5)2 + 52 + (-5)2 + 52 = 300/100 = 3.Comment: The residuals appear to be negatively serially correlated.

30.17. D. The residuals, εt , are: 77 - 77.6 = -.6, 69.9 - 70.6 = -.7, 73.2 - 70.9 = 2.3, 72.7 - 72.7 = 0, and 66.1 - 67.1 = -1.

Durbin-Watson statistic = Σ( εt - ^t 1ε − )2 / Σεt

2 =

((-.1)2 + 32 + (-2.3)2 + (-1)2)/((-.6)2 + (-.7)2 + 2.32 + 02 + (-1)2) = 15.3/7.14 = 2.143. 2.143 = DW ≅ 2(1 - ρ). ⇒ ρ ≅ -.071.Comment: DW > 2 is some indication of negative serial correlation.

30.18. E. We given that the Standard Error of ^β = 0.1. ⇒ Var[

^β] = .12 = .01.

h = (1 - DW/2)√N/(1 - N Var[^β]) = (1 - 1.2/2)√36/(1 - (36)(.01)) = 3.

h has a Standard Normal Distribution if there is no serial correlation.If the alternative hypothesis is positive serial correlation, then we do a one-sided test:1 - Φ(3) = 1 - .9987 = 0.13% < 1%. Reject the null hypothesis of no serial correlation at the 1% significance level. If the alternative hypothesis is serial correlation, then we do a two-sided test:2(1 - Φ(3)) = 0.26% < 1%. Reject the null hypothesis of no serial correlation at the 1% significance level. Comment: See Example 6.7 in Pindyck and Rubinfeld.The exam question really should have specified the alternative hypothesis. One uses Durbin's h test when the model includes a lagged dependent variable on the right

side of the equation. We use Var[^β] in computing h, since β is the coefficient of the lagged

dependent variable.


31.1. D. t Actual Fitted εt1 68.9 69.9 68.9 - 69.9 = -1.02 73.2 72.8 73.2 - 72.8 = 0.43 76.8 75.6 76.8 - 75.6 = 1.24 79.0 78.5 79.0 - 78.5 = 0.55 80.3 81.4 80.3 - 81.4 = -1.1

Fit a linear regression εt = ρ ^t 1ε − + error.

Since this is a model with no intercept, ρ = Σ^t 1ε − εt / Σ^

t 1ε −2 =

(-1)(.4) + (.4)(1.2) + (1.2)(.5) + (.5)(-1.1)/(-1.0)2 + .42 + 1.22 + .52 = .13 / 2.85 = .046.

31.2. C. Y6* = Y6 - ρ Y5 = 33.743 - (.3)(14.122) = 29.5.

31.3. B. ρ = .4 has the smallest Error Sum of Squares.

31.4. X = 18.5. x = (-17.5, -16.5, ..., 17.5). Y = ln(CPI) = (5.37805, 5.38404, ..., 5.46848).^β = ΣxiYi/Σxi2 = 9.92578/3885 = .0025549.

α = Y - ^βX = 195.405/36 - (.0025549)(18.5) = 5.38065.

Comment: Data for 1995 to 1997.

31.5. ^Y = α +

^βX = 5.38065 + .0025549X = (5.3832, 5.38576, ..., 5.47007, 5.47263).

ε = Y - ^Y = (-0.005167, -0.001738, ..., -0.004561, -0.004159).

Durbin-Watson Statistic = Σ( εt - ^t 1ε − )2/Σ εt

2 = 0.00006859/0.0002635 = .260.Comment: As is common for time series, positive serial correlation is indicated.


31.6. 1. Fit a regression and get the resulting residuals εt .

From the previous solution, ε = Y - ^Y = (-0.005167, -0.001738, ..., -0.004561, -0.004159).


t 1ε −2 =

(-0.005167)(-0.001738) + .... + (-0.00456)(-0.004159)/(-0.005167)2 + .... (-0.004561)2 = 0.0002072/0.0002462 = .842. 3. Let Xt

* = Xt - ρXt-1 = (1.158, 1.316, 1.474, ..., 6.530). (There are now only 35 elements, rather than 36.)Yt

* = Yt - ρYt-1 = (0.855716, 0.85297, 0.853327, ..., 0.866510).


α(1 - ρ) = 0.851256. ⇒ α = 0.851256/(1 - .842) = 5.38770. ^β = 0.00227827.



^Y = α +

^βX = 5.38770 + 0.00227827X = (5.38998, 5.39226, 5.39453, ..., 5.46972).

ε = Y - ^Y = (-0.0119259, -0.0082203, -0.00820657, ..., -0.00123573).

6. Estimate the serial correlation coefficient:

ρ = Σ ^t 1ε − εt / Σ^

t 1ε −2 = 0.000585432/0.000691695 = .846.

7. Although one could do another iteration, the value of ρ seems to have converged sufficiently for most purposes.

One can use the equation with ρ = .842, ^Y = 5.38770 + 0.00227827X.


31.7. DW ≅ 2(1- ρ). ⇒ ρ ≅ 1 - DW/2 = 1 - .26/2 = .87.Thus try ρ = .83, .85, .87, .89, .91.1. Let Xt

* = Xt - ρXt-1, and Yt* = Yt - ρYt-1.

For example, for ρ = .87:Xt

* = (1.13, 1.26, 1.39, 1.52, 1.65, 1.78, 1.91, 2.04, 2.17, 2.3, 2.43, 2.56, 2.69, 2.82, 2.95, 3.08, 3.21, 3.34, 3.47, 3.60, 3.73, 3.86, 3.99, 4.12, 4.25, 4.38, 4.51, 4.64, 4.77, 4.90, 5.03, 5.16, 5.29, 5.42, 5.55).Yt

* = (0.705131, 0.702217, 0.702509, 0.702346, 0.703035, 0.705593, 0.70526, 0.704367, 0.706002, 0.705562, 0.704565, 0.709634, 0.708639, 0.706551, 0.706778, 0.707004, 0.70723, 0.709644, 0.708397, 0.70737, 0.709658, 0.708744, 0.707666, 0.712479, 0.711839, 0.711471, 0.71057, 0.71079, 0.710156, 0.711119, 0.711338, 0.710707, 0.711665, 0.712729, 0.713475). 2. Fit by regression the transformed equation Y∗ = α(1 - ρ) + βX∗.

For example, for ρ = .87: α(1 - ρ) = 0.700669 and β = 0.00221417.3. The best regression has the smallest Error Sum of Squares (ESS).ρ .83 .85 .87 .89 .91ESS (10-10) 556198 554812 555378 557898 56237The best of these is ρ = .85.4. Refine the grid of values for ρ, and again perform steps 1, 2, and 3:

0.846 0.848 0.85 0.852 0.854 0.856 0.858 0.86Rho

0.000055478

0.00005548

0.000055482

0.000055484

0.000055486

0.000055488

0.00005549

ESS

Take ρ = .854. For the transformed regression with ρ = .854:

α(1 - ρ) = 0.786716. ⇒ α = 0.786716/(1 - .854) = 5.38847. ^β = 0.00225381.

Translate the transformed equation back to the original variables: Y = α + βX = 5.38847 + 0.00225381X.Comment: The results of using the two different procedures are very similar.How to forecast in the presence of serial correlation is discussed in a subsequent section.


31.8. B. t Actual Fitted εt1 77.0 77.6 77.0 - 77.6 = -0.62 69.9 70.6 69.9 - 70.6 = -0.73 73.2 70.9 73.2 - 70.9 = 2.34 72.7 72.7 72.7 - 72.7 = 05 66.1 67.1 66.1 - 67.1 = -1.0

Fit a linear regression εt = ρ ^t 1ε − + error.

Since this is a model with no intercept, ρ = Σ^t 1ε − εt / Σ^

t 1ε −2 =

(-.6)(-.7) + (-.7)(2.3) + (2.3)(0) + (0)(-1)/(-.6)2 + (-.7)2 + 2.32 + 02 = -1.19/6.14 = -.19.Thus our first estimate of the lag 1 serial correlation coefficient is -.19.Comment: The fitted values given in the question are not based on t being the (only) independent variable. When I fit a linear regression with t the independent variable, I got: Yt = 77.48 - 1.9t, which does not match the fitted values given in the question.

31.9. A. At page 159 of Pindyck and Rubinfeld, “Serial correlation does not affect the unbiasedness or consistency of the ordinary least squares estimators, but it does affect their efficiency.” Therefore, A is false and B is true. For positive serial correlation, the R2 probably gives an overly optimistic picture of the success of the regression, see page 160 of Pindyck and Rubinfeld. C is True. The lower efficiency of the ordinary least-squares estimator is masked; the estimator of the standard error of β is biased downward. D is True. See page 159 of Pindyck and Rubinfeld. The Cochrane-Orcutt procedure is designed to correct for serial correlation, and therefore E is True. See page 163 of Pindyck and Rubinfeld.Comment: The assumption of independent errors with constant variance was not necessary for the ordinary least-squares estimator of the slope to be unbiased.

In Appendix 3.1, Result 1, E[^β] = β, is derived, without any reference to the variance-

covariance matrix of the errors. (We do use the fact that the errors have mean of zero.)


32.1. Corr[Height, Weight] = .790. Corr[Height, Chest Depth] = .791. Corr[Weight, Chest Depth] = .881. The high correlations between these independent variables may indicate multicollinearity. While the R2 is high at .967, and the F-Statistic has a very small p-value of 0.07%, 3 of 4 slopes have t-statistics that are not significant at the 5% level. This is a possible sign of multicollinearity. In order to test for multicollinearity, one might calculate the condition number as mentioned by Pindyck and Rubinfeld.In order to test for whether multicollinearity is causing problems, one could run regressions with one or more variables dropped, and see the effect on the standard errors.

An older boy is expected to have a higher maximal oxygen intake, yet ^β2 < 0. A boy who

weighs more is expected to have larger lungs and to have a higher maximal oxygen intake,

yet ^β4 < 0. These unexpected signs of slopes are possible signs of multicollinearity.

Comment: The model Y = β1 + β2X2 + β3X3 + β4X4 + ε, was fit:Parameter Estimate Standard Error T-Statistic P-Valueβ1 -4.503 .5038 -8.939 0.01%

β2 -0.03631 .01405 -2.584 4.2%

β3 0.05254 .005380 9.766 0.006%

β4 -0.01970 .009078 -2.170 7.3%

R2 = .966. 2

R = .950. s2 = .00119. D.F. Sum of Squares Mean Square F-Ratio P-Value

RSS 4 0.20581 0.0686 58 0.01%ESS 5 0.00715 0.00119TSS 9 0.21296Dropping X5 from the model: R2 ≥ .95, the p-value for the F-Test is .01%, and all the individual fitted parameters are now significant at the 10% level. This indicates that multicollinearity was indeed causing problems with the original model.

32.2. X2 and X3 are highly correlated, since a firm with more trucks pays on average more for insurance.The written premium depends on the rate level charged in each state, so that X2 is probably significantly correlated (positively or negatively) with many of the state dummy variables.If in the past there has been expansion or contraction by Hoffa Insurance of business written based on in which states a firm has large exposure, then one or more of the state dummy variables could be significantly correlated (positive or negative) with the number of years insured with Hoffa Insurance.A trucking firm is probably more likely to have exposures in neighboring states than in states distant from each other. Therefore, some pairs or groups of state dummy variables may be significantly correlated. Comment: These are four possible answers, there may be others. Your exam is extremely unlikely to have open ended questions like this one.


32.3. C. For each regression, the t-statistic has 20 - 4 = 16 degrees of freedom, and the 10% significance critical value is 1.746. We would like the absolute value of all the t-statistics to be at least 1.746.For regression A, R2 = 773/(773 + 227) = .773. The t-statistics are: 44/16 = 2.75, 1.3/0.7 = 1.857, -2.4/0.5 = -4.8.For regression B, R2 = 798/1000 = .798. The t-statistics are: 40/18 = 2.22, 1.5/0.6 = 2.5, 0.35/0.16 = 2.19.For regression C, R2 = 769/1000 = .769. The t-statistics are: 37/25 = 1.48, -1.9/1.4 = -1.36, 0.41/0.23 = 1.78.High R2 with few significant t-statistics. An indication of multicollinearity.For regression D, R2 = 785/1000 = .785. The t-statistics are: 1.7/0.7 = 2.42, -2.8/1.1 = -2.55, 0.45/0.24 = 1.88.Comment: F = R2/(1 - R2)(N - k)/(k - 1) = R2/(1 - R2)16/3, with 3 and 16 d.f.For the smallest R2, .769, F = (.769/.231)(16/3) = 17.75. Since the 1% critical value is 5.29, in each case the F-Statistic is significant.

Solutions to problems in the remaining sections appear in Study Guide O.


Mahler’s Guide to

Regression



prepared by


Study Aid F06-Reg-O



33.1. A. X = 3. x = (-2, -1, 0, 1, 2). Σxi2 = 10.

ΣxiYi = -39640. ^β = ΣxiYi/Σxi2 = -39640/10 = -3964.0.

Y = 403221/5 = 80644.2. α = 80644.2 - (-3964.0)(3) = 92,536.Forecasted frequency for year 8 is: 92,536 + (-3964.0)(8) = 60,824.

33.2. A. ^Y = (88572, 84608, 80644, 76680, 72716).

ε = Y - ^Y = (-1816, 993, 2830, -1373, -633).

ESS = Σ îε 2 = 14,578,623. s2 = ESS/(N-2) = 14578623/(5-2) = 4,859,541.

33.3. B. 8 - X = 8 - 3 = 5. Variance of forecast of expected frequency in year 8 is:s21/N + x2/Σxi2 = (4,859,541)1/5 + 52/10 = 13,120,761.For the t-distribution with 5 - 2 = 3 degrees of freedom, the critical value for 5% area in both tails is 3.182.95% confidence interval for the expected frequency in year 8 is: 60,824 ± 3.182√13120761 = 60,824 ± 11,526 = (49,298, 72,350).

33.4. C. Mean Squared Error of the forecast of the observed frequency in year 8 is:s21 + 1/N + x2/Σxi2 = (4,859,541)1 + 1/5 + 52/10 = 17,980,302.95% confidence interval for the observed frequency in year 8 is: 60,824 ± 3.182√17980302 = 60,824 ± 13,493 = (47,331, 74,317).

HCMSA-F06-Reg-O, Solutions to Regression §33-44, 7/12/06, Page 530

33.5. E. X = 228/12 = 19. x = (-11, -13, ... , 21). Σxi2 = 924. ^Y = 64.247 - 1.013X.

^Y = (56.14, 58.17, 53.10, 41.96, 50.06, 47.03, 46.01, 39.94, 45.00, 40.95, 37.91, 23.73).

ε = Y - ^Y = (2.86, -0.17, 2.90, 11.04, -0.06, -2.03, -3.01, 2.06, -6.00, -2.95, -7.91, 3.27).

ESS = Σ îε 2 = 273.84. s2 = ESS/(N-2) = 273.84/(12-2) = 27.384.

Mean Squared Error of forecast for X = 35:s21 + 1/N + x2/Σxi2 = (27.384)1 + 1/12 + (35 - 19)2/924 = 37.253.For the t-distribution with 12 - 2 = 10 degrees of freedom, the critical value for 5% area in both tails is 2.228.The forecast for X = 35 is: 64.247 - (1.013)(35) = 28.792.95% confidence interval: 28.792 ± 2.228√37.253 = 28.8 ± 13.6 = (15.2, 42.4).Comment: An example of a cross-section study, as opposed to a time series.The regression was fit as follows:Y = 540/12 = 45. y = (14, 13, ..., -18). Σxiyi = -936.

^β = - 936/924 = -1.013. α = 45 - (-1.013)(19) = 64.247.Here is a graph of the data, the fitted line and the 95% confidence interval (dashed):

10 20 30 40size

20

40

60

80% wormy

Note that the confidence intervals get wider as we get further from X = 19.


33.6. B. The forecasted expected ln[severity] in year 8 is: 8.64516 + (0.0979168)(8) = 9.42849.X = 3. x = (-2, -1, 0, 1, 2). Σxi2 = 10. 8 - X = 8 - 3 = 5.Variance of forecast of expected ln[severity] in year 8 is:s21/N + x2/Σxi2 = (0.00191151)1/5 + 52/10 = .0051611.For the t-distribution with 5 - 2 = 3 degrees of freedom, the critical value for 10% area in both tails is 2.353.90% confidence interval for expected ln[severity] in year 8 is: 9.42849 ± 2.353√.0051611 = 9.42849 ± .16904 = (9.25945, 9.59753).Exponentiating, 90% confidence interval for expected severity in year 8 is: (10503, 14728).

33.7. E. (1, 2, 100, 200) (Covariance Matrix) (1, 2, 100, 200) = 8.795.The predicted expected score is: 18.40 + (2.04)(2) + (0.1120)(100) + (.09371)(200) = 52.42.95% confidence interval: 52.42 ± (1.960)√8.795 = 52.4 ± 5.8. Upper end is 58.2.

33.8. A. (1, 0, 150, 50) (Covariance Matrix) (1, 0, 150, 50) = 4.05.Adding s2, the mean squared forecast error is: 20.38 + 4.05 = 24.43.The predicted expected score is: 18.40 + (2.04)(0) + (0.1120)(150) + (.09371)(50) = 39.89.90% confidence interval for the observed score: 39.89 ± (1.645)√24.43 = 39.9 ± 8.1.Upper end is 48.0.

33.9. D. (1, 1, 50, 250) (Covariance Matrix) (1, 1, 50, 250) = 9.35.Adding s2, the mean squared forecast error is: 20.38 + 9.35 = 29.73.The predicted expected score is: 18.40 + (2.04)(1) + (0.1120)(50) + (.09371)(250) = 49.47.Prob[Y ≥ 54.2] = 1 - Φ((54.2 - 49.47)/√29.73) = 1 - Φ(.867) = 19.3%.

33.10. D. X = 1985. x = (-15, -5, 5, 15). Σxi2 = 500.

ΣxiYi = -14765. ^β = ΣxiYi/Σxi2 = -14765/500 = -29.53.

Y = 12583/4 = 3145.75. α = 3145.75 - (-29.53)(1985) = 61,762.8.Forecasted value for 2005 is: 61,762.8 + (-29.53)(2005) = 2555.

^Y = (3588.7, 3293.4, 2998.1, 2702.8).

ε = Y - ^Y = (-8.7, 8.6, 8.9, -8.8).

ESS = Σ îε 2 = 306.3. s2 = ESS/(N-2) = 306.3/(4-2) = 153.2.

Variance of the forecast of the expected value of 100,000q70 in 2005 is:

s21/N + x2/Σxi2 = (153.2)1/4 + (2005 - 1985)2/500 = 160.9.For the t-distribution with 4 - 2 = 2 degrees of freedom, the critical value for 1% area in both tails is 9.925.99% confidence interval for the expected value of 100,000q70 in 2005 is: 2555 ± 9.925√160.9 = 2555 ± 126 = (2429, 2681).Comment: The mortality data was made up.


33.11. E. Since Y is LogNormal, lnY is Normal. For X = 5, estimate of lnY is: 8.743 + (5)(.136) = 9.423. X = 13,283/4000 = 3.32075. Second moment of X is: 54,940/4000 = 13.735.Variance of X is: 13.735 - 3.320752 = 2.7076. Σxi2 = (4000)(2.7076) = 10,830.

s2 = ESS/(N-2) = 3285/3998 = .8217.5 - X = 1.67925. Variance of forecast at x = X - X is: s21/N + x2/Σxi2 = (.8217)(1/4000 + 1.6792/10830) = .000419.90% confidence interval for lnY is: 9.423 ± (1.645)√.000419 = 9.423 ± .034.90% confidence interval for Y is: e9.389 to e9.457 ⇔ 11,956 to 12,797.Comment: Not intended as a realistic model of private passenger automobile losses.

33.12. E. Mean Squared Forecast Error at x = X - X is: s21 + 1/N + x2/Σxi2 = (.8217)(1 + 1/4000 + 1.6792/10830) = .8221.90% confidence interval for lnY is: 9.423 ± (1.645)√.8221 = 9.423 ± 1.49290% confidence interval for Y is: e7.931 to e10.915 ⇔ 2782 to 54,995.Comment: Due to the process variance of a LogNormal, the 90% confidence interval for the observed size of loss is very wide.

33.13. A. The expected finish time (seconds exceeding 2 hours) in 2006 (t = 2006 - 1980 = 26) is: 596.496 - (0.86735)(26) = 573.95.X = (0, 1, ..., 25). X = 12.5. x = X - X = (-12.5, -11.5, ..., 12.5). Σxi2 = 1462.5.

Var[^β] = s2/Σxi2. ⇒ s2 = Var[

^β]Σxi2 = (6.1238)(1462.5) = 8956.1.

Variance of Forecast for 2006 is: Var[ α + 26^β] = Var[ α] + 262Var[

^β] + (2)(26)Cov[ α,

^β] =

1301.31 + (676)(6.1238) + (52)(-76.548) = 1460.50. Mean Squared Forecast Error for 2006 is: 1460.50 + s2 = 1460.50 + 8956.1 = 10,417.For N - 2 = 24 degrees of freedom, the critical value for the 1% area in both tails for the t-distribution is 2.797.A 99% confidence interval for the observed finish time (seconds exceeding 2 hours) in 2006:573.95 ± 2.797√10417 = 574 ± 285 = (289, 859).Translating back to hours and minutes: (2:4:49, 2:14:19). Alternately, the Mean Squared Forecast Error for 2006 (X = 26) is: s21 + 1/N + x2/Σxi2 = (8956.1)(1 + 1/26 + 13.52/1462.5) = 10,417. Proceed as before.Comment: The slope is not significantly different from zero; for a t-test the p-value is 73%.

33.14. D. The forecast for 2006 is 573.95 with mean squared error 10,417.The observed value for 2006, converting to seconds beyond two hours is: (60)(7) + 14 = 434.normalized forecast error = (573.95 - 434)/√10,417 = 1.37.


33.15. E. X = 2.5. x = (-1.5, -.5, .5, 1.5). Σxi2 = 5.

ΣxiYi = 32. ^β = ΣxiYi/Σxi2 = 32/5 = 6.4

Y = 1026/4 = 256.5. α = 256.5 - (6.4)(2.5) = 240.5.Forecasted value for year 5 is: 240.5 + (5)(6.4) = 272.5.

^Y = (246.9, 253.3, 259.7, 266.1).

ε = Y - ^Y = (4.1, -6.3, 0.3, 1.9).

ESS = Σ îε 2 = 60.2. s2 = ESS/(N-2) = 60.2/(4-2) = 30.1.

Mean Squared Error of the forecast of the pure premium observed in year 5 is:s21 + 1/N + x2/Σxi2 = (30.1)1 + 1/4 + (5 - 2.5)2/5 = 75.25.For the t-distribution with 4 - 2 = 2 degrees of freedom, the critical value for 5% area in both tails is 4.303.95% confidence interval for the pure premium observed in year 5 is:272.5 ± 4.303√75.25 = 272.5 ± 37.3 = (235, 310).

33.16. D. ^β = ΣXiYi/ΣXi2 = 126096439/317475058 = .397185.

ESS = Σ (Yi - ^

iY )2 = (3312 - 3192.2)2 + (3090 - 3156.8)2 + (2983 - 3094.9)2

+ (3211 - 3227.1)2 + (3224 - 3152.1)2 = 36765.s2 = ESS/(5 - 1) = 36765/4 = 9191.

Var[^β] = s2/ΣXi2 = 9191/317475058 = .00002895.

For Accident Year 2004, the expected number of claims reported after December 31, 2004 is:(.397185)(8431) = 3348.7.

Var[8431^β] = (84312)(.00002895) = 2058.

The mean squared error of the estimate compared to the observed is: 2058 + 9191 = 11249.Expected number of claims for Accident Year 2004 is: 8431 + 3348.7 = 11779.7.For the t-distribution with 5 - 1 = 4 degrees of freedom, 2.132 is the critical value for 10% area in both tails.90% confidence interval is: 11779.7 ± 2.132√11249 = 11780 ± 226 = (11554, 12006).


33.17. Assuming the model is a reasonable one for boys ages 10 to 14, it is probably reasonable to use the model to estimate the average height of a boy aged 15:30.6 + (2.44)(15) = 67.2 inches. It would not be reasonable to use the model to estimate the average height of a boy aged 0: 30.6 + (2.44)(0) = 30.6 inches! It would not be reasonable to use the model to estimate the average height of a man aged 60: 30.6 + (2.44)(60) = 177 inches = 14.75 feet! In general, one should be cautious about applying a model outside the range of data used to fit that model.

Comment: Var[^β] = s2/(N Var[X]) = (9.2)/(1000)(2) = 0.0046.

Var[ α] = E[X2]Var[^β] = (146)(.0046) = 0.6716.

Cov[ α,^β] = - X Var[

^β] = - (12)(.0046) = -0.0552.

The variance of the forecast of the mean height of a boy aged 15 is:

Var[ α + 15^β] = Var[ α] + 30Cov[ α,

^β] + 225Var[

^β] = 0.6716 + (30)(-0.0552) + (225)(0.0046)

= 0.0506.Alternately, the variance of the forecast of the mean height of a boy aged 15 is:(9.2)(1/1000 + (15 - 12)2/(1000)(2)) = 0.0506.The mean squared error of the forecast of the height of a boy aged 15 is:(9.2)(1 + 1/1000 + (15 - 12)2/(1000)(2)) = 9.25. Thus almost of the forecast error would be due the variance of the distribution of heights of boys aged 15, rather than an error in predicting the average height of a boy aged 15, assuming the model is valid. By the way, in this case, the error terms are likely to be not quite homoscedastic, with a slightly larger variance of the heights of boys aged 14, than the variance of the heights of boys aged 10.

33.18 & 33.19. The expected catastrophe losses in 2005 are: -1057.7 + (0.535385)(2005) = 15.75.s2 = 8.424362 = 70.9698. N = 14.X = 1997.5. x = X - X = (-6.5, -5.5, ..., 6.5). Σxi2 = 227.5.

Variance of Forecast at x = 2005 - X = 7.5 is: s21/N + x2/Σxi2 = (70.9698)(1/14 + 7.52/227.5) = 22.617.For N - 2 = 12 degrees of freedom, the critical value for the 10% area in both tails for the t-distribution is 1.782.90% confidence interval for the expected catastrophe losses in 2005:15.75 ± 1.782√22.617 = 15.75 ± 8.47 = (7.3, 24.2).Mean Squared Forecast Error at x = 2005 - X = 7.5 is: s21 + 1/N + x2/Σxi2 = (70.9698)(1 + 1/14 + 7.52/227.5) = 93.587.90% confidence interval for the observed catastrophe losses in 2005:15.75 ± 1.782√93.587 = 15.75 ± 17.24 = (0, 33).Comment: Catastrophe losses are never negative. There are large errors in the estimated coefficients of the model, and therefore the interval for the expected catastrophe losses is


very wide. Linear regression is not a practical tool for predicting catastrophe losses. Due to the large random fluctuation in catastrophe losses year to year, the interval for the observed catastrophe losses is extremely wide and useless.

33.20. (i) There appears to be a linear relationship with a positive slope.

1 2 3 4 5x

5

10

15

20

25

30

35

40

y

(ii) Var[X] = 91.08/10 - 2.662 = 2.0324. Var[Y] = 5603.12/10 - 20.282 = 149.0336.Cov[X,Y] = 707.58/10 - (2.66)(20.28) = 16.8132. Corr[X,Y] = 16.8132/√(2.0324)(149.0336) = 0.966.A large positive correlation, agreeing with the previous comment.

(iii) ^β = NΣXiYi - ΣXiΣYi / NΣXi2 - (ΣXi)2

= (10)(707.58) - (26.6)(202.8)/(10)(91.08) - 26.62 = 168.132/203.24 = 8.2726

α = Y - ^βX = 20.28 - (8.2726)(2.66) = -1.725.

Fitted regression is: y = -1.73 + 8.273 x.(iv) (a) TSS = Σ(Yi - Y )2 = ΣYi2 - (ΣYi)2/N = 5603.12 - 202.82/10 = 1490.336.

RSS = ΣXiYi - ΣXiΣYi/N2/ΣXi2 - (ΣXi)2/N = 168.1322/20.234 = 1390.886.

Alternately, RSS = TSS R2 = TSS Corr[X, Y]2 =(1490.336)(0.9662) = 1391.ESS = TSS - RSS = 1490.336 - 1390.886 = 99.450.(b) R2 = RSS/TSS = 1390.886/1490.336 = 0.933.This is also the square of the correlation between x and y: 0.9662 = 0.933.(c) s2 = ESS/(N - 2) = 99.450/8 = 12.43.(v) Recalling that x, the ski-lift capacity, is given in thousands, a 500 increase in skiers per hour is an increase in x of 0.5.

The expected increase in y is: 0.5^β = (0.5)(8.2726) = 4.136 or 4136 visitor days.

Var[^β] = s2/(N Var[X]) = 12.43/20.324 = 0.6116.

Var[.5^β] = .52Var[

^β] = (.25)(0.6116) = 0.1529.

StdDev[.5^β] = √0.1529 = 0.391 or standard error of the estimate of 391 visitor days.


33.21. (i) There appears to be a linear relations between x and y.

35 40 45 50 55

x

1000

1100

1200

1300

1400

1500

y

(ii) ^β = NΣXiYi - ΣXiΣYi / NΣXi2 - (ΣXi)2 =

(12)(650,264.8) - (516.4)(14821)/(12)(22,741.34) - 516.42 = 149613.2/6227.12 = 24.026.

α = Y - ^βX = 14821/12 - (24.026)(516.4/12) = 201.16.

The fitted line is: Y = 201.2 + 24.03 X.(iii) TSS = Σ(Yi - Y )2 = ΣYi2 - (ΣYi)2/N = 18,695,125 - 148212/12 = 389955.

RSS = ^β2 Σ(Xi - X)2 =

^β2 ΣXi2 - (ΣXi)2/N = (24.0262)22,741.34 - 516.42/12 = 299550.

Alternately, RSS = Sxy2/Sxx = ΣXiYi - ΣXiΣYi/N2/ΣXi2 - (ΣXi)2/N

= 650,264.8 - (516.4)(14821)/122/22,741.34 - 516.42/12 = 12467.82/518.93 = 299553.ESS = TSS - RSS = 389955 - 299550 = 90405.s2 = ESS/(N - 2) = 90405/10 = 9040.5.

Var[^β] = s2 / N Var[X] = 9040.5/ΣXi2 - (ΣXi)2/N = 9040.5/22,741.34 - 516.42/12 = 17.42.

βs = √17.42 = 4.17.

For 10 degrees of freedom, using the t-table, the critical value for 5% area in both tails is 2.228.95% confidence interval for β is: 24.03 ± (2.228)(4.17) = (14.7, 33.3).We have assumed Normal errors with a constant variance.


(iv) Variance of forecast at x = X - X is: s21/N + x2/Σxi2.

X = 516.4/12 = 43.033. Σxi2 = Σ(Xi - X)2 = ΣXi2 - (ΣXi)2/N = 22,741.34 - 516.42/12 = 518.93.

s21/N + x2/Σxi2 = (9040.5)(1/12 + x2/518.93).(a) For X = 50, the forecast is: 201.2 + (24.03)(50) = 1402.7.For X = 50, x = X - X = 50 - 43.033 = 6.967.Variance is: (9040.5)(1/12 + 6.9672/518.93) = 1599.95% confidence interval for the mean resting metabolic rate when X = 50 is:1402.7 ± (2.228)(√1599) = 1402.7 ± 89.1 = (1314, 1492). (b) For X = 75, the forecast is: 201.2 + (24.03)(75) = 2003.5.For X = 75, x = X - X = 75 - 43.033 = 31.967.Variance is: (9040.5)(1/12 + 31.9672/518.93) = 18556.95% confidence interval for the mean resting metabolic rate when X = 75 is:2003.5 ± (2.228)(√18556) = 2003.5 ± 303.5 = (1700, 2307). (v) The confidence interval for X = 50 is okay.However, in the case of X = 75 we are extrapolating significantly outside the values of X contained in the data. Therefore, considerable caution is needed in using this confidence interval.Comment: Here is a graph of the fitted line and the data:

35 40 45 50 55x

1000

1100

1200

1300

1400

1500

y


33.22. The fitted line is: Y = 201.2 + 24.03 X.The ends of the confidence intervals for the mean resting metabolic rate are: 201.2 + 24.03 X ± (2.228)√(9040.5)(1/12 + (X - X)2/518.93).

30 35 40 45 50 55mass

800

1000

1200

1400

1600

met. rate

Note how the confidence intervals are wider further from X = 516.4/12 = 43.03.

33.23. The ends of the confidence intervals for the observed resting metabolic rate are: 201.2 + 24.03 X ± (2.228)√(9040.5)(1 + 1/12 + (X - X)2/518.93).

30 35 40 45 50 55mass

800

1000

1200

1400

1600

met. rate

Note how the confidence intervals for the observed values are much wider than those for the mean values. The observed values vary around the mean value.


33.24. C. Var[Y10] = Var[α + 10β] = Var[α] + Var[10β] + 2Cov[α , 10β] =

Var[α] + 100Var[β] + 20Cov[α , β] = .00055 + (100)(0.00002) + (20)(-0.0001) = .00055.√.00055 = .0235.Alternately, one can use the delta method.h(α, β) = Y10 = α + 10β. ∂h/∂α = 1 and ∂h/ ∂β = 10. Therefore, the variance of the forecast is: ( 0.00055 -0.00010) ( 1 )(1 10) ( ) ( ) = .00055.

(-0.00010 0.00002) (10)The standard deviation of the forecast is: √.00055 = .0235.Comment: The delta method, covered on Exam 4/C, only measures the error in forecasting the expected value, not the error due to random fluctuation of the future observation around its expected value. The delta method measures the effect of the random fluctuations contained in the observations used to fit the model, not the random fluctuations of future observations. Note that equation 8.12 in Econometric Models and Economic Forecasts, by Pindyck and Rubinfeld, includes an additional term of σ2, the variance of ε, the error term in the model. This term takes into account the additional error due to the random fluctuation of the loss ratio for year 10 around its expected value.

33.25. For 8 data points, we 8 - 2 = 6 degrees of freedom.For the t-distribution with 6 d.f., for 5% area in both tails, the critical value is 2.447.The forecast is: .50 + (10)(.02) = .700. An approximate 95% confidence interval is: .700 ± (2.447)(.0235) = .700 ± .058.


33.26. (i) There seems to be an increasing and linear relationship.

4 4.25 4.5 4.75 5 5.25x4

4.25

4.5

4.75

5

5.25

5.5

5.75

y

(ii) ^β = (10)(238.3676) - (50.02)(47.12)/(10)(224.8554) - 47.122 = 0.946.

α = Y - ^βX = 5.002 - (.946)(4.712) = 0.544.

Fitted line is: y = 0.544 + 0.946x.

(iii) R2 = ^β2 Sxx/Syy = (0.946)224.8554 - 47.122/10/253.5796 - 50.022/10 = 0.748.

Alternately, TSS = 253.5796 - 50.022/10 = 3.37956.RSS = Sxy2/Sxx = 238.3676 - (50.02)(47.12)/102/224.8554 - 47.122/10 = 2.5290.

R2 = RSS/TSS = 2.5290/3.37956 = 0.748.This high R2 agrees with the apparent linear relationship.(iv) ESS = TSS - RSS = 3.37956 - 2.5290 = 0.8506.s2 = ESS/(N-2) = 0.8506/8 = 0.1063.ESS/σ2 follows a Chi-Square Distribution with 8 degrees of freedom.There is 95% probability between 2.18 and 17.53 on this Chi-Square Distribution..8506/17.53 = .0485 ≤ σ2 < .8506/2.18 = .390.A 95% confidence interval for σ2: (.0485, .390).

(v) Var[^β]= s2/Sxx2 = 0.1063/224.8554 - 47.122/102 = .0376.

For a t-distribution with 8 degrees of freedom, for a total of 5% area in both tails, 2.3060.946 ± (2.306)√.0376 = .946 ± .447 = (0.499, 1.393).(vi) The forecasted value is: 0.544 + (5)(.946) = 5.724.Var[forecast] = s21/N + (5- X)2/Sxx = (.1063)1/10 + (5 - 4.712)2/2.82596 = .01375.5.724 ± 2.306√.01375 = 5.274 ± .270 = (5.004, 5.544).


34.1. B. For the first ten years, X = 5.5. x = (-4.5, -3.5, ..., 4.5). Σxi2 = 82.5. ΣxiYi = -2.3725.

^β = ΣxiYi/Σxi2 = -2.4955/82.5 = -0.0302485. Y = 1.3983.

α = 1.3983 - (-0.0302485)(5.5) = 1.5647.Forecasted average rate for time 15 is: 1.5647 + (15)(-0.0302485) = 1.111.

34.2. E. Forecasted average rates for times 11 to 15 are:1.23194, 1.20169, 1.17144, 1.14119, 1.11094.Mean squared error is: (1.23194 - 1.366)2 + (1.20169 - 1.342)2 + (1.17144 - 1.334)2 + (1.14119 - 1.307)2 + (1.11094 - 1.362)2/5 = .03092.Root mean squared error = √.03092 = .176.

34.3. C. 2nd moment of forecasts is: (1.231942 + 1.201692 + 1.171442 + 1.141192 + 1.110942)/5 = 1.37410.2nd moment of observations is: (1.3662 + 1.3422 + 1.3342 + 1.3072 + 1.3622)/5 = 1.80195.U = .176/(√1.37410 + √1.80195) = .070.

34.4. E. mean of forecasts is: (1.23194 + 1.20169 + 1.17144 + 1.14119 + 1.11094)/5 = 1.17144. mean of observations is: (1.366 + 1.342 + 1.334 + 1.307 + 1.362)/5 = 1.3422.UM = (1.17144 - 1.3422)2/.03092 = 94.3%.

34.5. A. Standard deviation of forecasts = √(1.37410 - 1.171442) = .04276.Standard deviation of observations = √(1.80195 - 1.34222) = .02119.US = (.04276 - .02119)2/.03092 = 1.5%.

34.6. B. Correlation of the forecasts and the observations is:((1.23194 - 1.17144)(1.366 - 1.3422) + (1.20169 - 1.17144)(1.342 - 1.3422) +(1.17144 - 1.17144)(1.334 - 1.3422) + (1.14119 - 1.17144)(1.307 - 1.3422) +(1.11094 - 1.17144)(1.362 - 1.3422) /5)/(.04276)(.02119) = (0.00130075/5)/.0009061 = .287.UC = (2)(1 - .287)(.04276)(.02119)/.03092 = 4.2%.Comment: UM + US + UC = 94.3% + 1.5% + 4.2% = 100%.

34.7. B. Mean forecast = (76 + 106 + 110 + 142)/4 = 108.5.Mean actual = (81 + 93 + 125 + 129)/4 = 107. Bias = 108.5 - 107 = 1.5.MSE = (76 - 81)2 + (93 - 106)2 + (110 - 125)2 + (142 - 129)2/4 = 588/4 = 147. Bias proportion of inequality is: Bias2/ MSE = 1.52/147 = 0.0153.Comment: Similar to VEE-Applied Statistics Exam, 8/05, Q.6.


34.8. A. 2nd moment of forecasted = (762 + 1062 + 1102 + 1422)/4 = 12,319.variance of forecasted = 12319 - 108.52 = 546.75.2nd moment of actual = (812 + 932 + 1252 + 1292)/4 = 11,869.variance of actual = 11,869 - 1072 = 420.Standard Deviation of forecasted = √546.75 = 23.383.Standard Deviation of actual = √420 = 20.494.Variance proportion of inequality is: (23.383 - 20.494)2/147 = 0.0568.Comment: E[(forecast)(actual)] = (76)(81) + (106)(93) + (110)(125) + (142)(129)/4 = 12020.5. Cov[forecast, actual] = 12020.5 - (108.5)(107) = 411.Corr[forecast, actual] = 411/(23.383)(20.494) = .85766. Covariance proportion of inequality is: 2(1 - .85766)(23.383)(20.494)/147 = 0.9280.

34.9. C. In an ex post forecast, all of the values of the dependent variable are known at the time of the forecast, in contrast to an ex ante forecast. Comment: See pages 202-203 of Pindyck and Rubinfeld.

34.10. B. MSE = Σ(Fi - Oi)2 /100 = ΣFi2 + ΣOi2 - 2ΣFiOi /100 = 11,330 + 15,856 - (2)(10,281)/100 = 66.24. RMSE = √66.24 = 8.14.

34.11. E. Second Moment of forecasts = 11,330/100 = 113.30.Second Moment of observations = 15,856/100 = 158.56.U = 8.14/(√113.30 + √158.56) = .350.

34.12. A. F = 872/100 = 8.72. O = 981/100 = 9.81. UM = (8.72 - 9.81)2/66.24 = 1.79%.

34.13. D. Standard deviation of forecasts = √(113.30 - 8.722) = 6.104.Standard deviation of observations = √(158.56 - 9.812) = 7.895.US = (6.104 - 7.895)2/66.24 = 4.84%.

34.14. C. Correlation of the forecasts and the observations is:Σ(Fi - F )(Oi - O )/100/σFσO = ΣFiOi/100 - F O )/(6.104)(7.895) =

10,281/100 - (8.72)(9.81)/(48.191) = .3583.UC = (2)(1 - .3583)(6.104)(7.895)/66.24 = 93.37%.Comment: UM + US + UC = 1.79% + 4.84% + 93.37% = 1.

34.15. E. All of these statements are true. Since we have the average premium for June, the forecast of the June premium is ex post. Since we do not yet have the average premium for July, the forecast of the July premium is ex ante.We know the CPI for June, so the forecast of August average premiums is unconditional. We do not yet know the CPI for July; we would need to somehow predict it. Thus the forecast of September average premium is conditional.


34.16. A. For the first 12 values, X = 6.5. x = (-5.5, -4.5, ..., 5.5). Σxi2 = 143.

ΣxiYi = 120.65. ^β = ΣxiYi/Σxi2 = 120.65/143 = 0.843706.

Y = 297.075. α = 297.075 - (0.843706)(6.5) = 291.591.Forecasted value for time 18 is: 291.591+ (18)(0.843706) = 306.778.

34.17. C. Forecasted average rates for times 13 to 18 are:302.559, 303.403, 304.247, 305.090, 305.934, 306.778.Mean squared error is: (302.559 - 303.6)2 + (303.403 - 306.0)2 + ( 304.247 - 307.5)2 + (305.090 - 308.3)2 + (305.934 - 309.0)2 + (306.778 - 310.0)2/6 = 8.083.Root mean squared error = √8.083 = 2.84.Comment: The CPI for Medical Care for 2003 and the first half of 2004.

34.18. A. 2nd moment of forecasts is: (302.5592 + 303.4032 + 304.2472 + 305.0902 + 305.9342 + 306.7782)/6 = 92825.2nd moment of observations is: (303.62 + 306.02 + 307.52 + 308.32 + 309.02 + 310.02)/6 = 94499.U = 2.84/(√92825 + √94499) = .0046.

34.19. E. Mean forecast = (174 + 193 + 212 + 231)/4 = 202.5.Mean actual = (186 + 206 + 227 + 242)/4 = 215.25. Bias = 202.5 - 215.25 = -12.75.MSE = (174 - 186)2 + (193 - 206)2 + (212 - 227)2 + (231 - 242)2/4 = 659/4 = 164.75. Bias proportion of inequality is: Bias2/ MSE = 12.752/164.75 = 0.9867.

34.20. C. 2nd moment of forecasted = (1742 + 1932 + 2122 + 2312)/4 = 41,457.5.variance of forecasted = 41457.5 - 202.52 = 451.25.2nd moment of actual = (1862 + 2062 + 2272 + 2422)/4 = 46,781.25.variance of actual = 46,781.25 - 215.252 = 448.6875.E[(forecast)(actual)] = (174)(186) + (193)(206) + (212)(227) + (231)(242)/4 = 44,037. Cov[forecast, actual] = 44,037 - (202.5)(215.25) = 448.875.Corr[forecast, actual] = 448.875/√(451.25)(448.6875) = .99757. Covariance proportion of inequality is: 2(1 - .99757)√(451.25)(448.6875)/164.75 = 0.0133.Comment: Standard Deviation of forecasted = √451.25 = 21.2426.Standard Deviation of actual = √448.6875 = 21.1822.Variance proportion of inequality is: (21.2426 - 21.1822)2/164.75 = 0.000022.

35.1. A. (.8)(135.7) + (.2)(87.2) + (1.4)37 - (36)(.8) = 137.48.Alternately, (.8)(135.7 + 1.4) + (.2)87.2 + (1.4)(37) = 137.48.

35.2. C. Using the prior solution, (.8)(137.48) + (.2)(87.2) + (1.4)38 - (37)(.8) = 139.184.Alternately, (.8)(137.48 + 1.4) + (.2)87.2 + (1.4)(38) = 139.184.

35.3. D. Using the prior solution, (.8)(139.184) + (.2)(87.2) + (1.4)39 - (38)(.8) = 140.827.Alternately, (.8)(139.184 + 1.4) + (.2)87.2 + (1.4)(39) = 140.827.


35.4. E. X = 1950. x= (-50, -40, -30, -20, -10, 0, 10, 20, 30, 40, 50).Y = ln(pop.) = (4.333, 4.524, 4.663, 4.814, 4.884, 5.019, 5.189, 5.315, 5.423, 5.520, 5.640).^β = ΣxiYi/Σxi2 = 141.06/11000 = .01282. α = Y -

^βX = 5.0295 - (.01282)(1950) = -19.9695.

Predicted the population (in millions) in 2030 is: exp[-19.9695 + (.01282)(2030)] = 426.3.Comment: An actuary might instead apply the predicted annual change of exp[.01282] to the latest population in 2000 of 281.4, in order to get a predicted population (in millions) for 2030 of: 281.4 exp[(30)(.01282] = 413.4.

35.5. B. ^Y = α +

^βX = -19.9695 + .01282X =

(4.3885, 4.5167, 4.6449, 4.7731, 4.9013, 5.0295, 5.1577, 5.2859, 5.4141, 5.5423, 5.6705.)

ε = Y - ^Y = (-0.0555, 0.0073, 0.0181, 0.0409, -0.0173, -0.0105, 0.0313, 0.0291, 0.0089,

-0.0223, -0.0305).

Durbin-Watson Statistic = Σ( εt - ^t 1ε − )2/Σ εt

2 = 0.01121/0.00888 = 1.26.Comment: Since 1.26 < 2, this indicates the possible presence of positive serial correlation.This is common for time series. Positive serial correlation is expected here, since the population at a given point in time is made up to a large extent of the same people who were in the population ten years earlier. Also the number of births and deaths during a decade depends heavily on the population at the beginning of the decade. There are many factors affecting population growth such as: the age of the population, the rate at which births occur, the mortality rates, the rate of immigration, the rate of emigration, etc.


35.6. 1. Fit a linear regression and get the resulting residuals εt .

From the previous solution, ε = Y - ^Y = (-0.0555, 0.0073, 0.0181, 0.0409, -0.0173, -0.0105,

0.0313, 0.0291, 0.0089, -0.0223, -0.0305).2. Estimate the serial correlation coefficient:

ρ = Σ ^t 1ε − εt / Σ^

t 1ε −2 =

(-0.055)(0.0073) + (0.0073)(0.0181) + ... + (-0.0223)(-0.0305)/(-0.055)2 + ... + (-0.0223)2 = 0.0012642/0.0079465 = .159. 3. Let Xt

* = Xt - ρXt-1 = (1607.90, 1616.31, 1624.72, 1633.13, 1641.54, 1649.95, 1658.36, 1666.77, 1675.18, 1683.59).Yt

* = Yt - ρYt-1 = (3.83505, 3.94368, 4.07258, 4.11857, 4.24244, 4.39098, 4.48995, 4.57792,4.65774, 4.76232).4. Fit by regression the transformed equation Y∗ = α(1 - ρ) + βX∗.

α(1 - ρ) = -16.0107. ⇒ α = -16.0107/(1 - .159) = -19.038. ^β = 0.01235.



^Y = α +

^βX = -19.038 + .01235X = (4.4270, 4.5505, 4.6740, 4.7975, 4.9210, 5.0445, 5.1680,

5.2915, 5.4150, 5.5385, 5.6620).

ε = Y - ^Y = (-0.0940, -0.0265, -0.0110, 0.0165, -0.0370, -0.0255, 0.0210, 0.0235, 0.0080,

-0.0185, -0.0220).6. Estimate the serial correlation coefficient:

ρ = Σ ^t 1ε − εt / Σ^

t 1ε −2 = 0.003339/0.0133502 = .250.

3. Let Xt* = Xt - ρXt-1 = Xt - .250Xt-1 = (1435.0, 1442.5, 1450.0, 1457.5, 1465.0, 1472.5,

1480.0, 1487.5, 1495.0, 1502.5).Yt

* = Yt - ρYt-1 = Yt - .250Yt-1 = (3.44075, 3.5320, 3.64825, 3.6805, 3.7980, 3.93425, 4.01775, 4.09425, 4.16425, 4.26000).4. Fit by regression the transformed equation Y∗ = α(1 - ρ) + βX∗.

α(1 - ρ) = -14.1561. ⇒ α = -14.1561/(1 - .250) = -18.8748. ^β = 0.01226.



^Y = α +

^βX = -18.8748 + 0.01226X = (4.4192, 4.5418, 4.6644, 4.7870, 4.9096, 5.0322,

5.1548, 5.2774, 5.4000, 5.5226, 5.6452).

ε = Y - ^Y = (-0.0862, -0.0178, -0.0014, 0.027, -0.0256, -0.0132, 0.0342, 0.0376, 0.0230,

-0.0026, -0.0052).


6. Estimate the serial correlation coefficient:

ρ = Σ ^t 1ε − εt / Σ^

t 1ε −2 = 0.0028212/0.012427 = .227.

3. Let Xt* = Xt - ρXt-1 = Xt - .227Xt-1 = (1478.70, 1486.43, 1494.16, 1501.89, 1509.62,

1517.35, 1525.08, 1532.81, 1540.54, 1548.27).Yt

* = Yt - ρYt-1 = Yt - .227Yt-1 = (3.54041, 3.63605, 3.75550, 3.79122, 3.91033, 4.04969, 4.13710, 4.21650, 4.28898, 4.38696).4. Fit by regression the transformed equation Y∗ = α(1 - ρ) + βX∗.

α(1 - ρ) = -14.6249. ⇒ α = -14.6249/(1 - .227) = -18.9197. ^β = 0.012287.



^Y = α +

^βX = -18.9197 + 0.012287X = (4.4256, 4.54847, 4.67134, 4.79421, 4.91708,

5.03995, 5.16282, 5.28569, 5.40856, 5.53143, 5.6543).

ε = Y - ^Y = (-0.0926, -0.02447, -0.00834, 0.01979, -0.03308, -0.02095, 0.02618, 0.02931,

0.01444, -0.01143, -0.0143).6. Estimate the serial correlation coefficient:

ρ = Σ ^t 1ε − εt / Σ^

t 1ε −2 = 0.00298383/0.0130516 = 0.2286.

7. The value of ρ seems to have converged sufficiently for our purposes. Use the transformed equation translated back to the original variables for ρ = .227:

^Y = α +

^βX = -18.9197 + 0.012287X.

Y for 2000 is 5.640. Predicted Y for 2010 is:(.227)(5.640) + (1 - .227)(-18.9197) + (0.012287)2010 - (.227)(2000) = 5.774.Predicted population (in millions) for 2010 is: exp[ 5.774] = 321.8.Predicted Y for 2020 is:(.227)(5.774) + (1 - .227)(-18.9197) + (0.012287)2020 - (.227)(2010) = 5.899.Predicted population (in millions) for 2020 is: exp[5.899] = 364.7.Predicted Y for 2030 is:(.227)(5.899) + (1 - .227)(-18.9197) + (0.012287)2030 - (.227)(2020) = 6.023.Predicted population (in millions) for 2030 is: exp[6.023] = 412.8.Comment: This prediction for 2030 of 412.8 compares to that of 426.3 using the original regression.


35.7. DW ≅ 2(1- ρ). ⇒ ρ ≅ 1 - DW/2 = 1 - 1.26/2 = .37. Thus try ρ = .20, .30, .40, .50.1. Let Xt

* = Xt - ρXt-1, and Yt* = Yt - ρYt-1. (There are now only 10 elements, rather than 11.)

For example, for ρ = .3: Xt* = (1340, 1347, 1354, 1361, 1368, 1375, 1382, 1389, 1396, 1403).

Yt* = (3.2241, 3.3058, 3.4151, 3.4398, 3.5538, 3.6833, 3.7583, 3.8285, 3.8931, 3.9840).


For example, for ρ = .30: α(1 - ρ) = -13.137 and β = 0.0122097.3. The best regression has the smallest Error Sum of Squares (ESS).ρ .20 .30 .40 .50ESS 0.00400584 0.00403586 0.00421587 0.004545894. Refine the grid of values for ρ, and again perform steps 1, 2, and 3.Take ρ = .18, .20, .22, .24.ρ .18 .20 .22 .24ESS 0.00401784 0.00400584 0.00399984 0.00399985For ρ = .23, ESS = 0.00399909. Take ρ = .23. For the transformed regression with ρ = .23:

α(1 - ρ) = -14.5637. ⇒ α = -14.5637/(1 - .23) = -18.9139. ^β = 0.0122841.

Translate the transformed equation back to the original variables: Y = α + βX = -18.9139 + 0.0122841X.Y for 2000 is 5.640. Predicted Y for 2010 is:(.23)(5.640) + (1 - .23)(-18.9139) + (0.0122841)2010 - (.23)(2000) = 5.774.Predicted population (in millions) for 2010 is: exp[ 5.774] = 321.8.Predicted Y for 2020 is:(.23)(5.774) + (1 - .23)(-18.9139) + (0.0122841)2020 - (.23)(2010) = 5.899.Predicted population (in millions) for 2020 is: exp[5.899] = 364.7.Predicted Y for 2030 is:(.23)(5.899) + (1 - .23)(-18.9139) + (0.0122841)2030 - (.23)(2020) = 6.023.Predicted population (in millions) for 2030 is: exp[6.023] = 412.8.Comment: In this case, the result of using Hildreth-Lu procedure is basically the same as that from using the Cochrane-Orcutt procedure. A graph of the ESS as a function of ρ:

0.14 0.17 0.2 0.23 0.26 0.29 0.32

0.004

0.00401

0.00402

0.00403

0.00404

0.00405

0.00406


35.8. D. From the regression equation on the transformed variables,

α(1- ρ) = 14.981, and ^β = .33590.

Using the last observed value of 190.3, the forecast for t = 121 is:(.9)(190.3) + 14.981 + (.33590)121 - (120)(.9) = 190.62.Comment: Based on the Consumer Price Index for All Urban Consumers for 1995 to 2004.

Used the forecasting equation YT+1 = ρYT + α(1- ρ) + ^β(XT+1 - ρXT).

35.9. D. Using the prior solution, (.9)(190.62) + 14.981 + (.33590)122 - (121)(.9) = 190.94.

35.10. E. Using the prior solution, (.9)(190.94) + 14.981 + (.33590)123 - (122)(.9) = 191.26.

35.11. Use the estimated variance of the regression of the transformed variables, s2 = .1775. For this regression, N = 119.t* = i + 1 - .9i = (1.1, 1.2, .. , 12.9). t* = 7. For the forecast, t* = 13.

Σ(t* - t* )2 = 5.92 + 5.82 + ... + .12 + 02 + .12 + ... + 5.82 + 5.92 = 2 (i / 10)2i=1

59∑∑ = .02 i2

i=1

59∑∑ =

(.02)(59)(60)(119)/6 = 1404.2.Mean Squared Error of the forecast is: .17751 + 1/119 + (13 - 7)2/1404.2 = .1835.For the t-distribution with 119 - 2 = 117 degrees of freedom, for 5% area outside on both tails,the critical value is 1.980, (using the value in the table for 120 degrees of freedom.)95% confidence interval: 190.62 ± 1.980 √.1835 = 190.62 ± .85 = (189.77, 191.47). Comment: Beyond what you are likely to be asked on your exam. The observed values for the first three months of 2005 were 190.7, 191.8, and 193.3 (preliminary).

36.1. The standardized coefficients are β^*j = β^ j sXj

/sY.

j slope Var[Xj ] StdDev[Xj] Var[Y] StdDev[Y] standardized slope

2 0.8145 28.98 5.383 514.9 22.69 0.19323 0.8204 236.4 15.375 514.9 22.69 0.55594 13.5287 0.2169 0.4657 514.9 22.69 0.2777

Since the standardized variables each have a mean of 0, the intercept vanishes from the regression equation (the intercept is zero.)

The revised model is: ^Y* = .1932X2* + 0.5559X3* + .2777X4*.

Where Y* = (Y - Y )/sY, and X2* = (X2 - 2X )/s2X , etc.



36.2. C. β^*j = β^ j sXj

/sY. β^*2 =

^β2 2Xs /sY = (-0.715143)√(3.90916/317.564) = -.079344.

β^*3 = (-0.873252)√(16.4044/317.564) = -.198474.

β^*4 = (31.2728)√(0.226415/317.564) = .835033.

β^*5 = (-17.8078)√(0.0622642/317.564) = -.249353.

β^*6 = (9.98376)√(0.246631/317.564) = .278229.The variable with the largest absolute value of its standardized coefficient is the single most important determinant of Y, which in this case is X4.

36.3. D. β3* = ( rYX3 - rYX2

rX X2 3) /(1 - rX X2 3

2) = .3952 - (.7296)(-.2537)/(1 - (-.25372)) =

.5803/.9356 = .620.Comment: Similar to 4, 5/01, Q.13.

β2* = ( rYX2

- rYX3rX X2 3

) /(1 - rX X2 32) = .7296 - (.3952)(-.2537)/(1 - (-.25372)) = .887.

36.4. X = 3. x = X - X = -2, -1, 0, 1, 2. Y = 1914/5 = 383. y = Y - Y = -181, -62, 21, 97, 124. Σxiyi = 769. Σxi2 = 10. Σyi2 = 61831.For the two variable regression, the standardized slope is equal to correlation of X and Y. β∗ = rXY = Σxiyi/√Σxi2Σyi2 = 769/√(10)(61831) = .978.Since the standardized variables each have a mean of 0, the intercept vanishes from the regression equation (the intercept is zero.)

The fitted model in standardized form is: ^Y* = .978X*.

Where Y* = (Y - Y )/sY, and X* = (X - X)/sX.

36.5. E. β^*j = β^ j sXj

/sY. β^*2 =

^β2 2Xs /sY = (2.4642)√(64.9821/1527.64) = .508.

β^*3 = (6.4560)√(27.6429/1527.64) = .868. β^*4 = (1.1839)√(96.0000/1527.64) = .297.

β^*3 > β^*2 > β^*4.


36.6. D. ^β = NΣXiYi - ΣXiΣYi / NΣXi2 - (ΣXi)2 =

(40)(4120) - (128)(672) /(40)(1853) - 1282 = 1.3646.

β^*= ^βsX/sY = 1.3646√(1853/40 - (128/40)2)/(14911/40 - (672/40)2) = 0.86.

Alternately, β^*= r = ΣXiYi - ΣXiΣYi/N/√(ΣXi2 - (ΣXi)2/N)(ΣYi2 - (ΣYi)2/N)

= 4120 - (128)(672)/40/√(1853 - 1282/40)(14911 - 6722/40) = 1969.6/√(1443.4)(3621.4) = 0.86. Comment: A one standard deviation increase in X results in a 0.86 standard deviation increase in the fitted value of Y. In standardized form, the fitted regression is: Y* = 0.86X*.


36.7. E. X = 100. x = X - X = (-60, -40, -20, 0, 20, 40, 60). Σxi2 = 11,200.

Y = 24.5. y = Y - Y = (-8.6, -5.7, -2.9, 0.7, 4.2, 5.9, 6.4). Σyi2 = 208.76. Σxiyi = 1506.

β∗= rXY = Σxiyi/√Σxi2Σyi2 = 1506/√(11200)(208.76) = .985.

36.8. D. β^*j = β^ j sXj

/sY. β^*2 = .238√(80964.2/20041.4) = .478.

β^*3 = -.000229√(5923126.9/20041.4) = -.004. β^*4 = .14718√(300121.4/20041.4) = .570.

j slope Var[Xj ] StdDev[Xj] Var[Y] StdDev[Y] standardized slope

2 0.238 80964.2 284.54 20041.4 141.6 0.4783 - 0 . 0 0 0 2 2 9 5923126.9 2433.75 20041.4 141.6 - 0 . 0 0 44 0.14718 300121.4 547.83 20041.4 141.6 0.5705 - 6 . 6 8 1 .3 1 .14 20041.4 141.6 - 0 . 0 5 46 - 0 . 2 6 9 190.9 13.82 20041.4 141.6 - 0 . 0 2 6

β^*4 = .570 > β^*2 = .478 > β^*3 = -.004.

Comment: The variances are on the diagonal of the variance-covariance matrix.

36.9. B. Σyix2i = β2Σx2i2 + β3Σx2ix3i, and Σyix3i = β2Σx2ix3i + β3Σx3i2.

The solution for β2 is: β2 = Σx2iyi Σx3i2 - Σx3iyi Σx2ix3i / Σx2i2 Σx3i2 - (Σx2ix3i)2.

sY = square root of sample variance of Y = √ Σ(Yi - Y )2 /(N-1).

2Xs = √ Σ(X2i - 2X )2 /(N-1) = √Σx2i2/(N-1). ⇒ Σx2i2 = (N - 1) 2Xs 2.

rYX2 = Σx2iyi /√Σx2i2Σyi2 = Σx2iyi /(N-1) / sY 2Xs = 0.4. ⇒ Σx2iyi = .4(N - 1)sY 2Xs .

rYX3 = Σx3iyi /(N-1) / sY 3Xs = 0.9. ⇒ Σx3iyi = .9(N - 1)sY 3Xs .

rX X2 3 = Σx2ix3i/(N-1) / 2Xs 3Xs = 0.6. ⇒ Σx2ix3i = .6(N - 1) 2Xs 3Xs .

Therefore, β2 = .4(N - 1)sY 2Xs (N - 1) 3Xs 2 - .9(N - 1)sY 3Xs .6(N - 1)2Xs 3Xs /

(N - 1) 2Xs 2(N - 1) 3Xs 2 - .62(N - 1)2 2Xs 23Xs 2 = (.4sY/ 2Xs

- .54sY/ 2Xs ) /(1 - .36) = - .22sY/ 2Xs .

Therefore, β2* = β2 2Xs /sY = -.22.

Alternately, Σyix2 = β2Σx2i2 + β3Σx2ix3i, and Σyix3i = β2Σx2ix3i + β3Σx3i2.

Divide the first equation by √Σyi2Σx2i2 and get:

Σyix2/√Σyi2Σx2i2 = β2√Σx2i2/√Σyi2 + β3Σx2ix3i/√Σyi2Σx2i2 ⇒rYX2

= β2 2Xs /sY + β3 rX X2 3 3Xs /sY ⇒ rYX2= β2

* + rX X2 3β3* ⇒ .4 = β2

* + .6β3* .

Divide the second equation by √Σyi2Σx3i2 and get:

Σyix3/√Σyi2Σx3i2 = β2Σx2ix3i/√Σyi2Σx3i2 + β3√Σx3i2/√Σyi2 ⇒rYX3

= β2rX X2 3 2Xs /sY + β3 3Xs /sY ⇒ rYX3= rX X2 3

β2* + β3* ⇒ .9 = .6β2

* + β3* .

Thus we have two equations in two unknowns, with solution β2

* = .4 - (.9)(.6)/(1 - .62) = -.22, and β3* = .9 - (.4)(.6)/(1 - .62) = 1.03.

Comment: In general, β2* = ( rYX2

- rYX3rX X2 3

) /(1 - rX X2 32) = .4 - (.9)(.6)/(1 - .62) = -.22.


36.10. E. β3* = ( rYX3 - rYX2

rX X2 3) /(1 - rX X2 3

2) = .9 - (.4)(.6)/(1 - .62) = 1.03.

37.1. E. E2 = ^β2 2X /Y = (2.4626)(8.875)/35.250 = .62.

E3 = ^β3 3X /Y = (6.4560)(-.750)/35.250 = -.14.

E4 = ^β4 4X /Y = (1.1839)(10.000)/35.250 = .34. |E3| < |E4| < |E2|.

Comment: Near their means, a one percent change in X2 is expected to produce a 0.62% change in Y. Near their means, a one percent change in X3 is expected to produce a -0.14% change in Y.

37.2. E2 = ^β2 2X /Y = (0.8145)(9213)/56660 = .132.

E3 = ^β3 3X /Y = (0.8204)(35311.3)/56660 = .511.

E4 = ^β4 4X /Y = (13.5287)(1383.35)/56660 = .330.

37.3. B., 37.4. E., 37.5. A.Y = exp[-4.30 - .002D2i + .336 ln(X3i) + .384 X4i + .067D5i - .143D6i + .081D7i + .134 ln(X8i)].∂Y / ∂X3 = Y(.336/X3). Taking the partial derivative at the means of X3 and Y,

Elasticity of Y with respect to X3 is: (∂Y / ∂X3) 3X /Y = .336. Alternately, because the model was estimated in logarithms rather than in levels, the variable coefficients can be interpreted as elasticities. ln(X2) is multiplied by .336.

Elasticity of Y with respect to X4 is: (∂Y / ∂X4) 4X /Y= .384 X4, which can not be determined.

Elasticity of Y with respect to X8 is: (∂Y / ∂X8)X8/Y = .134. Alternately, because the model was estimated in logarithms rather than in levels, the variable coefficients can be interpreted as elasticities. ln(X8) is multiplied by .134.


37.6. & 37.7. SY = √TSS/(N - 1) = √20000/(5 - 1) = 70.71.

2X = 3. 2Xs = √(1 - 3)2 + (2 - 3)2 + (3 - 3)2 + (4 - 3)2 + (5 - 3)2/(5 - 1) = 1.581.

3X = 300.

3Xs = √(300 - 300)2 + (500 - 300)2 + (100 - 300)2 + (400 - 300)2 + (200 - 300)2/(5 - 1) = 158.1.

β^*2 =

^β2 2Xs /sY = (60)(1.581)/70.71 = 1.34.

β^*3 = ^β3 3Xs /sY = (-3)(158.1)/70.71 = -6.71.

Since |β^*3| > |β^*2|, X3 is more important than X2 in determining Y.

A one standard deviation increase in X2, results in a 6.71 standard deviation decease in the fitted Y. A one standard deviation increase in X3, results in a 1.34 standard deviation increase in the fitted Y.Since the regression passes through the point where each of the variables is equal to its mean, Y = 700 + (60)(3) - (3)(300) = -20.

E2 = ^β2 2X /Y = (60)(3)/(-20) = -9.

E3 = ^β3 3X /Y = (-3)(300)/(-20) = 45.

Since |E3| > |E2|, near their means Y is more sensitive to chances in X3 than X2.Near their means, a 1% increase in X2 results in about a 9% decrease in Y.Near their means, a 1% increase in X3 results in about a 45% increase in Y.

37.8. B. For 1992, Dt = 1.

Y = exp[β1 + β2ln X2t + β3 ln X3t + β4(ln X2t - ln X2t0 ) + β5(ln X3t - ln X3t0 )].

∂Y / ∂X2 = Y(β2/X2t + β4/X2t). Taking the partial derivative at the means of X2 and Y,

elasticity of Y with respect to X2 is: (∂Y / ∂X2)X2/Y = β2 + β4 = .60 - .07 = .53. Alternately, because the model was estimated in logarithms rather than in levels, the variable coefficients can be interpreted as elasticities.For 1992, ln(X2) is multiplied by: β2 + β4 = .60 - .07 = .53. Alternately, elasticity is the percent change in Y due to a 1% change in X. Let x be a given value of X2t. Then the expected value of lnYt is s + 0.53lnx, where s represents the rest of the

equation and 0.53 is the estimated value of β2 + β4. With a 1% increase in x, the new value is s + 0.53ln(1.01x), which is the old value plus 0.53ln(1.01) = 0.0052737. Exponentiating indicates that the new Y value will be e0.0052737 = 1.0052876 times the old value. This is a 0.53% increase, and so the elasticity is 0.53.Comment: A 1 percent increase in employment leads (approximately) to a .53 percentincrease in claim frequency. Loosely based on “Workers Compensation and Economic Cycles: A Longitudinal Approach”, by Hartwig, Retterath, Restrepo, and Kahley, PCAS 1997.

38.1. D. rYX .X2 3 = ( rYX2

- rYX3rX X2 3

)/√(1 - rYX32)(1 - rX X2 3

2) =

(.7296 - (.3952)(-.2537)/√(1 - .39522)(1 - .25372) = .8299/√(.8438)(.9356) = .934.


38.2. B. rYX .X3 2 = ( rYX3

- rYX2rX X2 3

)/√(1 - rYX22)(1 - rX X2 3

2) =

(.3952 - (.7296)(-.2537)/√(1 - .72962)(1 - .25372) = .5803/√(.4677)(.9356) = .877.Comment: Similar to 4, 11/02, Q.12.

38.3. E. rYX .X2 32 = (R2 - rYX3

2)/(1 - rYX32) = (.9 - .42)/(1 - .42) = .881. | rYX .X2 3

| = .939.

38.4. D. rYX .X3 22 = (R2 - rYX2

2)/(1 - rYX22) = (.9 - .62)/(1 - .62) = .844. | rYX .X2 3

| = .919.

38.5. An increase of 1 in X2 will lead to an increase of .00875 in Y.An increase of 1 in X3 will lead to a decrease of 1.927 in Y.An increase of 1 in X4 will lead to a decrease of 3444 in Y.An increase of 1 in X5 will lead to an increase of 2093 in Y.An increase of 1% in X2 will lead to an increase of 0.649% in Y, near the mean value of each variable. An increase of 1% in X3 will lead to a decrease of 0.337% in Y, near the mean value of each variable. An increase of 1% in X4 will lead to a decrease of 0.062% in Y, near the mean value of each variable. An increase of 1% in X5 will lead to an increase of 0.271% in Y, near the mean value of each variable.Since the absolute value of the elasticity associated with X2 is largest, Y is most responsive to changes in X2.Since the absolute value of the elasticity associated with X4 is smallest, Y is least responsive to changes in X4.

.8672 = 75.2% of the variance of Y not accounted for by X3, X4 and X5 is accounted for by X2.



.7762 = 60.2% of the variance of Y not accounted for by X2, X3 and X4 is accounted for by X5.An increase of 1 standard deviation in X2 will lead to an increase of 0.911 standarddeviations in Y. An increase of 1 standard deviation in X3 will lead to a decrease of 0.395 standard deviations in Y. An increase of 1 standard deviation in X4 will lead to a decrease of 0.537 standard deviations in Y. An increase of 1 standard deviation in X5 will lead to an increase of 0.390 standard deviations in Y.With respect to the normalized variables the fitted model is:

^Y* = .911X2* - .395X3* - .537X4* + .390X5* .Since the absolute value of the standardized coefficient associated with X2 is largest, X2 is the single most important determinant of Y.

38.6. A. rYX .X3 2 = ( rYX3

- rYX2rX X2 3

)/√(1 - rYX22)(1 - rX X2 3

2) =

(-.938 - (.97)(-.878)/√(1 - .972)(1 - .8782) = -.08634/√(.0591)(.2291) = -.742.


38.7. B. rYX .X2 3 = ( rYX2

- rYX3rX X2 3

)/√(1 - rYX32)(1 - rX X2 3

2) =

(.97 - (-.938)(-.878)/√(1 - .9382)(1 - .8782) = .1464/√(.1202)(.2291) = .882.

38.8. C. The partial correlation coefficient measures the effect of Xj on Y which is not accounted for by the other variables. Therefore, an increase of 1 unit in X3 will lead to a decrease of 0.04 units in Y, that is not accounted for by X2 and X4. An increase of 1 unit in X3

will lead to a decrease of ^β3 units in Y. D is false.

The square of the partial correlation coefficient measures the percent of variance of Y accounted for by the portion of Xj that is uncorrelated with the other variables, that is not

accounted for by the other variables. Therefore, .72 = 49% of the variance of Y not accounted for by X2 and X3 is accounted for by X4. A is false.

The standardized coefficients are: β^*j = β^ j sXj

/sY. Thus an increase of one standard deviation

in Xj will lead to a change of β^*j standard deviations in (the fitted) Y.

Therefore, an increase of 1 standard deviation in X2 will lead to an increase of 0.50 standarddeviations in Y. B is false. The standardized coefficient of X2 has the larger absolute value, and therefore X2 is a more important determinant of Y than X4. E is false.

The elasticities are Ej = β^ j jX /Y ≅ (∂Y/ ∂ Xj) jX /Y = (∂Y/Y)/ (∂ Xj/ jX ).Near the mean of the variables, a 1% change in Xj will lead to about a Ej change in Y.Therefore, (near the mean of the variables) an increase of 1% in X2 will lead to an increase of (about) 0.20% in Y. C is True.Comment; Statement C could have been more carefully worded. See pages 98 to 101 of Pindyck and Rubinfeld.

38.9. A. The partial correlation coefficient of Y and X2 controlling for X3 is:

rYX .X2 3 = ( rYX2

- rYX3rX X2 3

)/√(1 - rYX32)(1 - rX X2 3

2) =

.6 - (.5)(.4)/ √(1 - .52)(1 - .42) = .504.

38.10. C. rYX .X3 22 = (R2 - rYX2

2) / (1 - rYX22).

(-0.4)2 = (R2 - .42) / (1 - .42). ⇒ R2 = .2944.


39.1. Scatterplot of the data:

10 15 20 25 30 35 40X

60

70

80

90

100

110

120

Y

Comment: Data taken from Table 8.1 in Applied Regression Analysis by Draper and Smith.

39.2. α = 109.874 and ^β = -1.12699.

10 15 20 25 30 35 40X

60

70

80

90

100

110

120

Y


39.3. Residuals = (2.03099, -9.57213, -15.604, -8.73094, 9.03099, -0.334062, 3.41196, 2.52304, 3.14207, 6.66594, 11.0151, -3.73094, -15.604, -13.477, 4.52304, 1.39605, 8.65003, -5.54031, 30.285, -11.477, 1.39605).The residual for the 19th observation, (17, 121), has a large absolute value at 30.285.

10 15 20 25 30 35 40X

-10

10

20

30

Resid.

39.4. Studentized Residuals = (0.183968, -0.941583, -1.51081, -0.814263, 0.832863, -0.0306318, 0.311247, 0.229716, 0.28991, 0.61766, 1.05085, -0.342831, -1.51081, -1.27978, 0.413153, 0.127393, 0.798281, -0.845111, 3.60698, -1.07648, 0.127393).The studentized residual for the 19th observation, (17, 121), has a large absolute value at 3.607.

10 15 20 25 30 35 40X

-1

1

2

3

Stud. Resid.


39.5. DFBETAS = (0.00328426, -0.334798, 0.192386, 0.127884, 0.0148685, -0.00502936, 0.0326571, -0.0225011, -0.054266, 0.101412, -0.228885, 0.0538434, 0.192386, 0.125357, -0.0404692, -0.0162222, -0.054933, -1.11275, 0.273168, 0.105444, -0.0162222).The DFBETAS for the 18th observation, (42, 57), of - 1.11 has a large absolute value.

10 15 20 25 30 35 40X

-1

-0.8

-0.6

-0.4

-0.2

0.2

DFBETAS


39.6. The values of Cook’s D = (0.000897406, 0.081498, 0.0716581, 0.025616, 0.0177437, 0.0000387763, 0.00313057, 0.00166821, 0.00383195, 0.0154395, 0.0548101, 0.00467762, 0.0716581, 0.0475978, 0.00536122, 0.000573585, 0.0178565, 0.678112, 0.223288, 0.0345189, 0.000573585).

10 15 20 25 30 35 40X

0.1

0.2

0.3

0.4

0.5

0.6

Cook’s D

Cook’s D for the 18th observation, (42, 57), of 0.678 is very large compared to the others.Observation 18 is very influential.Cook’s D for the 19th observation, (17, 121), of 0.223 is somewhat large compared to the others. Observation 19 is somewhat influential.

42.1. For 5 observations, with first order serial correlation, the covariance matrix of the errors is: (1 ρ ρ2 ρ3 ρ4) (100 60 36 21.6 12.96)(ρ 1 ρ ρ2 ρ3) (60 100 60 36 21.6 )(ρ2 ρ 1 ρ ρ2)σ2 = (36 60 100 60 36 )(ρ3 ρ2 ρ 1 ρ ) (21.6 36 60 100 60 )(ρ4 ρ3 ρ2 ρ 1 ) (12.96 21.6 36 60 100 )


42.2. 42.3, & 42.4. Var(ε) = (X/2)2 = .25, 1, 2.25, 4.For heteroscedasticity, the covariance matrix σ2Ω of the errors is diagonal; take σ2 = 1:

(.25 0 0 0)Ω = (0 1 0 0)

(0 0 2.25 0)(0 0 0 4)(1 1)

X = (1 2) X’Ω−1X = (5.694 8.333) (X’Ω−1X)-1 = (.7385 -.3846)(1 3) (8.333 16 ) (-.3846 .2628)(1 4)

X’Ω−1Y = (37.333)~β = (X’Ω−1X)-1 X’Ω−1Y = (11.42).

( 42 ) ( -3.32)

Cov[~β ] = σ2(X’Ω−1X)-1 = (.7385 -.3846)

(-.3846 .2628)

Var[ α] = .7385. Var[^β] = .2628. Cov[ α,

^β] = -.3846.

Comment: Based on 4, 11/00, Q. 31. One could get the same answer using weighted regression. In this case we are given the variances, rather than something proportional to the variances. Thus we know σ2Ω. I have taken σ2 = 1. One could instead take σ2 = 1/4 and Ω to be 4 times what I took. The answers would not be affected.

42.5. 42.6, & 42.7. (1 1)

X = (1 4) X’Ω−1X = (.1412 -.1529) (X’Ω−1X)-1 = (7.857 .7143)(1 9) (-.1529 1.682) (.7143 .6593)

X’Ω−1Y = (-.03529)~β = (X’Ω−1X)-1 X’Ω−1Y = (1.429).

( 2.388 ) (1.549)

^Y = 1.429 + 1.549 X = (2.978, 7.625, 15.370).

ε = Y - ^Y = (0.022, 0.375, -0.370).

An unbiased estimator of σ2 is given by:( ε ‘ Ω−1 ε )/(N-k) = 0.00879/(3 - 2) = 0.00879.

Cov[~β ] = σ2(X’Ω−1X)-1 = (0.0691 0.00628)

(0.00628 0.00580)

Var[ α] = 0.0691. Var[^β] = 0.00580. Cov[ α,

^β] = 0.00628.

Comment; Well beyond what you are likely to be asked on an exam!With more than three observations, the matrices are larger, but the concept is the same.


43.1. Y = a + bXc. S = Σ(Yi - a - bXic)2.

0 = ∂S/∂a = -2Σ(Yi - a - bXic). ⇒ ΣΣΣΣ(Yi - a - bXic) = 0.

0 = ∂S/∂b = -2Σ(Yi - a - bXic)Xic. ⇒ ΣΣΣΣ(Yi - a - bXic)Xic = 0.

0 = ∂S/∂c = -2Σ(Yi - a - bXic)b lnXi Xic. ⇒ ΣΣΣΣ(Yi - a - bXic) lnXi Xic = 0.

43.2. Y = a/Xb. lnY = ln(a) - b ln(X). This is a linear model, V = α + βU, where V = lnY, U = lnX, α = ln(a), and β = -b.U = ln(1) + ln(2) + ... + ln(11)/11 = ln(11!)/11 = 1.59112.u = (-1.59112, -0.897972, -0.492507, -0.204825, 0.018319, 0.200641, 0.354791, 0.488323, 0.606106, 0.711466, 0.806776) V = ln(Y) = (6.82437, 5.42935, 4.58497, 3.93183, 3.58352, 3.21888, 2.94444, 2.63906, 2.3979, 2.19722, 2.07944).

Σui2 = 5.55189. ΣuiVi = -11.0581. ^β = ΣuiVi /Σui2 = -1.9918.

V = 3.6210. α = 3.6210 - (-1.9918)(1.59112) = 6.7902.

a = exp[ α] = 889.08. b = -^β = 1.9918.

Comment: Note these parameters do not minimize the sum of squared differences between the fitted and the original data. The sum of these squared differences is 1006.5.

43.3. Y = a/Xb. S = Σ(Yi - a/Xib)2.

0 = ∂S/∂a = -2Σ(Yi - a/Xib)/Xib. ⇒ a = (ΣΣΣΣYi/Xib)/(ΣΣΣΣ1/Xi2b).

0 = ∂S/∂b = 2Σ(Yi - a/(Xib)a ln(Xi)/Xib. ⇒ a = (ΣΣΣΣYiln(Xi)/Xib)/(ΣΣΣΣln(Xi)/Xi2b).

43.4. From the Normal Equations, we want b such that:(ΣYi/Xib)/(Σ1/Xi2b) = (ΣYiln(Xi)/Xib)/(Σln(Xi)/Xi2b).Solving numerically, using b = 1.9918 or b = 2 as an initial value, b = 2.026525.Using either Normal Equation, then a = 920.203.

43.5. Y = a/Xib = 920.203/X2.026525.X 1 2 3 4 5 6 7 8 9 10 11Observed 920 228 98 51 36 25 19 14 11 9 8Fitted 920.20 225.86 99.31 55.43 35.27 24.37 17.83 13.61 10.72 8.66 7.14Sum of Squared Differences = (920 - 920.20)2 + ... + (8 - 7.14)2 = 29.4.Y = 129. Σ(Y - Y )2 = 730,282.R2 = 1 - 29.4/730282 = .999960.


43.6. f(x) = a/Xb. ∂f/∂a = 1/Xb. ∂f/∂b = -a ln[X]/Xb.For a = 900 and b = 2, the constructed dependent variable is:V = Y - f(X1, X2, ..., Xk; β1,0, β2,0, ..., βp,0) + Σ βi,0 (∂f/∂βi)0 = Y - a/Xb + a(1/Xb) + b( -a ln[X]/Xb) =

Y - 1800 ln[X]/X2 = (920, -83.9162, -121.722, -104.958, -79.8795, -64.588, -52.4824, -44.4843, -37.8272, -32.4465, -27.6712).The first constructed independent variable is: 1/Xb = 1/X2 = (1, 1/4, 1/9, 1/16, 1/25, 1/36, 1/49, 1/64, 1/81, 1/100, 1/121).The second constructed independent variable is: -a ln[X]/Xb = -900ln[X]/X2 = (0, -155.958, -109.861, -77.9791, -57.9398, -44.794, -35.7412, -29.2421, -24.4136, -20.7233, -17.8356).Let Z be the matrix with columns equal to the constructed independent variables.

Z’Z = 1.0821 -61.4742

-61.4742 51312.8

. (Z’Z)-1 =

0.991614 0.00118798

0.00118798 0.0000209116

.

Z’V = (871.161, 47431.9).The solution to the matrix equations is: (Z’Z)-1Z’V = (920.204, 2.0268).Comment: The results of this first iteration are close to the least squares fit: a = 920.203 and b = 2.026525. If more accuracy was needed, one could perform another iteration.

43.7. Y = a/(X + c)b. S = Σ(Yi - a/(Xi + c)b)2.

0 = ∂S/∂a = -2Σ(Yi - a/(Xi + c)b)/(Xi + c)b. ⇒ ΣΣΣΣ(Yi - a/(Xi + c)b)/(Xi + c)b = 0.

0 = ∂S/∂b = 2Σ(Yi - a/(Xi + c)b)a ln(Xi + c)/(Xi + c)b.

⇒ ΣΣΣΣ (Yi - a/(Xi + c)b)ln(Xi + c)/(Xi + c)b = 0.

0 = ∂S/∂c = -2Σ(Yi - a/(Xi + c)b)a(-b)/(Xi + c)b+1. ⇒ ΣΣΣΣ(Yi - a/(Xi + c)b)/(Xi + c)b+1 = 0.

43.8. Using a computer, the least squares fit is: a = 993.2, b = 2.0743, c = 0.037565.Comment: One can use the values from the solution to a previous question, as the starting values: a = 920.203, b = 2.026525, and c = 0. Example adapted from “Extrapolating, Smoothing, and Interpolating Development Factors,” by Richard Sherman, PCAS 1984.

43.9. Σ(Yi - a/(Xi + c)b)/(Xi + c)b = Σ(Yi - 993.2/(Xi + 0.037565)2.0743)/(Xi + 0.037565)2.0743 =

(920 - 993.2/(1 + 0.037565)2.0743)/(1 + 0.037565)2.0743 + ... = -0.0555746 + 0.249839 - 0.111144 - 0.217003 + 0.0451745 + 0.0278578 + 0.0288684 + 0.0110243 + 0.00701497 + 0.00580866 + 0.00810089 = -0.00003 ≅ 0.Σ(Yi - a/(Xi + c)b)ln(Xi + c)/(Xi + c)b =

Σ(Yi - 993.2/(Xi + 0.037565)2.0743)ln(Xi + 0.037565)/(Xi + 0.037565)2.0743 =-0.0020494 + 0.177824 - 0.123487 - 0.302859 + 0.0730437 + 0.0500883 + 0.0563298 + 0.0229761+ 0.0154427 + 0.0133967+ 0.0194527 = 0.00016 ≅ 0..Σ(Yi - a/(Xi + c)b)/(Xi + c)b+1 = Σ(Yi - 993.2/(Xi + 0.037565)2.0743)/(Xi + 0.037565)3.0743 =-0.0535625 + 0.122617 - 0.0365897 - 0.0537461 + 0.00896752 + 0.00461408 +0.00410204 + 0.0013716 + 0.000776202 + 0.000578692 + 0.000733938 = -0.00014 ≅ 0.


43.10. Y = a/(Xi + c)b = 993.2/(X + 0.037565)2.0743.X 1 2 3 4 5 6 7 8 9 10 11Observed 920 228 98 51 36 25 19 14 11 9 8Fitted 920.06 226.91 99.11 54.92 34.71 23.84 17.35 13.17 10.33 8.31 6.82Sum of Squared Differences = (920 - 920.06)2 + ... + (8 - 6.82)2 = 26.4.Y = 129. Σ(Y - Y )2 = 730,282.R2 = 1 - 26.4/730282 = .999964.Comment: By introducing the additional parameter c, the sum of squared differences has been reduced from 29.4 to 26.4.

43.11 to 43.14. f(X) = 1/(α + X). ∂f/∂α = −1/(α + X)2. Constructed dependent variable: V = Y - f(X1, X2, ..., Xk; β1,0, β2,0, ..., βp,0) + Σ βi,0 (∂f/∂βi)0 = Y - 1/(α + X) - α/(α + X)2 =

Y - 1/(1 + X) - 1/(1 + X)2 = (-0.9, -0.05, -.1444). Constructed independent variable: U = - 1/(α + X)2 = - 1/(1 + X)2 =(-1, -1/4, -1/9).

α = Σ Ui Vi/ Σ Ui2 = .92855/1.07485 = .864.

Constructed dependent variable for the second iteration: V = Y - 1/(α + X) - α/(α + X)2 = Y - 1/(.864 + X) - .864/(.864 + X)2 = (-1.21481, -0.0851498, -0.154496). Constructed independent variable for the second iteration: U = - 1/(α + X)2 = - 1/(.864 + X)2 =(-1.33959, -0.287812, -0.121914).

α = Σ Ui Vi/ Σ Ui2 = 1.6707/1.89221 = .883.Comment: By doing another iteration, one could determine that to three decimal places the procedure has converged after two iterations.

43.15. D. ∂eβX/∂β = XeβX.For Y = 11.7 and X = 25, the value of the constructed dependent variable for β0 = 0.1is:

Y - f(X) + β0(∂f/∂β)0 = 11.7 - e(.1)(25) + (.1)(25)e(.1)(25) = 29.97.Comment: There would be other pairs of X and Y observed, each of which would have a corresponding value of the constructed dependent variable. Since there is one parameter in the model, the summation in the final term of the formula for the constructed dependent variable only has one term.

43.16. D. For Y = 11.7 and X = 25, the value of the constructed independent variable for β0 = 0.1is: (∂f/∂βi)0 = ∂eβX/∂β = XeβX.= 25exp[(0.1)(25)] = 304.56.Comment: There would be other pairs of X and Y observed, each of which would have a corresponding value of the constructed independent variable. Since there is one parameter in the model, there is only one constructed independent variable.


44.1. C. f(y) = exp[-(y - µ)2/(2σ2)]/σ√(2π). ln f(Yi) = -(Yi - βXi)2/(2σ2) - ln(σ) - ln(2π)/2.

Loglikelihood is: -Σ(Yi - βXi)2/(2σ2) - n ln(σ) - n ln(2π)/2.

Set the partial derivative of the loglikelihood with respect to β equal to zero:

0 = ΣXi(Yi - βXi)/σ2. ⇒ ΣXiYi = βΣXi2. ⇒ ^β = ΣXiYi / ΣXi2 = 3080/751 = 4.10.

Comment: Matches the linear regression model with no intercept, ^β = ΣXiYi / ΣXi2.

44.2. B. Set the partial derivative of the loglikelihood with respect to σ equal to zero:0 = Σ(Yi - βXi)2/σ3 - n/σ. ⇒ σ2 = Σ(Yi - βXi)2/n =

(5 - (1)(4.1))2 + (15 - (5)(4.1))2 + (50 - (10)(4.1))2 + (100 - (25)(4.1))2/4 = 29.58.^β = ΣXiYi / ΣXi2. Var[

^β] = Var[ΣXiYi / ΣXi2] = ΣVar[XiYi / ΣXi2] = ΣXi2Var[Yi ]/ (ΣXi2)2 =

ΣXi2σ2/ (ΣXi2)2 = σ2/ΣXi2 = 29.58/751 = .0394.

StdDev[^β] = √.0394 = .198.

Comment: In the linear regression version of this same example, one would estimate the

variance of the regression as: s2 = Σ îε 2 / (N - 1) = (5 - (1)(4.1))2 + (15 - (5)(4.1))2 +

(50 - (10)(4.1))2 + (100 - (25)(4.1))2/3 = 39.4. This is an unbiased estimate of σ2, which is not equal to that from maximum likelihood which is biased.

44.3. E. Statement #1 is true. Ordinary Linear Regression is a special case of the generalized linear model. Weighted Least Squares Regression, in which one can select whatever weights one wants, is not a special case of the generalized linear model.


44.4. ln(λ) = β0 + β1z. ⇒ λ = exp[β0 + β1z].

For the Poisson Distribution: f(y) = e−λ λy / y!.

ln f(y) = -λ + yln(λ) - ln(y!) = -exp[β0 + β1z] + y(β0 + β1z) - ln(y!). The loglikelihood is the sum of the contributions from the three observations:-exp[β0 + β1] - exp[β0 + 2β1] - exp[β0 + 3β1] + 4(β0 + β1) + 7(β0 + 2β1) + 8(β0 + 3β1) - ln(4!) - ln(7!) - ln(8!). To maximize the loglikelihood, we set its partial derivatives equal to zero.Setting the partial derivative with respect to β0 equal to zero:

0 = -exp[β0 + β1] - exp[β0 + 2β1] - exp[β0 + 3β1] + 19.


0 = -exp[β0 + β1] - 2exp[β0 + 2β1] - 3exp[β0 + 3β1] + 42.Thus we have two equations in two unknowns:exp[β0 + β1]1 + exp[β1] + exp[2β1] = 19.

exp[β0 + β1]1 + 2exp[β1] + 3exp[2β1] = 42.Dividing the second equation by the first equation:1 + 2exp[β1] + 3exp[2β1]/1 + exp[β1] + exp[2β1] = 42/19.

⇒ 19 + 38exp[β1] + 57exp[2β1] = 42 + 42exp[β1] + 42exp[2β1].

⇒ 15exp[2β1] - 4exp[β1] - 23 = 0.

Letting v = exp[β1], this equation is: 15v2 - 4v - 23 = 0, with positive solution: v = (4 + √1396)/30 = 1.3788.exp[β1] = 1.3788. ⇒ β1 = .3212.

⇒ exp[β0] = 19/exp[β1] + exp[2β1] + exp[3β1] = 19/1.3788 + 1.37882 + 1.37883 = 3.2197.

⇒ β0 = 1.1693.

λ = exp[β0 + β1z] = exp[β0] exp[β1]z = (3.2197)(1.3788z).

For z = 1, λ = 4.439. For z = 2, λ = 6.121. For z = 3, λ = 8.440.Comment: Beyond what you are likely to be asked on your exam.An ordinary linear regression fit to these same observations turns out to be: y = 2.333 + 2x,with fitted values: 4.333, 6.333, and 8.333.


44.5. E., 44.6. C. f(y) = exp[-(y - µ)2/(2σ2)]/σ√(2π). ln f(Yi) = -(Yi - β0 - β1Xi)2/(2σ2) - ln(σ) - ln(2π)/2.

Loglikelihood is: -Σ(Yi - β0 - β1Xi)2/(2σ2) - n ln(σ) - n ln(2π)/2.

Set the partial derivative of the loglikelihood with respect to β0 equal to zero:

0 = Σ(Yi - β0 - β1Xi)/σ2. ⇒ ΣYi = nβ0 + β1ΣXi. ⇒ β0 = Y - β1 X.

Set the partial derivative of the loglikelihood with respect to β1 equal to zero:

0 = ΣXi(Yi - β0 - β1Xi)/σ2. ⇒ ΣXiYi = β0ΣXi + β1ΣXi2. ⇒ ΣXiYi = (Y - β1 X)ΣXi + β1ΣXi2.

⇒ ^β1 = ΣXiYi - YΣXi/ΣXi2 - XΣXi = 255 - (10)(24)174 - (6)(24) = 15/30 = 0.5.

⇒ ^β0 = Y -

^β1 X = 10 - (0.5)(6) = 7.

Comment: Matches the linear regression model with an intercept. In deviations form:X = 24/4 = 6. x = X - X = -4, -1, 2, 3. Y = 40/4 = 10. y = Y - Y = 0, -4, 1, 3.^β = Σxiyi/Σxi2 = 15/30 = 0.5. α = Y -

^βX = 10 - (.5)(6) = 7.

44.7. C. Set the partial derivative of the loglikelihood with respect to σ equal to zero:0 = Σ(Yi - β0 - β1Xi)2/σ3 - n/σ. ⇒ σ2 = Σ(Yi - β0 - β1Xi)2/n = Σ(Yi - 7 - (.5)Xi)2/4 =

(10 - 7 - (.5)(2))2 + (6 - 7 - (.5)(5))2 + (11 - 7 - (.5)(8))2 + (13 - 7 - (.5)(9))2/4 = 18.5/4 = 4.625.

σ = √4.625 = 2.15.


44.8. B., 44.9. A. Let x = X - X = -4, -1, 2, 3, and y = Y - Y = 0, -4, 1, 3.Then, ΣXiYi - YΣXi = ΣXjYj - ΣYjΣXi/n = ΣYj(Xj - X) = ΣYjxj.

Also, ΣXi2 - XΣXi = ΣXi(Xi - X) = ΣXixi = Σ(Xi - X)xi + ΣXxi = Σxi2 + XΣxi = Σxi2 + X(0) = Σxi2.

^β1 = ΣXiYi - YΣXi/ΣXi2 - XΣXi = ΣYixi /Σxi2.

Var[^β1] = Var[ΣYixi /Σxi2] = ΣVar[Yixi]/Σxi22 = Σxi2Var[Yi]/Σxi22 = σ2 Σxi2/Σxi22 = σ2/Σxi2 =

4.625/30 = .1542. StdDev[^β1] = √.1542 = .393.

^β0 = Y -

^β1 X = (Y1 + Y2 + Y3 + Y4)/4 - (ΣYixi /Σxi2)(6) =

(Y1 + Y2 + Y3 + Y4)/4 - (-4Y1 - Y2 + 2Y3 + 3Y4)(6/30) = 1.05Y1 + .45Y2 - .15Y3 - .35Y4.

Recalling that the Yi are independent and each have variance σ2:

Var[^β0] = σ2(1.052 + .452 + .152 + .352) = 1.45σ2 = (1.45)(4.625) = 6.706.

StdDev[^β0] = √6.706 = 2.59.

Comment: One can show in general that Var[^β] = σ2 /Σxi2 and Var[ α] = σ2 ΣXi2 /(NΣxi2).

While the maximum likelihood results are similar, they do not match linear regression:

^Y = α +

^βX = 8, 9.5, 11, 11.5. ε = Y -

^Y = 2, -3.5, 0, 1.5. ESS = Σ^

iε 2 = 18.5.s2 = ESS/(N - 2) = 18.5/(4 - 2) = 9.25.

Var[^β] = s2 /Σxi2 = 9.25/30 = .3083. βs = √.3083 = .555.

Var[ α] = s2ΣXi2 /(NΣxi2) = (9.25)(174)/((4)(30)) = 13.41. sα = √13.41 = 3.66.


44.10. For a Poisson, f(n) = e−λλn/n!. ln f(n) = −λ + nlnλ - ln(n!) = -exp[β0 + β1X1i + β2X2i] + ni(β0 + β1X1i + β2X2i) - ln(ni!).

loglikelihood = -Σexp[β0 + β1X1i + β2X2i] + ΣYi(β0 + β1X1i + β2X2i) + constants.

Setting the partial derivatives of the loglikelihood with respect to β0, β1, and β2 equal to zero:

0 = -Σexp[β0 + β1X1i + β2X2i] + ΣYi.

0 = -ΣX1iexp[β0 + β1X1i + β2X2i] + ΣYiX1i.

0 = -ΣX2iexp[β0 + β1X1i + β2X2i] + ΣYiX2i.

ΣYi = 8 + 8 + 10 + .... + 33 + 31 = 369.

ΣYiX1i = 8ln(2) + 8ln(4) + 10ln(6) + .... + 33ln(18) + 31ln(20) = 872.856.

ΣYiX1i = 14 + 19 + .... + 33 + 31 = 241.

exp[β0 + β1X1i + β2X2i] = exp[β0]exp[β1X1i]exp[β2X2i] = exp[β0]exp[X1i] β1exp[β2X2i]. The first equation becomes:exp[β0]2β1 + 4β1 + ... + 20β1 + 2β1exp[β2] + 4β1exp[β2] + 20β1exp[β2] = 369. ⇒exp[β0](1 + exp[β2])2β1 + 4β1 + 6β1 + ... + 20β1 = 369. The second equation becomes:exp[β0](1 + exp[β2])ln(2)2β1 + ln(4)4β1 + ln(6)6β1 + ... + ln(20)20β1 = 872.856. The third equation becomes:exp[β0]exp[β2]2β1 + 4β1 + 6β1 + ... + 20β1 = 241.Comment: Well beyond what you should be asked on your exam!A Poisson variable with a logarithmic link function.Dividing the 1st and 3rd equations: (1 + exp[β2])/exp[β2] = 369/241. ⇒ β2 = ln(241/148) = .6328.

Using a computer, the fitted parameters are: β0 = 1.684, β1 = .3784, β2 = .6328.One can verify that these values satisfy the three equations. Example taken from Applied Regression Analysis by Draper and Smith.


44.11. p/(1-p) = exp[β0 + β1X]. ⇒ 1/p - 1 = exp[-β0 - β1X].

⇒ p = 1/(1 + exp[-β0 - β1X]). ⇒ 1 - p = exp[-β0 - β1X]/(1 + exp[-β0 - β1X]) = 1/(1 + exp[β0 +

β1X]).For a Binomial, f(n) = pn(1-p)m-n m!/(n!)(m-n)!. ln f(n) = n lnp + (m-n)ln(1-p) + ln(m!) - ln(n!) - ln[(m-n)!] = n ln[p/(1-p)] + m ln(1-p) + constants =n(β0 + β1X) - m ln[(1 + exp[β0 + β1X])] + constants.

loglikelihood = Σni(β0 + β1Xi) - Σmi ln[(1 + exp[β0 + β1Xi])] + constants.

Setting the partial derivatives of the loglikelihood with respect to β0 and β1 equal to zero:

0 = Σni - Σmi exp[β0 + β1Xi]/(1 + exp[β0 + β1Xi]).

0 = ΣniXi - Σmi Xi exp[β0 + β1Xi]/(1 + exp[β0 + β1Xi]).

Σni = 900 + 820 + 740 + 660 + 580 = 3700.

ΣniXi = (1)(900) + (2)(820) + (3)(740) + (4)(660) + (5)(580) = 10,300.The first equation becomes:3700 = 1000/(1 + exp[-β0 - β1]) + 900/(1 + exp[-β0 - 2β1]) + 800/(1 + exp[-β0 - 3β1])

+ 700/(1 + exp[-β0 - 4β1]) + 600/(1 + exp[-β0 - 5β1]).The second equation becomes:10300 = 1000/(1 + exp[-β0 - β1]) + 1800/(1 + exp[-β0 - 2β1]) + 2400/(1 + exp[-β0 - 3β1])

+ 2800/(1 + exp[-β0 - 4β1]) + 3000/(1 + exp[-β0 - 5β1]).Comment: An example of a Logistic Regression; a Binomial with a logit link function.

Using a computer, the maximum likelihood fit is: β0 = 1.885 and β1 = .2455.

44.12. B. For the Poisson, f(n) = e−λλn/n!. ln f(n) = -λ + n ln(λ) - ln(n!). The loglikelihood is:-34β + 2 ln(34β) - ln(2!) - 38β + 1 ln(38β) - ln(1!) - 45β + 0 ln(45β) - ln(0!) - 25β + 3 ln(25β)

- ln(3!) - 21β + 3 ln(21β) - ln(3!) = -163β + 9 ln(β) + constants.Setting the partial derivative of the loglikelihood with respect to β equal to zero:

0 = -163 + 9/β. ⇒ ^β = 9/163. More generally,

^β = ΣYi / Σ Xi = ΣYi / 163.

Var[^β] = Var[ΣYi / 163] = Var[ΣYi]/1632 = ΣVar[Yi]/1632 = Σµi /1632 = ΣβXi /1632 = βΣXi /1632

= ^β 163 /1632 = 9/1632 = .000338. StdDev[

^β] = √.000338 = .0184.

Alternately, Information ≅ - ∂2 loglikelihood / ∂β2 = 9/β2.

Var[^β] ≅ 1/Information = β2/9 = (9/163)2/9. StdDev[

^β] = (9/163)/3 = .0184.

Comment: Generalized Linear Model, with a Poisson Distribution and an identity link

function. Since Yi is Poisson distributed, Var[Yi] = E[Yi] = µi.

While these solutions are believed to be correct, anyone can make a mistake. If you believe you’ve found something that may be wrong, send any corrections or comments to: Howard Mahler, Email: [email protected]


Date post:	12-Aug-2019
Category:	Documents
Upload:	dangdang
View:	224 times
Download:	0 times

Mahler’s Guide to Regression · 30 284-295 Durbin-Watson Statistic 31 296-302 Correcting for...

Documents