The coefficient of determination and its adjusted version in linear regression models

This article was downloaded by: [Universitaets und Landesbibliothek]On: 27 August 2013, At: 07:47Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: MortimerHouse, 37-41 Mortimer Street, London W1T 3JH, UK

Econometric ReviewsPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/lecr20

The coefficient of determination and its adjustedversion in linear regression modelsAnil K. Srivastava a , Virendra K. Srivastava a & Aman Ullah ba Department of Statistics, Lucknow University, Lucknow, Indiab Department of Economics, University of California, Riverside, CA, U.S.APublished online: 16 Feb 2011.

To cite this article: Anil K. Srivastava , Virendra K. Srivastava & Aman Ullah (1995) The coefficient ofdetermination and its adjusted version in linear regression models, Econometric Reviews, 14:2, 229-240, DOI:10.1080/07474939508800317

To link to this article: http://dx.doi.org/10.1080/07474939508800317

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”)contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensorsmake no representations or warranties whatsoever as to the accuracy, completeness, or suitabilityfor any purpose of the Content. Any opinions and views expressed in this publication are the opinionsand views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy ofthe Content should not be relied upon and should be independently verified with primary sources ofinformation. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands,costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly orindirectly in connection with, in relation to or arising out of the use of the Content.

This article may be used for research, teaching, and private study purposes. Any substantial orsystematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution inany form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/lecr20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/07474939508800317

http://dx.doi.org/10.1080/07474939508800317

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

ECONOMETRIC REVIEWS, 14(2), 229-240 (1995)

THE COEFFICIENT OF DETERMINATION AND ITS ADJUSTED VERSION IN LINEAR REGRESSION MODELS

Anil K. Srivastava Virendra K. Srivastava Department of Statistics Lucknow University Lucknow, India

Aman Ullah Department of Economics

University of California Riverside, CA, U.S.A

Key Words and Phrases: regression models; quadratic forms; coefficient of determination (R2); bias; mean squared error; non-normal errors.

JEL classification: C1, C12, C13

ABSTRACT

This article presents a comparative study of the efficiency properties of the coefficient of determination and its adjusted version in linear regression models when disturbances are not necessarily normal.

1. INTRODUCTION

In applied work, the coefficient of determination (a2) is most commonly

used to judge the fit of a linear regression model. But an important and well

known limitation of R2 is that its value rises and approaches one as more

and more explanatory variables, be they relevant or not, are included in the

model. This is obviously an undesirable feature. In a bid to circumvent this

problem, a correction for the degrees of freedom is applied to R2 leading to

R i which is popularly known as adjusted R'.

Copyright 1995 by Marcel Dekker, Inc.

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

7:47

27

Aug

ust 2

013

230 SRIVASTAVA, SRIVASTAVA, AND ULLAH

In an interesting paper, observing that R2 and Rz have identical prob-

ability limits so that R2 and Ri can be regarded as consistent estimators

of certain population analogue, Cramer (1 987) obtained their exact means

and variances assuming normality of disturbances. A general finding which

emerges from his analysis is that Ri has relatively a smaller bias but has

a larger variance as compared to R2. However, he did not compare the

mean squared error (MSE) of R2 and Ri. In a recent paper, Ohtani and

Hasegawa (1993) obtained the exact moments of R2 and Ri when some re-

gressors in the regression model are proxy variables and the distrubances

follow a multivariate-t distribution. Since the exact results were extremely

complicated and no analytical comparison of the bias and MSE seemed pos-

sible, Ohtani and Hasegawa resorted to numerical calculations for specified

values of the parameters. Their results show that if the proxy variable is an

important variable, Ri can be more unreliable in small samples compared to

R2 from both the bias and MSE comparisons.

It is now well known that R2 can be expressed as a ratio of quadratic

forms. In view of this, the r- th order moments of R2, under normality, can

easily be written from the moments of the ratio of quadratic forms given

in Magnus (1986), Smith (1989, 1993) and Ullah (1990). Further, since R2

is scale invariant, these results will continue to be exact for any spherically

symmetric error density, the special cases of which are normal and multivari-

ate t densities. Furthermore, the exact moments of R2 can also be written

under the Edgeworth type density of disturbances from the results of, for ex-

ample, Peters (1989). However, these results, as in the case of Cramer (1987)

and Ohtani and Hasegawa (1993), will be extremely complicated functions of

unknown parameters and they will not be able to provide any meaningful an-

alytical comparisons of bias and MSE. One will have to resort to evaluating

the complicated expressions for specified values of parameters.

In this paper, we study the bias and MSE properties of R2 and R: for

the general set up, that is without imposing restrictions on the form of the

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

7:47

27

Aug

ust 2

013

COEFFICIENT OF DETERMINATION 23 1

distribution of disturbances. On the basis of the relationship between R2

and Rz, it has been shown that the exact bias of Rz will always be smaller

than that of R2. This result is straightforward and it does not need deriva-

tion of the exact means under specific distributions and their evaluations for

specified parameters. We also provide conditions under which R: dominates

R2 in the sense of having lower MSE. To analyse the analytical properties of

R2 and R: further we also develop the large-sample asymptotic expansions

of the bias and MSE of R2 and R: when the disturbances are i.i.d and non-

normal. These expansions provide simple expressions, and they give neat

analytical conditions of dominance of R: over R2. Such analytical results, as

far as the authors are aware, are not available in the literature. The perfor-

mance of the large-sample expansions is also examined. In an earlier paper,

Ullah and Srivastava (1994) gave the small-a approximate bias of R2 and

R:. However, neither the MSE expressions were given nor the dominance

condition obtained. Further, in general, the small-a approximations do not

povide results for the large-sample approximations. We also note that while

the i.i.d univariate t distribution is a special case of i.i.d non-normal distri-

butions considered for our approximate results, the multivariate- t considered

in Ohtani and Hasegawa (1993) is not a special case. In view of this and the

fact that Ohtani and Hasegawa (1993) c~nsider proxy variables, our analyti-

cal MSE results can not be directly comparable with their numerical results.

Also, they do not obtain any analytical dominance condition, which is the

main focus of this paper.

The plan of the paper is as follows. In the next section, we introduce

the model and estimators and their properties. Then the results presented

in this section are derived in the Appendix.

2. THE ESTIMATORS AND THEIR PROPERTIES

Let us postulate the following linear regression model

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

7:47

27

Aug

ust 2

013


where y is a n x 1 vector of n observations on the study variable, e is a n x 1

vector with all its elements unity, X is a n x p full column rank matrix of n

observations on p explanatory variables, cr and P are the associated regression

parameters and u is a n x 1 vector of disturbances with mean vector 0 and

variance-covariance matrix a2 I,, a2 being an unknown quantity.

Writing A = I, - n-lee', the goodness-of-fit measure R2 ( 0 5 R2 5 1 )

is given by

R2 = y l ~ ~ ( ~ l ~ ~ ) - l ~ t ~ y

Y' AY while the adjusted version of R2 is obtained by applying correction for the

loss of degrees of freedom in R2 and is given by

Cramer (1987; Sec. 3) has pointed out that R: has the same probability

limit as R2 provided all the explanatory variables in the model are asymptoti-

cally cooperative in the sense that the limiting form of matrix n- ' (X'AX) as

n + m is finite and non-singular. Consequently, both R2 and R: can be re-

garded as consistent estimators of their population counterpart @ (0 < @ < 1 )

defined by

which is a sort of 'population' measure of goodness-of-fit.

Let us, therefore, study the efficiency properties of R2 and R: for the

general setup, that is without imposing any restrictions on the distribution

of disturbances. For this purpose, we note from (3) that

Thus, E(Ri- R2) 5 0 which implies that E R ~ 5 ER2 or ER: -9 5 ER2-9.

That is the bias of R: is always smaller than that of R2,

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

7:47

27

Aug

ust 2

013

COEFFICIENT OF DETERMINATION 233

where B(R2) = ER2 - 0 is the bias of R2.

We note that the result in ( 6 ) is the exact result for any distribution

of distrubances. In the special cases of normality in Cramer (1987) and

multivariate-t in Ohtani and Hasegawa (1993) , the result in ( 6 ) is reflected

in their numerical calculations.

Next, as pointed out by Cramer (1987) , from (3)

That is, the exact variance of Rz is always larger than the variance of R2.

Now we turn to the MSE of R2 and R:. For this, we note from (3) that

Hence, squaring ( 8 ) and taking expectations on both sides we get

where M(R2) is the MSE of R2. It therefore follows from ( 9 ) that

provided D 2 0, that is

where we note that E(l - R2) 2 E(l - R2)2 because ( 1 - R2) 2 ( 1 - R2)2.

It is clear that the MSE comparison of Rz over R2 is not'as straightforward

as those of bias and variance, and it may be the case that in some situations

the MSE of Ri may behave well while in other situations the MSE of R2

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

7:47

27

Aug

ust 2

013


may perform better. This will be explored below by using large-sample

approximations.

In order to get further insight on the condition (11) and the bias and

MSE properties of R2 and R i , we assume that the elements of u are i.i.d with

first four finite moments as 0, u2, u3yl, and a4(y2 + 3) where yl and 72 are

Pearson's measures of skewness and kurtosis respectively of the distribution

of disturbances. Thus, we do not specify any form of the distribution of

disturbances. In a special case when the disturbances are normal, yl = y2 =

0.

Theorem: T h e large s a m p l e a s y m p t o t i c a p p r o x i m a t i o n s f o r t h e bias o f R2

a n d R: t o o r d e r O(n-l) are g i v e n b y

a n d t h e d i f f e rence in t h e i r respect ive m e a n squared errors , v i z . M(R2) a n d

M(R:), t o o r d e r O(n-2) i s g i v e n b y

These results are derived in the Appendix. We observe that the approx-

imate results in the above Theorem are for the normal (y2 = 0) as well as

for the non-normal disturbances (72 # 0).

From (12) and (13), it is observed that both R2 and R: are biased

estimators of 8. Also, though the bias approximations of both R2 and R i

are implicitly affected by p through the parameter 8, the bias approximation

of R i does not explicitly contain p while that of R2 does.

We also note that the bias of R2 and R: are not affected by the asym-

metry of the disturbances, though kurtosis does have its impact on the mag-

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

7:47

27

Aug

ust 2

013


nitude of bias. However, from (12 ) and (13)

does not depend on either 71 or 72. Thus, the magnitude of approximate

bias of R2 will be larger than that of Rz for all the i.i.d disturbances. This

is consistent with the exact result in (6) which holds for all distributions,

including those which are not i.i.d.

As regards the variances of these estimators, we observe from (12), (13 )

and (14) that the difference in variances of R2 and R: upto the order O(n-2)

is given by

Now this d is always negative because of positivity of (72 + 2 ) for all types of

distributions. This result is also consistent with the exact result in (7) which

holds for all distributions, including those which are not i.i.d. Further, from

(16), d increases as p grows large. This supports the thesis that the larger is

the number of explanatory variables in the model, the more inefficient is R:

in comparison to R2 according to the criterion of variance to order O(n-2)

irrespective of the nature of distribution of the disturbances.

However, variance is not a n appropriate criterion for judging efficiency

of biased estimators; the right choice is the mean squared error and, there-

fore, we should consider this for comparing the performance of R2 and R:.

Accordingly, it follows from (14) that Rz has a smaller mean squared error,

to the order of our approximation, than R2 when

This condition also follows by substituting approximate results for E(R2 - 1)

and E(R2 - 1)2 in ( l l ) , see Appendix.

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

7:47

27

Aug

ust 2

013


Now if the disturbances are mesokurtic, that is, if they are normally

distributed (72 = O), the condition (17) reduces to

which holds true for all values of 6' so long as p exceeds 3; see the following

Table.

TABLE

Thus, we observe from (18) that when the distribution of disturbances

is mesokurtic (normal), R: is definitely superior to R2 as long as the number

of explanatory variables is four or more, in the sense that R: has not only

smaller bias but smaller mean squared error too. This result may hold for

three or less explanatory variables also provided 0 is small enough, with

6' 5 .2 two explanatory variables are enough.

When the distribution of disturbances departs from normality, it follows

from inequality (17) and the table given above that R: is more efficient

than R2 for all platykurtic distributions (-2 5 7 2 < 0), at least as long

as p exceeds three. This result continues to remain true for all leptokurtic

distributions (y2 > 0) also provided 0 does not fall below 0.5, i.e., the model

does not fit the data very poorly. When 6' is less than 0.5 so that the model

fits poorly, the condition (17) for superiority of R: over R2 may require

p to be somewhat larger depending upon the value of 7 2 for leptokurtic

distributions.

It is thus found that the adjusted R2(R:) is not that unreliable as it

emerges out to be from variance viewpoint alone.

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

7:47

27

Aug

ust 2

013


Now we look into the question of the limitations of the approximate

result. For this we make the following observations. It is interesting to

note that both the results B(R2) 2 B ( R ~ ) in (15) and V(R2) 5 V(Rz)

in (16), based on large-n approximations, are the same as the exact results

given in (6) and (7), respectively. Also, under the normality assumption, a

comparison of the exact bias of R2 given in the Table 1 of Cramer (1987)

with the corresponding calculations of approximate bias from (12) suggests

that the two results are quite close. For example, when p = 1 (5 = 2

in Cramer) and 0 = .9, we get the exact bias with approximate bias in

parenthesis as .036 (.034), .018 (.017), .006 (.006), .002 (.002) for n = 5,

10, 30, 100, respectively. For p = 2, 0 = .9 and the same n values, in

order, we get .057 (.054), .028 (.027), ,009 (.009), ,003 (.003). Though not

reported here, similar results were obtained for other values of 0 considered

in Cramer (1987). The same phenomenon occurred in the case of MSE

comparisons of the exact versus the approximate results. Again, when p = 1

and 0 = .9, the exact and approximate values of D = M(R2) - M(Rz)

were -.001 (-.0006), -.00018 (-.00015), .0000 (.0000) for n = 5, 10 and 30,

respectively. Note that while the approximate D is calculated by using (14),

the exact value of D is calculated by noting that we can rewrite D from (9) as

D = r[2(1- O)(r + 1)B(R2) - (2 + r ) M ( R 2 ) - r (1 -@)'I and using the results

in Table 1 of Cramer. It is thus clear that the approximate numerical results

in the normal case are very close to the corresponding exact results and

they are identical when n is thirty and above. Also, as indicated above, the

analytical comparisons of the approximate bias and variance, in the general

non-normal case, provide the same results as those based on the exact results.

Nevertheless, it remains the subject of a future study to see if the dominance

condition in (17) and (18), based on the approximate MSE, will go through

for the exact MSE case.

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

7:47

27

Aug

ust 2

013

SKIVASTAVA, SRIVASTAVA, AND ULLAH

APPENDIX

In order to derive the results given in the Theorem, we first state the

following results:

u - ~ E ( u ' B u u ' ) = ylel(In * B ) ( 2 )

u - ~ E ( u ' B u . U U ' ) = y2(In * B ) + ( t r B ) I n + 2 B ( i i )

where B is a n x n symmetric matrix with non-stochastic elements; see, e.g.,

Ullah, Srivastava and Chandra (1983, p. 398).

Now writing

w = (G - 2)

and substituting ( 1 ) in ( 2 ) , we observe that

P'X 'AXP + 2u'AXP + U ' A X ( X ' A X ) - ' X ' A U ( R 2 - 0 ) = - O

P 'X 'AXP + nu2 + ( n w + 2u1AXP) - n - l ( ~ ' e ) ~

-- - (' - O ) [2(1 - O ) U ' A X P - Onw + U ~ A X ( X ~ A X ) - ' X ~ A U nu2

+ n - ' ~ ( u ' e ) ~ ] .

Expanding the right hand side, we find

( R 2 - 0 ) = ( 1 - O ) ( f - + + 5-1) + 0 p ( n - 3 / 2 ) ( i i i )

where, denoting f - , as a term of O p ( n - T ) ,

1 f - 3 = - [2(1 - O)ulAXP - Onw]

nu2 1 6 4(1 - 0)2

f- l = z ~ ' [ ~ ~ ( ~ ' ~ ~ ) - l ~ ' ~ + -eel - n nu2 A X P P ' X A ] . u

+ (1-8)[On2w2 - 2(1 - 2O)nw . u 'AXP] . n2u4

Thus, the bias of R2 to order ~ ( n - l ) is given by

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

7:47

27

Aug

ust 2

013

COEFFICIENT OF DETERMINATION

Using (i) and (ii), it is easy to see that

which, when substituted in (iv) leads to result (12) of the paper.

For deriving the result (13), we observe from (3) and (iii) that

Thus, the bias of Ri to order O(n-l) is given by

Substituting the expectations, we obtain the result (13).

Finally, let us consider the difference in mean squared errors of R2 and

RZ, to order ~ ( n - ~ ) . Using (iii), we have

n - p - 1

- - 2p(l - q2 2 P

E (f-; +f-1 -f-i - -) +0(n-') (vii) n 2 2n

It is easy to verify that

8 (f!;) = ;(4 - 28 + 672) (viii)

Substituting it along with the expectations of f-l and f-l in (vii), we

obtain the result (14) of the paper.

Fianlly, we note that E(l - R2) = (1 - 0) - B(R2), and E(l - R2)2 =

(1 - B2 - 2(1 - 8)B(R2) + (1 - 8 ) 2 E f ~ l 1 2 upto O(n-l). Substituting these

values in (1 I ) , we can get the condition in (17).

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

7:47

27

Aug

ust 2

013

SRIVASTAVA, SRIVASTAVA, AND ULLAH

ACKNOWLEDGEMENTS

The authors are thankful to E. Maasourni and two referees for their

valuable suggestions and constructive comments. The third author gratefully

acknowledges the financial support from the Academic Senate, UCR.

REFERENCES

Cramer, J.S. (1987): Mean and Variance of R2 in Small and Moderate Sam- ples, Journal of Econometrics, 35, 253-266.

Magnus, J.R. (1986): The Exact Moments of a Ratio of Quadratic Forms in Normal Variables, Annales of Economie et de Statistique, 4, 95-109.

Ohtani, L. and H. Hasegawa (1993): On Small-Sample Properties of R2 in a Linear Regression Model with Multivariate t Errors and Proxy Variables, Econometric Theory, 9, 504-515.

Peters, T.A. (1989): The Exact Moments of OLS in Dynamic Regression Models with Non-normal Errors, Journal of Econometrics, 279-305.

Smith, M.D. (1989): On the Expectation of a Ratio of Quadratic Forms in Normal Variables, Journal of Multivariate Analysis, 31, 244-257.

Smith, M.D. (1993): Expectations of Ratios of Quadratic Forms in Nor- mal Variables: Evaluating Some Top-Order Invariant Polynomials, Aus- tralian Journal of Statistics, 271-282.

Ullah, A. (1990): Finite Sample Econometrics: A Unified Approach in R.A.L. Carter, J . Dutta and A. Ullah, eds. Contributions to Econometric The- ory and Application. Springer-Verlag, New York, 249-292.

Ullah, A. and V.K. Srivastava (1994): Moments of the Ratio of Quadratic Forms in Non-normal Variables with Econometric Examples, Journal of Econometrics, 62, 129-142.

Ullah, A., V.K. Srivastava and R. Chandra (1983): Properties of Shrinkage Estimators in Linear Regression when Disturbances are not Normal, Journal of Econometrics, 21, 389-402.

Dow

nloa

ded

by [

Uni

vers

itaet

s un

d L

ande

sbib

lioth

ek]

at 0

7:47

27

Aug

ust 2

013

Date post:	13-Dec-2016
Category:	Documents
Upload:	aman
View:	213 times
Download:	0 times

The coefficient of determination and its adjusted version in linear regression models

Documents