+ All Categories
Home > Documents > Econometrics_ch6

Econometrics_ch6

Date post: 14-Apr-2018
Category:
Upload: muhendis8900
View: 216 times
Download: 0 times
Share this document with a friend

of 51

Transcript
  • 7/30/2019 Econometrics_ch6

    1/51

    405ECONOMETRICSChapter # 5: TWO-VARIABLE REGRESSION:

    INTERVAL ESTIMATION AND HYPOTHESIS TESTING

    Domodar N. Gujarati

    Prof. M. El-Sakka

    Dept of Economics. Kuwait University

  • 7/30/2019 Econometrics_ch6

    2/51

    Introduction

    The theory of estimation consists of two parts: point estimation and interval

    estimation. We have discussed point estimation thoroughly in the previoustwo chapters. In this chapter we first consider interval estimation and then

    take up the topic of hypothesis testing, a topic related to interval estimation.

  • 7/30/2019 Econometrics_ch6

    3/51

    INTERVAL ESTIMATION: SOME BASIC IDEAS

    Look at the estimated MPC in Yi= 24.4545 + 0.5091Xi, which is a single

    (point) estimateof the unknown population MPC2. How reliable is thisestimate? A single estimate is likely to differ from the true value, although

    in repeated sampl ing its mean value is expected to be equal to the true value.

    In statistics the reliability of a point estimator is measured by its standard

    error. Therefore, we may construct an interval around the point estimator,

    say within two or three standard errors on either sideof the point estimator,such that this interval has, say, 95 percent probability of including the true

    parameter value.

    Assume that we want to find out how close is, say,2to 2. We try to find

    out two positive numbers and , the latter lying between 0 and 1, such that

    the probability that the random interval (2 , 2+ )contains the true 2is1 . Symbolical ly,

    Pr (2 2 2+ ) = 1 (5.2.1)

    Such an interval, if it exists, is known as a confidence interval.

  • 7/30/2019 Econometrics_ch6

    4/51

    1 isknown as the confidence coefficient; and

    (0 < < 1) is known as thelevel of significance.

    The endpoints of the confidence interval are known as the conf idence limi ts

    (also known as critical values), 2 being the lower confidence limit and

    2+ the upper confidence limit.

    If = 0.05, or 5 percent, (5.2.1) would read: The probability that the

    (random) interval shown there includes the true2is0.95, or 95 percent.

    The interval estimator thus gives a range of values within which the true2

    may lie.

  • 7/30/2019 Econometrics_ch6

    5/51

    It is very important to know the following aspects of interval estimation:

    1. Equation (5.2.1) does not say that the probability of2lying betweenthegiven limits is1 . What (5.2.1) states is that the probability of

    constructing an interval that contains2is1 .

    2. The interval (5.2.1) is a random interval; that is, it will vary from one

    sample to the next because it is based on2, which is random.

    3. (5.2.1) means: If in repeated sampling confidence intervals like it areconstructed a great many times on the 1 probability basis, then, in the

    long run, on the average, such intervals wil l enclose in 1 of the cases the

    true value of the parameter.

    4. The interval (5.2.1) is random so long as2is not known. Once2is

    known, the interval (5.2.1) is no longer random;it isfixed. I nthis situation2is either in the f ixed interval or outside it. Therefore, theprobability is

    either 1 or 0, if the 95% confidence interval were obtained as (0.4268 2

    0.5914), we cannot say the probabil i ty is 95% that this interval includes the

    true2. That probabil i ty is either 1 or 0.

  • 7/30/2019 Econometrics_ch6

    6/51

    CONFIDENCE INTERVALS FOR REGRESSION

    COEFFICIENTS1AND2

    Confidence Interval for2

    With the normality assumption for ui, the OLS estimators1and 2arethemselvesnormally distr ibutedwith means and variances given therein.

    Therefore, for example, the Variable

    is a standardized normal variable. Therefore, we can use the normal

    distribution to make probabilistic statements about 2. If 2

    is known, animportant property of a normally distributed variable with mean and

    variance 2is that the area under the normal curve between is about

    68 percent, that between the limits 2is about 95 percent, and that

    between 3is about 99.7 percent.

  • 7/30/2019 Econometrics_ch6

    7/51

    But 2is rarely known, and in practice it is determined by the unbiased

    estimator 2. If we replace by , (5.3.1) may be written as:

    where the se (2)now refers to the estimated standard error. Therefore,

    instead of using the normal distribution, we can use thet distr ibutionto

    establish a confidence interval for2as follows:

    Pr (t/2 t t/2) = 1 (5.3.3) where t/2 is the value of thetvar iable obtained from thetdistribution for /2

    level of signi f icanceand n 2 df; i t is often called thecritical t valueat/2

    level of signi f icance.

  • 7/30/2019 Econometrics_ch6

    8/51

    Substituti on of (5.3.2) into (5.3.3)Yields

    Rearranging (5.3.4), we obtain

    Pr [2 t/2 se (2) 2 2+ t/2 se (2)] = 1 (5.3.5) Equation (5.3.5) provides a 100(1 ) percent confidence interval for 2,

    which can be written more compactly as

    100(1 )% confidence interval for 2:

    2 t/2 se (2) (5.3.6)

    Arguing analogously, and using (4.3.1) and (4.3.2), we can then write:

    Pr [1 t/2 se (1) 1 1+ t/2 se (1)] = 1 (5.3.7)

    or, more compactly,

    100(1 )% confidence interval for 1:

    1 t/2 se (1) (5.3.8)

  • 7/30/2019 Econometrics_ch6

    9/51

    Notice an important feature of the confidence intervals given in (5.3.6) and

    (5.3.8): In both cases thewidth of the conf idence interval is proportional to thestandard error of the estimator. That is, the larger the standard error, the

    larger is the width of the confidence interval. Put differently, the larger the

    standard error of the estimator, the greater is the uncertainty of estimating

    the true value of the unknown parameter.

  • 7/30/2019 Econometrics_ch6

    10/51

    We found that2= 0.5091, se (2)= 0.0357, anddf= 8. I f weassume =

    5%, that i s, 95% conf idence coeff icient, then thettable showsthat for 8 df thecritical t/2 =t0.025= 2.306. Substi tuting these values in(5.3.5), the 95%

    confidence interval for2is asfollows:

    0.4268 2 0.5914 (5.3.9)

    Or, using (5.3.6), it is

    0.5091 2.306(0.0357)

    that is,

    0.5091 0.0823 (5.3.10)

    The interpretation of this confidence interval is: Given the confidence

    coefficient of 95%, in 95 out of 100 cases intervals like (0.4268, 0.5914) will

    contain the true2. But, we cannot saythat the probability is 95 percent that

    the specific interval (0.4268 to 0.5914) contains the true2because this

    intervalis now fixed; therefore,2either l ies in it or does not.

  • 7/30/2019 Econometrics_ch6

    11/51

    Confidence Interval for1

    Following (5.3.7), we can verify that the 95% confidence interval for1ofthe consumptionincome example is

    9.6643 1 39.2448 (5.3.11)

    Or, using (5.3.8), we find it is

    24.4545 2.306(6.4138)

    that is,

    24.4545 14.7902 (5.3.12)

    In the long run, in 95 out of 100 cases intervals like (5.3.11) will contain the

    true1; again the probabil ity that this particular f ixed interval includes the

    true1is either 1 or 0.

  • 7/30/2019 Econometrics_ch6

    12/51

    CONFIDENCE INTERVAL FOR2

    As pointed before, under the normality assumption, the variable

    2= (n 2) 2/2 (5.4.1)

    follows the2distribution withn 2 df. Therefore, we can use the2

    distributionto establish a confidence interval for 2

    Pr (21/2 2 2/2 )= 1 (5.4.2)

    Where21/2and2/2 are two cri tical values of

    2obtainedfrom the chi-

    square table for n 2 df.

    Substituting2f rom (5.4.1) into (5.4.2) and rearr anging the terms, weobtain

    which gives the 100(1 )% confidence interval for 2.

  • 7/30/2019 Econometrics_ch6

    13/51

    To illustrate, we obtain 2= 42.1591and df = 8. If is chosen at5 percent,

    the chi -square tablefor 8 df gives the following critical values: 20.025= 17.5346, and

    20.975= 2.1797.

    These values show that the probabil ity of a chi -square value exceeding

    17.5346 is 2.5 percent and that of2.1797 is 97.5 percent. Therefore, the

    interval between these two values is the 95% confidence interval for2, as

    shown diagrammatically in Figure 5.1. Substituting the data of our example into (5.4.3 the 95% confidence interval

    for 2is as follows:

    19.2347 2154.7336 (5.4.4)

    The interpretation of this interval is: If we establish 95% confidence limits

    on 2and if we maintain a priori that these limits will include true 2, we shall

    be right in the long run 95 percent of the time.

  • 7/30/2019 Econometrics_ch6

    14/51

    (Note the skewed characteristic of the chi-square distribution.)

  • 7/30/2019 Econometrics_ch6

    15/51

    HYPOTHESIS TESTING: THE CONFIDENCE-INTERVAL

    APPROACH

    Two-Sided or Two-Tail Test

    To illustrate the confidence-interval approach, look at the consumptionincome example, the estimated (MPC),2, is 0.5091. Suppose we postulate

    that:

    H0:2= 0.3 and H1:2 0.3

    that is, the true MPC is 0.3 under the null hypothesis but i t is less than or

    greater than0.3 under the alternative hypothesis. The alternative hypothesisis a two-sided hypothesis. It reflects the fact that we do not have a strong

    expectationabout the direction in which the alternative hypothesis should

    move from the null hypothesis.

    Is the observed2compatible with H0? To answer this question, let us refer

    to the confidence interval (5.3.9). We know that in the long run intervalslike (0.4268, 0.5914) will contain the true2with 95 percent probabil i ty.

  • 7/30/2019 Econometrics_ch6

    16/51

    Consequently, in the long run such intervals provide a range or limits within

    which the true2may lie with a conf idence coeff icientof, say, 95%. Therefore, if2underH0falls with in the 100(1 )% confidence interval, we

    do not reject the null hypothesis; if it lies outside the interval, we may reject it.

    This range is illustrated schematically in Figure 5.2.

    Decision Rule: Construct a 100(1 )% confidence interval for 2. If the 2

    under H0falls within this conf idence interval, do not reject H0, but if it fallsoutside this interval, reject H0.

    Following this rule, H0:2= 0.3clearly l iesoutside the 95% confidence

    interval given in (5.3.9). Therefore, we can reject the hypothesis that the

    true MPC is 0.3, with 95% confidence.

  • 7/30/2019 Econometrics_ch6

    17/51

  • 7/30/2019 Econometrics_ch6

    18/51

    In statistics, when we reject the null hypothesis, we say that our finding is

    statistically signif icant. On the other hand, when we do not reject the nullhypothesis, we say that our finding is not statistically significant.

    One-Sided or One-Tail Test

    I f we have a strong theoretical expectationthat the alternative hypothesis is

    one-sided or unidirectional rather than two-sided. Thus, for our

    consumptionincome example, one could postulate that H0: 20.3 and H1: 2> 0.3

    Perhaps economic theory or prior empirical work suggests that the

    marginal propensity to consume is greater than 0.3. Although the procedure

    to test this hypothesis can be easily derived from (5.3.5), the actual

    mechanics are better explained in terms of the test-of-significance approachdiscussed next.

    HYPOTHESIS TESTING

  • 7/30/2019 Econometrics_ch6

    19/51

    HYPOTHESIS TESTING:

    THE TEST-OF-SIGNIFICANCE APPROACH

    Testing the Significance of Regression Coefficients: The t Test

    An alternati ve to the conf idence-intervalmethod is the test-of-significanceapproach. It is a procedure by which sample results are used to verify the

    truth or falsity of a null hypothesis. The decision to accept or reject H0is

    made on the basis of the valueof the test statistic obtained from the data at

    hand. Recall

    follows the t distribution with n 2 df.

  • 7/30/2019 Econometrics_ch6

    20/51

    The confidence-interval statementssuch as the following can be made:

    Pr [t/2(2 *2)/se (2) t/2]= 1 (5.7.1) where*2 is the value of2under H0. Rearranging (5.7.1), we obtain

    Pr [*2 t/2se (2) 2 *2+ t/2se (2)] = 1 (5.7.2)

    which gives the confidence interval in with 1 probability.

    (5.7.2) is known as the region of acceptanceand the region(s) outside the

    conf idence interval is (are)called the region(s) of rejection (ofH0) or the

    cri tical region(s).

    Using consumptionincome example. We know that2= 0.5091, se (2) =

    0.0357, and df = 8. If we assume = 5 percent, t/2= 2.306. I f we let:

    H0: 2= *2 = 0.3 and H1: 2 0.3, (5.7.2) becomes Pr (0.2177 20.3823) = 0.95 (5.7.3)

    Since the observed2l ies in thecritical region, we reject the null hypothesis

    that true2= 0.3.

  • 7/30/2019 Econometrics_ch6

    21/51

  • 7/30/2019 Econometrics_ch6

    22/51

    In practice, there is no need to estimate (5.7.2) explicitly. One can compute

    the t value in the middle of the double inequal ity given by (5.7.1)andseewhether it lies between the critical t values or outside them. For ourexample,

    t = (0.5091 (0.3)) / (0.0357) = 5.86 (5.7.4)

    which clearly lies in the critical region of Figure 5.4. The conclusion remains

    the same; namely, we reject H0.

    A statistic is said to be statistically signi f icant if the value of the test statistic

    lies in the cr itical region. By the same token, a test is said to be statistically

    insignif icant if the value of the test statistic lies in the acceptance region.

    We can summarize the ttest of significance approach to hypothesis testing

    as shown in Table 5.1.

  • 7/30/2019 Econometrics_ch6

    23/51

  • 7/30/2019 Econometrics_ch6

    24/51

  • 7/30/2019 Econometrics_ch6

    25/51

    Testing the Significance of2: The 2Test

    As another illustration of the test-of-significance methodology, consider thefollowing variable:

    2=n 2 (2/2) (5.4.1)

    which, as noted previously, follows the2distribution with n 2 df. For the

    hypothetical example, 2= 42.1591 and df = 8. I f we postulate that

    H0: 2= 85 vs. H1:

    2 85, Eq.

    (5.4.1) provides the test stati stic for H0. Substituti ngthe appropriate values in

    (5.4.1), it can be found that under H0, 2= 3.97. I fwe assume = 5%, the

    critical 2values are 2.1797 and 17.5346.

    Since thecomputed2l ies between these limi ts, we do not reject H0

    . The2

    test of signif icance approach tohypothesis testing is summarized in Table

    5.2.

  • 7/30/2019 Econometrics_ch6

    26/51

  • 7/30/2019 Econometrics_ch6

    27/51

    HYPOTHESIS TESTING: SOME PRACTICAL ASPECTS

    The Meaning of Accepting or Rejecting a Hypothesis

    If on the basis of a test of significance in accepting H0, do not say we acceptH0. Why? To answer this, let us revert to our consumptionincome example

    and assume that H0: 2(MPC) = 0.50. Nowthe estimated value of the MPC

    is2= 0.5091with a se (2) = 0.0357. Thenon the basis of the t test we find

    that t = (0.5091 0.50)/0.0357 =0.25,which is insignificant, say, at = 5%.

    Therefore, we say accept H0. Butnow let us assume H0: 2= 0.48. Applyingthe t test, we obtain t = (0.5091 0.48)/0.0357 =0.82, which too is statistically

    insigni f icant. So now we sayaccept this H0. Which of these two null

    hypotheses is the truth? We do not know. It is therefore preferable to say

    do not reject rather than accept.

  • 7/30/2019 Econometrics_ch6

    28/51

    The Zero Null Hypothesis and the 2-t Rule of Thumb

    A null hypothesis that is commonly tested in empirical work is H0: 2= 0,that is, the slope coefficient is zero.

    This null hypothesis can be easily tested by the confidence interval or the t-

    test approach discussed in the preceding sections. But very often such formal

    testing can be shortcut by adopting the 2-t rule of significance, which may

    be stated as

  • 7/30/2019 Econometrics_ch6

    29/51

    The rationale for this rule is not too difficult to grasp. From (5.7.1) we know

    that we will reject H0: 2= 0 if t =2/ se (2) > t/2when2> 0

    or

    t =2/ se (2) < t/2when2< 0

    or when

    |t| =|2/se (2)| > t/2 (5.8.1)

    for the appropriate degrees of freedom.

    Now if we examine the t, we see that for df ofabout 20 or more a computed t

    value in excess of 2 (in absolute terms), say, 2.1, is statistically significant at

    the 5 percent level, implying rejection of the null hypothesis.

  • 7/30/2019 Econometrics_ch6

    30/51

    Forming the Null and Alternative Hypotheses

    Very often the phenomenon under study will suggest the nature of the nulland alternative hypotheses. Consider the capital market line (CML) of

    portfolio theory,

    Ei= 1+ 2i,

    where E = expected return onportfolio and = the standard deviation of

    return, a measure of r isk. Sincereturn and risk are expected to be positivelyrelated, natural alternative hypothesis to the null hypothesis that2= 0

    would be 2> 0. That is, one would not choose to consider valuesof2less

    than zero.

    Consider the case of the demand for money. Studies have shown that the

    income elasticity of demand for money has typically ranged between 0.7 and1.3. Therefore, in a new study of demand for money, if one postulates that

    the income-elasticity coefficient2is 1, the alternati ve hypothesis couldbe

    that21, a two-sided alternative hypothesis.

    Thus, theoretical expectations or prior empirical work or both can be relied

    upon to formulate hypotheses.

  • 7/30/2019 Econometrics_ch6

    31/51

    Choosing , the Level of Significance

    To reject or not reject the H0 depends critically on , or the probabil i ty ofcommitting aType I errortheprobabil i ty of rejectingthe true hypothesis.

    But even then, why is commonly fixed at the 1, 5, or at the most 10 percent

    levels? For a given sample size, if we try to reduce a Type I error, a Type I I

    error increases, and viceversa. That is, given the sample size, if we try to

    reduce the probability of rejecting the true hypothesis, we at the same timeincrease the probability of accepting the false hypothesis.

    Now the only way we can decide about the tradeoff is to find out the relative

    costs of the two types of errors.

    Applied econometricians generally follow the practice of setting the value of

    at a 1 or a 5 or at most a 10 percent level and choose a teststatistic thatwould make the probability of committing a Type II error as small as

    possible. Since one minus the probability of committing a Type II error is

    known as the power of the test, this procedure amounts to maximizing the

    power of the test.

  • 7/30/2019 Econometrics_ch6

    32/51

    The Exact Level of Significance: The p Value

    Once a test stati sticis obtained in a given example, why not simply go to theappropriate statistical table and find out the actual probability of obtaininga value of the test statistic as much as or greater than that obtained in theexample? This probability is called the p value (i.e., probabili ty value), alsoknown as theobserved or exact level of significance or the exact probabilityof committing a Type I error. More technically, the p value is def ined as the

    lowest signi f icance level at which a nul l hypothesis can be rejected. To illustrate, given H0 that the true MPC is 0.3, we obtained a t value of 5.86

    in(5.7.4). What is the p value of obtaining a t value of as much as or greaterthan 5.86? Looking up the t table, for8 df the probability of obtaining sucha t value must be much smaller than0.001 (one-tail) or 0.002 (two-tail). Thisobserved, or exact, level of significance of the t statistic is much smaller than

    the conventionally, and arbitrarily, fixed level of significance, such as 1, 5,or 10 percent. If we were to use the p value just computed, and reject the nullhypothesis, the probability of our committing a Type I error is only about0.02 percent, that is, only about 2 in 10,000!

  • 7/30/2019 Econometrics_ch6

    33/51

    REGRESSION ANALYSIS AND ANALYSIS OF VARIANCE

    In Chapter 3, the following identity is developed:

    y2i=y2i+u2i=22x2i+u2i (3.5.2) that is, TSS = ESS + RSS. A study of these components of TSS is known as

    the analysis of variance (ANOVA).

    Associated with any sum of squares is its df, the number of independent

    observations on which it is based. TSS has n 1dfbecause we lose 1 df in

    computing the sample mean Y . RSS hasn 2df.

    ESS has 1 df which follows from the fact that

    ESS =22x2iis a function of

    2only, since x2iis known.

    Let us arrange the various sums of squares and their associated dfin Table

    5.3, which is the standard form of the AOV table, sometimes called the

    ANOVA table. Given the entries of Table 5.3, we now consider the following

    variable:

  • 7/30/2019 Econometrics_ch6

    34/51

    F= (MSS of ESS) / (MSS of RSS)

    =22x2i/(u2i/(n 2)) (5.9.1) =22x

    2i/

    2

    If we assume that the disturbances uiare normally distributed, and ifH0 is

    that2 = 0, then it can be shown that the Fvariable of (5.9.1) follows the F

    distribution with 1 dfin the numerator and (n 2) dfin the denominator.

  • 7/30/2019 Econometrics_ch6

    35/51

    What use can be made of the preceding Fratio? It can be shown that

    E22x2i= 2+ 22x2i (5.9.2)

    and

    Eu2i/ n2= E(2) = 2 (5.9.3)

    (Note that2 and 2 appearing on the right sides of these equations are the

    true parameters.) Therefore, if2 is in fact zero, Eqs. (5.9.2) and (5.9.3) both

    provide us with identical estimates of true 2. In this situation, theexplanatory variable Xhas no linear influence on Ywhatsoever and the

    entire variation in Yis explained by the random disturbances ui. If, on the

    other hand,2 is not zero, (5.9.2) and (5.9.3) will be different and part of the

    variation in Ywill be ascribable to X. Therefore, the Fratio of (5.9.1)

    provides a test of the null hypothesis H0:2 = 0.

  • 7/30/2019 Econometrics_ch6

    36/51

    To illustrate, the ANOVA table for consumptionincome is as shown in

    Table 5.4. The computed Fvalue is seen to be 202.87. The pvalue of this Fstatistic using electronic statistical tables is 0.0000001, the computed Fof

    202.87 is obviously significant at this level. Therefore, reject the null

    hypothesis that2 = 0.

    REPORTING THE RESULTS OF REGRESSION ANALYSIS

  • 7/30/2019 Econometrics_ch6

    37/51

    REPORTING THE RESULTS OF REGRESSION ANALYSIS

    Employing the consumptionincome example of as an illustration:

    Yi= 24.4545 + 0.5091Xi

    se = (6.4138) (0.0357) r2 = 0.9621 (5.11.1)

    t = (3.8128) (14.2605) df = 8

    p = (0.002571) (0.000000289) F1,8 = 202.87

    Thus, for 8 df the probability of obtaining a tvalue of 3.8128 or greater is

    0.0026 and the probability of obtaining a tvalue of 14.2605 or larger is

    about 0.0000003.

    Under the null hypothesis that the true population intercept value is zero,

    thep

    value of obtaining at

    value of 3.8128 or greater is only about 0.0026.Therefore, if we reject this null hypothesis, the probability of our

    committing a Type I error is about 26 in 10,000. The true population

    intercept is different from zero.

  • 7/30/2019 Econometrics_ch6

    38/51

    If the true MPC were in fact zero, our chances of obtaining an MPC of

    0.5091 would be practically zero. Hence we can reject the null hypothesisthat the true MPC is zero.

    Earlier we showed the intimate connection between the Fand tstatistics,

    namely, F1, k= t2k. Under the null hypothesis that the true2 = 0, the F

    value is 202.87, and the tvalue is about 14.24, as expected, the former value

    is the square of the latter value.

    EVALUATING THE RESULTS OF REGRESSION ANALYSIS

  • 7/30/2019 Econometrics_ch6

    39/51

    EVALUATING THE RESULTS OF REGRESSION ANALYSIS

    How good is the fitted model? We need some criteria with which to

    answer this question. First, are the signs of the estimated coefficients in accordance with

    theoretical or prior expectations? e.g., the income consumption model

    should be positive.

    Second, if theory says that the relationship should also be statistically

    significant. The pvalue of the estimated tvalue is extremely small. Third, how well does the regression model explain variation in the

    consumption expenditure? One can use r2 to answer this question, which is

    a very high.

  • 7/30/2019 Econometrics_ch6

    40/51

    There is one assumption that we would like to check, the normality of the

    disturbance term, ui. Normality Tests

    Although several tests of normality are discussed in the literature, we will

    consider just three:

    (1) histogram of residuals;

    (2) normal probability plot (NPP), a graphical device; and

    (3) the JarqueBera test.

  • 7/30/2019 Econometrics_ch6

    41/51

    Histogram of Residuals.

    A histogram of residuals is a simple graphic device that is used to learnsomething about the shape of the probability density function PDF of a

    random variable.

    If you mentally superimpose the bell shaped normal distribution curve on

    the histogram, you will get some idea as to whether normal (PDF)

    approximation may be appropriate.

  • 7/30/2019 Econometrics_ch6

    42/51

    Normal Probability Plot.

    A comparatively simple graphical device is the normal probability plot(NPP). If the variable is in fact from the normal population, the NPP will be

    approximately a straight line. The NPP is shown in Figure 5.7. We see that

    residuals from our illustrative example are approximately normally

    distributed, because a straight line seems to fit the data reasonably well.

  • 7/30/2019 Econometrics_ch6

    43/51

  • 7/30/2019 Econometrics_ch6

    44/51

    JarqueBera (JB) Test of Normality.

    The JB test of normality is an asymptotic, or large-sample, test. It is alsobased on the OLS residuals. This test first computes the skewness and

    kurtosis, measures of the OLS residuals and uses the following test statistic:

    JB = n[S2/ 6+ (K 3)2/ 24] (5.12.1)

    where n= sample size, S= skewness coefficient, and K= kurtosis coefficient.

    For a normally distributed variable, S= 0 and K= 3. In that case the valueof the JB statistic is expected to be 0.

  • 7/30/2019 Econometrics_ch6

    45/51

    The JB statistic follows the chi-square distr ibution with 2 df.

    If the computed pvalue of the JB statistic in an application is sufficientlylow, which will happen if the value of the statistic is very different from 0,

    one can reject the hypothesis that the residuals are normally distr ibuted. But if

    the pvalue is reasonably high, which will happen if the value of the statistic

    is close to zero, we do not reject the normality assumption.

    The sample size in our consumptionincome example is rather small. If wemechanically apply the JB formula to our example, the JB statistic turns

    out to be 0.7769. The pvalue of obtaining such a value from the chi-square

    distribution with 2 df is about 0.68, which is quite high. In other words, we

    may not reject the normality assumption for our example. Of course, bear

    in mind the warning about the sample size.

    A CONCLUDING EXAMPLE

  • 7/30/2019 Econometrics_ch6

    46/51

    A CONCLUDING EXAMPLE

    Let us return to Example 3.2 about food expenditure in India. Using the

    data given in (3.7.2) and adopting the format of (5.11.1), we obtain thefollowing expenditure equation:

    FoodExpi= 94.2087 + 0.4368 TotalExpi

    se = (50.8563) (0.0783)

    t= (1.8524) (5.5770)

    p= (0.0695) (0.0000)*

    r2 = 0.3698; df = 53

    F1,53 = 31.1034 (pvalue = 0.0000)*

  • 7/30/2019 Econometrics_ch6

    47/51

    As expected, there is a positive relationship between expenditure on food

    and total expenditure. If total expenditure went up by a rupee, on average,expenditure on food increased by about 44 paise.

    If total expenditure were zero, the average expenditure on food would be

    about 94 rupees.

    The r2 value of about 0.37 means that 37 percent of the variation in food

    expenditure is explained by total expenditure, a proxy for income. Suppose we want to test the null hypothesis that there is no relationship

    between food expenditure and total expenditure, that is, the true slope

    coefficient2 = 0.

  • 7/30/2019 Econometrics_ch6

    48/51

    The estimated value of2 is 0.4368. If the null hypothesis were true, what is

    the probability of obtaining a value of 0.4368? Under the null hypothesis, weobserve from (5.12.2) that the tvalue is 5.5770 and the pvalue of obtaining

    such a tvalue is practically zero. In other words, we can reject the null

    hypothesis. But suppose the null hypothesis were that2 = 0.5. Now what?

    Using the ttest we obtain:

    t= 0.4368

    0.5 / 0.0783

    = 0.8071

    The probability of obtaining a |t| of 0.8071 is greater than 20 percent. Hence

    we do not reject the hypothesis that the true2 is 0.5.

  • 7/30/2019 Econometrics_ch6

    49/51

    Notice that, under the null hypothesis, the true slope coefficient is zero.

    The Fvalue is 31.1034. Under the same null hypothesis, we obtained a tvalue of 5.5770. If we square this value, we obtain 31.1029, which is about

    the same as the Fvalue, again showing the close relationship between the t

    and the Fstatistic.

    Using the estimated residuals from the regression, what can we say about

    the probability distribution of the error term? The information is given inFigure 5.8. As the figure shows, the residuals from the food expenditure

    regression seem to be symmetrically distributed.

    Application of the JarqueBera test shows that the JB statistic is about

    0.2576, and the probability of obtaining such a statistic under the normality

    assumption is about 88 percent. Therefore, we do not reject the hypothesisthat the error terms are normally distributed. But keep in mind that the

    sample size of 55 observations may not be large enough.

  • 7/30/2019 Econometrics_ch6

    50/51

  • 7/30/2019 Econometrics_ch6

    51/51