+ All Categories
Home > Documents > Information-Theoretic Distribution Tests With Symmetry 2004

Information-Theoretic Distribution Tests With Symmetry 2004

Date post: 30-May-2018
Category:
Upload: anon-257547
View: 217 times
Download: 0 times
Share this document with a friend

of 22

Transcript
  • 8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004

    1/22

    Information-Theoretic Distribution Tests with

    Application to Symmetry and Normality

    Thanasis Stengos and Ximing Wu

    March 4, 2004

    Abstract

    We derive distribution free tests based on the Maximum Entropy densities to test

    the null hypotheses of symmetry and normality. The proposed tests are derived from

    maximizing the differential entropy subject to moment constraints. By exploiting the

    equivalence between Maximum Entropy and Maximum Likelihood estimates of the

    exponential families, we can use the conventional Likelihood Ratio, Wald and Lagrange

    Multiplier testing principles in the maximum entropy framework. Monte Carlo evidence

    suggests that they have desirable small sample properties, when compared with the

    standard parametric tests used in the literature, such as the standardized skewness

    coefficient test for symmetry or the Jarque and Bera (1980) test for normality. We

    apply the proposed symmetry tests to test the nominal wage rigidity hypothesis of

    wage determination process.

    Department of Economics, University of Guelph. Email: [email protected]. We want to thankseminar participants at Penn State University for comments. Financial support from SSHRC of Canada isgreatfully acknowledged.

    Corresponding author. Department of Economics, University of Guelph. Ontario, Canada, N1G 2W1.Email: [email protected]; Tel: (519) 824-4120, ext. 53014; Fax: (519) 763-8497.

    JEL code: C1, C12, C16; Key words: distribution test, maximum entropy, symmetry, normality.

  • 8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004

    2/22

    1 Introduction

    There are many parametric and nonparametric tests proposed in the literature to test the

    hypothesis that a distribution is symmetric about a known median. For example, among

    the most widely used parametric tests there is the standardized skewness coefficient test of

    Gupta (1967) and in the category of nonparametric tests those proposed by Fan and Gencay

    (1995) and Ahmad and Li (1997). The latter tests are distribution free based on local kernel

    estimation. They offer an advantage over the more traditional parametric tests such as the

    standardized skewness coefficient test in that they are consistent, since they are based on

    the whole distribution and not simply a part of it. However, as they are based on local

    smoothing methods these tests depend on the choice of bandwidth that is used in density

    estimation, something that may affect their power and size in finite samples.In this paper we introduce two alternative distribution free tests based on the maximum

    entropy (ME) densities. Unlike the above mentioned kernel based nonparametric tests, the

    proposed tests do not depend on bandwidth selection. Our proposed tests differ from those

    introduced by Imbens, Spady and Johnson (1998) that minimize the discrete Kullback-

    Leibler information criterion (cross entropy) or other Cresie-Read family statistics subject

    to moment constraints. They are derived from maximizing differential entropy subject to

    moment constraints. By exploiting the equivalence between ME and ML estimates for the

    exponential families, we can use the conventional likelihood ratio (LR), Wald and Lagrange

    Multiplier (LM) testing principles in the maximum entropy framework. Hence, our tests

    share the optimality properties of the standard maximum likelihood based tests. We show

    that the ME approach leads to simple yet powerful tests of symmetry which have desirable

    small sample properties. One of the tests is asymptotically equivalent to the conventional

    skewness test but is always more powerful. We also derive a normality test with similar

    properties that compares favorably with the existing JB test proposed in Jarque and Bera

    (1980) and Bera and Jarque (1981).

    The paper is organized as follows. In the next section we present the information the-

    oretic framework on which we base our analysis. We then proceed to derive our symmetry

    and normality tests and discuss their properties. In the following section we present some

    simulation results and then we present the results of an empirical application using a unique

    Canadian wage change contract data. Finally, before we conclude, we discuss some possible

    1

  • 8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004

    3/22

    extensions.

    2 Information-theoretic distribution test

    The ME principle states that among all the distributions that satisfy certain moment con-

    straints, we should choose the one that maximizes Shannons information entropy. According

    to Jaynes (1957), the ME distribution is uniquely determined as the one which is maximally

    noncommittal with regard to missing information, and that it agrees with what is known,

    but expresses maximum uncertainty with respect to all other matters.

    The ME (maxent) density is obtained by maximizing the entropy subject to some moment

    constraints. Let X be a random variable distributed with a probability density function (pdf)

    f(0) , and X1, X2,...,Xn be an i.i.d. random sample of size n generated according to f(0) .

    We also let be the estimate of0 based on a particular sample realization. We maximizethe entropy

    W =

    f(x,)log f(x,) dx,

    subject to

    f(x, ) dx = 1,gk (x) f(x,) dx = k, k = 1, 2, . . . , K ,

    where k = 1n ni=1 gk (xi) , and gk (x) is continuous and at least twice differentiable. Thesolution takes the form

    f(x, ) = exp

    0

    Kk=1

    kgk (x)

    . (1)

    To ensure f(x, ) integrates to one, we set

    0 = log

    exp

    Kk=1

    kgk (x)

    dx

    .

    The maximized entropy W = 0+K

    k=1 kk. The maxent density is of the generalized expo-nential family and can be completely characterized by the moments Egk (x) , k = 1, 2, . . . , K .

    2

  • 8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004

    4/22

    We call these moments characterizing moments, which are the sufficient statistics of the

    maxent density. A wide range of distributions belong to this family. For example, the Pear-

    son family and its extensions described in Cobb et al. (1982), which nests the normal, beta,

    gamma and inverse gamma densities as special cases, are all maxent densities.

    In general, there is no analytical solution for the maxent density, and nonlinear optimiza-

    tion is required (Zellner and Highfield (1988), Ornermite and White (1999) and Wu (2003)).

    We use Lagranges method to solve for this problem by iteratively updating

    t+1 = t H1b,

    where for the (t + 1)th stage of the updating, bk =

    gk (x) f

    x, t

    dx

    k and the Hessian

    matrix H takes the form

    Hk,j =

    gk (x) gj (x) f

    x, t

    dx.

    The positive-definitiveness of the Hessian ensures the existence and uniqueness of the solu-

    tion.1

    Given Eq. (1), we can also estimate f(x,) using MLE. The maximized log-likelihood

    l =ni=1

    log fxi, = ni=1

    0 + Kk=1

    kgk (xi)= n

    0 + Kk=1

    kk

    = nW.

    Therefore, when the distribution is of the generalized exponential family, MLE and ME are

    equivalent. Moreover, they are also equivalent to a method of moments (MM) estimator.

    This ME/MLE/MM estimator only uses the sample characterizing moments.

    1Let = [0, 1, . . . , K ] be a non-zero vector and g0 (x) = 1, we have

    H=

    Kk=0

    Kj=0

    kj

    gk (x) gj (x) f(x,) dx

    =

    Kk=0

    kgk (x)

    2f(x, ) dx > 0.

    Hence, H is positive-definite.

    3

  • 8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004

    5/22

    Although the MLE and ME are equivalent in our case, there are some conceptual dif-

    ferences. For MLE, the restricted estimates are obtained by imposing some constraints on

    the parameters. In contrast, for ME, the dimension of the parameter is determined by the

    number of moment restrictions imposed: the more moment restrictions, the more complex

    the distribution. To reconcile these two methods, we note that a ME estimate with the first

    m moment restrictions has a solution of the form

    f() = exp

    0

    mk=1

    kgk (x)

    ,

    which implicitly sets j, j = m + 1, m + 2, . . . , to be zero. Instead, when we impose more

    moment restrictions, say, gm+1 (x) f() dx = m+1, we let the data choose the appropriatevalue of m+1.2 In this sense, the estimate with more moment restrictions is in fact lessrestricted, or more flexible. The ME and MLE share the same objective function (up to

    a proportion) which is determined by the moment restrictions of the maximum entropy

    problem. Therefore, we can regard the ME approach as a method of model selection, which

    generates a MLE solution.

    Consider a M dimension parameter space M, and we want to test if m, a subspaceof M, m M. Because of the equivalence between the ME and MLE, we can use the

    traditional LR, Wald and LM principles to construct test statistics.3

    For j = m,M, let j

    be the MLE estimates in j , lj and Wj be their corresponding log-likelihood and maximized

    entropy, we have

    f(m)log f(m) dx =

    mk=0

    m,kgk (x)

    f(m) dx

    =mk=0

    m,k

    gk (x) f(m) dx =

    mk=0

    m,k

    gk (x) f(M) dx

    = mk=0

    m,kgk (x) f(M) dx = f(M)log f(m) dx.

    2The only case that m+1 = 0 is when the moment restriction

    gm+1f(m) = m+1 is not binding, orthe (m + 1)th moment is identical to its prediction based on the maxent density f(m) from the first mmoments. In this case, the (m + 1)th moment does not contain any additional information that will furtherreduce the entropy.

    3Imbens et al. (1998) proposes similar tests in the information-theoretic generalized empirical likelihoodframework.

    4

  • 8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004

    6/22

    The fourth equality follows because the first m moments of f(m) are identical to those of

    f(M) . Consequently, the log-likelihood ratio

    R =

    2 (lm

    lM) =

    2n (Wm

    WM)

    = 2n

    f(m)log f(m) dx

    f(M)log f(M) dx

    = 2n

    f(M)log f(m) dx

    f(M)log f(M) dx

    = 2n

    f(M)log

    f(M)

    f(m)dx,

    which is the Kullback-Leibler distance statistic between f(M) and f(m) multiplied by

    twice of the sample size. Therefore, if f(m) is the true model and nested in f(M) , the

    quasi-MLE estimate f(M) is equivalent to the estimate that minimizes the Kullback-Leibler

    statistic between f(M) and f(m) , as shown in White (1982).

    If we partition u = (m, Mm) = (1u, 2u) for the unrestricted model and similarly

    r = (1r, 0) for the restricted model, then the score function

    S(x,m,Mm) =

    ln f

    m(x|m, Mm)

    ln f

    Mm(x|m, Mm)

    ,

    and the Hessian

    H(x, m, Mm) =

    2 ln fmm (x|m,Mm) 2 ln fmMm (x|m,Mm)2 ln f

    Mm

    m

    (x|m, Mm) 2 ln fMm

    Mm

    (x|m, Mm)

    .If we partition similarly the inverse of the information matrix I = E(H) as

    I

    1 = I11 I12I21 I22

    ,then the Wald test is defined as

    W D = n2u I2212u,

    5

  • 8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004

    7/22

    whereas the Lagrange Multiplier test is defined as

    LM =1

    n

    ni=1

    S

    xi,

    1r, 0

    I22

    ni=1

    S

    xi,

    1r, 0

    .

    All the tests are asymptotically equivalent and distributed as 2 with (M m) degrees offreedom.

    3 Test of Symmetry and Normality

    In this section, we use the proposed ME method to obtain test statistics for symmetry and

    normality. Since the LR and Wald procedures require the estimation of the ME density,

    which in general has no analytical solution and is computationally quite involved, we focus

    on the LM test, which reduces surprisingly to a simple functional form.

    3.1 Symmetry Test

    As before, let X be a random variable distributed with a pdf f0, and X1, X2,...,Xn be an

    i.i.d. random sample of size n generated according to f0. The standard test of skewness

    takes the form

    b = n 36 64 + 9 ,wherej = 1n ni=1 xji . The test statistic b is asymptotically distribution as N(0, 1). Althoughoriginally proposed under the assumption of normality, Gupta (1967) shows that the test

    is also applicable without this assumption provided the underlying distribution has finite

    moments up to order six.

    3.1.1 Two Alternative Maxent Density Estimators.

    Alternatively, we can approximate f0 by the maxent densities and then use the LM test

    proposed above. In this paper we consider two simple, yet flexible functional forms. If we

    approximate f0 using the maxent density subject to the first four arithmetic moments, the

    solution takes the form

    f1 = exp

    4k=0

    kxk

    .

    6

  • 8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004

    8/22

    This exponential quartic form was first discussed by Fisher (1922) and studied in the maxi-

    mum entropy framework in Zellner and Highfield (1988), Ornermite and White (1999) and

    Wu (2003).

    However, instead of using the high order sample moments, whose small sample properties

    may be unreliable, we can use generalized moments of sin (x) and cos(x) .4 The resulting

    density takes the form

    f2 = exp

    2k=0

    kxk 3 sin(x) 4 cos(x)

    .

    This is in the spirit of the orthogonal trigometric polynomials method of Cencov (1962).

    Since sine and cosine terms are bounded in [

    1, 1] , these two moments always exist and

    are not sensitive to outliers. An additional advantage of f2 is its numerical stability in

    estimation. For large values ofx, f1, an exponential function where the exponent is quartic,

    may encounter a numerical overflow with badly chosen initial values ofs. However, f2 does

    not run into this problem because of the bounded range of sin (x) and cos(x) .

    Pearson uses

    f (x) =(x a) f

    b0 + b1x + b2x2

    to characterize the Pearson family distributions (Stuart and Ord, 1994). This family includes

    the exponential, normal, beta of first and second kind, gamma distributions as special cases.

    Cobb et al. (1983) further generalizes the Pearson distributions using

    f (x)

    f(x)= g (x)

    v (x),

    where g (x) is the so called shape polynomial of x up to order K and v (x) takes the form

    of 1, x , x2 or x (1 x) . The maxent densities are even more flexible than Cobb et al. family.

    Denote f(x) = exp(h (x)) , generally no restrictions are imposed on h (x) except that itis continuously differentiable. Following Cobb et al. (1983)s framework, approximating

    the differentiable density f(x) by exp(h (x)) is equivalent to approximating f (x) /f(x) by

    4When the domain off(x) = expKj=0 jxj is the real line, K should be an even number and k > 0

    to ensure that f(x) is a proper density function as we require lim |x| f(x) = 0. However, for f2, we canhave 3 = 0 and 4 = 0 as the even function x2 is the dominant term in f2. For the test of symmetry basedon the third term, the choice of the last characterizing moments is immaterial.

    7

  • 8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004

    9/22

    h (x) . The power series and Fourier series are two commonly used approximation methods.

    One can see that f1 corresponds to a power series approximation of f, while f2 is mixture

    of both. Gallant (1981) notes that including a linear term helps to considerably reduce the

    number of sine/cosine terms in a Fourier approximation of a non-periodic function. If a

    quadratic term is included as well, then curvature restriction may be imposed. Moreover,

    sine and cosine can be expressed as infinite power serieses

    sin(x) =n=0

    (1)n x2n+1

    (2n + 1)!= x x

    3

    3!+

    x5

    5! . . .

    cos(x) =n=0

    (1)n x2n

    (2n)!= 1 x

    2

    2!+

    x4

    4! . . . .

    Therefore, the exponent of f2 is essentially an infinite power series with some coefficient

    restrictions.

    We proceed to shed some light into how f1 and f2 approximate the underlying distribu-

    tions by conducting two experiments. In the first one, we estimate f1 and f2 from a random

    sample of standard normal variates with sample size 50. We repeat the experiment 1,000

    times. Denote fij as the estimate from the jth experiment, i = 1, 2. Because the experimentsare independent, we define the average estimate as

    fi =

    1000j=1

    fij1

    1000

    = exp

    1

    1000

    1000j=1

    ikjgik (x) , i = 1, 2,where s are the estimated Lagrange multipliers. For comparison, we also calculate thetwo-term Edgeworth expansion of each experiment.5 The average estimate is also defined

    similarly as the geometric average of each estimate. Figure 1 plots the average estimated

    maxent density and Edgeworth expansion, together with the theoretical density. The plot

    5The Edgeworth expansion is obtained as

    f(x) =

    1 +

    1

    6

    1H3 (x) +

    1

    24(2 3) H4 (x) + 1

    721H6 (x)

    (x) ,

    where

    1 and 2 is the coefficient of skewness and kurtosis, Hi is the ith order Hermite polynomial and (x) is the standard normal density function. Hence, the average estimate is defined as

    f(x) =1

    n

    ni=1

    1 +

    1

    6

    1 H3 (x) + 124

    2 3 H4 (x) + 172H6 (x) (x) .

    8

  • 8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004

    10/22

    suggests that both maxent estimates approximate the underlying distributions well. The

    Edgeworth expansion appears to be slightly closer to the underlying distribution. However,

    the Edgeworth expansion is not a proper density estimate and may have negative values

    for some x. In fact, when we evaluate the Edgeworth expansion on the range [

    4, 4] , in

    787 out of the 1,000 experiments we encounter negative values, which are replaced by some

    arbitrarily small positive numbers (1e-6 in this study). Moreover, the Edgeworth expansion

    usually does not integrate to unity. In contrast, the maxent estimates, by construction, are

    proper densities that are always positive and integrate to one.

    We run a second experiment on 2 with 3 degrees of freedom, and the results are plotted

    in Figure 2. One can see that when the underlying distributions are not close to normal, the

    Edgeworth expansion misses badly. In contrast, the maxent densities are considerably closer

    to f0. Moreover, we can exploit the fact the domain of the distribution is positive to improve

    the approximation by restricting x to be the positive half line or using a log(x) term in the

    maxent density.6

    3.1.2 The symmetry test statistics based on f1 and f2.

    We do not assume normality for our symmetry test. Given the maxent density f(x) , we can

    test the assumption of symmetry by testing if the Lagrange Multipliers associated with the

    moments that are odd functions, i.e., Eg (x) = Eg (x), are zero. For f1, the informationmatrix under symmetry takes the form

    I =

    1 0 1 0 4

    0 1 0 4 0

    1 0 4 0 6

    0 4 0 6 0

    4 0 6 0 8

    .

    6A detailed comparison between maxent estimates and the Edgeworth expansion is not pursued here anyfurther and is left as a topic of future research.

    9

  • 8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004

    11/22

    Under symmetry, the score function for f1 is S = n [0, 0, 0,3, 0] . Then the LM test forsymmetry, which is equivalent to testing 3 = 0, is defined as

    ts1 =1

    n

    SI1S =n

    236 24 ,

    where ts1 is distributed as 2 with one degree of freedom, and therefore is asymptotically

    equivalent to b2. Both tests require the existence of the moments up to order 6. Comparing

    with the conventional skewness test b =

    n3/6 64 + 9, we note that24 64 9

    which in turn implies that ts1

    b2. The equality holds only when 4 = 3, where the firstfour moments coincide with that of standard normal distribution. Otherwise, ts1 always has

    higher power than b under the alternative hypothesis of asymmetry.

    The information matrix of f2 under symmetry is

    I =

    1 0 1 0 c

    0 1 0 1,s 0

    1 0 4 0 2,c

    0 1,s 0 s2 0

    c 0 2,c 0 c2

    ,

    where 1,s = E[x sin(x)] , 2,c = E[x2 cos(x)] , s2 = E

    sin(x)2

    and c2 = E

    cos(x)2

    .

    Now the test for symmetry is equivalent to testing if the Lagrange Multiplier for s =

    Esin(x) is zero. Under the restriction of symmetry, the score function of f2 is S =

    n [0, 0, 0,

    s, 0] , and the test statistic is given by

    ts2 = 1n

    SI1S = n2ss2 21,s .where s = 1n ni=1 sin(xi) , 1,s = 1n ni=1 xi sin(xi) and s2 = 1n ni=1 sin(xi)2 . The teststatistic ts2 is also asymptotically distributed as

    2 with one degree of freedom.

    10

  • 8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004

    12/22

  • 8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004

    13/22

    propose a test based on f2. Under normality, the information matrix of f2 takes the form

    I =

    1 0 1 0 e1

    2

    0 1 0 e1

    2 0

    1 0 3 0 0

    0 e1

    2 0 1e2

    20

    e1

    2 0 0 0 1+e2

    2

    .

    The score function under normality restriction is S = n

    0, 0, 0,s,c e 12 and the newLM test is given by

    tn2 = 1n

    SI1S = n 22s1 2e1 e2 +

    2c e 122

    1 3e1 + e2 .Both tn1 and tn2 are asymptotically distributed as

    2 with two degrees of freedom.

    Under normality, the correlation of 3 and 4 is practically zero, so is that of s and

    c. However, the correlation of |3| and 4 is 0.52, 0.41 and 0.32 for 10,000 random normalsamples with n = 50, 100 and 200, while the correlation of |

    s| and

    c is 0.33, 0.22 and 0.17

    for n = 50, 100, 200. Therefore, we expect that tn2 converges to 22 distribution faster than

    tn1 asymptotically.

    4 Simulations

    In this section, we use Monte Carlo simulations to assess the size and power of the proposed

    tests. Following Randles et al. (1980) and Bai and Ng (2003), we consider well known

    distributions such as the normal, the t and the 2, as well as distributions from the generalized

    lambda family. The generalized lambda distribution, which nests a range of symmetric andasymmetric distributions, is defined in terms of the inverse of the cumulative distribution

    F1 (u) = 1 +

    u3 (1 u)4

    /2, 0 < u < 1.

    For both the symmetry and normality test, we consider the following symmetric and

    asymmetric distributions:

    S1: N(0, 1) ;

    12

  • 8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004

    14/22

    S2: t5;

    S3: e1I(z 0.5) + e2I(z > 0.5) , where z U(0, 1) , e1 N(1, 1), and e2 N(1, 1) ;

    S4: F1 (u) = 1 + u3 (1 u)4 /2, 1 = 0, 2 = 0.19754, 3 = 0.134915, 4 =0.134915;

    S5: F1 (u) = 1 +

    u3 (1 u)4

    /2, 1 = 0, 2 = 1, 3 = 0.8, 4 = 0.8;

    A1: lognormal: exp(e) , e N(0, 1) ;

    A2: 23;

    A3: exponential: ln (e) , e U(0, 1) ;

    A4: F1 (u) = 1 +

    u3 (1 u)4 /2, 1 = 0, 2 = 1, 3 = 1.4, 4 = 0.25;A5: F1 (u) = 1 +

    u3 (1 u)4

    /2, 1 = 0, 2 = 1, 3 = 0.0075, 4 = 0.03.

    For each distribution, we draw 100,000 random samples of size n = 20, 50, 100, 200, 500

    and 1,000 and calculate the symmetry and normality test statistics discussed above. There is

    a large body of work on symmetry and normality test, for example, Gupta (1967), Randles

    et al. (1980), Bera and Jarque (1981), DAgostino et al. (1990), Ahmad and Li (1996),

    Bai and Ng (2003) and Bontemps and Meddahi (2003). Our results are comparable to and

    in general more favorable than those of existing studies, especially when the sample size is

    small. Table 1 reports the results of the symmetry test at the 5 per cent level of significance.

    We report the results for sample size up to 200 as the power of the tests is nearly unity for

    n 500 for all the asymmetric distributions. The first five rows for symmetric distributionsreport size and the next five for asymmetric distributions show powers. The size of the tests

    remains stable across different sample sizes. For sample sizes ranging between 20 to 200,

    which is frequently encountered in empirical work, the variation in size for b is no more than

    2%, and that for ts1 and ts2 is less than 1%. We find that ts1 always rejects more often than

    b, confirming the results of the previous section that ts1 b2. Also in general, ts2 tends toreject more often than the other two tests. On the other hand, we observe high power of

    the proposed tests against asymmetric distributions and the power increases rapidly to unity

    with sample size. Overall, ts1 and ts2 are considerably more powerful than b. For example,

    13

  • 8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004

    15/22

    the power of the two maxent tests against the lognormal distribution (A1) is often more

    than twice that of the conventional skewness test when the sample size is small.

    Table 2 reports the results for the normality tests at the 5 per cent significance level.

    The first row reflects the size and the rest show the power of the tests. In terms of size the

    two tests are comparable, however, the second test is generally more powerful. For example,

    for distribution S3, the power of JB test is only 0.06 for n = 200, while that of tn2 is 0.18.

    So is the case for distribution A4, where tn2 is considerably more powerful even when the

    sample size is small. For distribution S4, whose first four moments coincide with those of

    the standard normal distribution, both tests as expected have very low power. Randles et

    al. (1980) and Bai and Ng (2003) also report very low power against S4. Even then though,

    tn2 appears to be slightly more powerful than tn1.

    5 Empirical Application

    The wage determination process is one of the most studied areas of empirical labor economics.

    At the root of this research lies the question of downward nominal rigidity, which, if prevalent,

    would interfere with the functioning of the labor market preventing the efficient reallocation

    of workers from low to high demand areas and inducing unemployment. Furthermore, if

    nominal rigidity is more pervasive in some sectors than in others, similar shocks will havedifferent price and quantity effects. For example, if unions are more resistant to wage cuts

    than the non-union sector real wage realignment may be more difficult to achieve in the union

    sector. The same may be true in the public sector when compared with the private sector,

    since the former is typically more unionized than the latter. There is an expanding research

    area that studies this issue, see Christofides and Stengos (2003) for a recent review of the

    current literature. The role of symmetry as a gage of the extent and significance of downward

    nominal rigidity is the subject of considerable debate. Card and Hyslop (1997) note that

    most wage determination models imply symmetry and use the portion of the distribution

    above the median as the no-rigidity counterfactual in their own work. McLaughlin (2000)

    provides reasons other than downward wage rigidity for believing that the wage-change

    distribution may be asymmetric. Whatever the reasons, testing for symmetry or the lack

    of it (asymmetry) is a question of paramount importance in empirical labor economics and

    serves as a first stage in offering a better understanding of wage determination. The starting

    14

  • 8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004

    16/22

    point of most of these studies is the construction of wage-change histograms from data on

    individual agents. However, visual evidence must be filtered through standard statistical

    procedures in order to measure the quantitative significance of certain forces and effects.

    In this context, consideration of the symmetry of the wage-change distribution has played

    a very important role. A number of symmetry measures have been used in this literature.

    These include the standardized skewness coefficient test b, the difference between the median

    and mean, symmetrically differenced histograms and nonparametric symmetry tests, see

    MacLaughlin (2000) and Christofides and Stengos (2003) for a discussion of these different

    measures.

    We apply our proposed LM test statistics based on the maxent densities f1 and f2 on

    a set of Canadian public sector union wage contract data for different years in the 1978 to

    1994 period. The union contract data used in this paper concern the annualized change

    over the life of each contract in the base-wage rate agreed to by employers and unions in

    the Canadian public sector. The data compiled initially by Labor Canada (now Human

    Resources Development Canada), starts in 1978 and covers contracts involving employees

    which range in number from 500 to nearly 80,000. Collective bargaining agreements often

    implement an increase in the nominal wage rate at the beginning of the contract and then

    again at yearly intervals. Contract duration (in years) and the number of nominal revisions

    are correlated but not identical and, in any case, the main pattern is one of infrequent wageadjustments. A result of this fact is that, depending on what contract sub-period one chooses

    to focus on, the implied nominal change could be made arbitrarily large or small. Under

    these circumstances, and given that contract duration varies substantially across settlements,

    a natural interval over which to measure wage adjustment is contract duration itself, taking

    care to standardize across contracts by annualizing. In the contract data the private-public

    sector distinction is made by the data collection agency itself and is based on the sources of

    funding for the employer. Thus, health and education contracts are classified as belonging

    to the public sector because these services are not market-provided.

    The data span the high-inflation period of 1978-1982, the medium-inflation period of

    1984-1989 and the low-inflation period of 1990-94. The contract data involves agreements

    which do not contain Cost-of-Living-Allowance (COLA) clauses and, therefore, do not raise

    the additional complication that some of the relevant wage flexibility may come from the

    indexation clause. To avoid spurious correlations that may arise by pooling different years

    15

  • 8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004

    17/22

    together we look at the years at the beginning and end of each of the three mentioned periods.

    The left panel of Table 3 provides some descriptive statistics for the data used in 1978, 1982,

    1984, 1989, 1990 and 1994 and the right panel of Table 3 reports the test statistics for these

    years. The first column is the conventional skewness test b, the next two are the maxent test

    ts1 and ts2. As was expected from the findings in the simulations, the standardized skewness

    coefficient test statistic b is less powerful than the information based test statistics ts1 and

    ts2. In two out of six years b fails to reject the symmetry hypothesis whereas the other two

    test statistics do. These results suggest that the standardized skewness coefficient test would

    give unreliable results. The only times that this test statistic rejects the null hypothesis of

    symmetry is in 1978 and 1983 which are high inflation years. It fails to reject the null during

    the medium inflation years, whereas the two maxent density based statistics reject the null

    during these years. During these years public sector union contracts displayed considerable

    downward nominal rigidity in order to avoid real wage erosion due to inflation.

    6 Extensions

    In addition to their simplicity, a major advantage of the proposed tests is its generality. In

    this section, we briefly discuss the some potential extensions of the tests.

    First, we can increase the terms of polynomials in the maxent density. For f1, we canuse moments higher than order 4, and for f2, we can use sin(kx) and cos(kx) , k = 2, 3, . . . .

    Moreover, we can mix high order polynomials and tri-polynomials. The derivation of the

    test statistics is straightforward.

    Second, we can use our method of normality test for other distributions. For example,

    the gamma distribution can be characterized as a maxent distribution

    f0 (x) = exp (

    0

    1x

    2 log x) , x > 0.

    Because Ex and Elog x are the sufficient statistics for gamma distribution, the presence of

    any additional terms in the exponent of f0 (x) rejects the hypothesis that x is distributed

    according to a gamma distribution. Let f(x) = exp0 1x 2 log x

    Kk=3 kgk (x)

    ,

    the test of k = 0 for k 3 is then the LM test for gamma distribution. The dis-cussions in previous section suggest that the natural candidates for gk (x) may include

    16

  • 8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004

    18/22

    xi+1, sin(ix) , cos(ix) , (log x)i+1 , sin(i log x) , cos(i log x) , i = 1, 2, 3, . . . , or their combina-

    tions.

    Third, we can generalize our tests to regression residuals within the framework of White

    and McDonald (1980). For time series data or heteroskedastic data, we can use the approach

    of Bai and Ng (2003) or Bontemps and Meddahi (2003). In general, for non-i.i.d. data, to test

    the Lagrange Multipliers associated with sample moments gk (x) in the maxent density being

    zero, we need to estimate a Heteroskedastic-Autocorrelation-Consistent (HAC) covariance

    matrix for those moments.

    7 Conclusion

    In this paper we derive distribution free tests based on the maximum entropy densities to

    test the null hypotheses of symmetry and normality. The proposed tests are derived from

    maximizing differential entropy subject to moment constraints. By exploiting the equivalence

    between ME and ML estimates for the exponential families, we can use the conventional LR,

    Wald and LM testing principles in the maximum entropy framework. Hence, our tests share

    the optimality properties of the standard ML based tests. We show that the ME approach

    leads to simple yet powerful LM tests of symmetry and normality. The proposed tests have

    desirable small sample properties, when compared with the standard parametric tests usedin the literature, such as the standardized skewness coefficient test for symmetry or the

    Jarque and Bera (1980) test for normality. These properties are confirmed by our Monte

    Carlo experiments and empirical application.

    17

  • 8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004

    19/22

  • 8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004

    20/22

    Gupta, M. K., 1967, An asymptotically nonparametric test of symmetry, Annals of Statistics,

    38, 849-866.

    Imbens, G. W., R. H. Spady and P. Johnson, 1998, Information theoretic approaches to

    inference in moment condition models, Econometrica, 66, 333-357.Jarque, C. and A. Bera, 1980, Efficient tests for normality, homoscedasticity and serial

    independence of regression residuals, Economics Letters, 6, 255-59.

    Jaynes, E.T., 1957, Information theory and statistical mechanics, Physics Review, 106, 620-

    630.

    McLaughlin, K. J., 2000, Testing for asymmetry in the distribution of wage changes, Mimeo.

    Ormoneit, D. and H. White, 1999, An efficient algorithm to compute maximum entropy

    densities, Econometric Reviews, 18(2), 141-67.

    Randles, R.H., M. A. Polocello and D. A. Wolfe, 1980, An asymptotically distribution-free

    test for symmetry versus asymmetry, Journal of the American Statistical Association,

    75, 168-172.

    Shannon, C. E., 1949, The mathematical theory of communication (University of Illinois

    Press: Urbana).

    Stuart, A. and J. K. Ord, 1994, Kendalls advanced theory of statistics, Vol.1 (Edward

    Arnold), 6th Edition.

    Vasicek, O., 1976, A test for normality based on sample entropy, Journal of the Royal

    Statistical Society, Series B, 38, 54-59.

    White, H., 1982, Maximum likelihood estimation of misspecified models, Econometrica, 50,

    1-26.

    White, H. and G. M. McDonald, 1980, Some large-sample tests for nonnormality in the

    linear regression model, Journal of American Statistical Association, 75, 16-28.

    Wu, X., 2003, Calculation of maximum entropy densities with application to income distri-

    bution, Journal of Econometrics, 115, 347-354.

    Zellner, A. and R. A. Highfield, 1988, Calculation of maximum entropy distribution and

    approximation of marginal posterior distributions, Journal of Econometrics, 37, 195-209.

    19

  • 8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004

    21/22

    Table 1: Size and Power of Symmetry Test

    n = 20 n = 50 n = 100 n = 200b ts1 ts2 b ts1 ts2 b ts1 ts2 b ts1 ts2

    S1 0.02 0.05 0.05 0.03 0.04 0.05 0.04 0.05 0.05 0.04 0.05 0.05

    S2 0.03 0.06 0.07 0.03 0.05 0.07 0.03 0.05 0.08 0.03 0.05 0.08S3 0.02 0.06 0.06 0.03 0.06 0.06 0.04 0.06 0.06 0.04 0.06 0.06S4 0.03 0.05 0.05 0.03 0.04 0.05 0.04 0.05 0.05 0.05 0.05 0.05S5 0.03 0.06 0.07 0.04 0.05 0.07 0.03 0.05 0.08 0.03 0.05 0.08A1 0.27 0.59 0.70 0.36 0.76 0.99 0.43 0.81 1.00 0.54 0.83 1.00A2 0.23 0.35 0.41 0.58 0.71 0.90 0.80 0.89 1.00 0.93 0.97 1.00A3 0.29 0.44 0.54 0.58 0.75 0.96 0.76 0.89 1.00 0.91 0.96 1.00A4 0.13 0.28 0.30 0.53 0.72 0.73 0.89 0.96 0.96 1.00 1.00 1.00A5 0.14 0.23 0.27 0.40 0.52 0.69 0.68 0.79 0.97 0.87 0.93 1.00

    Table 2: Size and Power of Normality Test

    n = 20 n = 50 n = 100 n = 200 n = 500 n = 1000tn1 tn2 tn1 tn2 tn1 tn2 tn1 tn2 tn1 tn2 tn1 tn2

    S1 0.02 0.03 0.04 0.04 0.04 0.04 0.04 0.04 0.05 0.05 0.05 0.05S2 0.17 0.18 0.39 0.40 0.63 0.63 0.86 0.86 0.99 1.00 1.00 1.00S3 0.01 0.01 0.01 0.01 0.00 0.04 0.06 0.18 0.51 0.68 0.93 0.97S4 0.02 0.03 0.03 0.04 0.04 0.04 0.04 0.04 0.03 0.04 0.04 0.04S5 0.16 0.18 0.39 0.41 0.63 0.64 0.87 0.88 1.00 1.00 1.00 1.00A1 0.72 0.80 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

    A2 0.36 0.44 0.86 0.93 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00A3 0.48 0.58 0.96 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00A4 0.02 0.04 0.08 0.40 0.80 0.97 1.00 1.00 1.00 1.00 1.00 1.00A5 0.30 0.35 0.73 0.79 0.97 0.98 1.00 1.00 1.00 1.00 1.00 1.00

    Table 3: Symmetry Test of Wage Adjustment Data

    Year Mean Std. Min Max b ts1 ts21978 7.10 2.48 0.0 19.4 3.34* 15.32* 26.20*1983 5.20 3.03 -8.4 16.2 -2.54* 6.72* 8.14*

    1984 3.39 2.25 -0.4 10.8 -1.90 3.90* 14.03*1989 5.10 1.93 -0.2 19.8 1.89 4.34* 16.07*1990 5.37 2.22 -0.3 15.2 0.24 0.08 2.271994 0.12 2.01 -7.5 17.7 1.12 1.71 0.53*: Rejected at 5% significance level.

    20

  • 8/14/2019 Information-Theoretic Distribution Tests With Symmetry 2004

    22/22

    4 2 0 2 4

    0.

    0

    0.

    1

    0.

    2

    0.

    3

    0.

    4

    Estimated Maxent densities and Edgeworth expansion for standard normal

    x

    density

    TheoreticalMaxent f1Maxent f2Edgeworth

    Figure 1: Estimated Maxent densities and Edgeworth expansion for standard normal.

    1 0 1 2 3 4 5

    0.

    00

    0.

    05

    0.

    10

    0.

    15

    0.

    20

    0.2

    5

    Estimated Maxent densities and Edgeworth expansion for Chisqure (3)

    x

    density

    Theoretical

    Maxent f1Maxent f2Edgeworth

    Figure 2: Estimated Maxent densities and Edgeworth expansion for 2 with d.f.=3.

    21


Recommended