+ All Categories
Home > Documents > Topic 3_2 Some Distribution Theory

Topic 3_2 Some Distribution Theory

Date post: 24-Nov-2015
Category:
Upload: david-huang
View: 20 times
Download: 2 times
Share this document with a friend
Description:
Some tough distribution theory
Popular Tags:
46
Topic 3 Some Basic Foundations 3.2 Some Distribution Theory References Stock and Watson, Ch.18 Wooldridge, Appendices D and E. Heij Section 3.4 John Section 3.4 and Appendix B Verb Ch.2 and Appendix B Greene Ch. 4, 5, Appendices A, B
Transcript
  • Topic 3

    Some Basic Foundations

    3.2 Some Distribution Theory

    References Stock and Watson, Ch.18 Wooldridge, Appendices D and E. Heij Section 3.4 John Section 3.4 and Appendix B Verb Ch.2 and Appendix B Greene Ch. 4, 5, Appendices A, B

  • Topic3.2:SomeDistributionTheory

    2

    Objective

    To examine the distribution theory underlying hypothesis testing and interval estimation in the linear regression model.

    Assumptions

    y X e 2( ) 0 ( ) NE e E ee I 2(0, )Ne N I

    N KX is nonstochastic of rank K

    Note: Rank will be defined shortly. It implies there is no exact collinearity and that X X is nonsingular (its inverse exists).

    The assumptions e normally distributed, and X nonstochastic will be relaxed later.

  • Topic3.2:SomeDistributionTheory

    3

    I will give results in a general form and then relate them to the linear regression model.

    Normal Distribution Results

    Let 1 1

    ( , )N N NNx N V

    [The joint distribution of the elements in x is multivariate normal with mean vector and covariance matrix V.]

    Then, (1) the marginal distribution of a single element from x is

    ( , )i i iix N v See POE4, Appendix B for a discussion of joint, marginal and conditional distributions.

  • Topic3.2:SomeDistributionTheory

    4

    (2) The distribution of 11 P N NP

    y A x is

    ( , )y N A AVA We have seen the result about the mean and covariance matrix before. What we are saying that might be new is that linear functions of normal random variables are also normal.

    Where relevant

    1( )b X X X e Since 2(0, )Ne N I , it follows that (1) 2 10, ( )b N X X and 2 1, ( )b N X X

  • Topic3.2:SomeDistributionTheory

    5

    (2) 2( , )k k kkb N a and (0,1)k kkk

    b Na

    where kka is the k-th diagonal element of 1( )X X .

    (3) 2 1, ( )Rb N R R X X R where is with R J K J K . Some examples of R.

    (a) A single coefficient. (b) Returns to scale in a Cobb-Douglas production function (c) (b) and the equality of 2 input elasticities (d) All coefficients except the intercept.

  • Topic3.2:SomeDistributionTheory

    6

    Note how R picks up the correct elements of the mean vector and the covariance matrix.

    Chi-square distribution results

    Definition of a chi-square random variable.

    If 1

    (0, )NNz N I , then 2( )Nz z .

    Alternatively, if 1 2, , , Nz z z are independent standard normal random variables, then 21

    Nii z has a chi-square distribution with

    N degrees of freedom 2( )N . ( )E z z N and var( ) 2z z N

  • Topic3.2:SomeDistributionTheory

    7

    Theorem

    If 1 1

    ( , )N N NNx N V

    , and V is positive definite (nonsingular), then

    1 2( )( ) ( ) Nx V x

    To prove this result we need to digress and consider some results from matrix algebra.

  • Topic3.2:SomeDistributionTheory

    8

    Digression on matrix algebra

    Diagonalization of a square symmetric matrix

    Let A be ( )N N symmetric matrix, then there exists an orthogonal matrix P such that

    P AP where is a diagonal matrix containing the eigenvalues of A.

    An orthogonal matrix P is such that P P I and hence 1P P and PP I .

    The columns of P are called the eigenvectors of A.

    Eigenvectors and eigenvalues are also called characteristic vectors and characteristic roots.

  • Topic3.2:SomeDistributionTheory

    9

    Rank of a matrix

    The rank of a matrix is defined as the maximum number of linearly independent columns (or rows) in the matrix.

    See matrices for 470 in 2010.pdf slides 22-29 for more detail and some examples. [Other references will also have details.]

    rank min{no. of rows, no. of columns} We will mainly be concerned with the rank of a square

    symmetric matrix which is either positive definite or positive semidefinite. [positive semidefinite = nonnegative definite]

  • Topic3.2:SomeDistributionTheory

    10

    An ( )N N matrix with rank N is said to be of full rank. It is nonsingular.

    An ( )N N matrix with rank less than N is said to be of reduced rank. It is singular.

    If a matrix is positive definite, it is of full rank.

    If a matrix is positive semidefinite, and not positive definite, it is of reduced rank.

    The eigenvalues of a positive definite matrix and all positive.

    The eigenvalues of a positive semidefinite matrix are positive or zero.

  • Topic3.2:SomeDistributionTheory

    11

    The number of nonzero eigenvalues of square symmetric matrix is equal to its rank.

    Thus,

    If a matrix is positive definite, then (a) it is nonsingular, (b) it is of full rank, and (c) all its eigenvalues are positive.

    If a matrix is positive semidefinite, and not positive definite, then (a) it is singular, (b) it is not of full rank, and (c) some of its eigenvalues are positive, and some are zero; the rank of the matrix is the number of nonzero eigenvalues.

    [To get more details, Johnston and Dinardo, Appendix A is good.]

  • Topic3.2:SomeDistributionTheory

    12

    Where relevant:

    Covariance matrices are positive semidefinite. They are also positive definite unless one of the random variables is a linear combination of the others.

    Matrices like X X and 1X V X are positive definite unless we have exact collinearity.

  • Topic3.2:SomeDistributionTheory

    13

    The square root of a positive definite matrix

    Return to the diagonalization of a square symmetric matrix:

    Let A be ( )N N symmetric matrix, then there exists an orthogonal matrix P such that

    P AP Now suppose A is positive definite. Then the diagonal elements

    of are positive. Define 1 2 as the matrix obtained by getting the square roots of

    the diagonal elements of . Then, 1 2 1 2 , and 1 2 1 2P AP

  • Topic3.2:SomeDistributionTheory

    14

    1 2 1 2P AP 1 2 1 2PP APP P P

    A L L where 1 2L P . We have shown that for any positive definite

    matrix A, we can always find a matrix L such that A L L . We can think of L as the matrix square root of A.

    1 2L P is not the only matrix with this property. There other

    matrices, H say, such that A H H . One that is commonly used is the Choelsky decomposition of A. Most software has a function for computing the Choelsky decomposition.

  • Topic3.2:SomeDistributionTheory

    15

    Another result: 1 2 1 2P AP

    1 2 1 2 1 2 1 2 1 2 1 2P AP QAQ I

    where 1 2Q P . For any positive definite matrix A, there exists a matrix Q such that QAQ I .

    In all the above relationships, Q, H, L and P are all nonsingular (of full rank).

    End of digression.

  • Topic3.2:SomeDistributionTheory

    16

    We can now prove that:

    If 1 1

    ( , )N N NNx N V

    , and V is positive definite (nonsingular), then

    1 2( )( ) ( ) Nx V x

    If V is positive definite, then 1V is positive definite. (We can prove this result by showing that the eigenvalues of 1V are the reciprocals of the eigenvalues of V.)

    Then, there exists a matrix Q such that 1V Q Q . Thus, 1( ) ( ) ( ) ( )x V x x Q Q x y y where ( )y Q x .

  • Topic3.2:SomeDistributionTheory

    17

    The mean and covariance matrix of ( )y Q x are ( ) ( ) ( ( ) ) ( ) 0E y E Q x Q E x Q

    cov( ) ( ) ( )( )

    ( )( )

    cov( )

    y E yy E Q x x Q

    QE x x Q

    Q x Q

    QVQ

    I

    The last line follows because 1V Q Q implies 11V Q Q and hence 11QVQ QQ Q Q I .

  • Topic3.2:SomeDistributionTheory

    18

    Now, y is normally distributed because x is normally distributed and ( )y Q x is a linear transformation of x. Thus, (0, )y N I , and

    1 2( )( ) ( ) Ny y x V x

    Where relevant

    (1) Since 2 1, ( )b N X X it follows that

    2( )2( ) ( )

    Kb X X b

  • Topic3.2:SomeDistributionTheory

    19

    (2) Since 2 1, ( )J KR b N R R X X R , it follows that

    11

    2( )2

    ( ) ( ) ( )J

    Rb R R X X R Rb R

    (We assume there are no redundant restrictions which implies R is of rank J and 1( )R X X R is has full rank.)

    Note: The Wald test statistic for testing 0 :H R r is given by

    11

    2

    ( ) ( ) ( )

    Rb r R X X R Rb rW

    When 0H is true W converges in distribution to 2( )J .

  • Topic3.2:SomeDistributionTheory

    20

    Notes:

    1. The above results (1), and (2) could be used for interval estimation and testing hypotheses about if 2 was known. It is, of course, unknown.

    2. One solution to this dilemma is to replace 2 by its estimate 2 . Doing so leads to the Wald statistic that has the chi-square distribution asymptotically (in large samples).

    3. Another solution is to derive a slightly modified statistic that has the F-distribution in finite samples.

    We will now proceed in that direction.

  • Topic3.2:SomeDistributionTheory

    21

    Theorem

    If 1

    (0, )Nz N I , and A is an idempotent matrix of rank G, then

    2( )Gz Az

    Another matrix result is required to prove this theorem.

  • Topic3.2:SomeDistributionTheory

    22

    Another digression on matrix algebra

    If A is an idempotent matrix, its eigenvalues are 1 or 0, and rank( ) trace( )A A .

    Proof: If P is the orthogonal matrix that diagonalizes A, we have

    P AP P APP AP

    P AAP P AP

    Thus,

  • Topic3.2:SomeDistributionTheory

    23

    If each eigenvalue is such that 2i i , then 0 or 1i . Now, tr( ) tr( ) tr( ) tr( )P AP PP A A This result is true in general, not just for idempotent A.

    When A is idempotent

    tr( ) tr( )

    the number eigenvalues of that are equal to 1

    the number nonzero eigenvalues of

    the rank of

    A

    A

    A

    A

    End of digression

  • Topic3.2:SomeDistributionTheory

    24

    Return to theorem

    If 1

    (0, )Nz N I , and A is an idempotent matrix of rank G, then

    2( )Gz Az Proof:

    Let P be the orthogonal matrix that diagonalizes A.

    And let z Px . Then,

    21

    00 0

    GG

    iiN G

    Iz Az x P APx x x x

    If we can show that (0, )x N I , the result follows.

  • Topic3.2:SomeDistributionTheory

    25

    x P z ( ) ( ) 0E x P E z cov( ) cov( ) Nx P z P P IP I

    Thus, (0, )x N I and 2 2( )1

    G

    i Gi

    z Az x

    .

  • Topic3.2:SomeDistributionTheory

    26

    Where relevant

    In the model y X e 2(0, )Ne N I we previously showed that the least squares residuals e y Xb

    were such that

    1 ( )Ne e e Me e I X X X X e

    where 1( )NM I X X X X is an idempotent matrix with trace N K .

    Since 2(0, )Ne N I , it follows that (0, )Ne N I , and

    2( )2

    N K

    e e e eM

  • Topic3.2:SomeDistributionTheory

    27

    F-Distribution

    An F random variable is defined as the ratio of two independent chi-square random variables, with each divided by its degrees of freedom.

    21 122 2

    dfFdf

    The two chi-square random variables of interest to us are

    11

    2( )2

    ( ) ( ) ( )J

    Rb R R X X R Rb R

    2( )2

    N Ke e

  • Topic3.2:SomeDistributionTheory

    28

    Their ratio after dividing each by its degrees of freedom is

    11

    2

    2

    11

    ( , )

    ( ) ( ) ( )

    ( )

    ( ) ( ) ( ) ( ) J N K

    Rb R R X X R Rb RJ

    F e e N K

    Rb R R X X R Rb R JF

    e e N K

    We have eliminated the unknown 2 . Note 2 e eN K

    However, we still need to prove the two chi-square distributions are independent.

  • Topic3.2:SomeDistributionTheory

    29

    The two quadratic forms in b and e will be independent if b and e are independent.

    Since b and e are normally distributed, they will be independent if the matrix containing the covariances between all their elements is a zero matrix.

    Using the results 1( )b X X X e and 1 ( )e I X X X X e , the covariances between the elements in b and those in e are given by

    1 12 1 1 1

    ( ) ( ) ( )

    ( ) ( ) ( )

    0

    E b e X X X E ee I X X X X

    X X X X X X X X X X

  • Topic3.2:SomeDistributionTheory

    30

    Testing hypotheses with 1 or more linear constraints 11

    ( , )

    ( ) ( ) ( ) ( ) J N K

    Rb R R X X R Rb R JF

    e e N K

    Consider testing 0 :H R r against 0 :H R r . When 0H is true,

    11

    ( , )

    ( ) ( ) ( ) ( ) J N K

    Rb r R X X R Rb r JF

    e e N K

    11

    2( )

    ( ) ( ) ( ) ( )

    dJ

    Rb r R X X R Rb re e N K

    These are the two test statistics given by EViews.

  • Topic3.2:SomeDistributionTheory

    31

    Example

  • Topic3.2:SomeDistributionTheory

    32

    We will consider part (c).

    0 1: :H R r H R r

    1

    2

    3

    4

    0 1 1 0 00 1 1 1 1

    R r

    2 30

    2 3 4

    0:

    1H

  • Topic3.2:SomeDistributionTheory

    33

  • Topic3.2:SomeDistributionTheory

    34

    Testing whether all coefficients except the intercept are zero

    In this case 10 KR I and 0r . In terms of the notation in Tutorial 6, QA1. 0 : 0sH It can be shown that

    11

    2

    ( ) ( ) ( ) ( 1) ( )

    ( ) ( ) ( 1)

    s s s s s s

    Rb r R X X R Rb r Ke e N K

    b X DX b K

    When 0H is true, ( 1, )2( 1)

    s s s s

    K N Kb X DX b K F

  • Topic3.2:SomeDistributionTheory

    35

    Now, from Tutorial 6

    2 21 1

    s s s s

    N N

    i ii i

    b X DX b y Dy e e

    y y e

    which is the same as

    regression sum of squares = total sum of squares sum of squared errors

    SSR = TSS SSE

    This F-statistic can be written as

    ( 1, )( ) ( 1)

    ( ) K N KTSS SSE K F

    SSE N K

  • Topic3.2:SomeDistributionTheory

    36

    It can be found in the EViews output as follows:

  • Topic3.2:SomeDistributionTheory

    37

    Testing hypotheses with 1 linear constraint: the t-distribution

    Reconsider the example and look at part (b).

    1

    2

    3

    4

    0 1 1 1 1R r

    When doing this test, we get a t-value as well as the F and 2 values. How is it calculated?

  • Topic3.2:SomeDistributionTheory

    38

  • Topic3.2:SomeDistributionTheory

    39

    In this case we can write 11

    2 1

    2

    1

    ( ) ( ) ( ) ( ) ( ) ( ) ( )

    ( ) ( )

    Rb R R X X R Rb R J Rb R Rb Re e N K R X X R

    Rb RR X X R

    We will see that

    ( )1

    ( ) ( )

    N KRb R tR X X R

    and hence that 2(1, ) ( )N K N KF t .

  • Topic3.2:SomeDistributionTheory

    40

    Definition

    The ratio of a (0,1)N random variable to the square root of an independent 2( )G random variable divided by its degrees of freedom is a ( )Gt random variable (a t random variable with G degrees of freedom.)

    ( ) 2( )

    (0,1)G

    G

    NtG

    When 1J ,

    2 1, ( )Rb N R R X X R 1

    0,1( )

    Rb R NR X X R

  • Topic3.2:SomeDistributionTheory

    41

    1

    0,1( )

    Rb R NR X X R

    22( )2 2

    ( )N K

    e e N K

    1

    ( ) 2

    2

    1

    ( )( ) ( )

    ( )

    N K

    Rb RR X X Rt

    N K N K

    Rb RR X X R

    When 0 :H R r is true ( ) 1 ( )N KRb rt

    R X X R

  • Topic3.2:SomeDistributionTheory

    42

    ( ) 1 ( )N K

    Rb rtR X X R

    is the t-value computed by EViews when doing the Wald test.

    Using the 3rd coefficient as an example, a routine case is where 0 0 1 0R and 0r , in which case we have

    3

    33bta

    where 33a is the 3rd diagonal element of 1( )X X .

    These are the values given routinely on the EViews output.

  • Topic3.2:SomeDistributionTheory

    43

    The general F test written as the difference between restricted and unrestricted sums of squared errors

    11( ) ( ) ( ) ( ) ( )

    SSE SSE JRb r R X X R Rb r JF

    e e N K SSE N K

    where SSE is the sum of squared errors from the model estimated under the restriction that R r , and SSE is the sum of squared errors from the unrestricted model.

    For a proof, see Johnston and Dinardo, p.90-99., or Greene, p.83-92. We have already seen how this is true for the special case 0 : 0sH . We will demonstrate by example for another case.

  • Topic3.2:SomeDistributionTheory

    44

    The restricted model for part (c) where 2 3 and 2 3 4 1 is

    1 2 2ln lnPROD AREA LABOR eFERT FERT

    Its EViews output follows

  • Topic3.2:SomeDistributionTheory

    45

    40.60791SSE From slide 36, 40.56536SSE

  • Topic3.2:SomeDistributionTheory

    46

    40.60791 40.56536 2 0.182540.56536 (352 4)

    F

    Which agrees with the result on slide 33


Recommended