May 2004 Prof.VuThieu 1 Basic Econometrics Basic Econometrics Course Leader Prof. Dr.Sc VuThieu
Transcript
1. Basic Econometrics Course Leader Prof. Dr.Sc VuThieu May
2004 Prof.VuThieu
2. Basic Econometrics Introduction : What is Econometrics? May
2004 Prof.VuThieu
3. Introduction What is Econometrics ?
Definition 1 : Economic Measurement
Definition 2 : Application of the mathematical statistics to
economic data in order to lend empirical support to the economic
mathematical models and obtain numerical results ( Gerhard Tintner,
1968 )
May 2004 Prof.VuThieu
4. Introduction What is Econometrics ?
Definition 3 : The quantitative analysis of actual economic
phenomena based on concurrent development of theory and
observation, related by appropriate methods of inference
( P.A.Samuelson, T.C.Koopmans and J.R.N.Stone, 1954 )
May 2004 Prof.VuThieu
5. Introduction What is Econometrics ?
Definition 4 : The social science
which applies economics, mathematics and statistical inference
to the analysis of economic phenomena ( By Arthur S. Goldberger,
1964 )
Definition 5 : The empirical determination of economic laws (
By H. Theil, 1971 )
May 2004 Prof.VuThieu
6. Introduction What is Econometrics ?
Definition 6 : A conjunction of economic theory and actual
measurements, using the theory and technique of statistical
inference as a bridge pier ( By T.Haavelmo, 1944 )
And the others
May 2004 Prof.VuThieu
7. May 2004 Prof.VuThieu Econometrics Economic Theory
Mathematical Economics Economic Statistics Mathematic
Statistics
8. Introduction Why a separate discipline ?
Economic theory makes statements that are mostly qualitative in
nature, while econometrics gives empirical content to most economic
theory
Mathematical economics is to express economic theory in
mathematical form without empirical verification of the theory,
while econometrics is mainly interested in the later
May 2004 Prof.VuThieu
9. Introduction Why a separate discipline ?
Economic Statistics is mainly concerned with collecting,
processing and presenting economic data. It does not being
concerned with using the collected data to test economic
theories
Mathematical statistics provides many of tools for economic
studies, but econometrics supplies the later with many special
methods of quantitative analysis based on economic data
May 2004 Prof.VuThieu
10. May 2004 Prof.VuThieu Econometrics Economic Theory
Mathematical Economics Economic Statistics Mathematic
Statistics
11. Introduction Methodology of Econometrics
Statement of theory or hypothesis:
Keynes stated: Consumption increases as income increases, but
not as much as the increase in income. It means that The marginal
propensity to consume (MPC) for a unit change in income is grater
than zero but less than unit
May 2004 Prof.VuThieu
12. Introduction Methodology of Econometrics (2) Specification
of the mathematical model of the theory Y = 1 + 2 X ; 0 < 2 <
1 Y= consumption expenditure X= income 1 and 2 are parameters; 1 is
intercept, and 2 is slope coefficients May 2004 Prof.VuThieu
13. Introduction Methodology of Econometrics (3) Specification
of the econometric model of the theory Y = 1 + 2 X + u ; 0 < 2
< 1; Y = consumption expenditure; X = income; 1 and 2 are
parameters; 1 is intercept and 2 is slope coefficients; u is
disturbance term or error term. It is a random or stochastic
variable May 2004 Prof.VuThieu
14. Introduction Methodology of Econometrics (4) Obtaining Data
(See Table 1.1, page 6) Y= Personal consumption expenditure X=
Gross Domestic Product all in Billion US Dollars May 2004
Prof.VuThieu
15. Introduction Methodology of Econometrics (4) Obtaining Data
May 2004 Prof.VuThieu Year X Y 1980 1981 1982 1983 1984 1985 1986
1987 1988 1989 1990 1991 2447.1 2476.9 2503.7 2619.4 2746.1 2865.8
2969.1 3052.2 3162.4 3223.3 3260.4 3240.8 3776.3 3843.1 3760.3
3906.6 4148.5 4279.8 4404.5 4539.9 4718.6 4838.0 4877.5 4821.0
16. Introduction Methodology of Econometrics (5) Estimating the
Econometric Model Y^ = - 231.8 + 0.7194 X (1.3.3) MPC was about
0.72 and it means that for the sample period when real income
increases 1 USD, led ( on average ) real consumption expenditure
increases of about 72 cents Note : A hat symbol (^) above one
variable will signify an estimator of the relevant population value
May 2004 Prof.VuThieu
17. Introduction Methodology of Econometrics (6) Hypothesis
Testing Are the estimates accord with the expectations of the
theory that is being tested? Is MPC < 1 statistically? If so, it
may support Keynes theory. Confirmation or refutation of economic
theories based on sample evidence is object of Statistical
Inference (hypothesis testing) May 2004 Prof.VuThieu
18. Introduction Methodology of Econometrics
(7) Forecasting or Prediction
With given future value(s) of X, what is the future value(s) of
Y?
GDP=$6000Bill in 1994, what is the forecast consumption
expenditure?
Y^= - 231.8+0.7196(6000) = 4084.6
Income Multiplier M = 1/(1 MPC) (=3.57). decrease (increase) of
$1 in investment will eventually lead to $3.57 decrease (increase)
in income
May 2004 Prof.VuThieu
19. Introduction Methodology of Econometrics (8) Using model
for control or policy purposes Y=4000= -231.8+0.7194 X X 5882 MPC =
0.72, an income of $5882 Bill will produce an expenditure of $4000
Bill. By fiscal and monetary policy, Government can manipulate the
control variable X to get the desired level of target variable Y
May 2004 Prof.VuThieu
20. Introduction Methodology of Econometrics
Figure 1.4: Anatomy of economic modelling
1) Economic Theory
2) Mathematical Model of Theory
3) Econometric Model of Theory
4) Data
5) Estimation of Econometric Model
6) Hypothesis Testing
7) Forecasting or Prediction
8) Using the Model for control or policy purposes
May 2004 Prof.VuThieu
21. May 2004 Prof.VuThieu Economic Theory Mathematic Model
Econometric Model Data Collection Estimation Hypothesis Testing
Forecasting Application in control or policy studies
22. Basic Econometrics Chapter 1 : THE NATURE OF REGRESSION
ANALYSIS May 2004 Prof.VuThieu
23. 1-1. Historical origin of the term Regression
The term REGRESSION was introduced by Francis Galton
Tendency for tall parents to have tall children and for short
parents to have short children, but the average height of children
born from parents of a given height tended to move (or regress)
toward the average height in the population as a whole (F. Galton,
Family Likeness in Stature )
May 2004 Prof.VuThieu
24. 1-1. Historical origin of the term Regression
Galtons Law was confirmed by Karl Pearson: The average height
of sons of a group of tall fathers < their fathers height. And
the average height of sons of a group of short fathers > their
fathers height. Thus regressing tall and short sons alike toward
the average height of all men. (K. Pearson and A. Lee, On the law
of Inheritance )
By the words of Galton, this was Regression to mediocrity
May 2004 Prof.VuThieu
25. 1-2. Modern Interpretation of Regression Analysis
The modern way in interpretation of Regression: Regression
Analysis is concerned with the study of the dependence of one
variable ( The Dependent Variable ) , on one or more other
variable(s) ( The Explanatory Variable ) , with a view to
estimating and/or predicting the (population) mean or average value
of the former in term of the known or fixed (in repeated sampling)
values of the latter.
Examples : (pages 16-19)
May 2004 Prof.VuThieu
26. Dependent Variable Y; Explanatory Variable Xs
1. Y = Sons Height; X = Fathers Height
2. Y = Height of boys; X = Age of boys
3. Y = Personal Consumption Expenditure
X = Personal Disposable Income
4. Y = Demand; X = Price
5. Y = Rate of Change of Wages
X = Unemployment Rate
6. Y = Money/Income; X = Inflation Rate
7. Y = % Change in Demand; X = % Change in the
advertising budget
8. Y = Crop yield; Xs = temperature, rainfall, sunshine,
fertilizer
May 2004 Prof.VuThieu
27. 1-3. Statistical vs. Deterministic Relationships
In regression analysis we are concerned with STATISTICAL
DEPENDENCE among variables (not Functional or Deterministic), we
essentially deal with RANDOM or STOCHASTIC variables (with the
probability distributions)
May 2004 Prof.VuThieu
28. 1-4. Regression vs. Causation:
Regression does not necessarily imply causation . A statistical
relationship cannot logically imply causation. A statistical
relationship, however strong and however suggestive, can never
establish causal connection: our ideas of causation must come from
outside statistics, ultimately from some theory or other ( M.G.
Kendal and A. Stuart, The Advanced Theory of Statistics )
May 2004 Prof.VuThieu
29. 1-5. Regression vs. Correlation
Correlation Analysis: the primary objective is to measure the
strength or degree of linear association between two variables
(both are assumed to be random)
Regression Analysis: we try to estimate or predict the average
value of one variable (dependent, and assumed to be stochastic) on
the basis of the fixed values of other variables (independent, and
non-stochastic)
May 2004 Prof.VuThieu
30. 1-6. Terminology and Notation
Dependent Variable
Explained Variable
Predictand
Regressand
Response
Endogenous
Explanatory Variable(s)
Independent Variable(s)
Predictor(s)
Regressor(s)
Stimulus or control variable(s)
Exogenous(es)
May 2004 Prof.VuThieu
31. 1-7. The Nature and Sources of Data for Econometric
Analysis
1) Types of Data :
Time series data;
Cross-sectional data;
Pooled data
2) The Sources of Data
3) The Accuracy of Data
May 2004 Prof.VuThieu
32. 1-8. Summary and Conclusions
1) The key idea behind regression analysis is the statistic
dependence of one variable on one or more other variable(s)
2) The objective of regression analysis is to estimate and/or
predict the mean or average value of the dependent variable on
basis of known (or fixed) values of explanatory variable(s)
May 2004 Prof.VuThieu
33. 1-8. Summary and Conclusions
3) The success of regression depends on the available and
appropriate data
4) The researcher should clearly state the sources of the data
used in the analysis, their definitions, their methods of
collection, any gaps or omissions and any revisions in the
data
May 2004 Prof.VuThieu
34. Basic Econometrics Chapter 2 : TWO-VARIABLE REGRESSION
ANALYSIS: Some basic Ideas May 2004 Prof.VuThieu
35. 2-1. A Hypothetical Example
Total population: 60 families
Y=Weekly family consumption expenditure
X=Weekly disposable family income
60 families were divided into 10 groups of approximately the
same income level
Figure 2-1 shows the population regression line (curve). It is
the
regression of Y on X
Population regression curve is the
locus of the conditional means or expectations of the dependent
variable
for the fixed values of the explanatory variable X
(Fig.2-2)
May 2004 Prof.VuThieu
39. 2-2. The concepts of population regression function (PRF)
E(Y X=X i ) = f(X i ) is Population Regression Function (PRF)
or
Population Regression (PR)
In the case of linear function we have linear population
regression function (or equation or model)
E(Y X=X i ) = f(X i ) = 1 + 2 X i
May 2004 Prof.VuThieu
40. 2-2. The concepts of population regression function (PRF)
E(Y X=X i ) = f(X i ) = 1 + 2 X i
1 and 2 are regression coefficients, 1 is intercept and 2 is
slope coefficient
Linearity in the Variables
Linearity in the Parameters
May 2004 Prof.VuThieu
41. 2-4. Stochastic Specification of PRF
U i = Y - E(Y X=X i ) or Y i = E(Y X=X i ) + U i
U i = Stochastic disturbance or stochastic error term. It is
nonsystematic component
Component E(Y X=X i ) is systematic or deterministic. It is the
mean consumption expenditure of all the families with the same
level of income
The assumption that the regression line passes through the
conditional means of Y implies that E(U i X i ) = 0
May 2004 Prof.VuThieu
42. 2-5. The Significance of the Stochastic Disturbance Term
U i = Stochastic Disturbance Term is a surrogate for all
variables that are omitted from the model but they collectively
affect Y
Many reasons why not include such variables into the model as
follows:
May 2004 Prof.VuThieu
43. 2-5. The Significance of the Stochastic Disturbance Term
Why not include as many as variable into the model (or the
reasons for using u i )
+ Vagueness of theory
+ Unavailability of Data
+ Core Variables vs. Peripheral Variables
+ Intrinsic randomness in human behavior
+ Poor proxy variables
+ Principle of parsimony
+ Wrong functional form
May 2004 Prof.VuThieu
44. 2-6. The Sample Regression Function (SRF)
Table 2-4: A random sample from the population
Y X
------------------
70 80
65 100
90 120
95 140
110 160
115 180
120 200
140 220
155 240
150 260
------------------
Table 2-5: Another random sample from the population
Y X
-------------------
55 80
88 100
90 120
80 140
118 160
120 180
145 200
135 220
145 240
175 260
--------------------
May 2004 Prof.VuThieu
45. May 2004 Prof.VuThieu SRF1 SRF2 Weekly Consumption
Expenditure (Y) Weekly Income (X)
46. 2-6. The Sample Regression Function (SRF)
Fig.2-3: SRF1 and SRF 2
Y^ i = ^ 1 + ^ 2 X i (2.6.1)
Y^ i = estimator of E(Y X i )
^ 1 = estimator of 1
^ 2 = estimator of 2
Estimate = A particular numerical value obtained by the
estimator in an application
SRF in stochastic form: Y i = ^ 1 + ^ 2 X i + u^ i
or Y i = Y^ i + u^ i (2.6.3)
May 2004 Prof.VuThieu
47. 2-6. The Sample Regression Function (SRF)
Primary objective in regression analysis is to estimate the PRF
Y i = 1 + 2 X i + u i on the basis of the SRF Y i = ^ 1 + ^ 2 X i +
e i and how to construct SRF so that ^ 1 close to 1 and ^ 2 close
to 2 as much as possible
May 2004 Prof.VuThieu
48. 2-6. The Sample Regression Function (SRF)
Population Regression Function PRF
Linearity in the parameters
Stochastic PRF
Stochastic Disturbance Term u i plays a critical role in
estimating the PRF
Sample of observations from population
Stochastic Sample Regression Function SRF used to estimate the
PRF
May 2004 Prof.VuThieu
49. 2-7. Summary and Conclusions
The key concept underlying regression analysis is the concept
of the population regression function (PRF).
This book deals with linear PRFs: linear in the unknown
parameters. They may or may not linear in the variables.
May 2004 Prof.VuThieu
50. 2-7. Summary and Conclusions
For empirical purposes, it is the stochastic PRF that matters.
The stochastic disturbance term u i plays a critical role in
estimating the PRF.
The PRF is an idealized concept, since in practice one rarely
has access to the entire population of interest. Generally, one has
a sample of observations from population and use the stochastic
sample regression (SRF) to estimate the PRF.
May 2004 Prof.VuThieu
51. Basic Econometrics Chapter 3 : TWO-VARIABLE REGRESSION
MODEL: The problem of Estimation May 2004 Prof.VuThieu
52. 3-1. The method of ordinary least square (OLS)
Least-square criterion:
Minimizing U^ 2 i = (Y i Y^ i ) 2
= (Y i - ^ 1 - ^ 2 X) 2 (3.1.2)
Normal Equation and solving it for ^ 1 and ^ 2 = Least-square
estimators [ See (3.1.6)(3.1.7) ]
Numerical and statistical properties of OLS are as
follows:
May 2004 Prof.VuThieu
53. 3-1. The method of ordinary least square (OLS)
OLS estimators are expressed solely in terms of observable
quantities. They are point estimators
The sample regression line passes through sample means of X and
Y
The mean value of the estimated Y^ is equal to the mean value
of the actual Y: E(Y) = E(Y^)
The mean value of the residuals U^ i is zero: E(u^ i )=0
u^ i are uncorrelated with the predicted Y^ i and with X i :
That are u^ i Y^ i = 0; u^ i X i = 0
May 2004 Prof.VuThieu
54. 3-2. The assumptions underlying the method of least squares
Ass 1: Linear regression model
(in parameters)
Ass 2: X values are fixed in repeated
sampling
Ass 3: Zero mean value of u i : E(u i X i )=0
Ass 4: Homoscedasticity or equal
variance of u i : Var (u i X i ) = 2
[VS. Heteroscedasticity]
Ass 5: No autocorrelation between the
disturbances: Cov(u i ,u j X i ,X j ) = 0
with i # j [VS. Correlation, + or - ]
May 2004 Prof.VuThieu
55. 3-2. The assumptions underlying the method of least squares
Ass 6: Zero covariance between u i and X i
Cov(u i , X i ) = E(u i , X i ) = 0
Ass 7: The number of observations n must be greater than the
number of parameters to be estimated
Ass 8: Variability in X values. They must not all be the
same
Ass 9: The regression model is correctly specified
Ass 10: There is no perfect multicollinearity between Xs
May 2004 Prof.VuThieu
56. 3-3. Precision or standard errors of least-squares
estimates
In statistics the precision of an
estimate is measured by its standard
error (SE)
var( ^ 2 ) = 2 / x 2 i (3.3.1)
se( ^ 2 ) = Var( ^ 2 ) (3.3.2)
var( ^ 1 ) = 2 X 2 i / n x 2 i (3.3.3)
se( ^ 1 ) = Var( ^ 1 ) (3.3.4)
^ 2 = u^ 2 i / (n - 2) (3.3.5)
^ = ^ 2 is standard error of the
estimate
May 2004 Prof.VuThieu
57. 3-3. Precision or standard errors of least-squares
estimates
Features of the variance:
+ var( ^ 2 ) is proportional to 2 and inversely proportional to
x 2 i
+ var( ^ 1 ) is proportional to 2 and X 2 i but inversely
proportional to x 2 i and the sample size n.
+ cov ( ^ 1 , ^ 2 ) = - var( ^ 2 ) shows the independence
between ^ 1 and ^ 2
May 2004 Prof.VuThieu
58. 3-4. Properties of least-squares estimators: The
Gauss-Markov Theorem
An OLS estimator is said to be BLUE if :
+ It is linear , that is, a linear function of a random
variable, such as the dependent variable Y in the regression
model
+ It is unbiased , that is, its average or expected value, E( ^
2 ), is equal to the true value 2
+ It has minimum variance in the class of all such linear
unbiased estimators
An unbiased estimator with the least variance is known as an
efficient estimator
May 2004 Prof.VuThieu
59. 3-4. Properties of least-squares estimators: The
Gauss-Markov Theorem
Gauss- Markov Theorem:
Given the assumptions of the classical linear regression model,
the least-squares estimators, in class of unbiased linear
estimators, have minimum variance, that is, they are BLUE
May 2004 Prof.VuThieu
60. 3-5. The coefficient of determination r 2 : A measure of
Goodness of fit
Y i = i + i or
Y i - = i - i + i or
y i = i + i (Note: = )
Squaring on both side and summing =>
y i 2 = 2 x 2 i + 2 i ; or
TSS = ESS + RSS
May 2004 Prof.VuThieu
61. 3-5. The coefficient of determination r 2 : A measure of
Goodness of fit
TSS = y i 2 = Total Sum of Squares
ESS = Y^ i 2 = ^ 2 2 x 2 i =
Explained Sum of Squares
RSS = u^ 2 I = Residual Sum of
Squares
ESS RSS
1 = -------- + -------- ; or
TSS TSS
RSS RSS
1 = r 2 + ------- ; or r 2 = 1 - -------
TSS TSS
May 2004 Prof.VuThieu
62. 3-5. The coefficient of determination r 2 : A measure of
Goodness of fit
r 2 = ESS/TSS
is coefficient of determination, it measures the proportion or
percentage of the total variation in Y explained by the
regression
Model
0 r 2 1;
r = r 2 is sample correlation coefficient
Some properties of r
May 2004 Prof.VuThieu
63. 3-5. The coefficient of determination r 2 : A measure of
Goodness of fit 3-6. A numerical Example (pages 80-83) 3-7.
Illustrative Examples (pages 83-85) 3-8. Coffee demand Function
3-9. Monte Carlo Experiments (page 85) 3-10. Summary and
conclusions (pages 86-87) May 2004 Prof.VuThieu
64. Basic Econometrics Chapter 4 : THE NORMALITY ASSUMPTION:
Classical Normal Linear Regression Model (CNLRM) May 2004
Prof.VuThieu
65. 4-2.The normality assumption
CNLR assumes that each u i is distributed normally u i N(0, 2 )
with:
Mean = E(u i ) = 0 Ass 3
Variance = E(u 2 i ) = 2 Ass 4
Cov(u i , u j ) = E(u i , u j ) = 0 (i#j) Ass 5
Note : For two normally distributed variables, the zero
covariance or correlation means independence of them, so u i and u
j are not only uncorrelated but also independently distributed.
Therefore u i NID(0, 2 ) is Normal and
Independently Distributed
May 2004 Prof.VuThieu
66. 4-2.The normality assumption
Why the normality assumption?
With a few exceptions, the distribution of sum of a large
number of independent and identically distributed random variables
tends to a normal distribution as the number of such variables
increases indefinitely
If the number of variables is not very large or they are not
strictly independent, their sum may still be normally
distributed
May 2004 Prof.VuThieu
67. 4-2.The normality assumption
Why the normality assumption?
Under the normality assumption for u i , the OLS estimators ^ 1
and ^ 2 are also normally distributed
The normal distribution is a comparatively simple distribution
involving only two parameters (mean and variance)
May 2004 Prof.VuThieu
68. 4-3. Properties of OLS estimators under the normality
assumption
With the normality assumption the OLS estimators ^ 1 , ^ 2 and
^ 2 have the following properties:
1. They are unbiased
2. They have minimum variance. Combined 1 and 2, they are
efficient estimators
3. Consistency, that is, as the sample size increases
indefinitely, the estimators converge to their true population
values
May 2004 Prof.VuThieu
69. 4-3. Properties of OLS estimators under the normality
assumption
4. ^ 1 is normally distributed
N( 1 , ^ 1 2 )
And Z = ( ^ 1 - 1 )/ ^ 1 is N(0,1)
5. ^ 2 is normally distributed N( 2 , ^ 2 2 )
And Z = ( ^ 2 - 2 )/ ^ 2 is N(0,1)
6. (n-2) ^ 2 / 2 is distributed as the
2 (n-2)
May 2004 Prof.VuThieu
70. 4-3. Properties of OLS estimators under the normality
assumption
7. ^ 1 and ^ 2 are distributed independently of ^ 2 . They have
minimum variance in the entire class of unbiased estimators,
whether linear or not. They are best unbiased estimators (BUE)
8. Let u i is N(0, 2 ) then Y i is
N[E(Y i ); Var(Y i )] = N[ 1 + 2 X i ; 2 ]
May 2004 Prof.VuThieu
71. Some last points of chapter 4
4-4. The method of Maximum likelihood (ML)
ML is point estimation method with some
stronger theoretical properties than OLS
(Appendix 4.A on pages 110-114)
The estimators of coefficients s by OLS and ML are
identical. They are true estimators of the s
(ML estimator of 2 ) = u^ i 2 /n (is biased estimator)
(OLS estimator of 2 ) = u^ i 2 /n-2 (is unbiased
estimator)
When sample size (n) gets larger the two estimators tend to be
equal
May 2004 Prof.VuThieu
72. Some last points of chapter 4
4-5. Probability distributions related
to the Normal Distribution: The t, 2 ,
and F distributions
See section (4.5) on pages 107-108
with 8 theorems and Appendix A, on
pages 755-776
4-6. Summary and Conclusions
See 10 conclusions on pages 109-110
May 2004 Prof.VuThieu
73. Basic Econometrics Chapter 5 : TWO-VARIABLE REGRESSION:
Interval Estimation and Hypothesis Testing May 2004
Prof.VuThieu
74. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-1. Statistical Prerequisites
S ee Appendix A with key concepts such as probability,
probability distributions, Type I Error, Type II Error,level of
significance, power of a statistic test, and confidence
interval
May 2004 Prof.VuThieu
75. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-2. Interval estimation: Some basic Ideas
How close is, say, ^ 2 to 2 ?
Pr ( ^ 2 - 2 ^ 2 + ) = 1 - (5.2.1)
Random interval ^ 2 - 2 ^ 2 +
if exits, it known as confidence interval
^ 2 - is lower confidence limit
^ 2 + is upper confidence limit
May 2004 Prof.VuThieu
76. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-2. Interval estimation: Some basic Ideas
(1 - ) is confidence coefficient,
0 < < 1 is significance level
Equation (5.2.1) does not mean that the Pr of 2 lying between
the given limits is (1 - ), but the Pr of constructing an interval
that contains 2 is (1 - )
( ^ 2 - , ^ 2 + ) is random interval
May 2004 Prof.VuThieu
77. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-2. Interval estimation: Some basic Ideas
In repeated sampling, the intervals will enclose, in (1 - )*100
of the cases, the true value of the parameters
For a specific sample, can not say that the probability is (1 -
) that a given fixed interval includes the true 2
If the sampling or probability distributions of the estimators
are known, one can make confidence interval statement like
(5.2.1)
May 2004 Prof.VuThieu
78. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
80. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-4. Confidence Intervals for 2
Pr [(n-2) ^ 2 / 2 /2 2 (n-2) ^ 2 / 2 1- /2 ] = 1-
(5.4.3)
The interpretation of this interval is: If we establish (1- )
confidence limits on 2 and if we maintain a priori that these
limits will include true 2 , we shall be right in the long run (1-
) percent of the time
May 2004 Prof.VuThieu
81. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-5. Hypothesis Testing: General Comments
The stated hypothesis is known as the
null hypothesis: H o
The H o is tested against and alternative
hypothesis: H 1
5-6. Hypothesis Testing: The confidence interval approach
One-sided or one-tail Test
H 0 : 2 * versus H 1 : 2 > *
May 2004 Prof.VuThieu
82. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
Two-sided or two-tail Test
H 0 : 2 = * versus H 1 : 2 # *
^ 2 - t /2 se( ^ 2 ) 2 ^ 2 + t /2 se( ^ 2 ) values of 2 lying
in this interval are plausible under H o with 100*(1- )%
confidence.
If 2 lies in this region we do not reject H o (the finding is
statistically insignificant )
If 2 falls outside this interval, we reject H o (the finding is
statistically significant )
May 2004 Prof.VuThieu
83. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-7. Hypothesis Testing:
The test of significance approach
A test of significance is a procedure by which sample results
are used to verify the truth or falsity of a null hypothesis
Testing the significance of regression coefficient: The
t-test
84. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-7. Hypothesis Testing: The test of significance approach
Table 5-1: Decision Rule for t-test of significance
May 2004 Prof.VuThieu Type of Hypothesis H 0 H 1 Reject H 0 if
Two-tail 2 = 2 * 2 # 2 * |t| > t /2,df Right-tail 2 2 * 2 > 2
* t > t ,df Left-tail 2 2 * 2 < 2 * t < - t ,df
85. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-7. Hypothesis Testing: The test of significance approach
Testing the significance of 2 : The 2 Test
Under the Normality assumption we have:
^ 2
2 = (n-2) ------- ~ 2 (n-2) (5.4.1)
2
From (5.4.2) and (5.4.3) on page 520 =>
May 2004 Prof.VuThieu
86. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-7. Hypothesis Testing: The test of significance approach
87. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-8. Hypothesis Testing:
Some practical aspects
1) The meaning of Accepting or Rejecting a Hypothesis
2) The Null Hypothesis and the Rule of
Thumb
3) Forming the Null and Alternative
Hypotheses
4) Choosing , the Level of Significance
May 2004 Prof.VuThieu
88. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-8. Hypothesis Testing:
Some practical aspects
5) The Exact Level of Significance:
The p-Value [ See page 132 ]
6) Statistical Significance versus
Practical Significance
7) The Choice between Confidence-
Interval and Test-of-Significance
Approaches to Hypothesis Testing
[Warning: Read carefully pages 117-134 ]
May 2004 Prof.VuThieu
89. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-9. Regression Analysis and Analysis
of Variance
TSS = ESS + RSS
F=[MSS of ESS] /[MSS of RSS] =
= 2 ^ 2 x i 2 / ^ 2 (5.9.1)
If u i are normally distributed; H 0 : 2 = 0 then F follows the
F distribution with 1 and n-2 degree of freedom
May 2004 Prof.VuThieu
90. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-9. Regression Analysis and Analysis of Variance
F provides a test statistic to test the null hypothesis that
true 2 is zero by compare this F ratio with the F-critical obtained
from F tables at the chosen level of significance, or obtain the
p-value of the computed F statistic to make decision
May 2004 Prof.VuThieu
91. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-9. Regression Analysis and Analysis of Variance
Table 5-3. ANOVA for two-variable regression model
May 2004 Prof.VuThieu Source of Variation Sum of square ( SS)
Degree of Freedom - (Df) Mean sum of square ( MSS) ESS (due to
regression) y^ i 2 = 2 ^ 2 x i 2 1 2 ^ 2 x i 2 RSS (due to
residuals) u^ i 2 n-2 u^ i 2 /(n-2)= ^ 2 TSS y i 2 n-1
92. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-10. Application of Regression
Analysis: Problem of Prediction
By the data of Table 3-2, we obtained the sample regression
(3.6.2) :
Y^ i = 24.4545 + 0.5091X i , where
Y^ i is the estimator of true E(Y i )
There are two kinds of prediction as
follows:
May 2004 Prof.VuThieu
93. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-10. Application of Regression
Analysis: Problem of Prediction
Mean prediction : Prediction of the conditional mean value of Y
corresponding to a chosen X, say X 0 , that is the point on the
population regression line itself (see pages 137-138 for
details)
Individual prediction : Prediction of an individual Y value
corresponding to X 0 (see pages 138-139 for details)
May 2004 Prof.VuThieu
94. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-11. Reporting the results of
regression analysis
An illustration:
Y^ I = 24.4545 + 0.5091X i (5.1.1)
Se = (6.4138) (0.0357) r 2 = 0.9621
t = (3.8128) (14.2405) df= 8
P = (0.002517) (0.000000289) F 1,2 =2202.87
May 2004 Prof.VuThieu
95. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-12. Evaluating the results of regression analysis:
Normality Test: The Chi-Square ( 2 ) Goodness of fit Test
2 N-1-k = (O i E i ) 2 /E i (5.12.1)
O i is observed residuals (u^ i ) in interval i
E i is expected residuals in interval i
N is number of classes or groups; k is number of
parameters to be estimated. If p-value of
obtaining 2 N-1-k is high (or 2 N-1-k is small) =>
The Normality Hypothesis can not be rejected
May 2004 Prof.VuThieu
96. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-12. Evaluating the results of regression analysis:
Normality Test: The Chi-Square ( 2 ) Goodness of fit Test
H 0 : u i is normally distributed
H 1 : u i is un-normally distributed
Calculated- 2 N-1-k = (O i E i ) 2 /E i (5.12.1)
Decision rule:
Calculated- 2 N-1-k > Critical- 2 N-1-k then H 0 can
be rejected
May 2004 Prof.VuThieu
97. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-12. Evaluating the results of regression analysis:
The Jarque-Bera (JB) test of normality
This test first computes the Skewness (S)
and Kurtosis (K) and uses the following
statistic:
JB = n [S 2 /6 + (K-3) 2 /24] (5.12.2)
Mean= x bar = x i /n ; SD 2 = (x i - x bar ) 2 /(n-1)
S=m 3 /m 2 3/2 ; K=m 4 /m 2 2 ; m k = (x i - x bar ) k /n
May 2004 Prof.VuThieu
98. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-12. (Continued)
Under the null hypothesis H 0 that the residuals are normally
distributed Jarque and Bera show that in large sample
(asymptotically) the JB statistic given in (5.12.12) follows the
Chi-Square distribution with 2 df. If the p-value of the computed
Chi-Square statistic in an application is sufficiently low, one can
reject the hypothesis that the residuals are normally distributed.
But if p-value is reasonable high, one does not reject the
normality assumption.
May 2004 Prof.VuThieu
99. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-13. Summary and Conclusions
1. Estimation and Hypothesis testing constitute the two main
branches of classical statistics
2. Hypothesis testing answers this question: Is a given finding
compatible with a stated hypothesis or not?
3. There are two mutually complementary approaches to answering
the preceding question: Confidence interval and test of
significance.
May 2004 Prof.VuThieu
100. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-13. Summary and Conclusions
4. Confidence-interval approach has a specified probability of
including within its limits the true value of the unknown
parameter. If the null-hypothesized value lies in the confidence
interval, H 0 is not rejected, whereas if it lies outside this
interval, H 0 can be rejected
May 2004 Prof.VuThieu
101. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-13. Summary and Conclusions
5. Significance test procedure develops a test statistic which
follows a well-defined probability distribution (like normal, t, F,
or Chi-square). Once a test statistic is computed, its p-value can
be easily obtained.
The p-value The p-value of a test is the lowest significance
level, at which we would reject H 0 . It gives exact probability of
obtaining the estimated test statistic under H 0 . If p-value is
small, one can reject H 0 , but if it is large one may not reject H
0 .
May 2004 Prof.VuThieu
102. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-13. Summary and Conclusions
6. Type I error is the error of rejecting a true hypothesis.
Type II error is the error of accepting a false hypothesis. In
practice, one should be careful in fixing the level of significance
, the probability of committing a type I error (at arbitrary values
such as 1%, 5%, 10%). It is better to quote the p-value of the test
statistic.
May 2004 Prof.VuThieu
103. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-13. Summary and Conclusions
7. This chapter introduced the normality test to find out
whether u i follows the normal distribution. Since in small
samples, the t, F,and Chi-square tests require the normality
assumption, it is important that this assumption be checked
formally
May 2004 Prof.VuThieu
104. Chapter 5 TWO-VARIABLE REGRESSION: Interval Estimation and
Hypothesis Testing
5-13. Summary and Conclusions (ended)
8. If the model is deemed practically adequate, it may be used
for forecasting purposes. But should not go too far out of the
sample range of the regressor values. Otherwise, forecasting errors
can increase dramatically.
May 2004 Prof.VuThieu
105. Basic Econometrics Chapter 6 EXTENSIONS OF THE
TWO-VARIABLE LINEAR REGRESSION MODEL May 2004 Prof.VuThieu
106. Chapter 6 EXTENSIONS OF THE TWO-VARIABLE LINEAR REGRESSION
MODELS
6-1. Regression through the origin
The SRF form of regression:
Y i = ^ 2 X i + u^ i (6.1.5)
Comparison two types of regressions:
* Regression through-origin model and
* Regression with intercept
May 2004 Prof.VuThieu
107. Chapter 6 EXTENSIONS OF THE TWO-VARIABLE LINEAR REGRESSION
MODELS 6-1. Regression through the origin Comparison two types of
regressions: ^ 2 = X i Y i / X 2 i (6.1.6) O ^ 2 = x i y i / x 2 i
(3.1.6) I var( ^ 2 ) = 2 / X 2 i (6.1.7) O var( ^ 2 ) = 2 / x 2 i
(3.3.1) I ^ 2 = u^ i ) 2 /(n-1) (6.1.8) O ^ 2 = u^ i ) 2 /(n-2)
(3.3.5) I May 2004 Prof.VuThieu
108. Chapter 6 EXTENSIONS OF THE TWO-VARIABLE LINEAR REGRESSION
MODELS
6-1. Regression through the origin
r 2 for regression through-origin model
Raw r 2 = ( X i Y i ) 2 / X 2 i Y 2 i (6.1.9 )
Note: Without very strong a priory expectation, well advise is
sticking to the conventional, intercept-present model. If intercept
equals to zero statistically, for practical purposes we have a
regression through the origin. If in fact there is an intercept in
the model but we insist on fitting a regression through the origin,
we would be committing a specification error
May 2004 Prof.VuThieu
109. Chapter 6 EXTENSIONS OF THE TWO-VARIABLE LINEAR REGRESSION
MODELS
6-1. Regression through the origin
Illustrative Examples:
1) Capital Asset Pricing Model - CAPM (page 156)
2) Market Model (page 157)
3) The Characteristic Line of Portfolio Theory (page 159)
May 2004 Prof.VuThieu
110. Chapter 6 EXTENSIONS OF THE TWO-VARIABLE LINEAR REGRESSION
MODELS
6-2. Scaling and units of measurement
Let Y i = ^ 1 + ^ 2 X i + u^ i (6.2.1)
Define Y* i =w 1 Y i and X* i =w 2 X i then:
^ 2 = (w 1 /w 2 ) ^ 2 (6.2.15)
^ 1 = w 1 ^ 1 (6.2.16)
*^ 2 = w 1 2 ^ 2 (6.2.17)
Var( ^ 1 ) = w 2 1 Var( ^ 1 ) (6.2.18)
Var( ^ 2 ) = (w 1 /w 2 ) 2 Var( ^ 2 ) (6.2.19)
r 2 xy = r 2 x*y* (6.2.20)
May 2004 Prof.VuThieu
111. Chapter 6 EXTENSIONS OF THE TWO-VARIABLE LINEAR REGRESSION
MODELS
6-2. Scaling and units of measurement
From one scale of measurement, one can derive the results
based on another scale of measurement. If w 1 = w 2 the
intercept and standard error are both multiplied by w 1 .
If
w 2 =1 and scale of Y changed by w 1 , then all coefficients
and
standard errors are all multiplied by w 1 . If w 1 =1 and scale
of
X changed by w 2 , then only slope coefficient and its
standard
error are multiplied by 1/w 2 . Transformation from (Y,X)
to
(Y*,X*) scale does not affect the properties of OLS
Estimators
A numerical example : (pages 161, 163-165)
May 2004 Prof.VuThieu
112. 6-3. Functional form of regression model
The log-linear model
Semi-log model
Reciprocal model
May 2004 Prof.VuThieu
113. 6-4. How to measure elasticity
The log-linear model
Exponential regression model:
Y i = 1 X i e u i (6.4.1)
By taking log to the base e of both side:
lnY i = ln 1 + 2 lnX i + u i , by setting ln 1 =
lnY i = + 2 lnX i + u i (6.4.3)
(log-log, or double-log, or log-linear model)
This can be estimated by OLS by letting
Y* i = + 2 X* i + u i , where Y* i =lnY i , X* i =lnX i ;
2 measures the ELASTICITY of Y respect to X, that is,
percentage change in Y for a given (small) percentage change in
X.
May 2004 Prof.VuThieu
114. 6-4. How to measure elasticity
The log-linear model
The elasticity E of a variable Y with
respect to variable X is defined as:
E=dY/dX=(% change in Y)/(% change in X)
~ [( Y/Y) x 100] / [( X/X) x100]=
= ( Y/ X)x (X/Y) = slope x (X/Y)
An illustrative example : The coffee
demand function (pages 167-168)
May 2004 Prof.VuThieu
115. 6-5. Semi-log model : Log-lin and Lin-log Models
How to measure the growth rate: The log-lin model
Y t = Y 0 (1+r) t (6.5.1)
lnY t = lnY 0 + t ln(1+r) (6.5.2)
lnY t = + 2 t , called constant growth model (6.5.5)
where 1 = lnY 0 ; 2 = ln(1+r)
lnY t = + 2 t + u i (6.5.6)
It is Semi-log model, or log-lin model. The slope coefficient
measures the constant proportional or relative change in Y for a
given absolute change in the value of the regressor (t)
2 = (Relative change in regressand)/(Absolute change in
regressor) (6.5.7)
May 2004 Prof.VuThieu
116. 6-5. Semi-log model : Log-lin and Lin-log Models
Instantaneous Vs. compound rate of growth
2 is instantaneous rate of growth
antilog( 2 ) 1 is compound rate of growth
The linear trend model
Y t = + 2 t + u t (6.5.9)
If 2 > there is an upward trend in Y
If 2 < there is an downward trend in Y
Note: (i) Cannot compare the r 2 values of models (6.5.5) and
(6.5.9) because the regressands in the two models are different,
(ii) Such models may be appropriate only if a time series is
stationary.
May 2004 Prof.VuThieu
117. 6-5. Semi-log model : Log-lin and Lin-log Models
The lin-log model:
Y i = 1 + 2 lnX i + u i (6.5.11)
2 = (Change in Y) / Change in lnX = (Change in Y)/(Relative
change in X) ~ ( Y)/( X/X) (6.5.12)
or Y = 2 ( X/X) (6.5.13)
That is, the absolute change in Y equal to 2 times the relative
change in X.
May 2004 Prof.VuThieu
118. 6-6. Reciprocal Models : Log-lin and Lin-log Models
The reciprocal model:
Y i = 1 + 2 ( 1/X i ) + u i (6.5.14)
As X increases definitely, the term
2 ( 1/X i ) approaches to zero and Y i
approaches the limiting or asymptotic value 1 ( See figure 6.5
in page 174 )
An Illustrative example : The Phillips Curve for the United
Kingdom 1950-1966
May 2004 Prof.VuThieu
119. 6-7. Summary of Functional Forms Table 6.5 (page 178) May
2004 Prof.VuThieu Model Equation Slope = dY/dX Elasticity =
(dY/dX).(X/Y) Linear Y = X (X/Y) */ Log-linear (log-log) lnY = lnX
(Y X) Log-lin lnY = X Y X */ Lin-log Y = lnX 2 (1/X) Y) */
Reciprocal Y = X) - 2 (1/X 2 ) - XY) */
120. 6-7. Summary of Functional Forms
Note : */ indicates that the elasticity coefficient is
variable, depending on the value taken by X or Y or both. when no X
and Y values are specified, in practice, very often these
elasticities are measured at the mean values E(X) and E(Y).
-----------------------------------------------
6 -8. A note on the stochastic error term
6-9. Summary and conclusions
(pages 179-180)
May 2004 Prof.VuThieu
121. Basic Econometrics Chapter 7 MULTIPLE REGRESSION ANALYSIS:
The Problem of Estimation May 2004 Prof.VuThieu
122. 7-1. The three-Variable Model: Notation and Assumptions
Y i = 1 + 2 X 2i + 3 X 3i + u i (7.1.1)
2 , 3 are partial regression coefficients
With the following assumptions:
+ Zero mean value of U i: : E(u i |X 2i ,X 3i ) = 0. i
(7.1.2)
+ No serial correlation: Cov(u i ,u j ) = 0, i # j (7.1.3)
+ Homoscedasticity: Var(u i ) = 2 (7.1.4)
+ Cov(u i ,X 2i ) = Cov(u i ,X 3i ) = 0 (7.1.5)
+ No specification bias or model correct specified (7.1.6)
+ No exact collinearity between X variables (7.1.7)
(no multicollinearity in the cases of more explanatory
vars. If there is linear relationship exits, X vars. Are
said
to be linearly dependent)
+ Model is linear in parameters
May 2004 Prof.VuThieu
123. 7-2. Interpretation of Multiple Regression
E(Y i | X 2i ,X 3i ) = 1 + 2 X 2i + 3 X 3i (7.2.1)
(7.2.1) gives conditional mean or expected value of Y
conditional upon the given or fixed value of the X 2 and X 3
May 2004 Prof.VuThieu
124. 7-3. The meaning of partial regression coefficients
Y i = 1 + 2 X 2i + 3 X 3 +.+ s X s + u i
k measures the change in the mean value of Y per unit change in
X k , holding the rest explanatory variables constant. It gives the
direct effect of unit change in X k on the E(Y i ), net of X j (j #
k)
How to control the true effect of a unit change in X k on Y?
(read pages 195-197)
May 2004 Prof.VuThieu
125. 7-4. OLS and ML estimation of the partial regression
coefficients
This section (pages 197-201) provides:
1. The OLS estimators in the case of three-variable
regression
Y i = 1 + 2 X 2i + 3 X 3 + u i
2. Variances and standard errors of OLS estimators
3. 8 properties of OLS estimators (pp 199-201)
4. Understanding on ML estimators
May 2004 Prof.VuThieu
126. 7-5. The multiple coefficient of determination R 2 and the
multiple coefficient of correlation R
This section provides:
1. Definition of R 2 in the context of multiple regression like
r 2 in the case of two-variable regression
2. R = R 2 is the coefficient of multiple regression, it
measures the degree of association between Y and all the
explanatory variables jointly
3. Variance of a partial regression coefficient
Var(^ k ) = 2 / x 2 k (1/(1-R 2 k )) (7.5.6)
Where ^ k is the partial regression coefficient of regressor X
k and R 2 k is the R 2 in the regression of X k on the rest
regressors
May 2004 Prof.VuThieu
127. 7-6. Example 7.1: The expectations-augmented Philips Curve
for the US (1970-1982)
This section provides an illustration for the ideas introduced
in the chapter
Regression Model (7.6.1)
Data set is in Table 7.1
May 2004 Prof.VuThieu
128. 7-7. Simple regression in the context of multiple
regression: Introduction to specification bias
This section provides an understanding on Simple regression in
the context of multiple regression. It will cause the specification
bias which will be discussed in Chapter 13
May 2004 Prof.VuThieu
129. 7-8. R 2 and the Adjusted-R 2
R 2 is a non-decreasing function of the number of explanatory
variables. An additional X variable will not decrease R 2
R 2 = ESS/TSS = 1- RSS/TSS = 1- u^ 2 I / y^ 2 i (7.8.1)
This will make the wrong direction by adding more irrelevant
variables into the regression and give an idea for an adjusted-R 2
(R bar ) by taking account of degree of freedom
R 2 bar = 1- [ u^ 2 I /(n-k)] / [ y^ 2 i /(n-1) ] , or
(7.8.2)
R 2 bar = 1- ^ 2 / S 2 Y (S 2 Y is sample variance of Y)
K= number of parameters including intercept term
By substituting (7.8.1) into (7.8.2) we get
R 2 bar = 1- (1-R 2 ) (n-1)/(n- k) (7.8.4)
For k > 1, R 2 bar < R 2 thus when number of X variables
increases R 2 bar increases less than R 2 and R 2 bar can be
negative
May 2004 Prof.VuThieu
130. 7-8. R 2 and the Adjusted-R 2
R 2 is a non-decreasing function of the number of explanatory
variables. An additional X variable will not decrease R 2
R 2 = ESS/TSS = 1- RSS/TSS = 1- u^ 2 I / y^ 2 i (7.8.1)
This will make the wrong direction by adding more irrelevant
variables into the regression and give an idea for an adjusted-R 2
(R bar ) by taking account of degree of freedom
R 2 bar = 1- [ u^ 2 I /(n-k)] / [ y^ 2 i /(n-1) ] , or
(7.8.2)
R 2 bar = 1- ^ 2 / S 2 Y (S 2 Y is sample variance of Y)
K= number of parameters including intercept term
By substituting (7.8.1) into (7.8.2) we get
R 2 bar = 1- (1-R 2 ) (n-1)/(n- k) (7.8.4)
For k > 1, R 2 bar < R 2 thus when number of X variables
increases R 2 bar increases less than R 2 and R 2 bar can be
negative
May 2004 Prof.VuThieu
131. 7-8. R 2 and the Adjusted-R 2
Comparing Two R 2 Values:
To compare, the size n and the dependent variable must be the
same
Example 7-2: Coffee Demand Function Revisited (page 210)
The game of maximizing adjusted-R 2 : Choosing the model that
gives the highest R 2 bar may be dangerous, for in regression our
objective is not for that but for obtaining the dependable
estimates of the true population regression coefficients and draw
statistical inferences about them
Should be more concerned about the logical or theoretical
relevance of the explanatory variables to the dependent variable
and their statistical significance
May 2004 Prof.VuThieu
132. 7-9. Partial Correlation Coefficients
This section provides:
1. Explanation of simple and partial correlation
coefficients
2. Interpretation of simple and partial correlation
coefficients
(pages 211-214)
May 2004 Prof.VuThieu
133. 7-10. Example 7.3: The Cobb-Douglas Production function
More on functional form
Y i = 1 X 2 2i X 3 3i e U i (7.10.1)
By log-transform of this model:
lnY i = ln 1 + 2 ln X 2i + 3 ln X 3i + U i = 0 + 2 ln X 2i + 3
ln X 3i + U i (7.10.2)
135. Basic Econometrics Chapter 8 MULTIPLE REGRESSION ANALYSIS:
The Problem of Inference May 2004 Prof.VuThieu
136. Chapter 8 MULTIPLE REGRESSION ANALYSIS: The Problem of
Inference
8-3. Hypothesis testing in multiple regression:
Testing hypotheses about an individual partial regression
coefficient
Testing the overall significance of the estimated multiple
regression model, that is, finding out if all the partial slope
coefficients are simultaneously equal to zero
Testing that two or more coefficients are equal to one
another
Testing that the partial regression coefficients satisfy
certain restrictions
Testing the stability of the estimated regression model over
time or in different cross-sectional units
Testing the functional form of regression models
May 2004 Prof.VuThieu
137. Chapter 8 MULTIPLE REGRESSION ANALYSIS: The Problem of
Inference 8-4. Hypothesis testing about individual partial
regression coefficients With the assumption that u i ~ N(0, 2 ) we
can use t-test to test a hypothesis about any individual partial
regression coefficient. H 0 : 2 = 0 H 1 : 2 0 If the computed t
value > critical t value at the chosen level of significance, we
may reject the null hypothesis; otherwise, we may not reject it May
2004 Prof.VuThieu
138. Chapter 8 MULTIPLE REGRESSION ANALYSIS: The Problem of
Inference
8-5. Testing the overall significance of a multiple
regression: The F-Test
For Y i = 1 + 2 X 2i + 3 X 3i + ........+ k X ki + u i
To test the hypothesis H 0 : 2 = 3 =....= k = 0 ( all slope
coefficients are simultaneously zero ) versus H 1 : Not at all
slope coefficients are simultaneously zero, compute
F=(ESS/df)/(RSS/df)=(ESS/(k-1))/(RSS/(n-k)) (8.5.7) ( k = total
number of parameters to be estimated including intercept )
If F > F critical = F (k-1,n-k), reject H 0
Otherwise you do not reject it
May 2004 Prof.VuThieu
139. Chapter 8 MULTIPLE REGRESSION ANALYSIS: The Problem of
Inference
8-5. Testing the overall significance of a multiple
regression
Alternatively, if the p-value of F obtained from (8.5.7) is
sufficiently low, one can reject H 0
An important relationship between R 2 and F:
F=(ESS/(k-1))/(RSS/(n-k)) or
R 2 /(k-1)
F = ---------------- (8.5.1)
(1-R 2 ) / (n-k)
( see prove on page 249)
May 2004 Prof.VuThieu
140. Chapter 8 MULTIPLE REGRESSION ANALYSIS: The Problem of
Inference
8-5. Testing the overall significance of a multiple regression
in terms of R 2
For Y i = 1 + 2 X 2i + 3 X 3i + ........+ k X ki + u i
To test the hypothesis H 0 : 2 = 3 = .....= k = 0 ( all slope
coefficients are simultaneously zero ) versus H 1 : Not at all
slope coefficients are simultaneously zero, compute
F = [R 2 /(k-1)] / [(1-R 2 ) / (n-k)] (8.5.13) ( k = total
number of parameters to be estimated including intercept )
If F > F critical = F (k-1,n-k), reject H 0
May 2004 Prof.VuThieu
141. Chapter 8 MULTIPLE REGRESSION ANALYSIS: The Problem of
Inference
8-5. Testing the overall significance of a multiple
regression
Alternatively, if the p-value of F obtained from (8.5.13) is
sufficiently low, one can reject H 0
The Incremental or Marginal contribution of an explanatory
variable:
Let X is the new (additional) term in the right hand of a
regression. Under the usual assumption of the normality of u i and
the H O : = 0, it can be shown that the following F ratio will
follow the F distribution with respectively degree of freedom
May 2004 Prof.VuThieu
142. Chapter 8 MULTIPLE REGRESSION ANALYSIS: The Problem of
Inference 8-5. Testing the overall significance of a multiple
regression [R 2 new - R 2 old ] / Df 1 F com =
---------------------- (8.5.18) [1 - R 2 new ] / Df 2 Where Df 1 =
number of new regressors Df 2 = n number of parameters in the new
model R 2 new is standing for coefficient of determination of the
new regression (by adding X); R 2 old is standing for coefficient
of determination of the old regression (before adding X). May 2004
Prof.VuThieu
143. Chapter 8 MULTIPLE REGRESSION ANALYSIS: The Problem of
Inference
8-5. Testing the overall significance of a multiple
regression
Decision Rule:
If F com > F Df1 , Df2 one can reject the Ho that = 0 and
conclude that the addition of X to the model significantly
increases ESS and hence the R 2 value
When to Add a New Variable? If |t| of coefficient of X > 1
(or F= t 2 of that variable exceeds 1)
When to Add a Group of Variables? If adding a group of
variables to the model will give F value greater than 1;
May 2004 Prof.VuThieu
144. Chapter 8 MULTIPLE REGRESSION ANALYSIS: The Problem of
Inference 8-6. Testing the equality of two regression coefficients
Y i = 1 + 2 X 2i + 3 X 3i + 4 X 4i + u i (8.6.1) Test the
hypotheses: H 0 : 3 = 4 or 3 - 4 = 0 (8.6.2) H 1 : 3 4 or 3 - 4 0
Under the classical assumption it can be shown: t = [( ^ 3 - ^ 4 )
( 3 - 4 )] / se( ^ 3 - ^ 4 ) follows the t distribution with (n-4)
df because (8.6.1) is a four-variable model or, more generally,
with (n-k) df. where k is the total number of parameters estimated,
including intercept term. se( ^ 3 - ^ 4 ) = [var( ( ^ 3 ) + var( ^
4 ) 2cov ( ^ 3 , ^ 4 )] (8.6.4) (see appendix) May 2004
Prof.VuThieu
145. Chapter 8 MULTIPLE REGRESSION ANALYSIS: The Problem of
Inference
4.If t-computed > t-critical at designated level of
significance for given df, then reject H 0 . Otherwise do not
reject it. Alternatively, if the p-value of t statistic from
(8.6.5) is reasonable low, one can reject H 0 .
Example 8.2: The cubic cost function revisited
May 2004 Prof.VuThieu
146. Chapter 8 MULTIPLE REGRESSION ANALYSIS: The Problem of
Inference 8-7. Restricted least square: Testing linear equality
restrictions Y i = 1 X 2 2i X 3 3i e u i (7.10.1) and (8.7.1) Y =
output X 2 = labor input X 3 = capital input In the log-form: lnY i
= 0 + 2 lnX 2i + 3 lnX 3i + u i (8.7.2) with the constant return to
scale: 2 + 3 = 1 (8.7.3) May 2004 Prof.VuThieu
147. Chapter 8 MULTIPLE REGRESSION ANALYSIS: The Problem of
Inference
8-7 . Restricted least square: Testing linear equality
restrictions
How to test (8.7.3)
The t Test approach (unrestricted): test of the hypothesis H 0
: 2 + 3 = 1 can be conducted by t- test:
The F Test approach (restricted least square -RLS): Using, say,
2 = 1- 3 and substitute it into (8.7.2) we get: ln(Y i /X 2i ) = 0
+ 3 ln(X 3i /X 2i ) + u i (8.7.8). Where (Y i /X 2i ) is
output/labor ratio, and (X 3i / X 2i ) is capital/labor ratio
May 2004 Prof.VuThieu
148. Chapter 8 MULTIPLE REGRESSION ANALYSIS: The Problem of
Inference 8-7. Restricted least square: Testing linear equality
restrictions u^ 2 UR =RSS UR of unrestricted regression (8.7.2) and
u^ 2 R = RSS R of restricted regression (8.7.7), m = number of
linear restrictions, k = number of parameters in the unrestricted
regression, n = number of observations. R 2 UR and R 2 R are R 2
values obtained from unrestricted and restricted regressions
respectively. Then F=[(RSS R RSS UR )/m]/[RSS UR /(n-k)] = = [(R 2
UR R 2 R ) / m] / [1 R 2 UR / (n-k)] (8.7.10) follows F
distribution with m, (n-k) df . Decision rule: If F > F m, n-k ,
reject H 0 : 2 + 3 = 1 May 2004 Prof.VuThieu
149. Chapter 8 MULTIPLE REGRESSION ANALYSIS: The Problem of
Inference
8-7. Restricted least square: Testing linear equality
restrictions
Note: R 2 UR R 2 R (8.7.11)
and u^ 2 UR u^ 2 R (8.7.12)
Example 8.3 : The Cobb-Douglas Production
function for Taiwanese Agricultural Sector,
1958-1972. (pages 259-260). Data in Table 7.3
(page 216)
General F Testing (page 260)
Example 8.4 : The demand for chicken in the US,
1960-1982. Data in exercise 7.23 (page 228)
May 2004 Prof.VuThieu
150. Chapter 8 MULTIPLE REGRESSION ANALYSIS: The Problem of
Inference
8-8. Comparing two regressions: Testing for structural
stability of regression models
Table 8.8: Personal savings and income data, UK, 1946-1963
(millions of pounds)
Savings function:
Reconstruction period:
Y t = 1 + 2 X t + U 1t (t = 1,2,...,n 1 )
Post-Reconstruction period:
Y t = 1 + 2 X t + U 2t (t = 1,2,...,n 2 )
Where Y is personal savings, X is personal income, the u s are
disturbance terms in the two equations and n 1 , n 2 are the number
of observations in the two period
May 2004 Prof.VuThieu
151. Chapter 8 MULTIPLE REGRESSION ANALYSIS: The Problem of
Inference 8-8. Comparing two regressions: Testing for structural
stability of regression models + The structural change may mean
that the two intercept are different, or the two slopes are
different, or both are different, or any other suitable combination
of the parameters. If there is no structural change we can combine
all the n 1 , n 2 and just estimate one savings function as: Y t =
1 + 2 X t + U t (t = 1,2,...,n 1 , 1,....n 2 ). (8.8.3) How do we
find out whether there is a structural change in the savings-income
relationship between the two period? A popular test is Chow-Test,
it is simply the F Test discussed earlier H O : i = i i Vs H 1 : i
that i i May 2004 Prof.VuThieu
152. Chapter 8 MULTIPLE REGRESSION ANALYSIS: The Problem of
Inference 8-8. Comparing two regressions: Testing for structural
stability of regression models + The assumptions underlying the
Chow test u 1t and u 2t ~ N(0,s 2 ), two error terms are normally
distributed with the same variance u 1t and u 2t are independently
distributed Step 1 : Estimate (8.8.3), get RSS, say, S 1 with df =
(n 1 +n 2 k); k is number of parameters estimated ) Step 2 :
Estimate (8.8.1) and (8.8.2) individually and get their RSS, say, S
2 and S 3 , with df = (n 1 k) and (n 2 -k) respectively. Call S 4 =
S 2 +S 3 ; with df = (n 1 +n 2 2k) Step 3 : S 5 = S 1 S 4 ; May
2004 Prof.VuThieu
153. Chapter 8 MULTIPLE REGRESSION ANALYSIS: The Problem of
Inference 8-8. Comparing two regressions: Testing for structural
stability of regression models Step 4 : Given the assumptions of
the Chow Test, it can be show that F = [S 5 / k] / [S 4 / (n 1 +n 2
2k)] (8.8.4) follows the F distribution with Df = (k, n 1 +n 2 2k)
Decision Rule : If F computed by (8.8.4) > F- critical at the
chosen level of significance a => reject the hypothesis that the
regression (8.8.1) and (8.8.2) are the same, or reject the
hypothesis of structural stability; One can use p-value of the F
obtained from (8.8.4) to reject H 0 if p-value low reasonably. +
Apply for the data in Table 8.8 May 2004 Prof.VuThieu
154. Chapter 8 MULTIPLE REGRESSION ANALYSIS: The Problem of
Inference 8-9. Testing the functional form of regression: Choosing
between linear and log-linear regression models: MWD Test
(MacKinnon, White and Davidson) H 0 : Linear Model Y is a linear
function of regressors, the X s ; H 1 : Log-linear Model Y is a
linear function of logs of regressors, the lnX s ; May 2004
Prof.VuThieu
155. Chapter 8 MULTIPLE REGRESSION ANALYSIS: The Problem of
Inference 8-9. Testing the functional form of regression: Step 1 :
Estimate the linear model and obtain the estimated Y values. Call
them Yf (i.e.,Y^). Take lnYf. Step 2 : Estimate the log-linear
model and obtain the estimated lnY values, call them lnf (i.e.,
ln^Y ) Step 3 : Obtain Z 1 = (lnYf lnf) Step 4 : Regress Y on X s
and Z 1 . Reject H 0 if the coefficient of Z 1 is statistically
significant, by the usual t - test Step 5 : Obtain Z 2 = antilog of
(lnf Yf) Step 6 : Regress lnY on lnX s and Z 2 . Reject H 1 if the
coefficient of Z 2 is statistically significant, by the usual
t-test May 2004 Prof.VuThieu
156. Chapter 8 MULTIPLE REGRESSION ANALYSIS: The Problem of
Inference Example 8.5 : The demand for Roses (page 266-267). Data
in exercise 7.20 (page 225) 8-10. Prediction with multiple
regression Follow the section 5-10 and the illustration in pages
267-268 by using data set in the Table 8.1 (page 241) 8-11. The
troika of hypothesis tests: The likelihood ratio (LR), Wald (W) and
Lagarange Multiplier (LM) Tests 8-12. Summary and Conclusions May
2004 Prof.VuThieu