Quantity Economics Regression Analysis

Quantityeconomicsproject

– Searching for the best fitted consumption

model.Bachelor’s Degree of Finance

10/05/2013

Quantity Economics Project

Introduction:

In order to start up an econometric linear regression

model analysis over the relationship between Consumers’

expenditure, personal disposable income and Interest

rates, the data are collected from NAVIDATA from

aggregated consumers’ expenditure, aggregated disposable

income based on the 1990’s price and Interest rates over

T-bill in UK between 1972 and 1995.

Q1

To gain an overview of the data, the line charts which

Page 1

2025

3035

40Mi

llio

n in

199

0 pr

ices

1970 1975 1980 1985 1990 1995year

Consumers' expenditure Personal disposable incomeSource: Office for National Statistics licensed under the Open Government Licence v.1.0.

Consumption and Income curves

10/05/2013

Figure 1


included different factors are drawn by Stata. A close

relationship can be found between income and consumption

from the first chart.

510

1520

Trea

sury

Bil

l yi

eld

%

1970 1975 1980 1985 1990 1995yearSource: Office for National Statistics licensed under the Open Government Licence v.1.0.

Interest Rates curves

F

urther analysis about the relationship between those

three factors can be conducted with an understanding of

basic Keynesian Consumption function theory and Surrey’s

theory.

The result of the

Page 2

Table 1


correlation table also illustrated how Income and

Consumption are closely related with a correlation

coefficient of 0.9934. A negative correlation exists

between interest rates and the other two factors.

So in the following question, various ways to test the

theory and the fitness of the model would be used. These

analyses include t test over the individual coefficient,

partial F test and F test for the overall significance of

the model, in addition, the R-squared analysis is also

used.

Q2

a) After using coefficient of correlation to measure the

Page 3

Table 2


strength of linear relationship between Income and

consumption in the first question, a regression analysis

can be implemented to determine what equation fits the

data best and measure the fitness of the model.

Using the estimated regression coefficients from Stata

the regression analysis output,

the regression equation can be

present as follow:

Page 4

Table 3


Y(Predicted) = 0.9988*X2 – 2.8500

St. Error. (0.0264) (0.7802)

T value: (37.85) (-3.60)

P value: (0.002) (0.000)

To test whether a model is good fit, R-squared is an

important measure. It reveals that how many percentage of

the outcome can be explained by the model as well as

whether the model fit actual data. In this case, R-

squared equals 98.83%, which means 98.83% of the change

in consumption can be explained by the change of income,

while the coefficient for X2 is 0.9988, so for every unit

increase in income, 0.9988 units increase of consumption

Page 5


are predicted. Considering that there is only one

variable t test with the hypothesis that there is no

difference between the coefficient and 0 would also

explain the fit of the model. The t statistic can be

calculated by formula: and it is given in the

stata output. So, for the first model, t is 37.85 which

are obviously larger than the critical t value, 2.1098,

with 17 degree of freedom. The corresponding p-value

approximately equals to 0, being less than 0.05, the

level of significance. So we can conclude that the first

Page 6


model is a good fit model.

It is clear that in scatter-line fitted diagram the

regression line fit with the actual data. The 95%

confident zone covered most of the data.

Using R-squared and t test analysis to examine the fit of

second model.

Page 7

Figure 2

Table 4


The estimated regression equation for the second model is

listed as follow:

Y(predicted) = 0.3471*X3 +

22.3333

St. Error. (0.3809)

(4.5552)

T value: (0.91) (4.90)

P value: (0.375) (0.000)

The R-squared value for this model is only 4.66%, which

means less than 5% of the changes in dependent variable,

consumption, can be explained with the model related to

interest rates. In terms of t statistic, in this case

Page 8

Table 5


0.91, it is located in the acceptance region (-2.1098,

2.1098). So we cannot reject the null hypothesis that the

coefficient of X3 is not significantly different from

zero. Thus we can also conclude the model is not a good

fitted model, as its only variable is not significant.

b) Given that both models have only one independent

variable, we know that the degree of freedom is 17, the

observations minus 2 (one for variable and one for the

Page 9

Figure 3

Figure 4


constant). The population standard deviation is estimated

by sample standard deviation and the observation is

smaller than 30. So the distribution of the estimated

coefficient a1, b1, a2 and b2 are t distribution with 17

degree of freedom. (The distribution is presented in the

following graph)

The null hypothesis for a1 is that a1 is 0, and the

alternative hypothesis is that a1 is not 0. It can also

Page 10


be written in this form: H0: a1=0, Ha: a1≠0;

Similar to a1, the t test for b1 has the hypothesis that:

H0: b1=0, Ha: b1≠0;

For a2: H0: a2=0, Ha: a2≠0;

For b2: H0: b2=0, Ha: b2≠0;

And for all of the 4

coefficients, the acceptance

region is (-2.1098, 2.1098). The critical region,

therefore, is .

Compared the t value of each variable with the critical

value, only the coefficient of X3 0.91 is located in the

acceptance region. Therefore we reject the null

Page 11

Table 6


hypothesis at 5% significance level for a1, b1 and a2. But

we cannot reject the null hypothesis for b2. Using p-value

will give out the same result, as only the p-value of b2

is greater than significance level 0.05.

c) Marginal Propensity to Consume (MPC) represents the

increase in consumption for each unit increasing in

income. People always spend certain amount of money in

maintaining essential daily life. As their income

increase, they will spend part of their income. So that

is marginal propensity to consume, and it should be

greater than 0 and smaller than 1.

Page 12


However, our model showed that the MPC is 0.9988, which

is almost one. In addition, the constant is -2.85,

smaller than 0. To test whether it agree with the theory

a serials of t test are conducted:

To begin with, two t tests over the constant is conducted

with =0.05.

The first is testing whether the MPC is smaller than 1:

H0: b2 <= 1 V.S. Ha: b2 > 1

With 5% significance level and 17 d.f. the critical value

can be computed. It is 1.7396. Therefore the critical

Page 13

Table 7


region is (1.7396, +∞). We will reject the null when t

statistic is greater than 1.7396. Using formula, t

statistic is -0.04718055. It is in acceptance region. We

cannot reject the null when =0.05.

H0: b2 >= 0 V.S. Ha: b2 < 0

Because it is testing left side of the tail, the critical

value computed is -1.7396 and the critical region is (-∞,

-1.7396). We will reject the null when t statistic is

less than -1.7396. Calculating t statistic, 37.850918 is

larger than -1.7396 located in acceptance region. We also

cannot reject the null when =0.05.

Page 14


We cannot reject both hypotheses; MPC is between 0 and 1

according to the sample data. It agrees with the theory.

Q3 Y (predicted) = 1.012296*X2 - 0.0818858*X3 - 2.29054

St. Error. (0.0251396) (0.0402461) (0.7677111)

T value: (40.27) (-2.03) (-2.98)

P value: (0.000) (0.059)

(0.009)

Page 15


a) H0: b2 = 0.95 V.S. Ha: b2 < 0.95

The significance level is given as 5%. The degree of

freedom for the model is 16. Therefore, the

critical region can be computed by stata. It is (-∞, -

1.7459). Using formula: the t statistic for the

hypothesis can be yielded: t = 2.4779952. Because t

statistic is greater than the critical value and it

located in the acceptance region, we cannot reject the

Page 16

Table 8


null hypothesis at 5% significance level that b2 is equal

or greater than 0.95.

b) The 95% confidence interval is defined as a region

where there is 95% probability that the true parameter

would fall in this region. The critical t value for 95%

confidence interval is 2.1199. P (-2.1199< <2.1199) =

95%. So X is between , yielding the

confidence interval: (-0.16720351, 0.00343191).

c)

i) F test for the overall model fit also provide a

clear view on whether the model is a good model, the

Page 17


null hypothesis being all of coefficient are not

significant different from 0. F statistic is

computed by using the Mean squared of Model divided

by the Mean Squared of Residual as . It

is 850.72 and greater than the critical F value at

5% significance level, 3.6337235, at 2 degree of

freedom for numerator and 16 degree of freedom of

denominator (calculated by invFtail function in

stata). Beside, its corresponding p-value

approximately, 0, is less than 0.05. So we can

Page 18


reject the null hypothesis at 5% significance level

and conclude that the model fit the actual data.

ii) Residual sum of squares is the sum of the difference

between actual outcome and model estimated outcome.

It can also be computed by using the total sum of

square minus the model sum of square as .

The smaller the RSS is comparing with TSS, the

larger the R-squared is and the better the model

fit. We also call it Sum of squared of error of

prediction. The RSS we have in the regression model

is 3.1024553. It indicates that only an amount of

Page 19


error is not explainable by the model.

The residual plot diagram shows the difference

between actual data and predicated data as follow:

d)

The variance-covariance matrix can be deduced from the

following formula: . In the ith column and

Page 20

Figure 5

Table 9


ith row the value of the (X’ X) matrix multiplied by is

the variance for ith coefficient. E.g. the variance of x2,

x3 and the constant are 0.000632, 0.00161975 and

0.58938034 respectively. The covariance between two

estimators are listed (i, j) position represented the

covariance between coefficient and .

e) Parameters b2 b3 b1

P value: (0.000) (0.059) (0.009)

Decision: Significant Insignificant

Significant

(Rule: P value smaller than significant level, it means

Page 21


the sample result is significant enough to reject the

null hypothesis.)

The conclusion is that b1 and b2 are significantly

different from 0 when =0.05. b3 is not significantly

different from 0 according to the sample at 5%

significance level.

f) After generated a time variable t, a new regression can

be conducted as follow:

Page 22

Table 10


The model:

g) To compare two models with different number of

variables, adjusted R-squared would be used. R-squared

can be computed by using 1 minus residual sum of squared

divided by total sum of square. Considering that each

model might have different number variables, in adjusted

R-squared, we use mean squared of residual and mean

squared of total instead of RSS and TSS. Both models with

time trend or without time trend are good fit according

Page 23


to the F statistics (both are greater than critical F

value 3.2873821) and R-squared (Both are more 99%

explained).

However, the model with time trend performs better

comparing the adjusted R-squared. Its adjusted R-squared

is 99.34% outperform the former one 98.95%. So the second

model which included time trend is the best model so far.

h) H0 : b3 = b4 = 0 (Both b3 and b4 are equal to 0)

Ha : b3≠0 or b4≠0 (one of the parameter is not equal

to 0)

Using Partial F test, the critical value can be find by

Page 24


using “invFtail()” function in stata. It is 3.6823203

with 5% significance level when there

are 2 degree of freedom in the numerator and 15 degree of

freedom in the denominator. Therefore the critical region

is (3.68, +∞) and the acceptance region is (0, 3.68). So

if the F value is in critical region, we would reject the

null at 5% significance level. The F value can be

calculated according to the formula: yielding F =

8.6045. Because F is greater than 3.68 and its associated

p-value is less than 0.05, we reject the null at 5%

significance level. Therefore we can conclude that at

Page 25


least one parameter between b3 and b4 is significantly

different from zero.

Q4. The model can be written in the following form:

Comparing the forecasted data with real data set, it is

obvious that the

predicted values

are relatively

Page 26

Figure 6

Figure 7


higher than the actual data. An unpredicted drop in

expenditure in after 1991 leads to the higher value in

prediction. In Figure 7, we can see the difference in

prediction and actual data.

Q5.

The log-linear model based on the regression of both

dependent variable and independent variables. For each

independent variable the coefficient represents the

elasticity of dependent variable with each percent change

of independent variable.

The log-linear models with and without time trends are

Page 27


listed as follow:

(Data from Table 11)

The overall fit of both

models can be examined by

their F statistics and R-squared. Both models have very

small residual sum of squares, with 0.008691539 and

0.008442626 respectively. As a consequence that, both

models have high R-squared. The R-squared for the model

without time trend is 98.88% and 98.91% for the other. F

statistics are also significantly higher that the

critical value, 3.4668 and 3.0984 respectively. So both

Page 28

Table 11


models are good fit. However the model without time trend

outperforms the other with its adjusted R-squared

slightly higher 98.78% over 98.75%.

Data for both models are listed as follow:

Q6.

There are several variables which are considered in

Surrey’s theory including inflation and the consumption

and the income from the year before (lag structure). I

think those variables can be significant in the

Page 29

Table 12


consumption model. The inflation is the index which

evaluates the change of price level. It affects

consumption in a negative way. Duesenberry’s theory about

“ratchet effect” explained that the highest income and

the consumption (consume style) will also affect the

present consumption. In addition, I would also take GDP

into consideration, as it is an important index which

related the overall economic condition. It can play a

significant role in the changing of people’s consumption.

The regression output is showed in Table 13

Page 30


The model that I have built included 6 variables, income,

interest rates, inflation (X4), the income and consumption

from last year. It is built in a log-linear form to

reduce the fluctuation of the model and present the

consumption elasticity to the respect of each variable.

The modeling output also proved it is the best fit model

Page 31


comparing with the models above. It has 24 observations

as other models. The R-squared is as high as 0.9982.

99.82% of change in income can be explained. The adjusted

R-squared which usually used to compare is outperform all

the other models with 0.9976.

The comparison between the new model and the best model

we used in section (5),

, will prove that this new model is the best fit.

I conduct the partial F test over variables: inflation,

GDP, past income and consumption. The yielding F

statistic 22.01 is greater than the critical value 2.96

Page 32


with 4 d.f. and 17 d.f. for numerator and denominator

respectively. So we reject the hypothesis that new

variable are not significantly different from 0. So we

cannot take out the variables which are significant in

the model. The new model would be chosen between two.

Conclusion:

In the regression analysis over the relationship between

consumption and other variables including income,

interest rates, inflation and GDP, We discussed about the

theories, such as Kenyesian Consumption theory, MPC

theory and Surrey’s theory, behind the consumption

Page 33


function. T test, R-squared and F test are used to

testing the significance of the coefficients and overall

model fit. After evaluating different variables, we build

up a new model and can conclude that income, interest

rate, inflation, GDP, and past income and consumption are

significant variables in the changing of consumer’s

expenditure.

Page 34


/* Quantity Economics Project*/

clear

cd E:\stata /* Here is the directory

of my project*/

capture log close

log using StataProject.log, replace

insheet using StataProject.csv

/***************************** Section One

**************************/

replace y = y / 10000

replace x2 = x2 / 10000

lab var year "year"

lab var y "Consumers' expenditure"

lab var x2 "Personal disposable income"

lab var x3 "Interest rates"

/*The first Graph for income and consumption*/

graph twoway line y x2 year, title(Consumption and Income

curves) subtitle(" ") ytitle("Million in 1990 prices")

name(“gr1”, replace) note("Source: Office for National

Page 35


Statistics licensed under the Open Government Licence

v.1.0. ")

/*The second Graph for interest rate*/

line x3 year, title("Interest Rates curves") subtitle("

") ytitle("Treasury Bill yield %") name(“gr2”, replace)

note("Source: Office for National Statistics licensed

under the Open Government Licence v.1.0. ")

cor y x2 x3

/***************************** Section Two

**************************/

/*Question a. */

regress y x2 if year <= 1990

estimates store model1

estimates table model1, se

/*The third graph for model1 fit*/

gr tw (sc y x2) (lfitci y x2) if year <= 1990,

title("Regression model for Income and consumption")

subtitle(" ") ytitle("Consumers' expenditure")

name(“gr3”, replace) note("Source: Office for National

Page 36


Statistics licensed under the Open Government Licence

v.1.0. ")

regress y x3 if year <= 1990

est store model2

est table model2, se

/*The fourth graph of model2 fit*/

gr tw (sc y x3) (lfitci y x3) if year <= 1990,

title("Regression model for Interest rates and

consumption") subtitle(" ") ytitle("Consumers'

expenditure") name(gr4, replace) note("Source: Office for

National Statistics licensed under the Open Government

Licence v.1.0. ")

/*Question b. */

/*The fifth graph for T distribution*/

tw function y=tden(17,x), range(-4 -2.1098) color(gs5)

recast(area) || function y=tden(17, x), range(2.1098 4)

color(gs5) recast(area) || function y=tden(17, x),

range(-4 4) legend(off) ytitle("percentage")

title("Student's T distribution curve") subtitle(" ")

Page 37


note("t distribution with 17 degree of freedom and 5%

significance level") name("gr5", replace)

/*Table for model parameters*/

est table model1 model1, se t p

/*Question c. */

/*Generate a tabel for t test*/

quietly reg y x2 if year <= 1990

ereturn display

quietly est table model2

mat a = r(coef)

/*T test on MPC*/

display "T test for the null: b2 <= 1 against b>1"

display "t = ( b2(estimated)-1 )/ (se) "

di "t = " (a[1,1]-1)/a[1,2]^0.5

display "T test for the null: b2 >= 0 against b<0"

display "t = ( b2(estimated)-0 )/ (se) "

di "t = " (a[1,1]-0)/a[1,2]^0.5

/***************************** Section Three

**************************/

Page 38


/*Question a.*/

reg y x2 x3 if year <= 1990

matrix coef = e(b)

display "Model: " "Y =" coef[1,1] " X2 " coef[1,2] " X3

" coef[1,3]

/*Question b.*/

est store model3

quietly est table model3

mat a = r(coef)

display "T test for the null: b2 = 0.95 against b<0.95"

display "t = ( b2(estimated)-0.95 )/ (se) "

di "t = " (a[1,1]-0.95)/a[1,2]^0.5

/*Question c.*/

/*The sixth graph for the residuals*/

rvfplot, yline(0) title("Residual of predicted

consumption") subtitle(" ") xtitle("Predicted

consumption") name("gr6", replace)

/*Question d.*/

estat vce

Page 39


/*Question e.*/

est table model3, p

/*Question f.*/

gen t = _n

lab var t "time"

reg y x2 x3 t if year <= 1990

est store model4

mat b = e(b)

display "Model: " "Y =" b[1,1] "*X2 " b[1,2] "*X3 "

a[1,3] "*t " b[1,3]

/*Question h.*/

test x3 t

/***************************** Section Four

**************************/

predict yhat

lab var yhat "Predicted Consumption"

/*The seventh graph for prediction value*/

Page 40


line y year || line yhat year if year >= 1991,

title("Predicted curve from 1991 to 1995") subtitle(" ")

ytitle("Consumption") legend(off) name(“gr7”, replace)

list year y yhat if year >= 1991

gen dfy = y - yhat

lab var dfy "Difference between y and yhat"

sc dfy yhat if year > 1990, mlabel(year) ylabel(-3(0.5)1)

xlabel(35(1)40) yline(0) title("Residuals in the

predicted period") subtitle(" ") ytitle("difference

between y and yhat") xtitle("Predicted consumption")

name(“gr8”, replace)

/************************** Section Five

****************************/

gen logy = log(y)

gen logx2 = log(x2)

gen logx3 = log(x3)

lab var logy "Log form of consumption"

lab var logx2 "Log form of income"

lab var logx3 "Log form of interest rates"

Page 41


reg logy logx2 logx3

est store loglinear1

reg logy logx2 logx3 t

est store loglinear2

est table loglinear1 loglinear2

/***************************** Section Six

***************************/

clear

insheet using q6.csv

replace y = y / 10000

replace x2 = x2 / 10000

replace g = g / 10000

gen logy = log(y)

gen logx2 = log(x2)

gen logx3 = log(x3)

gen logg = log(g)

gen lagrpi = rpi[_n-1]

gen x4 = rpi/lagrpi-1

Page 42


gen logx4 = log(x4)

gen lagy = y[_n-1]

gen lagx2 = x2[_n-1]

gen loglagy = log(lagy)

gen loglagx2 = log(lagx2)

drop if _n <= 2

gen t = _n

reg logy logx2 logx3 logx4 loglagy loglagx2 logg

/*************************** Finished

*************************/

Bibliography

1) Annotated output for stata. UCLA: Statistical Consulting Group. from http://www.ats.ucla.edu/stat/AnnotatedOutput/ (accessed May 01, 2013).

2) Douglas A. Lind, Lind, William G. Marchal, Samuel Adam Wathen (2010) Basic Statistics for Business and Economics, Student Edition with Formula Card, 5th edn., : McGraw-Hill Higher Education.

3) Thomas R L, Introductory Econometrics: Theory and Applications, chapter 10 “Consumption Functions”, 2nd edition, Longman, 1993, Ch 10 pp 247-269

4) Michael Barrow (2009) Statistics for economics, accounting and business studies [electronic resource], 5th ed. : Prentice Hall/Financial Times.

Page 43

Date post:	27-Mar-2023
Category:	Documents
Upload:	qub
View:	0 times
Download:	0 times

Quantity Economics Regression Analysis

Documents