Task 1 last year- Computer assignment The data set busind.dta
contains information on Gross National Income (GNI) per capita and
the number of days to open a business and to enforce a contract in
a sample of 135 countries. It was extracted from the Doing Business
dataset, a dataset collected by the World Bank based on expert
opinions in each country. The variable gnipc measures GNI per
capita in thousand $. The variable daysopen measures the average
number of days needed to open a business in that country, and
daysenforce measures the average number of days needed to enforce a
given type of contract. (i) Find the average GNI per capita and the
average number of days to open a business, and the average number
of days to enforce a contract.
Slide 3
Answer to question (i) Stata command: use busind,clear su
daysenforce daysopen gnipc (ii) In how many countries does it take
on average less than 5 days to open a business? What is the maximum
number of days to open a business in the dataset? In which
countries does it take more than 200 days to open a business?
Slide 4
Answer to Question (ii) Stata command: su daysopen if
daysopen=200
Slide 5
Question (iii) Estimate the following simple regression model:
Give a careful interpretation of estimates b 1 and b 0. Are the
signs what you expected them to be?
Slide 6
Answer to Question (iii) Stata commands: reg gnipc
daysopen
Slide 7
Question (iv) Question: What kind of factors are contained in
u? Are these likely to be correlated with the number of days to
open a business? Answer: Factors contained in u are factors that
explain the GNI par capita apart from the number of days to open a
business. You might be conscious that there are many other factors,
such as economic institutions, education, savings, consumption,
R&D Some factors are likely to be correlated with the number of
days to open a business, such as the quality of economic
institutions.
Slide 8
Question (v) Question: What is according to this model the
predicted income for a country where it takes 5 days to open a
business? And the predicted income for a country where it takes 200
days to open a business? Show how you can calculate the answers by
hand (once you have obtained the estimation results). Do the
obtained levels of income seem reasonable? Explain..
Slide 9
Answer to Question (v) You can compute predicted values for the
dependent variable in two ways: by displaying when daysopen=5 and
daysopen=200 Stata commands: display _b[daysopen]*5+_b[_cons]
10.894018 display _b[daysopen]*200+_b[_cons] -7.5347099
Slide 10
Answer to question (v) or by generating the fitted value of the
dependent variable : reg gnipc daysopen predict gnipc_hat A problem
arises with this second method as there is no observation with
daysopen=200, so that it is impossible to get the value of
gnipc_hat for daysopen=200.
Slide 11
To illustrate our fitted values, we can draw the OLS regression
line: scatter gnipc daysopen||lfit gnipc daysopen
Slide 12
Question (vi) Estimate the following simple regression model
and give a careful interpretation of 1.
Slide 13
Answer to Question (vi) Stata command: reg gnipc
daysenforce
Slide 14
Question (viii)
Slide 15
Question (vii) Comparing the estimates of the models in (iii)
and (v), which one explains more of the variation in income per
capita across countries. Can you infer whether the duration to open
a business or the duration for enforcing contracts is more strongly
correlated with income per capita? Answer: How much of the
variation of GNI per capita (y) is explained by an independent
variable is given by the R 2. The greater the R 2, the more
variation of y is explained by x. The R 2 of the regression of GNI
per capita on the number of days to open a business is about 13%
and the R 2 of the regression of GNI per capita on the number of
days to enforce a contract 21%. That means that this variable
explains more of the variation of the gni per capita than the
former. It means that the duration for enforcing contract is more
strongly correlated with income per capita than the number of days
to open a business. Here, the correlation between gnipc and
daysenforce is equal to -0.46 and the correlation between gnipc and
daysopen is equal to -0.36.
Slide 16
Question (viii) Estimate the following simple regression model
and give a careful interpretation of 1.
Slide 17
Answer to Question (viii) Stata commands: gen lngnipc=ln(gnipc)
reg lngnipc daysopen
Slide 18
Do these results allow you to draw conclusions regarding the
desirability of policies aimed at reducing the number of days for
opening a business in certain developing countries? The dataset
contains 135 countries, and hence does not contain information
about all the countries in the world. Do you think one should
account for that when interpreting the regression results.
Why?
Slide 19
Task 2 last year- Computer exercise The dataset nepalind.dta
contains data from 706 children of 15 years old in Nepal. The data
come from the 2003 Nepal Living Standard Survey (NLSS) Living
Standard Measurement Survey (LSMS). We want to analyze this data to
understand the number of years of education. Illiteracy and low
levels of education are a major concern in Nepal, so it would be
good to know which type of factors could be explaining education of
the present generation, to know what type of policies to implement.
The dataset has some information on household characteristics and
characteristics of the child, and of the household head. The NLSS
is a LSMS-type survey, which are country-wide representative
surveys that statistical offices in developing countries conduct
with the support of the World Bank to determine poverty levels,
determinants of poverty, etc. See www.worldbank.org/lsms for more
info.www.worldbank.org/lsms
Slide 20
Question 1 Write a paragraph describing the dataset using the
standard descriptive statistics (also called summary statistics, or
D-stats). Add a table with the d-stats.
Slide 21
Slide 22
Question (1) Child characteristics Male (%) 52 Health status
(%) Good 69.5 Fair 30 Poor 0.5 Years of education 5.5 (3.6)
Slide 23
Question (1) Household characteristics Number of household
members 6.8 (2.73) under 18 years old 3.5 (1.74) between 18 and 59
3.0 (1.49) 60 or older 0.3 (0.63) Age of the head 46 (10.5)
Education of the head 2.8 (4.1) Land owned (in ha) 0.74 (1.05)
Value of jewelries (in rupees) 13985 (26726) Distance to school (in
hours) 0.29 (0.31) Number of observations 706 Standard errors into
parenthesis
Slide 24
Question 2: Show the distribution of the different values of
years of education in the dataset. Drop the variables that have
values higher than 10. Explain why that might be a smart thing to
do, before doing any regression analysis.. hist educ,discrete
(start=0, width=1)
Slide 25
Question (3): Specify a model that allows explaining the number
of years of education as a function of fathers age, the number of
active adults (between 18 and 60 years old) and the number of
elderly (60 or older) and all other variables you think are
interesting and appropriate. Make sure only to include variables
that are exogenous and discuss why the variables you include can be
considered exogenous. Estimate the model and give a careful
interpretation of each of the coefficients (sign, size, and
significance!). Do you find any of your results
counterintuitive?
Slide 26
Tips to answer question (3) Each variable that you add into the
model must be related to educ in some way, and should not violate
the ZCM assumption=>they must be exogenous=>ask yourself: x
caused by y? i.e. possibility of reverse causality? One third
factor determines both x and y? in this case correlation is not
causation, and x is not exogenous. u and x related for some other
reason? Gender? Heads age? Nb of active adults? Number of elderly?
Heads education? Land owned? distance to school? Value jewelry? Nb
of children? Health?
Slide 27
A reasonable model to estimate: Expected signs of coefficients?
Argue.
Slide 28
Question 4:What is the minimum significance level at which one
can reject that hypothesis that age of the household head does not
affect education levels? The p-value gives the smallest significant
level at which an hypothesis H0 can be rejected. In other words, a
low p-value indicates that the tested hypothesis is unlikely. The
minimum significance level at which one can reject the hypothesis
that the age of the household head does not affect education levels
is given by the p-value of the test 1 =0. Then, one can directly
read on the stata output that this minimum significance level is
1.4%.
Slide 29
Question (5) Do your results allow you to conclude that the
effects of the number of active adults in the household is
different than the effect of elderly? State the null hypothesis and
the alternative hypothesis you are testing, and the significance
level you are considering. Does your answer differ depending on
which significance level you consider?
Slide 30
Answer to Question (5) Need to test null hypothesis: H 0 : 3 =
4 against H 1 : 3 4 You just need command "test".
Slide 31
Question (6) Test whether the characteristics of the household
head are jointly significant. Show how to do this in stata, and
calculate the test by hand in 2 different ways. What can you
conclude about the role of household head characteristics on
education of the children?
Slide 32
Answer to Question (6)
Slide 33
Question 6: compute F-test Run the unrestricted and restricted
models, and compute either SSR or R2 form of the F-statistic. reg
educ head_age head_educ nractad nrold r2_supown distschool male
scalar r2_ur=e(r2) scalar df=e(df_r) reg educ nractad nrold
r2_supown distschool male scalar r2_r=e(r2)
Slide 34
Question (9) Child characteristics Non missingMissing Male (%)
5348 Health status (%) Good 69.574 Fair 3026 Poor 0.5. Years of
education 5.36.8
Slide 35
Question (9) Household characteristics Non missingMissing
Number of household members 6.96.1 under 18 years old 3.52.8
between 18 and 59 3.02.9 60 or older 0.30.4 Age of the head
46.548.2 Education of the head 2.65.5 Land owned (in ha) 0.770.29
Value of jewelries (in rupees) 1221235488 Distance to school (in
hours) 0.29. Number of observations 60046