Research Methods Lecture 5 Advanced STATA IAN WALKER Module Leader S2.109 i.walker@warwick.ac.uk.

Post on 31-Mar-2015

228 views 1 download

Tags:

transcript

Research Methods Lecture 5

Advanced STATA

IAN WALKERModule Leader

S2.109 i.walker@warwick.ac.uk

Housekeeping announcement

• Stephen Nickell (MPC and LSE) – British Academy Keynes Lecture in Economics– "Practical Issues in UK Monetary Policy 2000-

2005"

–Wednesday 2nd November

– Arts Centre Conference Room at 5.30pm – http://www2.warwick.ac.uk/fac/soc/economics/

forums/deptsems/keynes_lecture/

Stat-Transfer• Use STAT-

TRANSFER to convert data.

• Click on• Stat-transfer is

“point and click”.

• Just tell it the file name and format

• and the format you want it in.

• Click “transfer”.

Stat Tran 6.lnk

Stat Transfer options• Useful options for creating a manageable

dataset from a large one:– Keep or drop variables– Change variable format

• E.g. float to integer

– Select observations• E.g. “where (income + benefits)/famsize < 4500”

• Can be used for reading a large STATA dataset and writing a smaller one

• Avoids doing this in STATA itself

Practicising• You can import some of Stata’s own demo files

using the .sysuse command– E.g. .sysuse auto

• Many datasets are available at specific websites– E.g. STATA’s own site has all the demo data used in

the manual examples

• You can use the .webuse command to load the files directly into stata without copying locally.webuse auto /* gets the data from STATA’s own site */Or .webuse set http://www2.warwick.ac.uk/fac/soc/

economics/pg/modules/rm/notes/auto.dta

More help• You can search the whole of STATA’s online help

using .search xxx• Michigan’s web-based guide to STATA (for SA)• UCLA resources to help you learn and use STATA:

– including movies and “web-books”• Consult other user-written guides and tutorials

– Chevalier1, Chevalier2; Princeton; Illinois; Gruhn• ESDS’s “Stata for LFS”• Stata’s own resources for learning STATA

– Stata website, journal, library, archive– http://www.stata.com/links/resources1.html

Web resources• STATA is web-aware

– E.g. . update /* updates from www.stata.com */

• Statalist is an email listserv discussion group• The Stata Journal is a refereed journal

– Replaces the old Stata Technical Bulletin (STB):

• SSC Boston College STATA Archive – Extensive library of programs by Stata users– Files can be downloaded in Stata using . ssc

• Eg .ssc install outreg • Installs the outreg ado file that makes tables pretty

Always (whatever the software)

• Use lowercase• Open a log file• Label your data• Use the do file editor• Organise your files

– Separate directories for separate projects– Archive (zip) data, do and results files

when your finished

Customising STATA• profile.do runs automatically when STATA

starts• Edit it to include commands you want to

invoke every time.set mem 200m.log using justincase.log, replace

• Define preferences for STATA’s look and feel– Click on Prefs in menu

• Colours, graph scheme, etc.• Save window positioning

Regression models - I• Linear regression and related models when

the outcome variable is continuous– OLS, 2SLS, 3SLS, IV, quantile reg, Box-Cox …

• Binary outcome data– the outcome variable is 0 or 1(or y/n)

• probit, logit, nested logit...;

• Multiple outcome data– the outcome variable is 1, 2, ...,

• conditional logit, ordered probit

Regression models - II• Count data

– the outcome variable is 0, 1, 2, ..., occurrences • Poisson regression, negative binomial

• Choice models– multinomial choice– A, B or C

• Multinomial logit, Random utility model, unordered probit, nested logit, ...etc

• Selection models– Truncated, censored

• Tobit, Heckman selection models; • linear regression or probit with selection

Regression models - III• STATA supports several special data types.• Once type is defined special commands work• Time series

– Estimate ARIMA, and ARCH models– Estimators for autocorrelation and heteroscedasticity– Estimate MA and other smoothers– Tests for auto, het, unit roots - h, d, LM, Q, ADF, P-P …..– TS graphs sysuse tsline2, clear tsset day tsline calories, ttick(28nov2002 25dec2002 , tpos(in)) ttext(3470 28Nov2002 “Thanks" 3470 25dec2002 “Xmas"",orient(vert))

…gives

than

ks

x-m

as

3400

3600

3800

4000

4200

4400

Ca

lorie

s co

nsu

med

01jan2002 01apr2002 01jul2002 01oct2002 01jan2003Date

Special data types: survey

• Non-randomness induces OLS to be inefficient

• STATA can handle non-random survey data– see the “syv***” commands– Example (stratified sample of medical cases):

. webuse nhanes2f, clear

. svyset psuid [pweight=finalwgt], strata(stratid)

. svy: reg zinc age age2 weight female black orace rural

. reg zinc age age2 weight female black orace rural

Number of strata = 31 Number of obs = 9189 Number of PSUs = 62 Population size = 1.042e+08 Design df = 31 F( 7, 25) = 62.50 Prob > F = 0.0000 R-squared = 0.0698 ------------------------------------------------------------------------------ | Linearized zinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | -.1701161 .0844192 -2.02 0.053 -.3422901 .002058 age2 | .0008744 .0008655 1.01 0.320 -.0008907 .0026396 weight | .0535225 .0139115 3.85 0.001 .0251499 .0818951 female | -6.134161 .4403625 -13.93 0.000 -7.032286 -5.236035 black | -2.881813 1.075958 -2.68 0.012 -5.076244 -.687381 orace | -4.118051 1.621121 -2.54 0.016 -7.424349 -.8117528 rural | -.5386327 .6171836 -0.87 0.390 -1.797387 .7201216 _cons | 92.47495 2.228263 41.50 0.000 87.93038 97.01952 ------------------------------------------------------------------------------ . regress zinc age age2 weight female black orace rural Source | SS df MS Number of obs = 9189 -------------+------------------------------ F( 7, 9181) = 79.72 Model | 110417.827 7 15773.9753 Prob > F = 0.0000 Residual | 1816535.3 9181 197.85811 R-squared = 0.0573 -------------+------------------------------ Adj R-squared = 0.0566 Total | 1926953.13 9188 209.724982 Root MSE = 14.066 ------------------------------------------------------------------------------ zinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | -.090298 .0638452 -1.41 0.157 -.2154488 .0348528 age2 | -.0000324 .0006788 -0.05 0.962 -.0013631 .0012983 weight | .0606481 .0105986 5.72 0.000 .0398725 .0814237 female | -5.021949 .3194705 -15.72 0.000 -5.648182 -4.395716 black | -2.311753 .5073536 -4.56 0.000 -3.306279 -1.317227 orace | -3.390879 1.060981 -3.20 0.001 -5.470637 -1.311121 rural | -.0966462 .3098948 -0.31 0.755 -.7041089 .5108166 _cons | 89.49465 1.477528 60.57 0.000 86.59836 92.39093

Special data types: duration

• Survival time data– See the “st***” commands

.stset failtime /*sets the var that defines duration*/

• Estimates a wide variety of models to explain duration– E.g. Weibull “hazard” model -

Weibull example ….

twoway (function y = .5*x^(-.5), range(0 5) yvarlab("a=.5") )( function y = 1.5*x^(.5), range(0 5) yvarlab("a=1.5") )( function y = 1*x^(0), range(0 5) yvarlab("a=1") )( function y = 2*x, range(0 2) yvarlab("a=2") ), saving(weib1, replace)title("Weibull hazard: lambda=1, alpha varying")ytitle(hazard) xtitle(t)

• ST regression supports Weibull, Cox PH and other options. streg load bearings, distribution(weibull)

• After streg you can plot bthe estimated hazard with . stcurve, cumhaz• STATA allows functions to be plotted by specifying the

function:

gives…..0

12

34

haza

rd

0 1 2 3 4 5t

a=.5 a=1.5a=1 a=2

Weibull hazard: lambda=1, alpha varying

Special data types: Panel data

• STATA can handle “panel” data easily– see the “xt***” commands

• Common commands are.xtdes Describe pattern of xt data

.xtsum Summarize xt data

.xttab Tabulate xt data

.xtline Line plots with xt data

.xtreg Fixed and random effects

Panel data• An xt dataset looks like this: pid yr_visit fev age sex height smokes ---------------------------------------------------------- 1071 1991 1.21 25 1 69 0 1071 1992 1.52 26 1 69 0 1071 1993 1.32 28 1 68 0 1072 1991 1.33 18 1 71 1 1072 1992 1.18 20 1 71 1 1072 1993 1.19 21 1 71 0

• xt*** commands need to know the variables that identify person and “wave”:

. iis pid . tis yr_visit

Or use the tsset command. tsset pid yr_visit, yearly

Panel regression

• Once STATA has been told how to read the data it can perform regressions quite quickly:. xtreg y x, fe

. xtreg y x, re

Further advice

• See Stephen Jenkins’ excellent course on duration modelling in STATA

• See Steve Pudney’s excellent course on panel data modelling in STATA– Beware the dataset is 30mb+