+ All Categories
Home > Documents > 1 CCPR Computing Services Introduction to Stata Courtney Engel October 26, 2007.

1 CCPR Computing Services Introduction to Stata Courtney Engel October 26, 2007.

Date post: 26-Dec-2015
Category:
Upload: angelina-cameron
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
41
1 CCPR Computing Services Introduction to Stata Courtney Engel October 26, 2007
Transcript

1

CCPR Computing ServicesIntroduction to Stata

Courtney EngelOctober 26, 2007

2

Outline Stata

Command Syntax Basic Commands Abbreviations Missing Values Combining Data Using do-files Basic programming Special Topics Getting Help Updating Stata

3

Stata Syntax

Basic command syntax:[by varlist:]

command [varlist] [= exp] [if exp] [in range] [weighttype=weight] [, options]

Brackets = optional portions Italics = user specified

http://www.ccpr.ucla.edu/Computing_Services/Tutorial/Stata/log/stataslides10.07.log

4

Complete syntax[by varlist:]

command [varlist] [= exp] [if exp] [in range] [weighttype=weight] [, options]

Example 1 (webuse union) Stata Command:

.bysort black: summarize age if year >= 80, detail Results:

Summarizes age separately for different values of black, including only observations for which year >= 80, includes extra detail.

Stata Syntax, cont.

5

Complete syntax[by varlist:]

command [varlist] [= exp] [if exp] [in range] [weighttype=weight] [, options] Example 2 (webuse union)

Stata Commands: .generate agelt30 = age.replace agelt30 = 1 if age < 30.replace agelt30 = 0 if age >= 30 & age < .

Result: Variable agelt30 set equal to 1, 0, or missing

Generally [= exp] used with commands generate and replace

Stata Syntax, cont.Obs # age agelt30

1 10 1

2 15 1

3 . .

4 30 0

5 73 0

6

Basic Commands – Load “auto” data and look at some vars Load data from Stata’s website

webuse auto.dta Look at dataset

describe Summarize some variables

codebook make headroom, header

inspect weight length

7

Basic Commands – Load “auto” data and look at some vars Look at first and last observation

list make price mpg rep78 if _n==1

list make price mpg rep78 if _n==_N Summarize a variable in a table

table foreign

table foreign, c(mean mpg sd mpg)

8

Keep/Save a Subset of the Data “Keep” a subset of the variables in memory

keep make headroom trunk weight length price List variables in current dataset

ds List string variables in current dataset

ds, has(type string) Save current dataset

save autokeep, replace

9

Generating New Variables Create new variable = headroom squared

generate headroom2 = headroom^2 Generate numeric from string variable

encode make, generate(makeNum)

list make makeNum in 1/5 Can’t tell it’s numeric, but look at “storage type” in

describe:

describe make makeNumObs # Headroom Headroom2

1 10 100

2 9 81

3 4 16

10

Generating New Variables, cont. Create categorical variable from continuous

variable “price” is integer-valued with minimum 3291 and

max 15906 Generate categorical version - Method 1:

generate priceCat = 0

replace priceCat = 1 if price < 5000

replace priceCat = 2 if price >= 5000 & price < 10000

replace priceCat = 3 if price >= 10000 & price < .

11

Generating New Variables, cont. Generate categorical version of numerical

variable: Method 2generate priceCat2 = price

recode priceCat2 (min/5000 = 1) (5000/10000=2) (10000/max=3)

Compare price, priceCat, and priceCat2table price priceCat

table priceCat priceCat2

12

Variable Labels and Value Labels Create a description for a variable:

label variable priceCat “Categorical price"

Create labels to represent variable values:label define priceCatlabels 1 “cheap” 2 “mid-range” 3 “expensive”

label values priceCat priceCatLabels

View results:describe

list price priceCat in 1/10

13

Reshape > Wide to Long

Wide -> Long: reshape long author, i(year session order) j(count)

long - reshape from wide to long author- Stem of the variable going from wide to long i(year session order)- Uniquely identifies an observation in wide form j(count)- Variable which will be created to contain suffix of Author i.e. (1 2)

year Session Order Author1 Author2

2006 P01 3 Biddlecom Bankole

2006 P01 4 Anyara Hinde

2006 P01 5 Amouzou Becker

Wide format:

14

Reshape > Long to Wide

Long -> Wide:reshape wide author, i(year session order) j(count)

wide - reshape from long to wide author - variable to be converted from long to wide i(year session order) - variables uniquely identify observations in wide j(count)- variable gives the suffix of Author i.e. (1 2)

Year Session Order Author Count

2006 P01 3 Biddlecom 1

2006 P01 3 Bankole 2

2006 P01 4 Anyara 1

2006 P01 4 Hinde 2

2006 P01 5 Amouzou 1

2006 P01 5 Becker 2

Long format:

15

A few other commands

compress - saves data more efficiently

sort/ gsort – ascending/descending observation sort

order- variable order

rename – rename variables

set more on/off – produce results with pause?

16

Abbreviations in Stata

Abbreviating command, option, and variable names shortest uniquely identifying name is sufficient

Example: Assume three variables are in use: make, price, mpg “UN-abbreviated” Stata command:

.summarize make price Abbreviated Stata command:

.su ma p Exceptions

describe (d), list (l), and some others Commands that change/delete Functions implemented by ado-files

17

Missing Values in Stata 8-10

Stata 8 and later versions 27 representations of numerical “missing” ., .a, .b, … , .z

Relational comparisons Biggest number < . < .a < .b < … < .z

Mathematical functions missing + nonmissing = missing

String missing = Empty quote: “”

18

Missing Values in Stata - Pitfalls Pitfall #1

Missing values changed after Stata7:

Pitfall #2 Do NOT:

.replace weightlt200 = 0 if weight >= 200

INSTEAD: .replace weightlt200 = 0 if weight >= 200 & weight < .

Stata 7 Stata 8 and later

varname != . varname < .

varname == . varname >= .

19

Combining Data

Append vs. Merge Append – two datasets with same variables, different

observations Merge – two datasets with same or related observations,

different variables

Appending data in Stata Example: append.do

http://www.ccpr.ucla.edu/Computing_Services/Tutorial/Stata/log/append10.07.log

20

Combining Data- merge and joinby Demonstrate with two sample datasets:

Neighborhood and County samples One-to-one merge

onetoone.dohttp://www.ccpr.ucla.edu/Computing_Services/Tutorial/Stata/log/onetoone10.07.log

One-to-many merge – use match merge onetomany.dohttp://www.ccpr.ucla.edu/Computing_Services/Tutorial/Stata/log/onetomany10.07.log

Many-to-many merge – use joinby manytomany.dohttp://www.ccpr.ucla.edu/Computing_Services/Tutorial/Stata/log/manytomany10.07.log

21

Combining Data

Variable _merge (generated by merge and joinby)

Pitfalls Merging unsorted data Many-to-many using merge instead of joinby

_merge Observation in master data Observation in “using” data

1 Yes No

2 No Yes

3 Yes Yes

22

Do-files

What is a do-file? Stata commands can be executed interactively or

via a do-file A do-file is a text file containing commands that

can be read by Stata Running a do-file within Stata

.do dofilename.do

23

Do-files

Why use a do-file? Documentation Communication Reproduce interactive session?

Interactive vs. do-files Record EVERYTHING to recreate results in

your do-file!

Do-files > Documentation Header *Josie Bruin ([email protected])

*HRS project*/u/socio/jbruin/HRS/*October 5, 2007*Stata version 8*Purpose: Create and merge two datasets in Stata,* then convert data to SAS*Input programs: * HRS/staprog/H2002.do, * HRS/staprog/x2002.do, * HRS/staprog/mergeFiles.do*Output: * HRS/stalog/H2002.log, * HRS/stalog/x2002.log, * HRS/stalog/mergeFiles.log * HRS/stadata/Hx2002.dta * HRS/sasdata/Hx2002.sas*Special instructions: Check log files for errors * check for duplicates upon new data release

File header includes: Name (email) Project Project location Date Software Version Purpose of program Inputs Outputs Special Instructions

25

Do-files > Comments Comments

Lines beginning with * will be ignored Words between // and end of line will be ignored Spanning commands over two lines:

Words between /* and */ will be ignored, including end of line character

Words between /// and beginning of next line will be ignored

26

Do-file > End of Line Character Commands requiring multiple lines

delimit ; This command tells Stata to read semi-colons as the

end-of-line character instead of the carriage return Comment out the carriage return with

/* at the end of line and */ at the beginning of next Comment out the carriage return with ///

27

Do-files > Examples

webuse auto, clear

*this is a comment

#delimit ;summarize price mpg rep78

headroom trunk weight;#delimit cr

summarize price mpg rep78 headroom trunk weight //this is a comment

summarize price mpg rep78 /// headroom trunk weight

summarize price mpg rep78 /* */ headroom trunk weight

28

Saving output

Work in do-files and log your sessions! log using filename

replace or append

log close Output choices:

*.log file - ASCII file (text) *.smcl file - nicer format for viewing and printing in Stata

29

Saving Output, cont.

Graphs are not saved in log files Export current graph:

graph export graph.ext Ex: graph export graph.eps

Supported formats: .ps, .eps, .wmf, .emf .pict

30

Example using local macro

. local mypath "C:\Documents and Settings\MyStata"

. display `mypath'C:\Documents invalid namer(198);

. display C:\Documents and Settings\MyStataC:\Documents invalid namer(198);

. display "`mypath'"C:\Documents and Settings\MyStata

31

Example– foreach, return, displayforeach var of varlist tenure-ln_wage {

quietly summarize `var'

local varmean = r(mean)

display "Variable `var' has mean `varmean’ "

} +---------------------------------------------------+ |tenure hours wks_work ln_wage | |---------------------------------------------------| 1. | .0833333 20 27 1.451214 | 2. | .1666667 15 27 2.09457 | 3. | .25 40 27 1.790204 | 4. | .0833333 44 10 1.02862 | 5. | .0833333 20 10 .7409375 | +----------------------------------------------------+

http://www.ccpr.ucla.edu/Computing_Services/Tutorial/Stata/log/constructs10.07.log

32

Example using forvalues, displayforvalues counter = 1/10 {

display `counter'

}

forvalues counter = 0(2)10 {

display `counter'

}

33

Example: forvalues, generating random variables

forvalues j = 1/3 {

generate x`j' = uniform()

generate y`j' = invnormal(uniform())

}

foreach x of varlist x1-x3 y1-y3 {

summarize `x'

}

34

Example – if/else

foreach var of varlist tenure-ln_wage { quietly summarize `var' local varmean = r(mean) if `varmean' > 10 { display "`var' has mean greater than 10" } else { display "`var' has mean less than 10" }}

35

Special Topic: regular expressions webuse auto List all values of make starting with a capital

and containing an additional capital:

list make if regexm(make, "^[A-Z].+[A-Z].+")

AND ending in a number

list make if regexm(make, "^[A-Z].+[A-Z].+[0-9]$") +-------------------+

| make |

|--------------------|

| Merc. XR-7 |

| Olds Delta 88 |

+--------------------+

36

Special Topic: Exporting results using outreg User-written program called outreg From within Stata, type findit outreg Very simple!! Basically add one line of code after each

regression to export results For an example of code, see

http://www.ats.ucla.edu/stat/stata/faq/outreg.htm

37

Getting Help in Stata

help command_name abbreviated version of manual

search search keywords, local search keywords, net search keywords, all

findit keywords same as search keywords, all

Search Stata Listserver and Stata FAQ

38

Stata Resources

www.stata.com > Resources and Support Search Stata Listserver Search Stata (FAQ) Stata Journal (SJ)

articles for subscribers programs free

Stata Technical Bulletin (STB) replaced with the Stata Journal Articles available for purchase, programs free

Courses (for fee)

39

Updating Stata

help update update all

CCPR’s Cluster and helping your research Software and Data

STATA, SAS, R, Compilers, text editors, etc HRS, CPS (Unicon version), AddHealth, IFLS, etc

Efficiency Your PC is available for other work when you submit a job

to the cluster Faster processors More RAM Easy to share data, programs, etc. with colleagues via the

cluster Obtain access by requesting an account

http://lexis.ccpr.ucla.edu/account/request/

Questions/Feedback Please email me if you need help in the future

[email protected]


Recommended