Post on 21-Jan-2021
transcript
Stata Introduction, 3 h
Presented by Cecilie Dahl
Presentation, data and programs at:
https://www.med.uio.no/helsam/forskning/aktuelt/arrange
menter/andre/stata-course-uio.html
Stata introduction
•General use• Interface and menu•Do-files and syntax•Data handling
•Analysis•Descriptive•Graphs•Bivariate
Exercises
Why Stata
•Pro•Price•Aimed at epidemiology (and economy)•Many methods, growing•Graphics•Structured, Programmable
•Con•File size < Memory
Smart working
•Data (.dta)•Master file, safe•Working file for each project
•Syntax (.do)•Work in progress file•Manuscript file (Table 1…, Figure 1…, Supplement)
•Output (.smcl or .log)•Save or discard
Syntax
•Examples•mean age•mean age if sex==1•bysort sex: summarize age•summarize age ,detail
9
command [varlist] [if exp] [in range] [, opts][bysort varlist:]
Syntax
Use and save data
•Open data•use “C:\Course\Myfile.dta”, clear
•Describe•describe describe all variables• list sex age in 1/20 list obs nr 1 to 20
•Save data• save “C:\Course\Myfile.dta” ,replace
12
Exercise 1•Start Stata
•Open the birth data (…birth1.sav)
•Open a new syntax file (Ctrl-9)
•Describe all variables: describe.
• List the 10 first observations of weight, sex and mother’s age (mage)
•Save the syntax file for later use
5-10 min
https://www.med.uio.no/helsam/forskning/aktuelt/arrangementer/andre/stata-course-uio.html
Descriptive• Continuous
• Categorical
summarize weight
summarize weight, details percentiles ++
tabulate bullied
tabulate bullied, nolab show coding
Generate, replace
• Index (young men)
• generate index=0• replace index=1 if sex==1 & age<30
•Young/Old• generate old=(age>50) if age<.
•Serial numbers• generate id=_n
16
Dates
•From numeric to date (3 numeric variables into date variable)
ex: m=12, d=2, y=1987generate birth=mdy(m,d,y)format birth %td
•From string to date (1 string variable into date variable)
ex: bstr=“02.12.1987”generate birth=date(bstr,”DMY”)format birth %td
18
Exercise 2•Summarize mother’s age
•Tabulate sex
•Recode sex into sex0 with categories 0, 1
•Generate new gestational age in weeks (the old is in days)•Summarize the new variable
10 min
Missing
•Obs!!!• Represented as ”.”• Missing values are large numbers • age>30 will include missing.• age>30 if age<. will not.
•Test• replace age=0 if (age==.)
•Change• replace educ=. if educ==99
20
Describe missing• Summarize missing
• Missing in tables
21
tab bullied sex, missing
misstable summarize weight sex gest missing
Exercise 3
•Tabulate missing in gestational age (gest) with the misstable command
•Tabulate gest4 versus sex and include missing
•Summarize mage if gest is greater than 260 days•Will this include missing in gest?•Summarize mage if gest is greater than 260 days
excluding missing in gest
10 min
Help•General• help command• findit keyword search Stata+net
•Examples• help table• findit aflogit
23
Many videos on YouTube
Summing up
• Use do files• Run: Mark, Ctrl-D
• Syntax• command [varlist] [if exp] [in range] [, options]
•Missing• age>30 if age<.• generate old=(age>50) if age<.
•Help•help describe
Oct-19 24
Twoway plots
•Syntax• twoway (plot1, opts) (plot2, opts), opts
•One plot•kdensity bw
• scatter bw gest
Oct-19 26
0 2000 4000 6000Birth weight
kernel = epanechnikov, bandwidth = 102.3251
Kernel density estimate
0
200
04
00
06
00
0
Birth
we
igh
t
240 260 280 300 320 340Gestational age
Oct-19 27
twoway (scatter bw gest) (fpfitci bw gest) (lfit bw gest)
200
03
00
04
00
05
00
06
00
0g
ram
250 270 290 310days
Weight by gestational age
scatter smooth with CI line fit
Titles
Oct-19 28
1000
2000
3000
4000
5000
ytitle
240 260 280 300 320xtitle
note
subtitle
title
scatter bw gest, title("title") subtitle("subtitle") ///
xtitle("xtitle") ytitle("ytitle") note("note")
Exercise 4
•Make a density plot of birth weight (weight)
•Make a scatter plot of birth weight versus gestational age (gest)• Replace the outlier in gestational age (gest) with missing
• Restrict the plot to gestational age greater than 250 days (hint if gest>250)• Add a linear fit line to the scatter plot to see the trend• Add a smoothing curve with confidence interval to the plot
(fpfitci) to look for non-linear trend. The order of plots matters• Add a title, ytitle and xtitle to the plot
10 min
Two independent samples
31
2000 3000 4000 5000 6000Birth weight
twoway ( kdensity weight if sex==1, lcolor(blue) ) ///
( kdensity weight if sex==2, lcolor(red) )
Equal means?
Equal variance?
Do boys and girls have the same mean birth weight?
Test of equal variance:robvar weight, by(sex)
Two independent samples test
32
ttest weight, by(sex) unequal
ttest w1 w2, paired
ttest weight, by(sex) 2-sample T-test
Crosstables
33
equal proportions?
Are boys bullied as much as
girls?
tabulate bullied sex, col chi2 nofreq
Exercise 5
• The variable “magegr2” contains mother’s age in two groups. Do tab magegr2 and tab magegr2, nolab to find the groups and the coding. An alternative to find coding is to list all labels: label list
• Make a plot of the birth weight distribution for each of the two groups of mother’s age.
• Do a ttest of weight by magegr2. Are the means different?
• Redo the ttest for weight>2000 to get more normal distributions.• Are the means different?
• Are the p-values different?
• Generate an indicator for high birth weight (>4500).
• Make a table of high birth weight by gestgr2 with columns percent and chi-square test
Extra (if you have time)
•Do a help tabstat and look at the statistics options
•Do a tabstat of weight showing N min p25 p50 p75 max, by magegr2
Summing up
•Descriptivesummarize weighttabulate sex
•Graphstwoway (plot1, opts) (plot2, opts),
opts
•Bivariate•ttest weight, by(sex)•tabulate bullied sex, chi2
36
Save output (Log results)
•Save a portion of the analysis as a .smcl file
log using “results.smcl”…log close
Keep plots during session
•Set “tabbed” graphics
•Give each plot a name
set autotabgraphs on, permanently
twoway …, name(“scatter”,replace)