Date post: | 17-Jan-2016 |
Category: |
Documents |
Upload: | bartholomew-robertson |
View: | 216 times |
Download: | 0 times |
Basics of Biostatistics for Health ResearchSession 3 – February 21, 2013
Dr. Scott Patten, Professor of EpidemiologyDepartment of Community Health Sciences
& Department of Psychiatry
Some General Principles of Data Analysis
• Data cleaning, checking is always the first step after data entry.
• Start with “univariate” analysis (frequencies and their CIs.
• Progress to “bivariate” analysis – does a “dependent” variable differ depending on an “independent” variable
• The next stage is “multivariate”
Statistical Errors
• Go to “www.ucalgary.ca/~patten” www.ucalgary.ca/~patten
• Scroll to the bottom.
• Right click to download the files described as being “for PGME Students”– One is a dataset– One is a data dictionary
• Save them on your desktop
Open the Datafile
Comparing Proportions
• We’ve looked at two procedures (e.g. for obesity in men vs. women):
generate obese = bmi
recode obese 0/30=0 30.001/1000=1
prtest obese, by(sex)
Generate Commands Using Logic
generate obese2 = .
recode obese2 .=0 if bmi <= 30
recode obese2 .=1 if bmi > 30
tab obese obese2
prtest obese2, by(sex)
Generate as a Recode Subcommand
recode bmi (0/30=0) (30.01/1000=1), gen(obese3)
tab obese obese3
Alternative to prtest
• Can use tab with the subcommand “exact”
tab obese sex, exact
Epitab Commands
1
3
2
Risk Ratios
“risk” in the “exposed”
“risk” in the “non-exposed”RR =
Odds Ratios
Odds in the “exposed”
Odds in the “non-exposed”OR =
Measures of Association
• The most common ones are ratios..– RR– OR– PR– IR
• You’ll sometimes see differences as well..– Risk Difference
Another Alternative…
• The “cs” command is for “cross-sectional” and will give you risk ratios or risk differences
• However, it requires 0 and 1 values.
recode sex (1=0) (2=1), gen(female)
cs obese female, exact
Odds and Proportions
• In our sample, there are…– 1560 obese– 10,015 non-obese– (and 52 missing)
• The frequency of obesity (prevalence) is 1,560/(1,560 + 10,015)
• The odds are: 1,560/10,015
Odds and Proportions
• In other words…– If ‘a’ means “have disease” and b means “does
not have the disease” then…– Proportion = a / a+b– Odds = a / b
Another Alternative…
• The “cc” command is for “case-control” and will give you odds ratios
• However, it requires 0 and 1 values.
cc obese female, exact
As Task for You…• What is the prevalence of diabetes? (provide a 95%
confidence interval for your estimate)
• What is the prevalence of diabetes in men and women (hint: use “by” in the dialogue box)
• What is the odds ratio for the association of diabetes and obesity?
• What is the risk ratio for the association of diabetes and obesity?
• Is the association statistically significant?
A More Complex Problem..
• The prevalence of obesity is said to be associated with lower levels of education
Two-way Tables
12
3
A Two-way Table
Pearson chi2(3) = 136.4094 Pr = 0.000
Total 4,663 3,395 1,883 1,341 11,282
1 813 417 162 112 1,504
0 3,850 2,978 1,721 1,229 9,778
obese 1 2 3 4 Total
grad+
0-11 years, hs or ged, some coll, coll
. tabulate obese educ, chi2
Bar Graphs
• It is under the graphics menu, the dialogue box…
1 2
3
Select Categories..
1
2
0.0
5.1
.15
.2m
ean
of o
bese
1 2 3 4
Histograms, with “by”
• The pattern of obesity by education is different than that of mean BMI.
• Your Task: use the “by” subcommand with the histogram command to look at the distribution of BMI by eduation.
Does BMI Differ by Education?
• If we had two groups we’ld use a t-test.
• Our null would be Mean(1) = Mean(2), or as Stata says: Mean(1) – Mean(2) = 0
• But we have > 2 groups, so could try to use ANOVA– Can think of this test as an extension of the two
group t-test– Assumes normal distribution and equal variances
(like the t-test it is “parametric”)
1
2
3
One-Way ANOVA
STATA Warns of a Problem
Bartlett's test for equal variances: chi2(3) = 195.1798 Prob>chi2 = 0.000
Total 189524.965 11281 16.8003692
Within groups 184631.134 11278 16.370911
Between groups 4893.83027 3 1631.27676 99.64 0.0000
Source SS df MS F Prob > F
Analysis of Variance
. oneway bmi educ
The Kruskal-Wallis Test
1
2
3
Kruskal-Wallis Output
probability = 0.0001
chi-squared with ties = 294.008 with 3 d.f.
probability = 0.0001
chi-squared = 294.007 with 3 d.f.
4 1341 6.93e+06
3 1883 9.29e+06
2 3395 1.83e+07
1 4663 2.91e+07
educ Obs Rank Sum
Kruskal-Wallis equality-of-populations rank test
. kwallis bmi, by(educ)
Non-Parametric Tests• Kruskall-Wallis and its 2 sample version (Wilcoxon
Rank Sum Test) require that…– The variable can be meaningfully ordered, and– Has a roughly/loosely bell shaped frequency
distribution (should have a central tendency)
• Your task: Repeat our analysis from last week in which we compared BMI in men and women, but use Kruskall-Wallis and Wilcoxon’s Rank Sum test.– Do you get equivalent results?
Comparing Proportions?
Yes No
Fisher’s Exact Test Parametric Assumptions?
Yes No
Multiple Groups? Multiple Groups?
Yes NoYes No
ANOVA t-test Kruskall-Wallis Wilcoxon’s-Rank Sum
Prevalence of Diabetes
Your Task: try this command: cii 11627 530, exact
Total 11,627 100.00
1 530 4.56 100.00
0 11,097 95.44 95.44
y/n Freq. Percent Cum.
diabetic
. tab diabetes
(Does your estimate resemble what you get with ci diabetes, exact?)
The CI Calculator
1 2
3
The “CC” Calculator
12
3
The CC Calculator
Your Task: Try the cci command to obtain the OR
chi2(1) = 122.89 Pr>chi2 = 0.0000
Attr. frac. pop .1368591
Attr. frac. ex. .7042266 .6248288 .7652318 (exact)
Odds ratio 3.380966 2.66545 4.25952 (exact)
Point estimate [95% Conf. Interval]
Total 842 10785 11627 0.0724
Controls 739 10358 11097 0.0666
Cases 103 427 530 0.1943
Exposed Unexposed Total Exposed
Proportion
. cc diabetes prevchd
Your Task: Can you reproduce these CIs with an immediateCommand?
Diagnostic Test Metrics
• Sensitivity
• Specificity
• Positive Predictive Value
• Negative Predictive Value
Common Notation for Test Metrics
Formulas for Test Metrics…
Let’s make formulas for Se, Sp, PPV and NPV using this terminology.
(In Class) Assignment for Today
• Our database has random blood glucose (they call it “casual”)
• In these units (mg/dl) about 140 may be used as a cut-point for an “elevated” level
• Create a variable for “elevated” glucose and determine its Se, Sp, PPV and NPV as a diagnostic test for diabetes
• Calculate a confidence interval for each parameter.