Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 219 times |
Download: | 2 times |
HS 167 9: Inference About a Proportion
1
Inference about a proportion
Unit 9
HS 167 9: Inference About a Proportion
2Copyright ©1997 BMJ Publishing Group Ltd.
Greenhalgh, T. BMJ 1997;315:364-366
Our data analysis journey continues …
HS 167 9: Inference About a Proportion
3
Types of response variables
Quantitative Sums Averages
Categorical Counts Proportions
Response type
Prior chapters have focused on quantitative response variables. We now focus on categorical response
variables.
HS 167 9: Inference About a Proportion
4
Binary variables
We focus on the most popular type of categorical response, the binary response (categorical variables with two categories; dichotomous variables)Examples of binary responses
CURRENT_SMOKER: yes/no SEX: male/female SURVIVED: yes/no DISEASE_STATUS: case/non-case
One category is arbitrarily labeled a “success”Count the number of success in the sample Turn the count into a proportion
HS 167 9: Inference About a Proportion
5
Proportions
np
successes"" ofnumber ˆ
tois as , tois ˆ xpp
The proportion in the sample is denoted "p-hat"
The proportion in the population (parameter) is denoted p
HS 167 9: Inference About a Proportion
6
A proportion is an average of 0s and 1s
Example (right):n =10X binary attribute coded 1=YES and 0=NOx = 2
Observation X
1 0
2 1
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 1
x = 2
p
n
xx
ˆ
samplein proportion10
2
HS 167 9: Inference About a Proportion
7
Incidence proportion and prevalence proportion
Incidence proportion (risk): proportion that develop condition over specified timePrevalence: proportion with the condition at a point in time
Source www.bioteach.ubc.ca/Biomedicine/Smallpox/
HS 167 9: Inference About a Proportion
8
Example: “Smoking prevalence”
SRS, n = 57, determine number of current smokers (“successes”) X = 17
29.8% is sample in the smoking of prevalence The
2982.57
17ˆ p
Calculations: at least 4 significant digits Reporting (APA 2001): convert to percent and report xx.x%
HS 167 9: Inference About a Proportion
9
Inference about the proportion
How good is sample proportion (p-hat) as an estimate population proportoin (p)?To answer this question, consider what would happen if we took many samples of size n from the same population?This creates the sampling distribution of p-hat
tois as , tois ˆ xpp
HS 167 9: Inference About a Proportion
10
Binomial Sampling Distribution
The sampling distribution of the is binomial Binomial probabilities are difficult to calculateHowever, the binomial becomes Normal when n is large (central limit theorem)The figure to the right shows the number of smokers expected in a sample of n = 57 from a population in which p = 0.25. This distribution is both binomial and Normal. We can use a Normal approximation to the binomial when n is large.
0
.02
.04
.06
.08
.10
.12
.14
0 5 10 15 20 25 30 35 40 45 50 55
Number of Successes
HS 167 9: Inference About a Proportion
11
Sampling Distribution of p-hat when n large
pqn
pqSE
SEpNp
p
p
1 where
where
),(~ˆ
ˆ
ˆ
When n is large,
HS 167 9: Inference About a Proportion
12
Confidence interval for p (plus 4 method)
Take a SRS, count the successes and failures, add two imaginary successes and two imaginary failures to the statistics, put a tilde over these revised statistics:
~
~~ where~ ~~1 2 n
qpSESEzp pp
~
~~ and ,4~ ,2
~
n
XpnnXX
Then calculate the CI according to this formula:
Example: 17 smokers in n = 57
6885.3115.1~
3115.61
19~
614574~192172
~
q
p
nn
XX
)4277. ,1953(.
1162.3115.
)0593)(.96.1(3115.
ˆ for CI 95%
confidence 95%for 96.1
0593.61
)6885)(.3115(. ~
~~
ˆ
~
p
p
SEzpp
zn
qpSE
HS 167 9: Inference About a Proportion
13
Sample size requirements
2
**21 2
d
qpzn
To estimate p with margin of error d use:
where z is the Z value for given level of confidence and p* is an educated guess for the proportion you want to estimate (use p* = 0.5 to get the “safest” estimate)
323322.7 05.
)70)(.30)(.96.1(2
2
n
Redo study but now want margin of error of ±.03
897896.4 03.
)70)(.30)(.96.1(2
2
n
Sample size calculations always rounded up.
Example: Redo smoking survey; now want 95% CI with margin of error ±.05; assume p* = 0.30 (best available estimate)
HS 167 9: Inference About a Proportion
14
Hypothesis TestA. H0: p = p0 vs. H1: p p0
where p0 represents the proportion specified by null hypothesis
B. Test statistic
C. P-value (from z table)D. Significance level
n
qpSE
SE
ppz
p
p
00ˆ
ˆ
0stat
where
ˆ
Illustration: Prevalence of smoking in the U.S. (p0) is 0.25. Take a SRS of n = 57 from community and find 17 smokers. Therefore, p-hat = 17 / 57 = 0.2982. Is this significantly different than 0.25?A. H0:p = 0.25 vs. H1:p ≠ 0.25B. Test statistic
C. P = 0.4010D. Evidence against H0 is not
significant (retain H0)
84.00574.
25.2982.
ˆ
574.057
)75)(.25(.
ˆ
0stat
00ˆ
p
p
SE
ppz
n
qpSE
HS 167 9: Inference About a Proportion
15
Conditions for Inference
Valid informationSRSTo use Normal based method For plus-four confidence interval, n must be
10 or greater For z test, np0q0 5
Illustration: n = 57, p0= 0.25, q0 = 0.75Therefore, np0q0 = 57 ∙ 0.25 ∙ 0.75 = 10.7 → “OK”