Date post: | 20-Nov-2014 |
Category: |
Documents |
Upload: | bianca-evangelista |
View: | 119 times |
Download: | 0 times |
SAMPLE SIZE SAMPLE SIZE CALCULATIONCALCULATION
Melchor V.G. Frias, IVClinical Epidemiology Unit
Angelo King Medical Research CenterDe La Salle Health Sciences Institute
Learning Objectives:Learning Objectives:
At the end of this session, learners should be able to:
1. Explain the concept/importance of sample size,
2. Explain and apply the concept of hypothesis testing,
3. Apply sample size formulas for descriptive and analytic studies,
4. Identify the requirements for sample size calculation ,
5. Apply OPEN EPI/EPIINFO for sample size calculation for cross-sectional, cohort, case-control and experimental studies.
How many subjects are How many subjects are to be included in the to be included in the sample?sample?
SAMPLE SIZE CALCULATION Why calculate?
for planning purposes for “power” of the study (low power – it
will have little chance of giving a statistically significant difference).
meaningful results (small sample - the study will have failed to establish that the intervention has no appreciable effect).
How do we calculate sample How do we calculate sample size?size?
♦Using formulas♦Using tables of sample sizes♦Using statistical calculators (StatCalc of EpiInfo, Open EPI)
Sample size calculationSample size calculation
Things to know: type of the study: descriptive or
analytic? proportions or means? usual values? amount of deviation from the true
value? Clinically important difference? confidence level? power? one-tailed or two-tailed hypotheses
Hypotheses testingHypotheses testing
The first thing to do when given a claim is to write the claim mathematically (if possible), and decide whether the given claim is the null or alternative hypothesis.
Hypotheses testingHypotheses testing
If the given claim contains equality, or a statement of no change from the given or accepted condition, then it is the null hypothesis, otherwise, if it represents change, it is the alternative hypothesis.
Hypotheses testingHypotheses testing
hypothesis -- a statement about the population
null hypothesis (Ho) -- equality alternative hypothesis (Ha) --
two-tailed -- not equal one-tailed -- one is greater than the
other
Hypotheses testingHypotheses testing
"He's dead, Jim," said Dr. McCoy to Captain Kirk.
Hypotheses testingHypotheses testing
Mr. Spock, as the science officer, is put in charge of statistically determining the correctness of Bones‘ statement and deciding the fate of the crew member (to vaporize or try to revive)
Hypotheses testingHypotheses testing
• His first step is to arrive at the hypothesis to be tested. • Does the statement represent a change in previous condition?
Yes, there is change, thus it is the alternative hypothesis, H1
No, there is no change, therefore it is the null hypothesis, H0
Hypotheses testingHypotheses testing
The correct answer is that there is change. Dead represents a change from the accepted state* of alive. The null hypothesis always represents no change. Therefore, the hypotheses are:
H0 : Patient is alive.
H1 : Patient is not alive (dead).
Hypotheses testingHypotheses testing
Possible states of nature (Based on H0)
Patient is alive (H0 true - H1 false )
Patient is dead (H0 false - H1 true)
Hypotheses testingHypotheses testing
Decisions are something that you have control over. You may make a correct decision or an incorrect decision. It depends on the state of nature as to whether your decision is correct or in error.
Hypotheses testingHypotheses testing
Possible decisions (Based on H0 ) / conclusions (Based on claim )
Reject H0 / "Sufficient evidence to say patient is dead"
Fail to Reject H0 / "Insufficient evidence to say patient is dead"
Hypotheses testingHypotheses testing
There are four possibilities that can occur based on the two possible states of nature and the two decisions which we can make.
Hypotheses testingHypotheses testing
Statisticians will never accept the null hypothesis, we will fail to reject. In other words, we'll say that it isn't, or that we don't have enough evidence to say that it isn't, but we'll never say that it is, because someone else might come along with another sample which shows that it isn't and we don't want to be wrong.
Hypotheses testing - Hypotheses testing - Statistically speaking:Statistically speaking:
State of Nature
Decision H0 True H0 False
Reject H0 Patient is alive,
Sufficient evidence of
death
Patient is dead, Sufficient evidence
of death
Fail to reject H0
Patient is alive,
Insufficient evidence of
death
Patient is dead, Insufficient
evidence of death
Hypotheses testing – In Hypotheses testing – In English (or Clingon?)English (or Clingon?)
State of Nature
Decision H0 True H0 False
Reject H0 Vaporize a live person
Vaporize a dead person
Fail to reject H0
Try to revive a live person
Try to revive a dead person
Hypotheses testing – Hypotheses testing – Were you right?Were you right?
State of Nature
Decision H0 True H0 False
Reject H0 Type I Erroralpha
Correct Assessment
Fail to reject H0
Correct Assessment
Type II Errorbeta
Hypotheses testingHypotheses testing
State of Nature
Decision H0 True H0 False
Reject H0 Type I Erroralpha
Correct Assessment
Fail to reject H0
Correct Assessment
Type II Errorbeta
Which of the two errors is more serious? Type I or Type II ?
Hypotheses testingHypotheses testingState of Nature
Decision H0 True H0 False
Reject H0 Correct Assessment
Fail to reject H0
Correct Assessment
Which of the two errors is more serious? Type I or Type II ?
Patient is dead, Insufficient evidence of death: revive a dead person
Patient is alive, Sufficient evidence of death:vaporize a live person
Hypotheses testingHypotheses testing
Disease actually present
Diagnosis No Yes
Disease present Mis-diagnosis Correct diagnosis
Disease absent Correct diagnosis
Missed diagnosis
Hypotheses testingHypotheses testing
Assumption of innocence
Judgment True False
Pronounced guilty Serious error in judgment
Correct judgment
Pronounced not guilty
Correct judgment
Error in judgment
Hypotheses testingHypotheses testing
Since Type I is the more serious error (usually), that is the one we concentrate on.
We usually pick alpha to be very small
(0.05, 0.01). Note: alpha is not a Type I error.
Alpha is the probability of committing a Type I error. Likewise beta is the probability of committing a Type II error.
Hypotheses testingHypotheses testing
Conclusions Conclusions are sentence answers
which include whether there is enough evidence or not (based on the decision), the level of significance, and whether the original claim is supported or rejected.
Hypotheses testingHypotheses testing
Conclusions Conclusions are based on the original
claim, which may be the null or alternative hypotheses. The decisions are always based on the null hypothesis
Hypotheses testing - Hypotheses testing - ConclusionsConclusions
Original Claim
DecisionH0
"REJECT"
H1
"SUPPORT"
Reject H0
"SUFFICIENT"
There is sufficient evidence at the alpha level of significance to reject the claim
that (insert original claim here)
There is sufficient evidence at the alpha level of
significance to support the claim that (insert original
claim here)
Fail to reject H0
"INSUFFICIENT"
There is insufficient evidence at the alpha level of significance to reject the claim
that (insert original claim here)
There is insufficient evidence at the alpha level of significance to support
the claim that (insert original claim here)
DefinitionsDefinitions
Null Hypothesis ( H0 ) Statement of zero or no change. If the original claim includes equality (<=,
=, or >=), it is the null hypothesis. If the original claim does not include
equality (<, not equal, >) then the null hypothesis is the complement of the original claim.
The null hypothesis always includes the equal sign. The decision is based on the null hypothesis.
DefinitionsDefinitions
Alternative Hypothesis ( H1 or Ha ) Statement which is true if the null
hypothesis is false. The type of test (left, right, or two-tail) is
based on the alternative hypothesis.
DefinitionsDefinitions
One-Tailed (Sided) Test
DefinitionsDefinitions
Two-Tailed (Sided) Test
DefinitionsDefinitions
Type I error Rejecting the null hypothesis when it is true
(saying false when true). Usually the more serious error.
Type II error Failing to reject the null hypothesis when it
is false (saying true when false).
DefinitionsDefinitions
alpha ( - probability of committing Type I
error 1- - the confidence level
beta - probability of committing Type II
error 1- - power of the study; ability to
detect a true difference
DefinitionsDefinitions
Significance level ( alpha ) The probability of rejecting the null
hypothesis when it is true. alpha = 0.05 and alpha = 0.01 are common.
If no level of significance is given, use alpha = 0.05.
The level of significance is the complement of the level of confidence in estimation.
Confidence level, PowerConfidence level, Power
Usual Values:
= 0.05, 1- (confidence level) = .95
= 0.20,
1- (power) = 0.80
Confidence level, PowerConfidence level, Power
The easiest ways to increase power are to:
increase sample size
increase desired difference (or effect size)
decrease significance level desired e.g. 10%
DefinitionsDefinitions
Decision A statement based upon the null
hypothesis. It is either "reject the null hypothesis" or
"fail to reject the null hypothesis". We will never accept the null hypothesis.
DefinitionsDefinitions
Conclusion A statement which indicates the level of
evidence (sufficient or insufficient), at what level of significance, and whether the original claim is rejected
(null) or supported (alternative).
How do we calculate sample How do we calculate sample size?size?
A.J. Dobson’s formula (SIMPLE RANDOM SAMPLE)
descriptive studies population proportion population mean
analytic studies comparing two proportions comparing two means
Sample size for descriptive studiesSample size for descriptive studies
1. Estimation of a population 1. Estimation of a population proportionproportion
wheren = computed sample size
p = estimate of the proportion = the desired width of the confidence interval 1- = confidence level
)1()100(
2
fpp
n
Sample size for descriptive studiesSample size for descriptive studies
1. Estimation of a population 1. Estimation of a population proportionproportion
Table 1 Values for f(1-) for various confidence levels 100 (1-) %
(1-) 0.8 0.9 0.95 0.99
f(1-)* 1.642 2.706 3.842 6.635
* f(1-) is the square of the upper 1/2 point of the std. Normal Distribution
Sample size for descriptive studiesSample size for descriptive studies
1. Estimation of a population 1. Estimation of a population proportionproportion
A researcher wants to estimate the smoking prevalence in high school students . What is the sample size if it is expected that the smoking prevalence is 15%, and a 95% confidence interval will be used for an interval of 4% (11-19%)?
)1()100(
2
fpp
n
Sample size for descriptive studiesSample size for descriptive studies
1. Estimation of a population 1. Estimation of a population proportionproportion
Table 1 Values for f(1-) for various confidence levels 100 (1-) %
(1-) 0.8 0.9 0.95 0.99
f(1-)* 1.642 2.706 3.842 6.635
* f(1-) is the square of the upper 1/2 point of the std. Normal Distribution
Sample size for descriptive studiesSample size for descriptive studies
1. Estimation of a population 1. Estimation of a population proportionproportion
A researcher wants to estimate the smoking prevalence in high school students . What is the sample size if it is expected that the smoking prevalence is 15%, and a 95% confidence interval will be used for an interval of 4% (11-19%)?
306
842.34
)15100(152
n
n
)1()100(
2
fpp
n
Sample size for descriptive studiesSample size for descriptive studies
2. Estimation of a population 2. Estimation of a population meanmean
)1(2
2
fs
n
wheren = computed sample size
s = estimate of the standard deviation of the observations = the desired width of the confidence interval 1- = confidence level
Sample size for descriptive studiesSample size for descriptive studies
2. Estimation of a population 2. Estimation of a population meanmean
Table 1 Values for f(1-) for various confidence levels 100 (1-) %
(1-) 0.8 0.9 0.95 0.99
f(1-)* 1.642 2.706 3.842 6.635
* f(1-) is the square of the upper 1/2 point of the std. Normal Distribution
Sample size for descriptive studiesSample size for descriptive studies
2. Estimation of a population 2. Estimation of a population meanmean
A researcher wants to estimate the mean serum cholesterol level (mg/100ml) in a group of men. How many men should be included if he wants to be 90% confident that the estimate of the mean will fall within 10mg/100ml of the true value and standard deviation is estimated to be 40mg/100ml?
)1(2
2
fs
n
Sample size for descriptive studiesSample size for descriptive studies
2. Estimation of a population 2. Estimation of a population meanmean
Table 1 Values for f(1-) for various confidence levels 100 (1-) %
(1-) 0.8 0.9 0.95 0.99
f(1-)* 1.642 2.706 3.842 6.635
* f(1-) is the square of the upper 1/2 point of the std. Normal Distribution
Sample size for descriptive studiesSample size for descriptive studies
2. Estimation of a population 2. Estimation of a population meanmean
A researcher wants to estimate the mean serum cholesterol level (mg/100ml) in a group of men. How many men should be included if he wants to be 90% confident that the estimate of the mean will fall within 10mg/100ml of the true value and standard deviation is estimated to be 40mg/100ml?
)1(2
2
fs
n
43706.210
402
2
n
Sample size for analytic studiesSample size for analytic studies
1. Hypothesis testing between two 1. Hypothesis testing between two proportionsproportions
),()21(
)2100(2)1100(12
fpp
ppppn
wheren = computed sample size
p1, p2 = estimate of the sample proportion for each group 1- = confidence level 1- = power of the test
Sample size for analytic studiesSample size for analytic studies
1. Hypothesis testing between two 1. Hypothesis testing between two proportionsproportions
Significance level, one-tailed two-tailedPower,
1- 0.05 0.01 0.05 0.01
0.5 2.71 5.41 3.84 6.63
0.8 6.18 10.04 7.85 11.68
0.9 8.56 13.02 10.51 14.88
Table 2 Values for f(,)*
* f(,) is the square of the sum of the upper tail and the upper tail point (for one tailed test) or 1/2 point (for two-tailed test) of the std. Normal distribution
Sample size for analytic studiesSample size for analytic studies
1. Hypothesis testing between two 1. Hypothesis testing between two proportionsproportions
A new antibiotic is to be compared to a standard drug with respect to cure rate of urinary tract infection. The new drug will be considered better than the standard drug if it shows a 5% difference from the cure rate of 80%. How many patients are needed if the investigator wants 90% power and 95% confidence?
),()21(
)2100(2)1100(12
fpp
ppppn
Sample size for analytic studiesSample size for analytic studies
1. Hypothesis testing between two 1. Hypothesis testing between two proportionsproportions
Significance level, one-tailed two-tailedPower,
1- 0.05 0.01 0.05 0.01
0.5 2.71 5.41 3.84 6.63
0.8 6.18 10.04 7.85 11.68
0.9 8.56 13.02 10.51 14.88
Table 2 Values for f(,)*
* f(,) is the square of the sum of the upper tail and the upper tail point (for one tailed test) or 1/2 point (for two-tailed test) of the std. Normal distribution
Sample size for analytic studiesSample size for analytic studies
1. Hypothesis testing between two 1. Hypothesis testing between two proportionsproportions
A new antibiotic is to be compared to a standard drug with respect to cure rate of urinary tract infection. The new drug will be considered better than the standard drug if it shows a 5% difference from the cure rate of 80%. How many patients are needed if the investigator wants 90% power and 95% confidence?
98456.8)8580(
)85100(85)80100(802
n
),()21(
)2100(2)1100(12
fpp
ppppn
Sample size for analytic studiesSample size for analytic studies
2. Hypothesis testing between 2. Hypothesis testing between two meanstwo means
),(2
2
2
fsn
wheren = computed sample size
s = estimate of the standard deviation of the observations, assuming it is the same for each group = the true difference between the means 1- = confidence level 1- = power
Sample size for analytic studiesSample size for analytic studies
1. Hypothesis testing between two 1. Hypothesis testing between two meansmeans
Significance level, one-tailed two-tailedPower,
1- 0.05 0.01 0.05 0.01
0.5 2.71 5.41 3.84 6.63
0.8 6.18 10.04 7.85 11.68
0.9 8.56 13.02 10.51 14.88
Table 2 Values for f(,)*
* f(,) is the square of the sum of the upper tail and the upper tail point (for one tailed test) or 1/2 point (for two-tailed test) of the std. Normal distribution
Sample size for analytic studiesSample size for analytic studies
2. Hypothesis testing between 2. Hypothesis testing between two meanstwo means
To determine whether an antihypertension therapy can reduce the average blood pressure of some group by 5 mmHg when the standard deviation is 10 mmHg, how many patients are needed for a two-tailed test at the 5% significance level, and power of 90%?
),(2
2
2
fsn
Sample size for analytic studiesSample size for analytic studies
2. Hypothesis testing between two 2. Hypothesis testing between two meansmeans
Significance level, one-tailed two-tailedPower,
1- 0.05 0.01 0.05 0.01
0.5 2.71 5.41 3.84 6.63
0.8 6.18 10.04 7.85 11.68
0.9 8.56 13.02 10.51 14.88
Table 2 Values for f(,)*
* f(,) is the square of the sum of the upper tail and the upper tail point (for one tailed test) or 1/2 point (for two-tailed test) of the std. normal distribution
Sample size for analytic studiesSample size for analytic studies
2. Hypothesis testing between 2. Hypothesis testing between two meanstwo means
To determine whether an antihypertension therapy can reduce the average blood pressure of some group by 5 mmHg when the standard deviation is 10 mmHg, how many patients are needed for a two-tailed test at the 5% significance level, and power of 90%?
),(2
2
2
fsn
8451.105
)10(22
2
n
Sample size calculation using EPI-Sample size calculation using EPI-Info6Info6http://www.cdc.gov/epiinfo/Epi6/ei6.hthttp://www.cdc.gov/epiinfo/Epi6/ei6.htmm STATCALC program
Sample size for analytic studiesSample size for analytic studies
2. Hypothesis testing between 2. Hypothesis testing between two meanstwo means
To compare two antianemia treatment groups in terms of outcome of hemoglobin level. What is the sample size needed if expected mean hgb level after treatment for group A is 132.86 with standard deviation of 15.34 and the mean hemoglobin level for group B is 127.44 with sd of 18.23?
http://www.openepi.com/Menu/OpenEpiMenu.htm
Sample size for analytic studiesSample size for analytic studies
Case Control StudyCase Control Study
Research question: Is there an association between receiving HRT and development of breast CA among women in Dasmarinas, Cavite?
Odds of exposure among diseased = 175/75 = 2.3Odds of exposure among non-diseased = 25/225 = 0.11
Odds Ratio = 21
You need to have an estimate of the percentage of exposure among the controls and either the odds ratio or the percentage of exposure among cases
Sample size for analytic studiesSample size for analytic studies
Cohort StudyCohort Study
Research question: Is Hib vaccine associated with the development of leukemia among children in Dasmarinas, Cavite ?
Incidence of disease among exposed = 150/500 = 0.3Incidence of disease among unexposed = 400/500 = 0.8
Relative Risk = 0.375
You need to know the percentage of outcome among the unexposed, and either an OR, RR or the percentage of the outcome among the exposed.
Calculate sample size: Calculate sample size: RCTRCT
Example: Efficacy of flubendazole compared to mebendazole in the treatment of trichiuriasis among pediatric patients.
Objective: To compare resolution of trichiuriasis for pediatric patients given flubendazole and those given mebendazole.
Flubendazole group(Exposed)
Mebendazolegroup(Unexposed)
+ resolution
(-) resolution
+ resolution
(-) resolution
Calculate sample size: Calculate sample size: RCTRCT
Example: Efficacy of flubendazole compared to mebendazole in the treatment of trichiuriasis among pediatric patients.
Objective: To compare resolution of trichiuriasis for pediatric patients given flubendazole and those given mebendazole.
Flubendazole group(Exposed)
Mebendazolegroup(Unexposed)
+ resolution
(-) resolution
+ resolution
(-) resolution
75%
50%
50% with resolution inMebendazole group
75% with resolution in flubendazole group
General comments on estimation of General comments on estimation of sample sizesample size
Compute the sample size as early as possible during the design phase, (to estimate the resources required and the feasibility of the study.
The rarer the condition being investigated, the larger the sample size, all other things being equal.
Complex data analysis generally requires larger samples than simple analysis.
In general, longitudinal studies require a larger sample size than case-control and cross sectional studies.
General comments on estimation of General comments on estimation of sample sizesample size
The higher the level of accuracy and precision desired for the resulting estimates, the larger the sample size necessary.
When more than 1 item or outcome are to be studied, sample sizes are estimated separately for each item. The final sample size will be a compromise between the largest n and the resources to conduct the study.
SummarySummary
Explained the concept/importance of sample size,
Explained and applied the concept of hypothesis testing,
Applied sample size formulas for descriptive and analytic studies,
Identified the requirements for sample size calculation ,
Introduced OPEN EPI/EPIINFO for application in sample size calculation for cross-sectional, cohort, case-control and experimental studies.
SummarySummary Statistical inference allows us to
generalize sample results to the target population
sample size is based on the research objectives/design sample estimates, variability from
previous studies power, level of confidence operational constraints (time,
resources)