Determining the Size of a Sample
Kajal Srivastava SPM Deptt.S.N.Medical College,
Kajal SrivastavaJR- 3SPM Deptt.S.N.Medical College,
Sample definition Characteristics of an ideal sample Terminologies C.I. method of calculating sample size
Formulas Other methods
Contents
A finite set of objects drawn from the population with an aim is called a sample.
Probability sampling
Aim
What is sample
Determining sample size is a very important issue because samples that are too large may waste time, resources and money, while samples that are too small may lead to inaccurate results.
Why to determine sample size
True representative
Precision
Unbiased character
Characters of Ideal Sample
Study design
Types of outcome measure
Guess at likely result
Required level of significance
Required precision / power
Before calculating sample size one has to decide on the following:
Types of outcome measures- Proportion, rates and means.
Sampling error-An estimate of an outcome measure calculated in an intervention study is subject to sampling error, because it is based on a sample of individuals and not on the whole population of interest.
The confidence interval is a range of plausible values for the true value of the outcome measure.
It is conventional to quote the 95 % confidence interval (also called 95%confidence limits).
There is a 95 chances out of 100 to find that the true value will be with in this range.
95% of samples drawn from a population will fall within + 1.96 x Sample error
Confidence Interval
It is appropriate to test a specific hypothesis about the outcome measure.
Null hypothesis
Significance tests
Determine the p – value (probability value) or ‘significance’ of the results.
Significance tests & P value
Type I error(α)- Rejecting a null hypothesis when its true.
The probability of this error is referred as P value. When P value is small, it is safe to conclude that
groups are different. This threshold, 0.05 is the level of significance.
Type II error- The second type of error is failure to reject null hypothesis when it is actually false. The complimentary probability of type II error is the statistical power (1-β). Thus the power of a statistical test is a probability of correctly rejecting a null hypothesis when it is false.
Types of error
Power of the study indicates the probability of finding a statistically significant difference between the two groups.
The power of a study depends on:1. The value of the true difference between the
study groups (effect size). The greater the effect, the higher the power to detect the effect as statistically significant for a study of a given size.
2. The study size; The larger the study size, higher is the power.
3. The probability level at which a difference will be regarded as ‘statistically significant’.
Power of study
Two tailed- when we make hypothesis that sample statistic is lesser or greater than population parameter than its called two tailed.(i.e. in both direction)
Single tailed- when we make hypothesis that sample statistic is either lesser or greater than population parameter than its called one tailed.(in single direction)
One sided and Two sided tests
Usually, it is more important to estimate the effect of the intervention and to specify a confidence interval around the estimate to indicate the likely range, than to test a specific hypothesis.
Therefore, in many situations it may be more appropriate to choose the sample size by setting the width of the confidence interval, rather than to rely on power calculations.
Confidence interval approach: applies the concepts of accuracy, variability, and confidence interval to create a “correct” sample size
• Variability: refers to how similar or dissimilar responses are to a given question
• P (%): share that “have” or “are” or “will do” etc.
• Q (%): 100%-P%, share of “have nots” or “are nots” or “won’t dos” etc.
With Nominal data (i.e. Yes, No), we can conceptualize answer variability with bar charts…the highest variability is 50/50
If we conducted our study over and over, e.g.1,000 times, we would expect our result to fall within a known range (+ 1.96 s.d.’s of the mean). Based upon this, there are 95 chances in 100 that the true value of the universe statistic (proportion, share, mean) falls within this range!
The Confidence Interval Method of Determining Sample Size
Normal Distribution
1.96 X s.d. defines the endpoints for 95% of the distribution
The level of confidence we desire that our results be repeated within some known range if we were to conduct the study again, and…
the variability (in responses) in the population and…
the amount of acceptable sample error (desired accuracy) we wish to have and…
the size of the sample.
There is a relationship among:
The reduced power or precision resulting from losses may be avoided by increasing the initial sample size in order to compensate for the expected number of losses.
A 5-20% allowance is generally considered appropriate.
Practical constraints
Allowances of losses
If the sample is small the confidence interval will be very wide and even though it will probably include the null value, it will extend to include large values of the effect measure.
In other words, the study will have failed to establish that the intervention has no appreciable effect.
In case the intervention does have an appreciable effect, a study that is too small will have low power i.e. it will have little chance of giving a statistically significant difference.
Consequences of studies those are too small
Estimating a population proportion: With specified absolute precision-
Required information-- Anticipated population proportion: P, a rough
estimate of P is sufficient.- Desired confidence level- Absolute Precision: (d ) - total percentage
points of the error that can be tolerated on each side of the figure obtained.
FORMULAE FOR SAMPLE SIZE ESTIMATION
The estimated sample size is applicable only in case of SRS.
If another sampling method is used, a larger sample size is likely to be needed because of design effect.
For cluster sampling strategy, the estimated sample size as above is multiplied by design effect, which is defined as the ratio of variance obtained in cluster survey to the variance for the same sample size adopting SRS.
In cluster sampling strategy, a design effect of 2 is taken.
This means twice as many individuals would have to be studied to obtain the same precision as with SRS.
Estimating a population proportion: With specified relative precision
Relative Precision:– The sample result should fall within є % of the true value.
Estimating the difference between two population proportions with specified absolute precision (Two – sample situations)
P1, P2 = anticipated value of the proportions in the two populations.
Required information :-- Anticipated values of the population
proportions: P1 & P2- Level of significance- Power of the test: 100 (1-β) %
Hypothesis testing for two population proportions
Estimating a population mean: With specified absolute precision
Estimating population mean: Withspecified relative precision
Estimating difference between means of two populations with specified precision
Hypothesis testing for two population means
There are computer programs available that perform sample size calculations.
In particular, this facility is available in the package ‘Epi Info’, though it does not cover the full range of possibilities.
Other Methods of Sample Size Determination
• Arbitrary “percentage rule of thumb” sample size:• Arbitrary sample size approaches rely on
erroneous rules of thumb (e.g. “n must be at least 5% of the population”).
• Arbitrary sample sizes are simple and easy to apply, but they are neither efficient nor economical. (e.g. Using the “5 percent rule,” if the universe is 12 million, n = 600,000 – a very large and costly result)
Other Methods of Sample Size Determination…cont.
• Conventional sample size specification• Conventional approach follows some
“convention” or number believed somehow to be the right sample size (e.g. 1,000 – 1,200 used for national opinion polls w/+ 3% error)
• Using conventional sample size can result in a sample that may be too large or too small.
• Conventional sample sizes ignore the special circumstances of the survey at hand.
Special Sample Size Determination Situations
Sample Size Using Nonprobability Sampling
• When using nonprobability sampling, sample size is unrelated to accuracy, so cost-benefit considerations must be used
Refrences Sample size determination in health studies- A
practical manual- S.K. Lwanga and S. Lameshow Sample size determination in health studies-NTI
Bulletin 2006,42/3&4,55-62, VK Chadha Sampling guide- Tulane University School of Public
Health, Robert Magnani On validity of assumptions while determing
sample size. IJCM, S.B. Sarmukaddam, S.G. Garad-. Essentials of biostatistics- Nishi Agarwal Methods in biostatistics- BK Mahajan
Thank You