Principles of sample size calculation
Doug AltmanEQUATOR Network, Centre for Statistics in Medicine,
NDORMS, University of Oxford
EQUATOR – OUCAGS training course25 October 2014
2
Principle of sample size calculation
The aim should be to have a large enough sample size to have a high probability (power) of detecting a clinically worthwhile treatment effect if it exists
– Larger studies have greater power to detect a beneficial (or detrimental) effect of a given size
– Same principle for all types of study
Main alternatives
– Seek to estimate a quantity with a given precision
– ???
3
Reaching the wrong conclusion (1)
Consider a clinical trial
May conclude that there is a difference in outcomes between active and control groups, when in fact no such difference exists
Technically called a Type I error
– more usefully called a false-positive result
Probability of making such an error is designated , commonly known as the significance level
Risk of false-positive conclusion (Type I error) does not decrease as the sample size increases
4
Reaching the wrong conclusion (2)
May conclude that there is no evidence of a difference in outcomes between active and control groups, when in fact there is such a difference
Technically called a Type II error
– more usefully called a false-negative result
Probability of making such an error is designated , and 1- is commonly known as the statistical power
Risk of missing an important difference (Type II error) decreases as the sample size increases
5
Type I and Type II errors
There really is a
difference
There really is no difference
Statistically significant
OK
Type I error (false +ve)
Statistically non-significant
Type II error (false -ve)
OK
6
Choice of Type I & II errors ( and )
The choices of and power (1-) produce greatly different sample sizes
– Conventional: =5% and power=80%
– Better: =5% and power=90%
– Better still: =1% and power=90%
Many clinical trials (and other studies) are far too small
7
Relationship between sample size and power
8
9
A simpler way
For = 0.05 and power of 80%
N 31/ Δ2 (total for 2 groups)
For = 0.05 and power of 90%
N 42 / Δ2 (total for 2 groups)
For = 0.01 and power of 90%
N 60 / Δ2 (total for 2 groups)
10
Example: Continuous outcome
Two arm RCT
Outcome – systolic blood pressure
Mean in placebo group expected to be 140 mmHg
SD 20 mmHg
Minimum clinical difference is deemed 10 mmHg ()
What sample size for =0.05 and power of 80%?
Δ = δ/SD = 10/20 = 0.5
Using simple formula : N (total) 31/(0.5)2 = 124
11
Some ways to improve power
Increase sample size
– Extend recruitment period
– Relax inclusion criteria (can work against)
– Make the trial multi-centre, or add further centres
Increase event rate
– Selectively enrol “high-risk” patients
– Use a combined endpoint
– Do not exclude those at most risk of an event (e.g. oldest patients)
12
http://homepage.stat.uiowa.edu/~rlenth/Power/
13