Date post: | 26-Mar-2015 |
Category: |
Documents |
Upload: | ashley-perez |
View: | 214 times |
Download: | 0 times |
Sampling and power analysis in the
High Resolution studies
Pamela MinicozziDescriptive Studies and Health Planning Unit,
Department of Preventive and Predictive Medicine, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan
2
High Resolution studies
collected detailed data
from patients’ clinical records, so that the influence
of non-routinely collected factors
(tumour molecular characteristics, diagnostic
investigations, treatment, relapse)
on survival and differences in standard care
could be analysed
Problem
In each country, the population of incident cases
for a particular cancer consists of N subjects
N is large (so, rare cancers are not considered here)
Since N is large, not all cases can be investigated
use a representative sample to derive valid conclusions
that are applicable to the entire original population
3
Solution
4
Two questions
1) What kind of probability sampling should we use?
2) What sample size should we use?
5
Sampling
Previous High Resolution studies
6
Samples were representative of
1-year incidence
a time interval (e.g. 6 months) within the study period, provided that
incidence was complete
an administratively defined area covered by cancer registration
7
We want to eliminate variations in types of sampling between countries
and within a single country
Present High Resolution studies
Main types of probability sampling
This implies more sophisticated sampling
Simple random sampling
assign a unique number to each element of the study population determine the sample size randomly select the population elements using
a table of random numbers a list of numbers generated randomly by a computer
8
Advantage: - auxiliary information on subjects is not requiredDisadvantage: - if subgroups of the population are of particular interest, they may not be included in sufficient numbers in the sample
Stratified sampling
identify stratification variable(s) and determine the number of strata to be used (e.g. day and month of birth, year of diagnosis, cancer registry, etc.)
divide the population into strata and determine the sample size of each stratum randomly select the population elements in each stratum
9
Advantage: - a more representative sample is obtainedDisadvantage: - requires information on the proportion of the total population belonging to each stratum
Systematic sampling
determine the sample size (n); thus the sampling interval “i” is n/N randomly select a number “r” from 1 to “i” select all the other subjects in the following positions: r, r+ i, r+ 2*i, etc, until the sample is exhausted
10
Advantage: - eliminate the possibility of autocorrelationDisadvantage: - only the first element is selected on a probability basis pseudo-random sampling
11
Howmany subjects do we
need?
12
The main elements
Previous pilot
studies
the probability that the difference will be detected (e.g. 80%, 90%)
the probability that a positive finding is due to chance alone (e.g. 1%, 5%) they explored whether some
variables can be measured with sufficient precision (or available) and checked the study vision
13
Number of patients was defined based on:
observed differences in survival and risk of death
incidence of the cancer under study
difficulties in collecting clinical information
available economic resources
Previous High Resolution studies
Notwithstanding that ...
we were able to identify statistically significant relative excess risks of
death
up to 1.60 among European countries
up to 1.40 among Italian areas
for breast cancer for which differences in survival are small.
Applicable to other cancers for which survival differences are larger
14
Example for breast cancer (diagnosis 95-99)
Plot power as a function of hazard ratio for a 5% two-sided log-rank test with 80% power over sample sizes ranging from 100 and 1000 Assume 75% survival as reference (the overall survival in Europe, range: 65-90%)
45%
15
Example for colorectal cancer (diagnosis 95-99)
Plot power as a function of hazard ratio for a 5% two-sided log-rank test with 80% power over sample sizes ranging from 100 and 1000 Assume 50% survival as reference (the overall survival in Europe, range: 30-70%)
32%
16
Example for lung cancer (diagnosis 95-99)
Plot power as a function of hazard ratio for a 5% two-sided log-rank test with 80% power over sample sizes ranging from 100 and 1000 Assume 10% survival as reference (the overall survival in Europe, range: 5-20%)
30%
17
We want to analyse both differences in survival and
adherence to standard care
Present High Resolution studies
Power analysis for both
logistic regression analysis
(to analyse the odds of receiving one type of care (typically standard care))
and relative survival analysis
(to analyse differences in relative survival and relative excess risks of death)
18
Conclusions
Taking into account existing samplings and power methodology experience from previous studies different coverage of Cancer Registries available economic resources
We want to standardize the selection of data include a minimum number of cases that satisfies statistical considerations related to all aims of our studies
Prof. JS Long1 (Regression Models for Categorical and Limited Dependent,1997) suggests that sample sizes of less than 100 cases should be avoided and that 500 observations should be adequate for almost any situation.
1Professor of Sociology and Statistics at Indiana University
19