Advanced topics in study design II: Less commonly used observational study designs John S. Witte...

transcript

Advanced topics in study design II: Less commonly used

observational study designs

John S. Witte

jwitte@ucsf.edu

Cross-Sectional Studies

– Subjects are all persons in the population at the time of ascertainment or a representative sample.

– Often deal with exposures that can not change, such as blood type or other invariable personal characteristics.

– Cross-sectional analyses of baseline information in cohort studies provides possible exposure-disease associations that can later be confirmed.

– Cases in a cross-sectional study will over represent cases with a long duration of illness and under represent those with a short duration of illness.

Case or Control Cross Sectional Studies

• Use cases only to determine estimates of disease prevalence etc. among different groups (e.g., defined by geographical region).

• Use controls to estimate exposure prevalence in population.

Cross-sectional Study Strengths

• Relatively feasible and not too time-consuming, since there is no follow-up period (though random sampling in a large population can be expensive and problematic).

• We can study several diseases and/or exposures; thus, it is useful for screening new hypotheses.

• We can describe disease frequency and health needs of a large population; thus, it is useful for health planning.

Hal Morgenstern

Cross-sectional Study Weaknesses

• Potential temporal ambiguity (exposure and disease).

• Possible large measurement error that may be nondifferential (e.g., exposures collected after disease occurs), resulting in biased effect estimates.

• Selection bias possible since prevalent cases occurred before the study is conducted, so disease status can influence the selection of subjects.

• It is inefficient for studying rare or highly fatal diseases or diseases with short durations of expression.

Hal Morgenstern

Repeated Survey

• Combines two or more cross-sectional studies of the same source population at different times. Although we might say that the population is followed in this type of study, individuals are not followed.

• Design is not much better than the simple cross-sectional study for testing etiologic hypotheses.

• study population trends or to evaluate the effectiveness of population interventions initiated between surveys.

• assess the extent to which change in disease rate can be explained by changes in specific exposures.

Hal Morgenstern

Survey Follow-up

• Combines a cross-sectional study followed by a cohort study of those individuals who are still at risk of developing the disease.

• This design is used when

• Want to estimate both the prevalence and incidence rates of a disease in the same source population;

• It is hard to distinguish between prevalent and incident cases.

• To make baseline assessments to identify persons still at risk of developing the disease (e.g, as a necessary first phase of a cohort study)

Hal Morgenstern

Intervention Follow-up

• Combines an intervention with a cohort study, each part having a different followup period and outcome variable.

• The first followup period is short and is used to assess the effect of an intervention / exposure on an outcome (not primary disease).

• The second followup period is generally longer and is used to observe disease occurrence.

• This design is useful for examining relationships between acute biological/behavioral responses and chronic health effects.

Hal Morgenstern

Proportionate Study

• Proportional morbidity or mortality study involves data on cases or deaths.

• Special type of case-control (or cross-sectional) study.

• A group of individuals with (or dying from) the index disease of interest is compared with a group of individuals with (or dying from) certain other diseases.

Hal Morgenstern

Example: Proportional mortality study • Occupational exposure to low-level

ionizing radiation on cancer. • All certified deaths among employees

of the Hanford nuclear power facility between 1944 and 1972 were classified by cause of death and exposure status (based on company records of radiation monitoring).

• Proportion of deaths that was exposed to ionizing radiation (i.e., at least one positive badge reading) among male employees, by cause of death (n=3520). Cancers of the reticuloendothelial system (RES) include lymphomas, myelomas, and leukemias.

Mancuso et al. Health Physics 1977; 33:369-385.Hal Morgenstern

Two-Stage Sampling Case-Control Studies

• Large control sample has some exposure information or a limited amount of information on some relevant variables.

• Subsample selected & more detailed information obtained.• Useful when relatively inexpensive to obtain exposure

information but more expensive to obtain specific covariate information.

• Exposure information has already been collected on the entire population but more detailed information is needed on covariates.

• Special analytic methods are needed to take full advantage of the information collected at both stages.

Revisit Nested Case-Control Studies

• What does the OR from these studies estimate?• Depends on how the controls are sampled.• Random at start of follow-up (case-cohort)

*-------------------------------------------------------------------------*

Start FU End FU

• Density sampling*-------------------------------------------------------------------------*

Start FU End FU

^ ^ ^ ^^^ ^

• Cumulative sampling*------------------------------------------------------------------------*

Start FU End FU

Nested Case-control Studies

Cases Total Person Time Controls

Exposed A1 N1 T1 B1

Unexposed A0 N0 T0 B0

Case-Only Studies

• Study only cases.

• Use theoretical considerations to construct a distribution of exposure in the source population.

• Use this distribution in place of an observed control series.

1. Case-crossover studies

2. Case-specular Studies

3. Genetic epidemiology– Hardy-Weinberg Disequilibrium

– Gene x Environment Interaction

Case-Crossover Studies

• One or more time periods are selected as matched “control” periods for the case.

• Compare exposure status at the time of disease onset to the ‘control’ exposure status within the same individual.

• Depends on the assumption that neither exposure nor confounders are changing over time in a systematic way.1. i.e. cyclic manner

• Exposure must vary over time within individuals

• Exposure must have a short duration and a transient effect

Example: Physical Exertion & MI

• A number of different exposure periods can be measured.

• One might also use a bidirectional approach to measuring exposures.

Tager, 2000

Limitations of Case-Crossover Studies

• There can be ‘overmatching’ on the exposures which leads to decreased precision of estimates.

• Misclassification could be differential between case and ‘control’ groups if different methods are used to measure exposures or past exposures are more poorly measured.

Case-Specular Design

• Use some physical properties to distinguish controls’ environmental ‘exposures’.

• E.g., In a study of electromagnetic field exposure and disease, measure case’s home’s distance to electrical wires. Then ‘flip’ block and measure distance from specular home to electrical wires for ‘control’s distance.

Genetic Epidemiology Case-Only Studies

• The laws of inheritance may be combined with certain assumptions to derive a population of genotypes.

• Hardy-Weinberg Principle: Genotypes will reflect allele frequency distributions in the general population.

• That is, both allele and genotype frequencies in a population remain constant—they are in equilibrium—from generation to generation unless specific disturbing influences are introduced.

Hardy-Weinberg Disequilibrium

• Expect the cases to have an increased frequency of the disease causing genetic alleles.

• Study cases only, and look for departures from Hardy-Weinberg equilibrium.

• This suggests chromosomal

regions where a disease-causing

gene resides.

Case-Parents Transmission Disequilibrium Test (TDT)

Transmitted alleles vs. non-transmitted alleles

M1 M2 M2 M2

Transmitted alleles vs. non-transmitted alleles

Non-Transmitted Allele

Transmitted M1 M2

M1 n11 n12

M2 n21 n22

TDT = (n12 - n21)2

(n12 + n21)Asymptotically 2 with

1 degree of freedom

For this one Trio:

Non-Transmitted Allele

Transmitted M1 M2

M1 0 1

M2 0 1

TDT = TDT = (1 - 0)(1 - 0)22

(1 + 0)(1 + 0)= 1= 1 p-value = 0.32p-value = 0.32

Case-Only for Interactions

G+ G- G+ G-

Case A11 A10 A01 A00

Control B11 B10 B01 B00

Family-Based Association Studies

Siblings Parents

GCousins

Twin Studies

• Compare the disease concordance rates of MZ (identical) and DZ (fraternal) twins.

Disease Yes No

Yes A B

No C D

Twin 1

Twin 2

Then one can estimate heritability of a phenotype.

Concordance = 2A/(2A+B+C)

Example of Twin Study: PCa

Twin Concordant pairs (A)

Discordant pairs (B+C)

Concordance

MZ 40 299 0.21

DZ 20 584 0.06

Heritability: 0.42 (0.29-0.50)Non-shared Environment: 0.58 (0.50-0.67)

Lichtenstein et al NEJM 2000 13;343:78-85.

• Twin registry (Sweden, Denmark, and Finland) 7,231 MZ and 13,769 DZ Twins (male)

Rare Recessive Common Rare DominantHigh Risk Low Risk High Risk

Population-based 100% 100% 100%Case-sibling 69% 51% 50%Case-cousin 97% 88% 88%TDT 231% 102% 101%

• Further, family-based designs can be require more recruitment efforts.

• Family-based designs can be less efficient than population-based designs.

Comparison of Designs

Witte et al. Am J Epidemiol 1999

Population Stratification• Confounding bias that may occur if one’s sample is comprised of

sub-populations with different:1. allele frequencies (); and2. disease rates (RpR)

• Cases are more likely than controls to arise from the sub-population with the higher baseline disease rate.

• Cases and controls will have different allele frequencies regardless of whether the locus is causal.

Sub-population

Disease

Genomic Control

• Use population-based design, but incorporate into analysis genomic information to adjust for population stratification.

• Genomic control: adjust test statistics for outliers due to population stratification.

• Use unlinked genetic markers.

Principal Compoenents:Genetic Matching of Controls

Luca et al.AJHG 2008

Population-based

“Ethnicity” Matched

Genomic-based

Family-based

Population Stratification

Overmatching

Continuum of Assoc Study Designs

Subpopulation

Disease

Sharing of genes & envt.

Efficiency

Also, recruitment issues

(Bias…………………versus………………...efficiency)

Ecologic Studies

Levels of MeasurementAggregate measures: summaries (e.g. means, proportions) of

observations derived from individuals in each group.

Environmental measures: physical characteristics of the place in which members of each group live or work (e.g. air

pollution level, hours of sunlight).

Global measures: attributes of groups, organizations, or places for which there is no distinct analogue at the individual level

(e.g. population density, level of social disorganization, existence of a specific law, or type of health-care system).

Ecologic Studies- Concepts (continued)

Levels of Analysis- The common level for which data on all variables are reduced and analyzed.

a. Complete ecologic analysis Total

Disease

Exposure

+ ? ? T1

T0 T++

Levels of Analysis- The common level for which data on all variables are reduced and analyzed.

a. Complete ecologic analysis

b. Partially ecologic analysis Z=1 Z=0 Total

Disease

Exposure

+ ? ? M11

- ? ? M01

N11 N01 T1

Exposure

+ ? ? M11

- ? ? M01

N10 N00 T0

Exposure

+ ? ? T+1

- ? ? T+0

T1+ T0+ T++

Levels of Inference-The goal is to make ecologic inferences about effects on group rates (an ecologic effect).

i.e. Helmet-use laws

Ecologic effects vs. biological effect

-An interest may exist to estimate the contextual effect of an ecologic exposure on individual risk.

Commonly found in infectious disease epidemiology

Ecologic Studies- Study Designs

• Multiple-group Design

-The rate of disease is compared among many groups during one period of time to search for spatial patterns.

Example: NCI cancer study

- The rate of disease may be compared between migrants and their offspring and residents of the countries of immigration and

emigration.-Environmental or behavioral risk factors

-Genetic risk factors

Examples: Migrant Study and Multiple-group Analytic Study

Migrant Studies

Weeks, Population. 1999

Example: Standardized Mortality Ratios

Japanese

Cancer Site Japan Not US Born

USBorn

USCaucasians

Stomach (M) 100 72 38 17

Colorectal (F) 100 218 209 483

Breast 100 166 136 591

MacMahon B, Pugh TF. Epidemiology. 1970:178.

Ecologic Studies- Study Designs (continued)

• Time-trend Design– One group or population is followed over time to assess a

possible association between a change in exposure frequency and a change in disease frequency.– Example: NCI study of artificial sweetener consumption and bladder

cancer between 1950-1969.

• Mixed Design– A mixture of the two previous designs. A number of groups or

populations are followed over time to assess a possible association between a change in exposure frequency and a change in disease frequency.

– Example: Change in annual CVD mortality rate for males between 1948 and 1964 in 83 British towns by age and water level hardness.

Ecologic Studies- Rationale

– Strengths

1. Low cost and convenience

2. Measurement limitations of individual-level studies

3. Design limitations of individual-level studies

4. Interest in ecologic effects

5. Simplicity of analysis and presentation

Ecologic Studies- Rationale (continued)

• Weaknesses

1. Ecologic fallacy (or bias)

2. Cannot asses confounding or effect modification

3. Temporal ambiguity

4. Migration across groups

5. Collinearity

6. Lack of adequate data

COMMONLY USED DESIGNS FOR SELECTED STUDY OBJECTIVES

Objective of the Study and Commonly UsedNature of the Disease Study Designs

1.Test / screen new etiologic hypotheses regarding Case-control

several possible risk factors for one disease Cross-sectional Ecologic

2.Test / screen new etiologic hypotheses regarding the Cohort effects of a specific exposure on several outcomes Ecologic

3.Test or screen new etiologic hypotheses, based on the Ecologic merging of two or more large data sets, to obtain infor-mation on both exposure and disease frequencies

4.Identify risk factors for a disease for which we cannot Selective prevalenceobserve the (base) population at risk Proportional

5.Study the possible genetic etiology of a disease Family-based

6.Determine whether a disease is likely to have an Space-time clusterinfectious etiology

7. Identify environmental risk factors for a remittent Repeated follow-up disease, or study the possible mutual effects betweentwo diseases Hal Morgenstern

COMMONLY USED DESIGNS FOR SELECTED STUDY OBJECTIVES

Objective of the Study and Commonly UsedNature of the Disease Study Designs

8. Identify environmental risk factors for a specific rareRetrospective cohort

disease, which might have a long latent period Case-control

9.Study the relationship between an acute response to an Intervention follow-up

exposure and a chronic health outcome

10. Identify risk factors for a relatively frequent diseaseProspective cohort

with a long duration of expression, which often goes Cross-sectional

undiagnosed or unreported

11.Study the possible effect of an exposure on disease occur- Prospective cohort

rence, where exposure status is likely to be influenced Repeated follow-up

by disease status

12.Assess the impact of a planned intervention on the healthRepeated survey

status of a target population Ecologic

13.Assess the need for health services and facilities in a Cross-sectionaltarget population Survey follow-up

Repeated surveyEcologic

Hal Morgenstern

Criteria for Comparing Study Designs

There are three general criteria for evaluating and comparing different study designs.

1. Relevance of the information to the investigator– Extent to which expected findings will satisfy the specific objectives of the

study – Investigator's desire to estimate specific population parameters

2. Quality or accuracy of the information expected in the data– Ability of the investigator to determine that the exposure preceded disease

occurrence– Ability of the investigator to eliminate the possibility that the statistical

findings were due to various methodological problems or sources of error

3. Cost of the information – The ultimate worth of a study is the total value of all derived information--

now and in the future--relative to the total (direct and indirect) costs of the study

Advanced topics in study design II: Less commonly used observational study designs John S. Witte...

Documents