Confounding and Bias in
Case-Control Studies
Ching-Lan Cheng (鄭靜蘭), Ph.D.Assistant Professor
Institute of Clinical Pharmacy and Pharmaceutical Sciences,
National Cheng Kung University
30th Annual Meeting of the International Society for
Pharmacoepidemiology
Taipei, Taiwan October 23, 2014
1
Disclosures
• There is no potential conflict of interest relevant to
this presentation
• Materials in this presentation are adopted from the
lectures in this year provided by Dr. Tobias Gerhard!
2
Outline
• Bias that might occur in case-control
studies
– Selection Bias
– Information Bias
• Summary
3
SELECTION BIAS
Selection Bias
• Selection bias occurs when a systemic error in the
ascertainment of cases or controls in case-control
studies.
• If exposure status is differentially distributed between
cases and controls, leading to a distortion of the
exposure-disease association.
Population base
Study population
With
disease
Without
disease
Exposed
Not
exposed
Selection Bias
Should include equal proportions from each category
Population base
Study population
With
disease
Without
disease
Exposed
Not
exposed
Selection Bias
Distorted picture of the population base
• Imagine a cumulative case-control study conducted in one large
hospital. The study aims to explore whether smoking increases
the risk of experiencing a stroke. Cases are patients admitted
for stroke, controls are patients admitted for everything else. In
order to have an unbiased result, the controls need to be
representative of the non-cases in the source population,
particularly in regards to the exposure of interest (smoking).
However, because smokers are also at higher risk for other
diseases that lead to hospitalizations than non-smokers (lung
cancer, COPD, etc), smoking is more common among
hospitalized non-cases than among non-cases in the source
population. This will result in an underestimation of the effect
of smoking on stroke risk.
Example I: Selection Bias in Case Control Studies
Unbiased Control Selection
Stroke
(Cases)
No Stroke
(Controls)
Smoker 60 30000
Non-Smoker 40 60000
Stroke
(Cases)
No Stroke
(Controls)
Smoker 60 130
Non-Smoker 40 270
True OR
= 3.0
Source Population (Exposure odds in non-cases = 0.5)
Cumulative Case-Control Study (4:1); (Exposure odds in non-cases = 0.48)
Random Sample
Estimated
OR = 3.1
Stroke
(Cases)
No Stroke
(Controls)
Smoker 60 30000
Non-Smoker 40 60000
Stroke
(Cases)
No Stroke
(Controls)
Smoker
Non-Smoker
True OR
= 3.0
Source Population (Exposure odds in non-cases = 0.5)
Hospitalized Population
Hospitalization
Biased Control Selection
Stroke
(Cases)
No Stroke
(Controls)
Smoker 60 30000
Non-Smoker 40 60000
Stroke
(Cases)
No Stroke
(Controls)
Smoker 60
Non-Smoker 40
True OR
= 3.0
Source Population (Exposure odds in non-cases = 0.5)
Hospitalized Population
HospitalizationAll cases are hospitalized
Biased Control Selection
Stroke
(Cases)
No Stroke
(Controls)
Smoker 60 30000
Non-Smoker 40 60000
Stroke
(Cases)
No Stroke
(Controls)
Smoker 60 540
Non-Smoker 40 360
True OR
= 3.0
Source Population (Exposure odds in non-cases = 0.5)
Hospitalized Population
Hospitalization
Biased Control Selection
Among the possible controls (i.e. the source
population) smokers are more likely to be
hospitalized than non smokers (1.8% vs. 0.6%).
Stroke
(Cases)
No Stroke
(Controls)
Smoker 60 30000
Non-Smoker 40 60000
Stroke
(Cases)
No Stroke
(Controls)
Smoker 60 540
Non-Smoker 40 360
True OR
= 3.0
Source Population (Exposure odds in non-cases = 0.5)
Hospitalized Population (Exposure odds in non-cases = 1.5)
Hospitalization
Biased Control Selection
Among the possible controls (i.e. the source
population) smokers are more likely to be
hospitalized than non smokers (1.8% vs. 0.6%).
Stroke
(Cases)
No Stroke
(Controls)
Smoker 60 30000
Non-Smoker 40 60000
Stroke
(Cases)
No Stroke
(Controls)
Smoker 60 540 � 240
Non-Smoker 40 360 � 160
True OR
= 3.0
Source Population (Exposure odds in non-cases = 0.5)
Hospitalized Population ���� sample controls for study
Hospitalization
Biased Control Selection
Among the possible controls (i.e. the source
population) smokers are more likely to be
hospitalized than non smokers (1.8% vs. 0.6%).
Stroke
(Cases)
No Stroke
(Controls)
Smoker 60 30000
Non-Smoker 40 60000
Stroke
(Cases)
No Stroke
(Controls)
Smoker 60 240
Non-Smoker 40 160
True OR
= 3.0
Source Population (Exposure odds in non-cases = 0.5)
Study Population (Exposure odds in non-cases = 1.5)
Hospitalization
Biased Control Selection
Among the possible controls (i.e. the source
population) smokers are more likely to be
hospitalized than non smokers (1.8% vs. 0.6%).
Stroke
(Cases)
No Stroke
(Controls)
Smoker 60 30000
Non-Smoker 40 60000
Stroke
(Cases)
No Stroke
(Controls)
Smoker 60 240
Non-Smoker 40 160
True OR
= 3.0
Source Population (Exposure odds in non-cases = 0.5)
Study Population (Exposure odds in non-cases = 1.5)
Hospitalization
Biased Control Selection
Among the possible controls (i.e. the source
population) smokers are more likely to be
hospitalized than non smokers (1.8% vs. 0.6%).
Study OR = 1.0
Stroke
(Cases)
No Stroke
(Controls)
Smoker 60 30000
Non-Smoker 40 60000
Stroke
(Cases)
No Stroke
(Controls)
Smoker 60 240
Non-Smoker 40 160
True OR
= 3.0
Source Population (Exposure odds in non-cases = 0.5)
Study Population (Exposure odds in non-cases = 1.5)
Hospitalization
Biased Control Selection
Study OR = 1.0
Exposure distribution in study controls ≠≠≠≠exposure distribution in source population controls
• In the example, the selection process for the
controls ─ sampled from hospitalized patients
instead of randomly sampled from the non-cases
in the source population ─ changed the
distribution of the exposure of interest (smoking)
in the control patients of the study from the true
distribution in the source population.
Selection Bias in Case Control Studies
Solution ���� Population-based sampling of controls
Population base
Study population(sampled from hospitalized patients)
With
disease
Without
disease
Exposed
Not
exposed
Selection Bias in Case Control Studies
Smokers w/o stroke
� overrepresented
in the hospital
Nonsmokers w/o stroke
� underrepresented in
the hospital
• Those who develop outcomes stop taking the drug
(depletion of susceptibles, sick stoppers)
• Prevalent users tend to be healthy adherers and
those that benefit from treatment (healthy users)
• In sum, inclusion of prevalent users will distort the
study population (oversampling of subjects / person
time at low risk) and result in underestimation of
harms and overestimation of benefits
Example II – Prevalent User Bias
Solution ���� New user design
Population base
Study population
With
disease
Without
disease
Exposed
Not
exposed
Selection Bias – Prevalent User Bias
“Healthy Users”
“Sick Stoppers”
INFORMATION BIAS
Information Bias
• Often referred to as measurement bias
• Occurs due to poor measurement (classification) of study
variables (exposure)
• Distinguish two basic types of information bias
– Non-differential
- Misclassification between groups is approximately equal
– Differential
- Amount of misclassification differs between groups
AE+ AE-
Exp+ 20 ↓↓↓↓4 10 ↓↓↓↓2
Exp- 80 90
– 20% of exposed subjects classified as unexposed (used OTC version of the drug)
AE+ AE-
Exp+ 20 ↓↓↓↓4 10 ↓↓↓↓2
Exp- 80 ↑↑↑↑8 90 ↑↑↑↑9
• Binary, non-differential
– 10% of unexposed subjects classified as exposed (non-compliers)
Misclassification of Exposure
AE+ AE-
Exp+ 20 10
Exp- 80 90
AE+ AE-
Exp+ 24 17
Exp- 76 83
True OR = 2.25(20x90)/(80x10)
Estimated OR = 1.54(24x83)/(76x17)
Non-differential misclassification of
exposure
� Bias towards the null
Truth Observation
Misclassification of Exposure
• Binary, differential � Direction of bias is unpredictable
AE+ AE-
Exp+ 20 10
Exp- 80 90
AE+ AE-
Exp+ 20 10 ↓↓↓↓3
Exp- 80 90
AE+ AE-
Exp+ 20 7
Exp- 80 93
Differential exposure misclassification I(e.g., recall bias)
Truth
Observation I
• Exposure not binary � Direction of bias is unpredictable
AE+ AE-
Exp+ 20 10
Exp- 80 90 ↑↑↑↑9
AE+ AE-
Exp+ 20 19
Exp- 80 81
Differential exposure misclassification II Observation II
(30%)(0%)
True OR = 2.25(20x90)/(80x10)
Estimated OR = 3.32(20x93)/(80x7)
Bias away from null
Estimated OR = 1.07(20x81)/(80x19)
Bias towards null
(10%)(0%)
• Adjustment with a binary non-differentially misclassified
confounder reduces bias and produces a partially adjusted
effect estimate that falls between the crude and true effect –
residual confounding Greenland and Robins, AJE 1985
– Residual confounding decreases with increasing sensitivity and
specificity of the misclassified confounder
Savitz and Baron, AJE 1989
– Necessary assumption (likely to hold in most applications in
epidemiology) – Effect of the confounder on the outcome is in the
same direction among the treated and the untreated (i.e., there is no
qualitative interaction between the treatment and the confounder)
Ogburn and VanderWeele, Epidemiology 2012
Misclassification of Confounders
• Prospective studies with primary data
collection
– Ensure accurate measurement (instruments,
procedures, quality control, etc)
• Studies that rely on secondary data
– Use validated measures for exposure, outcome,
and confounding factors
– Rule out recall and detection biases
Addressing Misclassification
In summary…
• Best remedy for bias is prevention!
• RCTs
– Randomization
– Blinding
– Primary data collection
• Observational Studies
– Sample selection
– Choice of comparator
– Use validated measures
– Statistical analysis
28