Measures of Associations and of Causal Effects
EPI 200AOctober 13, 2009
In-Class Lecture OutlineNon-random relationships between exposure and outcome occurrenceAssociation versus causal effect (causation)Overview of measures of relationships (association versus causal effect)Measures of associationMeasures of effectAttributable, etiologic and other fractions
Analytical epidemiology is about making comparisons.
To compare exposed and not exposed, coming as close to the counterfactual ideal as possible
We compare frequencies of diseases or states of health (usually prevalence or incidence) in populations
Most measures of associations are relative or absolute for example relative risks or risk differences and the P-value is not a measure of association.
Comparison
When we compare incidence or prevalence data for specific health outcomes. We want these health outcomes to be the same.
The Kappa Coefficient
Epidemiologists often have to work with imperfect data.We get health data from existing records, health examinations or interviews. All of the data sources are subject to measurement error or other sources of bias.We often need to know something about the magnitude of bias.
The Kappa Coefficient cont.
We can get that information if we can compare our recorded health data with a gold standard.We may compare self-reported data on colic with sound recordings. We may compare self-reported data on ADHD with psychiatric diagnosing.If we have no gold standard we may just compare data from different observers.
Kappa Measures
Kappa measures the proportion of non-random agreement between two measures.
Random Agreement
Two people flipping the same coin 1000 times:
1 2 Head
Tails
500Head 250 250 Tails 250 250 500
500 500 1000
A measure of the proportion of agreement above the agreement expected by chance alone.Pathologists reading cell smears on cervical cancer.
Pathologists reading cell smears on cervical cancer
Pathologist B Pathologist A Positive
Negative
All
37Positive 25 12 Negative 15 230 245 All
40 242 282
Proportion of agreement Examples
Proportion of agreement (25+230)/282=0.904
Expected by chancePAT B 37/282 pos=0.131 x 40=5.25PAT B 245/282 neg=0.869 x 242=210.25
Cont.
All = 215.50Out of 282 =0.7640.904 agreement, but 0.764 may be due to chance
Kappa =0.904-0.764 1-0.764
= 0.58
Pathology Example
PAT B PAT A POS NEG
POS a b
NEG c d
Po proportion of observed agreement =(a+d)/(a+b+c+d)Pe proportion of expected agreement due to change = [(a+b) (a+c)] +[(b+d) (c+d)] / (a+b+c+d)2
Kappa =PO
-PE
1-PE
More levelsObserver 1 (i) Observer 2 (j)
1 2 K
1 M11 M12 M1K M1. 2 M21 K MK1 MK.
M.1 M.K
M
Po=
Pe=
K=
K∑i=j=1
Mij
/M.
K∑i=j=1
Mi.
M.j
/M2
Po-Pe1-Pe
Population Prevalence
Kappa depends not only on the level of agreement, but also on the frequency of the variable to be studiedPop A true prevalence 0.50, obs 1 always correctPop B true prevalence 0.01, obs 2 misclassifies 10% of cases and 10% non cases
Population A Example
Obs 1 Obs 2 Positive +
Negative -
500Obs + 450 50 Obs - 50 450 500 500 500 1000
PO
=0.90
PE
= 0.50K=0.80
Population B ExampleObs 1 Obs 2
Positive +
Negative -
10Obs + 9 1 Obs - 99 891 990 108 892 1000
PO
=0.90
PE
= 0.884
K=0.14
Interpreting Kappa – only guidelines0.81-0.99 almost perfect0.61-0.80 substantial agreement0.41-0.60 moderate agreement0.21-0.40 fair agreement0.01-0.20 slight agreement
Correlations (associations) can be generated by the design, the conduct of the study, the data you use, other correlated factors (confounders), chance or by causation.
We try to eliminate non-causal factors; what remains may be causes but we will never know (mostly untestable assumptions without experimentation)
Closed population, no loss to follow-up
Exposure N D Obs
time/person yearsNoYes
90,00010,000
900500
89,5509,750
RR = Rexp
= 500/10,000 = 5.0Rnot
exp
900/90,000
IRR =
IRexp
= 500/9750years
= 5.1 IRnot
exp
900/89,550years
When would the two measures be almost the same?
Closed population, no loss to follow-up, cont.
RD = Rexp
– Rnot exp
= 500/10,000 –
900/90,000 = 0.040
IRD = IRexp
– IRnot exp
= 500/9750 years –
900/89,550years
= 0.041 years-1
For statistical reasons associations are often measured as odds ratios (OR).
The odds ratio is a ratio between 2 odds
Odds ratios are frequently used because of good statistical properties
Odds =
If R = 0.01; odds = 0.01/0.99 = 0.0101
If R = 0.10; odds = 0.10/0.90 = 0.1111
If R = 0.40; odds = 0.40/0.60 = 0.6666
If R = 0.50; odds = 0.50/0.50 = 1
If R = 0.60; odds = 0.60/0.40 = 1.5000
R1-R
OR = (Rexp/1-Rexp) / (Rnot
exp/1-Rnot exp)
=
=
=
OR = 5.21
OR ≈
RR if R is small since then 1-R ≈
1
(500/10000) / (1-500/10000)
(900/90000) / (1-900/90000)
(500/10000) / (9500/10000)
(900/90000) / (89100/90000)
500 / 9500900 / 89100
A RR of 5 is equivalent to a 400% increase in risk
RR = = 5
x 100 = 100
= 400%Rexp
– Rnot exp
Rnot exp
500/10000 -
900/90000900/90000
500/10000900/90000
(RR-1) x 100 equals increase in percent and a RR of 1 equals no association (RD=0)
Measure Range DimensionalityRRRDIRRIRD
0, ∞-1, +10, ∞-∞, +∞
NoneNoneNoneTime -1
If the disease is rare (Δt is short)
Rexp
IRexp
x Δt IRexp
Rnot exp
IRnot exp
x Δt IRnot exp
When Δt goes towards 0; RR goes towards IRR
When Δt goes towards ∞; RR goes towards 1
Since the upper limit for R is 1 the R for not exposed sets the upper limit for RR
If the Rnot exp for an abortion is 0.30, RR cannot be more than 1/0.30 = 3.33
~ =
Measures of effect
Humes two definitions of causality
1.
A cause is an object followed by another E D where all objects similar to the first one are followed by objects similar to the second
2.
Or in other words, where the first object had not been the second would never exist
The second definition inspired counterfactual thinking as the ideal for causal inference, for being able to talk about causal effect.
Causal rate difference = A+/T+ - Ao/To
Causal risk difference = A+/N – Ao/NCausal difference in average disease free time = T+/N – To/NCausal rate ratio = (A+/T+) / (Ao/To)Causal risk ratio = (A+/N) / (Ao/N) = R+/Ro
Causal odds ration = (A+/(N-A+)) / (Ao/(N-Ao)) = (R+/S+) / (Ro/So)
To define an effect of an exposure is to define it in relation to noexposure under counterfactual conditions.
Cohort D N TWomen exposed to HRTCounterfactual cohort
A+
Ao
NN
T+
To
We are interested in effects but we measure associations.
OK to use the effect word when you describe aims/hypotheses, but not when you describe actual findings
Attributable fractions
To describe the potential importance of an exposure we sometimes use estimates like attributable fractions. Since this term has causal implications you should use it with care
Attributable fractions
If we have estimated an effect of the exposure, what is then the fraction of the disease among exposed related to the exposure?
Let the number of exposed be N+
N+ (R+
- Ro) R+
- RoN+ R+
R+=
Attributable fractions for exposed
R+
- Ro
RD
1
RR –
1R+
R+
RR
RR
= Attributable fractions for exposed
=
= 0.80 or 80%
= = =1-
Exp N D R RR+O
10,00090,000
500900
0.050.01 5.0
RR –
1RR
5 –
15
The attributable fraction for the population
AF =
AF =
= 400/1400 = 0.286
Closed population
No competing risk
10,000 x 0.05 x90,000 x 0.01 + 10,000 x 0.05
N+
x R+
xN+
x R+
+ No
x Ro
RR -1RR
5 -15
ExposureIR per 100,000 years
Lung cancer CHPSmokersNon-smokers
14010
669413
IRR =
= 14
IRR = = 1.62 LC
CHD
IRD = 140-10 = 130 years-1
IRD = 669-413 = 256 years-1
LC
CHD
14010
669413
Attributable fraction among exposed
AF =
= 0.929
AF = = 0.383 LC
CHD
Attributable fraction in the population will depend upon the proportion of smokers in the population
14-114
1.62-11.62
20% smoke
Lung CancerO E
CHDO E
SmokersNon-smokers
200,000800,000
280
2080
801338
8263304
3304All
Pop Attr. Fraction
360
100 4642
4130
= 0.72 = 0.11
360-100360
4642-41304642
60% smoke
Lung CancerO E
CHDO E
SmokersNon-smokers
600,000400,000
840
6040
404014
24781652
1652All
Pop Attr. Fraction
880
100 5666
4130
= 0.89 = 0.27
880-100880
5666-41305666
14-1140.6 x 140 x
0.6 x 140 + 0.4 x 10Check: = 78
88 = 0.89
Preventable fraction
Attributable fraction R1
– R0
-
let exposure be no exposureR1
then
= 1 –
RR preventable fractionR0
– R1
R0
BMJ Oct. 20, 2006
Around 40% of the fall in number of deaths from cancer among US men from 1991 to 2003 can be attributed to the decline in smoking, say researchers from the American Cancer Society.
Etiologic fraction
Attributable fractions have also been called etiologic fractions (EF), but if we define the etiologic fraction to represent those who had a particular exposure to cause their disease (the disease would not have occurred had they not been exposed) attributable fractions (AFs) need not be similar to EFs.
If the exposure is a cause for some but a preventive factor for others, the attributable fraction may be 0 and the etiologic fraction > 0.
Etiologic fractionAssume genetically susceptible will always get lung cancer if exposed to either cigarette smoke or asbestos. Assume the time sequence for a smoker and the counter factual non-smoker would always be
smoke
I+
non-smoking
I0
If the cohort is unexposed, all cases are attributable to C1
If the cohort is exposed and G is always present all cases will be attributable to E (smoking) or E and G. The etiologic fraction is 100% for the exposed, not
G C1
C1
I+
- I-I+
Deterministic model
4 causal types
1.
No effect
doomed
case
case2.
Causative
case
non case3.
Preventative
non case case
4.
No effect (immune)
non case non case
If all type 2 RR =
1/0 = ∞
RD = 1 –
0 = 1
If 10% are type 1, 10% type 2 and 80% type 4
C1If exposed
if not exposed
Deterministic model
X
X
If exposed
2 D
If not exposed 1 D
RR = 2/1 = 2
RD = 0.20 –
0.10 = 0.10
C
If exposed 5 would get the disease
5 would not get the disease
If not exposed 5 would not get the
disease5
would get the disease
C
C
C
C
C C
C
RR = 5/5 = 1
RD = 0.5 –
0.5
= 0
If 50% are type 2 and 50% type 3
Even in situations (real life) where the exposure only causes the disease (no diseases are prevented by the exposure) there will be a competition between which causal field onsets the disease.
E E
C1