Date post: | 22-Dec-2015 |
Category: |
Documents |
Upload: | trace-haddix |
View: | 218 times |
Download: | 0 times |
Comparing methods for addressing limits of detection in environmental epidemiology
Roni Kobrosly, PhD, MPH
Department of Preventive Medicine
Icahn School of Medicine at Mount Sinai
A familiar diagram…
EnvironmentalExposure
InternalDose
BiologicallyEffective
Dose
AlteredStructure/Function
ClinicalDisease
Biomarker of Exposure
DeCaprio, 1997
Biomarkers and Limits of Detection (LOD)
It is difficult to quantify the concentration because it is so low
LOD
Higher concentration
Handling LODs in analysis
• Easiest approach: simply delete these observations
• Problems with this:
o However, values < LOD are informative: analyte may have a concentration between 0 and LOD
o Studies are expensive and you lose covariate data!
o Excluding observations from analyses *may* substantially bias results
Chen et al. 2011
Handling LODs in analysis
• Hornung & Reed describe approach that involves substituting a single value for each observation <LOD
• Three suggested substitutions: LOD/2, LOD/√2, or just LOD
• Problem: Replacing a sizable portion of the data with a single value increases the likelihood of bias and reduces power!
Helsel, 2005; Hughes 2000;Hornung & Reed, 1990
Citations in Google Scholar
Hornung & Reed, 1990
19901992
19941996
19982000
20022004
20062008
20102012
0102030405060708090100
Year
Nu
mb
er
of
Pu
blica
tio
ns
Comparing LOD methods
• While there are many studies testing individual methods, relatively little work comparing performance of several methods
• Even fewer studies have compared methods in context of multivariable data
• Comparative studies that do exist provide contradictory recommendations. No consensus!
Simulation Study Objectives
• Compare performance of LOD methods when independent variable is subject to limit of detection in multiple regression
• Compare performance across a range of “experimental” conditions
• Create flowchart to aid researchers in their analysis decision making
Statistical Bias
Nat’l Library of Med definition: “Any deviation of results or inferences from the truth”
Unbiased Biased
Variable Definitions
• Four continuous variables:
• Y: Dependent variable (outcome)
• X: Independent variable (exposure, subject to LOD)
• C1, C2: Independent variables (covariates)
6 “Experimental Conditions”
1) Dataset sample size: n = {100, 500}
2) % of exposure variable with values in LOD region:
LOD% = {0.05, 0.25}
3) Distribution of Exposure Variable:
Normal versus Skewed
4) R2 of full model:
R2 = {0.10, 0.20}
5) Strength & direction of exposure-outcome association:
Beta = {-10, 0, 10}
6) Direction of confounding:
Strong Positive, versus Strong Negative, versus None
+-
LOD methods considered
1. Deletion of subjects with LOD values
2. Substitution with LOD/√(2)
3. Substitution with LOD/2
4. Substitution with just LOD value
5. Multiple imputation (King’s Amelia II)
6. MLE-imputation method (Helsel & Krishnamoorthy)
Method 1: Deletion
Y X C1 C2
167.7 25.8 13.5 12.9
-66.3 15.9 11.7 12.6
50.6 <LOD 10.4 10.8
-273.0 9.5 11.8 11.1
156.9 <LOD 12.6 9.0
Y X C1 C2
167.7 25.8 13.5 12.9
-66.3 15.9 11.7 12.6
50.6 <LOD 10.4 10.8
-273.0 9.5 11.8 11.1
156.9 <LOD 12.6 9.0
Method 2: Sub with LOD/√(2)
Y X C1 C2
167.7 25.8 13.5 12.9
-66.3 15.9 11.7 12.6
50.6 <LOD 10.4 10.8
-273.0 9.5 11.8 11.1
156.9 <LOD 12.6 9.0
LODX = 9.0
Y X C1 C2
167.7 25.8 13.5 12.9
-66.3 15.9 11.7 12.6
50.6 6.4 10.4 10.8
-273.0 9.5 11.8 11.1
156.9 6.4 12.6 9.0
9.0/√2 = 6.4
Method 3: Sub with LOD/(2)
Y X C1 C2
167.7 25.8 13.5 12.9
-66.3 15.9 11.7 12.6
50.6 <LOD 10.4 10.8
-273.0 9.5 11.8 11.1
156.9 <LOD 12.6 9.0
LODX = 9.0
Y X C1 C2
167.7 25.8 13.5 12.9
-66.3 15.9 11.7 12.6
50.6 4.5 10.4 10.8
-273.0 9.5 11.8 11.1
156.9 4.5 12.6 9.0
9.0/2 = 4.5
Method 4: Sub with just LOD
Y X C1 C2
167.7 25.8 13.5 12.9
-66.3 15.9 11.7 12.6
50.6 <LOD 10.4 10.8
-273.0 9.5 11.8 11.1
156.9 <LOD 12.6 9.0
LODX = 9.0
Y X C1 C2
167.7 25.8 13.5 12.9
-66.3 15.9 11.7 12.6
50.6 9.0 10.4 10.8
-273.0 9.5 11.8 11.1
156.9 9.0 12.6 9.0
9.0
Method 5: Multiple Imputation
• “Amelia II” by Dr. Gary King
• Assumes pattern of observations below LOD only depends on observed data (not unobserved data)
• Lets you constrain imputed values (very helpful when working with LODs!)
Method 5: Multiple Imputation
Y X C1 C2
167.7 25.8 13.5 12.9
-66.3 15.9 11.7 12.6
50.6 <LOD 10.4 10.8
-273.0 9.5 11.8 11.1
156.9 <LOD 12.6 9.0
Y X C1 C2
167.7 25.8 13.5 12.9
-66.3 15.9 11.7 12.6
50.6 3.0 10.4 10.8
-273.0 9.5 11.8 11.1
156.9 6.2 12.6 9.0
Y X C1 C2
167.7 25.8 13.5 12.9
-66.3 15.9 11.7 12.6
50.6 2.5 10.4 10.8
-273.0 9.5 11.8 11.1
156.9 6.8 12.6 9.0
Y X C1 C2
167.7 25.8 13.5 12.9
-66.3 15.9 11.7 12.6
50.6 3.3 10.4 10.8
-273.0 9.5 11.8 11.1
156.9 6.3 12.6 9.0
Y X C1 C2
167.7 25.8 13.5 12.9
-66.3 15.9 11.7 12.6
50.6 3.5 10.4 10.8
-273.0 9.5 11.8 11.1
156.9 6.0 12.6 9.0
Y X C1 C2
167.7 25.8 13.5 12.9
-66.3 15.9 11.7 12.6
50.6 2.8 10.4 10.8
-273.0 9.5 11.8 11.1
156.9 7.2 12.6 9.0
M = 5
Method 5: Multiple ImputationY X C1 C2
167.7 25.8 13.5 12.9
-66.3 15.9 11.7 12.6
50.6 3.0 10.4 10.8
-273.0 9.5 11.8 11.1
156.9 6.2 12.6 9.0
Y X C1 C2
167.7 25.8 13.5 12.9
-66.3 15.9 11.7 12.6
50.6 2.5 10.4 10.8
-273.0 9.5 11.8 11.1
156.9 6.8 12.6 9.0
Y X C1 C2
167.7 25.8 13.5 12.9
-66.3 15.9 11.7 12.6
50.6 3.3 10.4 10.8
-273.0 9.5 11.8 11.1
156.9 6.3 12.6 9.0
Y X C1 C2
167.7 25.8 13.5 12.9
-66.3 15.9 11.7 12.6
50.6 3.5 10.4 10.8
-273.0 9.5 11.8 11.1
156.9 6.0 12.6 9.0
Y X C1 C2
167.7 25.8 13.5 12.9
-66.3 15.9 11.7 12.6
50.6 2.8 10.4 10.8
-273.0 9.5 11.8 11.1
156.9 7.2 12.6 9.0
= 10.01
β1 = 10.1
β2 = 9.5 β3 = 8.3 β4 = 12.1
β5 = 10.4
Method 6: MLE-Imputation
Y X C1 C2
167.7 25.8 13.5 12.9
-66.3 15.9 11.7 12.6
50.6 <LOD 10.4 10.8
-273.0 9.5 11.8 11.1
156.9 <LOD 12.6 9.0
Method 6: MLE-Imputation
Y X C1 C2
167.7 25.8 13.5 12.9
-66.3 15.9 11.7 12.6
50.6 <LOD 10.4 10.8
-273.0 9.5 11.8 11.1
156.9 <LOD 12.6 9.0
Assume normal distribution, estimate and Sx
Method 6: MLE-Imputation
Y X C1 C2
167.7 25.8 13.5 12.9
-66.3 15.9 11.7 12.6
50.6 3.2 10.4 10.8
-273.0 9.5 11.8 11.1
156.9 5.8 12.6 9.0
Use estimated LOD value, , and Sx to randomly generate observations below LOD
Two-step Data Generation Process
• 1st Step: Select “true” regression parameters for following two models:
o
o
• 2nd Step: Use “true” parameters to guide the drawing of random numbers
X 0 _ X C1_ X (C1) C 2 _ X (C2)
Y 0 _ Y X (X) C1_ Y (C1) C 2 _ Y (C2)
“TRUTH”Y = 2.8 + 2(X) + 4.5(C1) + 6(C2)
Dataset1.1 Dataset1.2 Dataset1.3SIMULATED DATASETS
X = 1.3 - 6(C1) + 1.5(C2)
Obs # Y X C1 C2
1 24.67 5.44 -0.28 1.77
2 30.73 9.47 -1.55 -0.81
3 19.39 -0.98 0.96 0.92
4 -9.47 -8.20 1.72 0.49
i yi xi c1i c2i
Y = 2.8 + 2(X) + 4.5(C1) + 6(C2)Create a set of “true” parameters
Dataset1.1
Dataset1.2
Dataset1.3
Dataset1.1000
Create 1500 simulated datasets for set of “true” parameters, using specific set of experimental conditions
Apply a LOD correction method and run regression for each dataset
Bias = 2.2 – 2 = 0.2
Take difference of estimated coefficient and “true” parameter. Produce 1000 bias estimates with 95% CI’s
ˆ y 2.72 2.2(X) 4.2(C1) 5.98(C2)
Help from Minerva
Minerva runtime ~ 5 minutes
n = 100, 25% LOD, Skewed Dist, R2 = 0.20, Negative X-Y Association, Negative confounding
Mea
n B
ias
(wit
h 9
5% C
I)
3.0
4.0
5.0
6.0
Deletion
LOD/sqrt(2)
LOD/2
LOD
Multi Impu
2.0
0
1.0
-1.0
7.0
8.0
MLE Impu
-2.0
Mea
n B
ias
(wit
h 9
5% C
I)
-3.0
-2.0
-1.0
0
Deletion
LOD/sqrt(2)
LOD/2
LOD
Multi Impu
-4.0
-6.0
-5.0
-7.0
1.0
2.0
MLE Impu
-8.0
n = 100, 25% LOD, Skewed Dist, R2 = 0.20, Positive X-Y Association, Negative confounding
n = 100, 25% LOD, Skewed Dist, R2 = 0.20, Negative X-Y Association, No confounding
Mea
n B
ias
(wit
h 9
5% C
I)
0
0.2
0.4
0.6
Deletion
LOD/sqrt(2)
LOD/2
LOD
Multi Impu
-0.2
-0.6
-0.4
-0.8
0.8
1.0
MLE Impu
-1.0
n = 100, 25% LOD, Skewed Dist, R2 = 0.20, Positive X-Y Association, No confounding
Mea
n B
ias
(wit
h 9
5% C
I)
0
0.2
0.4
0.6
Deletion
LOD/sqrt(2)
LOD/2
LOD
Multi Impu
-0.2
-0.6
-0.4
-0.8
0.8
1.0
MLE Impu
-1.0
n = 100, 25% LOD, Skewed Dist, R2 = 0.20, Negative X-Y Association, Positive confounding
Mea
n B
ias
(wit
h 9
5% C
I)
3.0
4.0
5.0
6.0
Deletion
LOD/sqrt(2)
LOD/2
LOD
Multi Impu
2.0
0
1.0
-1.0
7.0
8.0
MLE Impu
-2.0
n = 100, 25% LOD, Skewed Dist, R2 = 0.20, Positive X-Y Association, Positive confounding
Mea
n B
ias
(wit
h 9
5% C
I)
-3.0
-2.0
-1.0
0
Deletion
LOD/sqrt(2)
LOD/2
LOD
Multi Impu
-4.0
-6.0
-5.0
-7.0
1.0
2.0
MLE Impu
-8.0
An overview of results
• Relative bias of methods is highly dependent on experimental conditions (i.e. no simple answers)
• Covariates and confounding matters! Simulations that only consider bivariate, X-Y relationships with LODs are limited
Deletion method results
• Surprisingly… provides unbiased estimates across all conditions!
• If sample size is large and LOD% is small, this may be a good option. As LOD% becomes larger, deletion is more costly
• Important caveat: deletion method works well if true associations are linear
Deletion method with linear effects
Bottom 8% of X variable deleted
Substitution method results
• Not surprisingly… these methods are generally terrible!
• Just LOD substitution is worst type
• In most scenarios, these will bias associations towards the null
• … but, works reasonably well when distribution is highly skewed, no confounding, and LOD% is low
Multiple Imputation results
• Amelia II performs relatively well! Particularly when R2 is higher
• Does well even when LOD% is high
• Problematic when there is no confounding (reason: this indicates there are no/weak associations between variables)
MLE Imputation results
• Associated with severe bias in most cases
• Highly reliant on parametric assumptions and the code is daunting: recommend avoiding this method
• However, performed reasonably well when exposure is normally distributed, no confounding, and LOD% is low
A Case Study…
Sarah’s SFF Analysis
• Study for Future Families (SFF): a multicenter pregnancy cohort study that recruited mothers from 1999-2005
• Sarah Evans’ analysis: prenatal exposure to Bisphenol A (BPA) and neurobehavioral scores in 153 children at ages 6-10
• 28 (18%) children have BPA levels below the LOD
Sarah’s SFF Analysis
• Maternal urinary BPA collected during late pregnancy
• Neurobehavioral scores obtained through School-age Child Behavior Checklist (CBCL).
• Used multiple regression adjusting for child age at CBCL assessment, mother’s education level, family stress, urinary creatinine
Anxiety/Dep
Withdrawn/Dep
Somatic
Social
Thought
Attention
Rule-Break
Aggressive
Internalizing
Externalizing
Total Problems
LOD/sqrt(2)
-0.2 0-0.4-0.6 0.2 0.4 0.6 0.8 1.0
Deletion