Measurement Error and Misclassification instatistical models: Basics and applications
bcam Bilbao
Helmut KuchenhoffStatistical Consulting Unit
Ludwig-Maximilians-Universitat Munchen
Bilbao27-05-2019
Measurement Error Part 1 Bilbao 27-05-2019 HK 1 / 50
Schedule
Monday 1. Introcduction and Misclassification: Basic ModelsTuesday 2. Measurement error: Effect and ModelsWednesday 3. Methods for Estimation in the presence
of Measurement errorThursday 4. Simulation and Extrapolation (SIMEX)
for Misclassification and measurement error:Concept and Examples
Friday 5. Case studies:Uncertainty of diagnosis, Exposure assessment
Measurement Error Part 1 Bilbao 27-05-2019 HK 2 / 50
Material
I Carroll R. J. , D. Ruppert , L. Stefanski and CRainiceanu, C :Measurement Error in Nonlinear Models. A Modern perspective.Chapman & Hall London 2006.
I Gustafson, P. : Measurement Error and Misclassification in Statisticsand Epidemiology. Impacts and Bayesian Correction & CRC Press,Boca Raton 2004.
Measurement Error Part 1 Bilbao 27-05-2019 HK 3 / 50
Outline
I Measurement
I Misclassification
I Model
I Effect
I Correction
Measurement Error Part 1 Bilbao 27-05-2019 HK 4 / 50
MeasurementMuseum of modern Art in Barcelona
Measurement Error Part 1 Bilbao 27-05-2019 HK 5 / 50
Measurement
Measurement is the contact of reason with nature(Henry Margenau)Nearly all the grandest discoveries of science have been but therewards of accurate measurement(Lord Kelvin)Measurement is the basis for producing dataLiterature: David Hand: Measurement. Theory and practice . The worldthrough quantification. (Arnold,2004)
Measurement Error Part 1 Bilbao 27-05-2019 HK 6 / 50
Basics
Peter −→ 1.84Stefan −→ 1.91Laura −→ 1.72
Measurement is a assignment of a number to a characteristic of anobject. This measurement is to be compared with other objects.
Measurement: A structure preserving function (homomorphism)
Peter is smaller than ⇔ 1.84 < 1.91
Measurement Error Part 1 Bilbao 27-05-2019 HK 7 / 50
Levels of scaling
This can be defined by the structure in the objectsonly relation equal - non equal: nominalsmaller : ordinal scaledifferences : metric scaledifferences and ratios : Ratio scale
Measurement Error Part 1 Bilbao 27-05-2019 HK 8 / 50
Accuracy, Validity and Reliability
I Accuracy: General term, describing how closely a measurementreproduces the attribute being measured
I Validity: How well the measurement captures the true attribute orhow well it captures the concept which is targeted to be measured
I Reliability describes the differences between multiple measurementsof an attribute
Statistical point of view:Accuracy : Mean square errorValidity : BiasReliability: correlation or difference,
agreement between two raters
Measurement Error Part 1 Bilbao 27-05-2019 HK 9 / 50
Types of measurement
I Representational measurementMeasurements relate to existing attributes of the objectsExamples: Length, weight, blood parameter
I Pragmatic measurementAn attribute is defined by its measuring procedure, no ’real’existence beyond thatExamples: Pain score, intelligence
Measurement Error Part 1 Bilbao 27-05-2019 HK 10 / 50
Sources of measurement error
I Induced by an instrument (laboratory value, blood pressure)
I Induced by medical doctors or patients
I Measurement error induced by definition, e.g. long term mean ofdaily fat intake”
I Surrogate -Variables e.g. mean of exposure in a region where thestudy participant lives instead of individual exposure
Measurement Error Part 1 Bilbao 27-05-2019 HK 11 / 50
Misclassification: Examples
I Wrong diagnosisnot diseased instead of diseased
I Wrong answer in a questionnaire Voted for the greensNo drugsDo not smoke
I Technical problems , e. g. classification of genes
I Classification by machine learning tool (e.g. classification of images)
I Problem of definition, e .g. Caries
I Randomized response
I Anonymisation of data
Measurement Error Part 1 Bilbao 27-05-2019 HK 12 / 50
General remarks
Before we start thinking of measurement error and misclassification, weshould answers to the folowing questions:
I Is there a true value and how is it defined ?
I What is the aim of our study concerning the variable havingmeasurement error ? (prediction or interpretation, outcome orpredictor ...)
Measurement Error Part 1 Bilbao 27-05-2019 HK 13 / 50
Notation
We have to distinguish between true (correctly measured, gold standard)variableand its (possible incorrect) measurementX ,W , Z - Notation (Carroll et al.)X : correctly (unobservable) VariableW : possibly incorrect measurement of XZ : Further correctly measured variables
ξ - X- Notation (Schneeweiß , Fuller)ξ : correctly (unobservable) VariableX : possibly incorrect measurement of X
* - Notation (HK)X ,Z ,Y : correctly (unobservable) VariableX ∗,Z∗,Y ∗: Corresponding possibly incorrect measurements
Measurement Error Part 1 Bilbao 27-05-2019 HK 14 / 50
One sample binary
Model for misclassificationY : true binary variable, gold standardY ∗ : observed value of Y , surrogate
P(Y ∗ = 1|Y = 1) = π11 (Sensitivity)
P(Y ∗ = 0|Y = 0) = π00 (Specificity)
P(Y ∗ = 0|Y = 1) = 1− π11 = π01
P(Y ∗ = 1|Y = 0) = 1− π00 = π10
→ (mis-) classification matrix (diffusion matrix)
Π =
(π00 π01π10 π11
)
Measurement Error Part 1 Bilbao 27-05-2019 HK 15 / 50
Effect of misclassification
Naive analysis: Simply ignore misclassificationWe want to estimate P(Y = 1)We use 1
n
∑ni=1 Y
∗i
P(Y ∗ = 1) = π11P(Y = 1) + π10P(Y = 0)
P(Y ∗ = 1)− P(Y = 1) = π10P(Y = 0)− π01P(Y = 1)
−→ Examples:
No bias if P(Y = 1) = 12 and π00 = π11
Neg. bias if P(Y = 1) = 0.9 and π00 = π11 = 0.9→ Bias = −0.1 · 0.9 + 0.1 · 0.1 = −0.08Pos. bias if P(Y = 1) = 0.8 and π11 = 0.99, π00 = 0.9→ Bias = −0.01 · 0.8 + 0.1 · 0.9 = 0.01
Measurement Error Part 1 Bilbao 27-05-2019 HK 16 / 50
Effect of Misclassification
Everything can happendependent on π11, π00 and P(Y = 1).
However, if π00 = π11 (in most times unrealistic) then
Bias = π00(1− 2P(Y = 1))
P(Y = 1) > 0.5 =⇒ bias < 0
P(Y = 1) < 0.5 =⇒ bias > 0
Attenuation towards 0.5
Measurement Error Part 1 Bilbao 27-05-2019 HK 17 / 50
Correction
Idea: Solve the bias equationNote that X ∗ is still binomial and P(X ∗ = 1) can be consistentlyestimated from the observed data.
P(Y ∗ = 1) = π11P(Y = 1) + π10(1− P(Y = 1))
⇒ P(Y = 1) = (P(Y ∗ = 1)− π10)/(π11 + π00 − 1)
Assumptions
I π11 and π00 known
I π11 + π00 > 1
Variance factor (π11 + π00 − 1)−2
Measurement Error Part 1 Bilbao 27-05-2019 HK 18 / 50
Multinomial case
Y is multinomial with categories 1, . . . , k .Y ∗ is observedThe error model is given by the classification Matrix
Π = {πij}
with πij = P(Y ∗ = i |Y = j). Then we get for the probability vectors
PY = (py1, . . . , pyk)′
PY ∗ = (p∗y1, . . . , p∗yk)′
PY ∗ = Π ∗ PY
Measurement Error Part 1 Bilbao 27-05-2019 HK 19 / 50
The matrix method
The correction method is given by
PY := Π−1 ∗ PY ∗ (1)
Properties
I Classification matrix has to be known or estimated
I Gives sometimes probabilities > 1 or < 0
I Variance calculation straight forward
I Use the delta method in the case of estimated Π, Greenland (1988)
Measurement Error Part 1 Bilbao 27-05-2019 HK 20 / 50
Prevalence estimation from the Signal- Tandmobiel study
I Oral health study involving 4468 children in Flanders
I Y=1 if the tooth is decayed, missing due to caries or filled
I 16 examiners with high MC on Y
I Validation study also used for two regions
I Validation data from 3 validation studies
I Simple correction in two regions: East and West
Measurement Error Part 1 Bilbao 27-05-2019 HK 21 / 50
0.1
0.2
0.3
0.4
0.5
0.6
�
�
1996
��
1997
�
�
1998
�
�
1999
�
�
2000
�
�
2001
1
Measurement Error Part 1 Bilbao 27-05-2019 HK 22 / 50
0.1
0.2
0.3
0.4
0.5
0.6
�
�
1996
�
�
1997
�
�
1998
�
�
1999
�
�
2000
�
�
2001
1
Measurement Error Part 1 Bilbao 27-05-2019 HK 23 / 50
Results
Estimated prevalence using data from the validation study
I Corrections
I Huge confidence limits
I MC Matrix possibly overestimated
Measurement Error Part 1 Bilbao 27-05-2019 HK 24 / 50
Information about misclassification
There are three basic strategies:
I Assumption, external validation data
I Internal validation data
I Replication data
Measurement Error Part 1 Bilbao 27-05-2019 HK 25 / 50
Assumption, external validation data
Examples
I Certain type of diagnosis
I Technical applications
I Results from other studies (be very careful!)
I Interpretation as sensitivity analysis
Note that ignoring misclassification assumes πij = 0!
Measurement Error Part 1 Bilbao 27-05-2019 HK 26 / 50
Internal validation dataExamples
I Caries study: examiners were compared to a gold standard
I Controlling a part of a questionnaire by a doctor
I Ex post check of a diagnosis
Measurement Error Part 1 Bilbao 27-05-2019 HK 27 / 50
Calibration Model
X : true binary variable, gold standard examinerX ∗ : observed value of X, surrogate
P(X = 1|X ∗ = 1) (positive predicted value)
P(X = 0|X ∗ = 0) (negative. Predicted value)
can be calculated from MC-Matrix and marginal Distribution of X (i.e.from P(X = 1))
Measurement Error Part 1 Bilbao 27-05-2019 HK 28 / 50
Replication
If no gold standard is available measurements are replicated.
I Requirement: measurements have to be conditional independent onthe true value
I Identifiability conditions for multinomial case
Measurement Error Part 1 Bilbao 27-05-2019 HK 29 / 50
Two independent measurements
We observe X ∗i1,X∗i2 ,i.e a 2 ∗ 2 –table:
X ∗1 = 0 X ∗1 = 1
X ∗2 = 0 n00 n10X ∗2 = 1 n01 n11
Assuming independence and constant MC we get :
P(X ∗1 = 0,X ∗2 = 0) = P(X = 0) ∗ π200 + P(X = 1) ∗ π2
01)
P(X ∗1 = 1,X ∗2 = 1) = P(X = 0)π210 + P(X = 1)π2
11
P(X ∗1 = 0,X ∗2 = 1) = P(X = 0) ∗ π00π10 + P(X = 1)π01π11
P(X ∗1 = 1,X ∗2 = 0) = P(X ∗1 = 0,X ∗2 = 1)
Two independent equations, but three unknown parameters!⇒ We cannot estimate the MC-Matrix and P(X=1)!!
Measurement Error Part 1 Bilbao 27-05-2019 HK 30 / 50
Identified problems
Literature about diagnostic testsThree independent Measurements: Three independent equations threeunknowns. Explicit solution availableFurther assumptions : Error in Haplotype reconstruction same MC matrixfor each gene
Measurement Error Part 1 Bilbao 27-05-2019 HK 31 / 50
Kappa Statistics
Basic idea : Evaluate agreement and adjust for agreement by chanceMeasuring agreement:
n00 + n11n
X ∗1 = 0 X ∗1 = 1X ∗2 = 0 10 2X ∗2 = 1 2 0
X ∗1 = 0 X ∗1 = 1X ∗2 = 0 5 2X ∗2 = 1 2 5
Same proportion of agreement, but different situation !!
Measurement Error Part 1 Bilbao 27-05-2019 HK 32 / 50
Definition of Kappa
Po =n00 + n11
n
Pe =n0.n.0n2
+n1.n.1n2
κ = (Po − Pe)/(1− Pe)
X ∗1 = 0 X ∗1 = 1X ∗2 = 0 10 2X ∗2 = 1 2 0
X ∗1 = 0 X ∗1 = 1X ∗2 = 0 5 2X ∗2 = 1 2 5
κ < 0 κ = 0.428
Measurement Error Part 1 Bilbao 27-05-2019 HK 33 / 50
Kappa and MC-Matrix
Kappa depends on the MC-Matrix and marginal distribution P(X=1)Fixed MC-Matrix π00 = 0.9, π11 = 0.7 (l)Sensitivity and specificity which result in κ = 0.5 for P(X = 1) = 0.2(r)
0,6
0,3
0,4
0,2
0,1
0,2
0
0
prx
10,8
Kappa sens
0,950,9
spec
0,85
1
0,8
0,9
0,8
0,75
0,7
0,6
0,70,5
Kappa=0.5
Measurement Error Part 1 Bilbao 27-05-2019 HK 34 / 50
Bivariate analysis
Binary exposure: XDisease status: YMeasurement of disease: Y ∗
Model for misclassification:
π110 = P(Y ∗ = 1|Y = 1,X = 0)
π111 = P(Y ∗ = 1|Y = 1,X = 1)
π100 = P(Y ∗ = 1|Y = 0,X = 0)
π101 = P(Y ∗ = 1|Y = 0,X = 1)
Non differential misclassification if
π110 = π111 and π100 = π101,
i.e. misclassification independent of exposure
Measurement Error Part 1 Bilbao 27-05-2019 HK 35 / 50
Effect and correction
Use the results of one sample case:
P(Y ∗ = 1|X = 1) = π111P(Y = 1|X = 1) + π101P(Y = 0|X = 1)
P(Y ∗ = 1|X = 0) = π110P(Y = 1|X = 0) + π100P(Y = 0|X = 0)
If the misclassification error is non differential then:
P(Y ∗ = 1|X = 1)− P(Y ∗ = 1|X = 0) =
[P(Y = 1|X = 1)− P(Y = 1|X = 0)] (π11 + π00 − 1)
I Attenuation to 0
I Also for OR
I Correction by matrix method
Measurement Error Part 1 Bilbao 27-05-2019 HK 36 / 50
Misclassification in exposure
We observe X ∗ instead of XModel for misclassification:
π110 = P(X ∗ = 1|X = 1,Y = 0)
π111 = P(X ∗ = 1|X = 1,Y = 1)
π100 = P(X ∗ = 1|X = 0,Y = 0)
π101 = P(X ∗ = 1|X = 0,Y = 1)
Non differential misclassification ifπ110 = π111 and π100 = π101,i.e. misclassification independent of diseaseThis is fulfilled in most cohort studies, but could be violated in casecontrol studies
Measurement Error Part 1 Bilbao 27-05-2019 HK 37 / 50
Example for non differential misclassification error
high fat No Yes No Yes No Yes
cases 450 250 360 340 410 290
controls 900 100 720 280 740 260
Odds ratio 5.0 2.4 2.0
Correct 20%of No 20% of No s. YesClassification say Yes 20% of Yes s. No
Attenuation to OR = 1 Note: Everything can happen in case ofdifferential misclassification
Measurement Error Part 1 Bilbao 27-05-2019 HK 38 / 50
Likelihood
We assume non differential misclassification error
P(Y = 1,X ∗ = x∗) =∑x
P(Y = 1,X ∗ = x∗,X = x)
=∑x
P(Y = 1|X ∗ = x∗,X = x) ∗ P(X ∗ = x∗,X = x)
=∑x
P(Y = 1|X = x) ∗ P(X ∗ = x∗|X = x) ∗ P(X = x)
We have three components of the likelihood:Main model: P(Y = 1|X = x)Measurement model: P(X ∗ = x∗|X = x)Exposure model: P(X = x)
Measurement Error Part 1 Bilbao 27-05-2019 HK 39 / 50
Observed probabilities
P(Y = 1|X ∗ = x∗) =P(Y = 1,X ∗ = x∗|)
P(X ∗ = x∗)
P(Y = 1|X ∗ = 1)− P(Y = 1|X ∗ = 0)
P(Y = 1|X = 1)− P(Y = 1|X = 0)=
(π11 + π00 − 1)P(X = 1)P(X = 0)
P(X ∗ = 1)P(X ∗ = 0)< 1
see Gustafson (2004), p.35 ff Bias to 0 if (π11 + π00 − 1) > 0
Measurement Error Part 1 Bilbao 27-05-2019 HK 40 / 50
Misclassification in a confounder
X ∗ : Misclassified confounderZ : ExposureY : ResponseEven in the case of non differential measurement error with respect to Yand Z :
I Bias in both direction possible
I Residual confounding
I e.g. Savitz and Baron (1989)
Measurement Error Part 1 Bilbao 27-05-2019 HK 41 / 50
Correction methods
I Matrix method: The two by two table can be seen as onemultinomial variable
I Variance estimation see Greenland(1988)
I MLE for unrestricted sampling
I Alternatives by Tennebein (1972)
Measurement Error Part 1 Bilbao 27-05-2019 HK 42 / 50
Misclassification in regression
General Regression Model
E (Y |X1, . . . ,Xk) = h(β0 + β1X1 + . . .+ βkXk) h: Link-function
Misclassification possibly on
I binary covariates: Observe X ∗ instead of X
I binary response : Observe Y ∗ instead of Y
Measurement Error Part 1 Bilbao 27-05-2019 HK 43 / 50
Handling misclassification in Y in binary regression
I Hausmann et al. (Journal of Econometrics, 1998)
I Neuhaus (Biometrika, 1999)
We observe Y ∗ instead of Y with misclassification matrix Π
P(Y ∗ = 1|X ) = π11G(x ′β) + (1− π00)(1− G(x ′β)) = H(x ′β)
H(t) = π11G(t) + (1− π00)(1− G(t))
Measurement Error Part 1 Bilbao 27-05-2019 HK 44 / 50
Observed regression function
Logistic regression with misclassified Y
0
0.2
0.4
0.6
0.8
1
–4 –3 –2 –1 1 2 3 4X
Logistic regression with misclassified Y
Measurement Error Part 1 Bilbao 27-05-2019 HK 45 / 50
Misclassification in regressors
One binary regressor, normal Outcome:Y = β0 + β1I1, β0 = µ0, β1 = µ1− µ0
Naive analysis:E (Y |X ∗ = 0)) = P(X = 0|X ∗ = 0) ∗ µ0 + P(X = 1|X ∗ = 0) ∗ µ1
E (Y |X ∗ = 0) = P(X = 0|X ∗ = 1) ∗ µ0 + P(X = 1|X ∗ = 1) ∗ µ1
These equations can be solved for µ1 and µ2, when MC Matrix andP(X=0) is knownMatrix Method
Measurement Error Part 1 Bilbao 27-05-2019 HK 46 / 50
Likelihood
L(Y ,X ∗) =∑x
L(Y ,X ∗,X = x)
=∑x
L(Y |X ∗ = x∗,X = x) ∗ P(X ∗ = x∗,X = x)
=∑x
L(Y |X = x) ∗ P(X ∗ = x∗|X = x) ∗ P(X = x)
Likelihood for many regression models numerically easy to handleComponents of the misclassification model and its components can beadded.
Measurement Error Part 1 Bilbao 27-05-2019 HK 47 / 50
Effects of misclassification
I Biased and inconsistent estimates for parameters
I In most cases attenuation to 0
I In complex settings bias in any direction possible
I Effect dependent on the misclassification matrix
I Similar to effect of measurement error in continuous variables inregression
Measurement Error Part 1 Bilbao 27-05-2019 HK 48 / 50
Hypothesis testing
Attenuation →I Naive tests (e. g. for no true effect in a 2x2 table) have still correct
significance level
I Power reduction
I Sample size calculation has to be corrected
Measurement Error Part 1 Bilbao 27-05-2019 HK 49 / 50
Outlook
I Use of validation data
I Latent class analysis
Measurement Error Part 1 Bilbao 27-05-2019 HK 50 / 50