+ All Categories
Home > Documents > Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS...

Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS...

Date post: 07-Aug-2020
Category:
Upload: others
View: 8 times
Download: 0 times
Share this document with a friend
50
Measurement Error and Misclassification in statistical models: Basics and applications bcam Bilbao Helmut K¨ uchenhoff Statistical Consulting Unit Ludwig-Maximilians-Universit¨ at M¨ unchen Bilbao 27-05-2019 Measurement Error Part 1 Bilbao 27-05-2019 HK 1 / 50
Transcript
Page 1: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Measurement Error and Misclassification instatistical models: Basics and applications

bcam Bilbao

Helmut KuchenhoffStatistical Consulting Unit

Ludwig-Maximilians-Universitat Munchen

Bilbao27-05-2019

Measurement Error Part 1 Bilbao 27-05-2019 HK 1 / 50

Page 2: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Schedule

Monday 1. Introcduction and Misclassification: Basic ModelsTuesday 2. Measurement error: Effect and ModelsWednesday 3. Methods for Estimation in the presence

of Measurement errorThursday 4. Simulation and Extrapolation (SIMEX)

for Misclassification and measurement error:Concept and Examples

Friday 5. Case studies:Uncertainty of diagnosis, Exposure assessment

Measurement Error Part 1 Bilbao 27-05-2019 HK 2 / 50

Page 3: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Material

I Carroll R. J. , D. Ruppert , L. Stefanski and CRainiceanu, C :Measurement Error in Nonlinear Models. A Modern perspective.Chapman & Hall London 2006.

I Gustafson, P. : Measurement Error and Misclassification in Statisticsand Epidemiology. Impacts and Bayesian Correction & CRC Press,Boca Raton 2004.

Measurement Error Part 1 Bilbao 27-05-2019 HK 3 / 50

Page 4: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Outline

I Measurement

I Misclassification

I Model

I Effect

I Correction

Measurement Error Part 1 Bilbao 27-05-2019 HK 4 / 50

Page 5: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

MeasurementMuseum of modern Art in Barcelona

Measurement Error Part 1 Bilbao 27-05-2019 HK 5 / 50

Page 6: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Measurement

Measurement is the contact of reason with nature(Henry Margenau)Nearly all the grandest discoveries of science have been but therewards of accurate measurement(Lord Kelvin)Measurement is the basis for producing dataLiterature: David Hand: Measurement. Theory and practice . The worldthrough quantification. (Arnold,2004)

Measurement Error Part 1 Bilbao 27-05-2019 HK 6 / 50

Page 7: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Basics

Peter −→ 1.84Stefan −→ 1.91Laura −→ 1.72

Measurement is a assignment of a number to a characteristic of anobject. This measurement is to be compared with other objects.

Measurement: A structure preserving function (homomorphism)

Peter is smaller than ⇔ 1.84 < 1.91

Measurement Error Part 1 Bilbao 27-05-2019 HK 7 / 50

Page 8: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Levels of scaling

This can be defined by the structure in the objectsonly relation equal - non equal: nominalsmaller : ordinal scaledifferences : metric scaledifferences and ratios : Ratio scale

Measurement Error Part 1 Bilbao 27-05-2019 HK 8 / 50

Page 9: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Accuracy, Validity and Reliability

I Accuracy: General term, describing how closely a measurementreproduces the attribute being measured

I Validity: How well the measurement captures the true attribute orhow well it captures the concept which is targeted to be measured

I Reliability describes the differences between multiple measurementsof an attribute

Statistical point of view:Accuracy : Mean square errorValidity : BiasReliability: correlation or difference,

agreement between two raters

Measurement Error Part 1 Bilbao 27-05-2019 HK 9 / 50

Page 10: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Types of measurement

I Representational measurementMeasurements relate to existing attributes of the objectsExamples: Length, weight, blood parameter

I Pragmatic measurementAn attribute is defined by its measuring procedure, no ’real’existence beyond thatExamples: Pain score, intelligence

Measurement Error Part 1 Bilbao 27-05-2019 HK 10 / 50

Page 11: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Sources of measurement error

I Induced by an instrument (laboratory value, blood pressure)

I Induced by medical doctors or patients

I Measurement error induced by definition, e.g. long term mean ofdaily fat intake”

I Surrogate -Variables e.g. mean of exposure in a region where thestudy participant lives instead of individual exposure

Measurement Error Part 1 Bilbao 27-05-2019 HK 11 / 50

Page 12: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Misclassification: Examples

I Wrong diagnosisnot diseased instead of diseased

I Wrong answer in a questionnaire Voted for the greensNo drugsDo not smoke

I Technical problems , e. g. classification of genes

I Classification by machine learning tool (e.g. classification of images)

I Problem of definition, e .g. Caries

I Randomized response

I Anonymisation of data

Measurement Error Part 1 Bilbao 27-05-2019 HK 12 / 50

Page 13: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

General remarks

Before we start thinking of measurement error and misclassification, weshould answers to the folowing questions:

I Is there a true value and how is it defined ?

I What is the aim of our study concerning the variable havingmeasurement error ? (prediction or interpretation, outcome orpredictor ...)

Measurement Error Part 1 Bilbao 27-05-2019 HK 13 / 50

Page 14: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Notation

We have to distinguish between true (correctly measured, gold standard)variableand its (possible incorrect) measurementX ,W , Z - Notation (Carroll et al.)X : correctly (unobservable) VariableW : possibly incorrect measurement of XZ : Further correctly measured variables

ξ - X- Notation (Schneeweiß , Fuller)ξ : correctly (unobservable) VariableX : possibly incorrect measurement of X

* - Notation (HK)X ,Z ,Y : correctly (unobservable) VariableX ∗,Z∗,Y ∗: Corresponding possibly incorrect measurements

Measurement Error Part 1 Bilbao 27-05-2019 HK 14 / 50

Page 15: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

One sample binary

Model for misclassificationY : true binary variable, gold standardY ∗ : observed value of Y , surrogate

P(Y ∗ = 1|Y = 1) = π11 (Sensitivity)

P(Y ∗ = 0|Y = 0) = π00 (Specificity)

P(Y ∗ = 0|Y = 1) = 1− π11 = π01

P(Y ∗ = 1|Y = 0) = 1− π00 = π10

→ (mis-) classification matrix (diffusion matrix)

Π =

(π00 π01π10 π11

)

Measurement Error Part 1 Bilbao 27-05-2019 HK 15 / 50

Page 16: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Effect of misclassification

Naive analysis: Simply ignore misclassificationWe want to estimate P(Y = 1)We use 1

n

∑ni=1 Y

∗i

P(Y ∗ = 1) = π11P(Y = 1) + π10P(Y = 0)

P(Y ∗ = 1)− P(Y = 1) = π10P(Y = 0)− π01P(Y = 1)

−→ Examples:

No bias if P(Y = 1) = 12 and π00 = π11

Neg. bias if P(Y = 1) = 0.9 and π00 = π11 = 0.9→ Bias = −0.1 · 0.9 + 0.1 · 0.1 = −0.08Pos. bias if P(Y = 1) = 0.8 and π11 = 0.99, π00 = 0.9→ Bias = −0.01 · 0.8 + 0.1 · 0.9 = 0.01

Measurement Error Part 1 Bilbao 27-05-2019 HK 16 / 50

Page 17: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Effect of Misclassification

Everything can happendependent on π11, π00 and P(Y = 1).

However, if π00 = π11 (in most times unrealistic) then

Bias = π00(1− 2P(Y = 1))

P(Y = 1) > 0.5 =⇒ bias < 0

P(Y = 1) < 0.5 =⇒ bias > 0

Attenuation towards 0.5

Measurement Error Part 1 Bilbao 27-05-2019 HK 17 / 50

Page 18: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Correction

Idea: Solve the bias equationNote that X ∗ is still binomial and P(X ∗ = 1) can be consistentlyestimated from the observed data.

P(Y ∗ = 1) = π11P(Y = 1) + π10(1− P(Y = 1))

⇒ P(Y = 1) = (P(Y ∗ = 1)− π10)/(π11 + π00 − 1)

Assumptions

I π11 and π00 known

I π11 + π00 > 1

Variance factor (π11 + π00 − 1)−2

Measurement Error Part 1 Bilbao 27-05-2019 HK 18 / 50

Page 19: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Multinomial case

Y is multinomial with categories 1, . . . , k .Y ∗ is observedThe error model is given by the classification Matrix

Π = {πij}

with πij = P(Y ∗ = i |Y = j). Then we get for the probability vectors

PY = (py1, . . . , pyk)′

PY ∗ = (p∗y1, . . . , p∗yk)′

PY ∗ = Π ∗ PY

Measurement Error Part 1 Bilbao 27-05-2019 HK 19 / 50

Page 20: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

The matrix method

The correction method is given by

PY := Π−1 ∗ PY ∗ (1)

Properties

I Classification matrix has to be known or estimated

I Gives sometimes probabilities > 1 or < 0

I Variance calculation straight forward

I Use the delta method in the case of estimated Π, Greenland (1988)

Measurement Error Part 1 Bilbao 27-05-2019 HK 20 / 50

Page 21: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Prevalence estimation from the Signal- Tandmobiel study

I Oral health study involving 4468 children in Flanders

I Y=1 if the tooth is decayed, missing due to caries or filled

I 16 examiners with high MC on Y

I Validation study also used for two regions

I Validation data from 3 validation studies

I Simple correction in two regions: East and West

Measurement Error Part 1 Bilbao 27-05-2019 HK 21 / 50

Page 22: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

0.1

0.2

0.3

0.4

0.5

0.6

1996

��

1997

1998

1999

2000

2001

1

Measurement Error Part 1 Bilbao 27-05-2019 HK 22 / 50

Page 23: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

0.1

0.2

0.3

0.4

0.5

0.6

1996

1997

1998

1999

2000

2001

1

Measurement Error Part 1 Bilbao 27-05-2019 HK 23 / 50

Page 24: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Results

Estimated prevalence using data from the validation study

I Corrections

I Huge confidence limits

I MC Matrix possibly overestimated

Measurement Error Part 1 Bilbao 27-05-2019 HK 24 / 50

Page 25: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Information about misclassification

There are three basic strategies:

I Assumption, external validation data

I Internal validation data

I Replication data

Measurement Error Part 1 Bilbao 27-05-2019 HK 25 / 50

Page 26: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Assumption, external validation data

Examples

I Certain type of diagnosis

I Technical applications

I Results from other studies (be very careful!)

I Interpretation as sensitivity analysis

Note that ignoring misclassification assumes πij = 0!

Measurement Error Part 1 Bilbao 27-05-2019 HK 26 / 50

Page 27: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Internal validation dataExamples

I Caries study: examiners were compared to a gold standard

I Controlling a part of a questionnaire by a doctor

I Ex post check of a diagnosis

Measurement Error Part 1 Bilbao 27-05-2019 HK 27 / 50

Page 28: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Calibration Model

X : true binary variable, gold standard examinerX ∗ : observed value of X, surrogate

P(X = 1|X ∗ = 1) (positive predicted value)

P(X = 0|X ∗ = 0) (negative. Predicted value)

can be calculated from MC-Matrix and marginal Distribution of X (i.e.from P(X = 1))

Measurement Error Part 1 Bilbao 27-05-2019 HK 28 / 50

Page 29: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Replication

If no gold standard is available measurements are replicated.

I Requirement: measurements have to be conditional independent onthe true value

I Identifiability conditions for multinomial case

Measurement Error Part 1 Bilbao 27-05-2019 HK 29 / 50

Page 30: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Two independent measurements

We observe X ∗i1,X∗i2 ,i.e a 2 ∗ 2 –table:

X ∗1 = 0 X ∗1 = 1

X ∗2 = 0 n00 n10X ∗2 = 1 n01 n11

Assuming independence and constant MC we get :

P(X ∗1 = 0,X ∗2 = 0) = P(X = 0) ∗ π200 + P(X = 1) ∗ π2

01)

P(X ∗1 = 1,X ∗2 = 1) = P(X = 0)π210 + P(X = 1)π2

11

P(X ∗1 = 0,X ∗2 = 1) = P(X = 0) ∗ π00π10 + P(X = 1)π01π11

P(X ∗1 = 1,X ∗2 = 0) = P(X ∗1 = 0,X ∗2 = 1)

Two independent equations, but three unknown parameters!⇒ We cannot estimate the MC-Matrix and P(X=1)!!

Measurement Error Part 1 Bilbao 27-05-2019 HK 30 / 50

Page 31: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Identified problems

Literature about diagnostic testsThree independent Measurements: Three independent equations threeunknowns. Explicit solution availableFurther assumptions : Error in Haplotype reconstruction same MC matrixfor each gene

Measurement Error Part 1 Bilbao 27-05-2019 HK 31 / 50

Page 32: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Kappa Statistics

Basic idea : Evaluate agreement and adjust for agreement by chanceMeasuring agreement:

n00 + n11n

X ∗1 = 0 X ∗1 = 1X ∗2 = 0 10 2X ∗2 = 1 2 0

X ∗1 = 0 X ∗1 = 1X ∗2 = 0 5 2X ∗2 = 1 2 5

Same proportion of agreement, but different situation !!

Measurement Error Part 1 Bilbao 27-05-2019 HK 32 / 50

Page 33: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Definition of Kappa

Po =n00 + n11

n

Pe =n0.n.0n2

+n1.n.1n2

κ = (Po − Pe)/(1− Pe)

X ∗1 = 0 X ∗1 = 1X ∗2 = 0 10 2X ∗2 = 1 2 0

X ∗1 = 0 X ∗1 = 1X ∗2 = 0 5 2X ∗2 = 1 2 5

κ < 0 κ = 0.428

Measurement Error Part 1 Bilbao 27-05-2019 HK 33 / 50

Page 34: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Kappa and MC-Matrix

Kappa depends on the MC-Matrix and marginal distribution P(X=1)Fixed MC-Matrix π00 = 0.9, π11 = 0.7 (l)Sensitivity and specificity which result in κ = 0.5 for P(X = 1) = 0.2(r)

0,6

0,3

0,4

0,2

0,1

0,2

0

0

prx

10,8

Kappa sens

0,950,9

spec

0,85

1

0,8

0,9

0,8

0,75

0,7

0,6

0,70,5

Kappa=0.5

Measurement Error Part 1 Bilbao 27-05-2019 HK 34 / 50

Page 35: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Bivariate analysis

Binary exposure: XDisease status: YMeasurement of disease: Y ∗

Model for misclassification:

π110 = P(Y ∗ = 1|Y = 1,X = 0)

π111 = P(Y ∗ = 1|Y = 1,X = 1)

π100 = P(Y ∗ = 1|Y = 0,X = 0)

π101 = P(Y ∗ = 1|Y = 0,X = 1)

Non differential misclassification if

π110 = π111 and π100 = π101,

i.e. misclassification independent of exposure

Measurement Error Part 1 Bilbao 27-05-2019 HK 35 / 50

Page 36: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Effect and correction

Use the results of one sample case:

P(Y ∗ = 1|X = 1) = π111P(Y = 1|X = 1) + π101P(Y = 0|X = 1)

P(Y ∗ = 1|X = 0) = π110P(Y = 1|X = 0) + π100P(Y = 0|X = 0)

If the misclassification error is non differential then:

P(Y ∗ = 1|X = 1)− P(Y ∗ = 1|X = 0) =

[P(Y = 1|X = 1)− P(Y = 1|X = 0)] (π11 + π00 − 1)

I Attenuation to 0

I Also for OR

I Correction by matrix method

Measurement Error Part 1 Bilbao 27-05-2019 HK 36 / 50

Page 37: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Misclassification in exposure

We observe X ∗ instead of XModel for misclassification:

π110 = P(X ∗ = 1|X = 1,Y = 0)

π111 = P(X ∗ = 1|X = 1,Y = 1)

π100 = P(X ∗ = 1|X = 0,Y = 0)

π101 = P(X ∗ = 1|X = 0,Y = 1)

Non differential misclassification ifπ110 = π111 and π100 = π101,i.e. misclassification independent of diseaseThis is fulfilled in most cohort studies, but could be violated in casecontrol studies

Measurement Error Part 1 Bilbao 27-05-2019 HK 37 / 50

Page 38: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Example for non differential misclassification error

high fat No Yes No Yes No Yes

cases 450 250 360 340 410 290

controls 900 100 720 280 740 260

Odds ratio 5.0 2.4 2.0

Correct 20%of No 20% of No s. YesClassification say Yes 20% of Yes s. No

Attenuation to OR = 1 Note: Everything can happen in case ofdifferential misclassification

Measurement Error Part 1 Bilbao 27-05-2019 HK 38 / 50

Page 39: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Likelihood

We assume non differential misclassification error

P(Y = 1,X ∗ = x∗) =∑x

P(Y = 1,X ∗ = x∗,X = x)

=∑x

P(Y = 1|X ∗ = x∗,X = x) ∗ P(X ∗ = x∗,X = x)

=∑x

P(Y = 1|X = x) ∗ P(X ∗ = x∗|X = x) ∗ P(X = x)

We have three components of the likelihood:Main model: P(Y = 1|X = x)Measurement model: P(X ∗ = x∗|X = x)Exposure model: P(X = x)

Measurement Error Part 1 Bilbao 27-05-2019 HK 39 / 50

Page 40: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Observed probabilities

P(Y = 1|X ∗ = x∗) =P(Y = 1,X ∗ = x∗|)

P(X ∗ = x∗)

P(Y = 1|X ∗ = 1)− P(Y = 1|X ∗ = 0)

P(Y = 1|X = 1)− P(Y = 1|X = 0)=

(π11 + π00 − 1)P(X = 1)P(X = 0)

P(X ∗ = 1)P(X ∗ = 0)< 1

see Gustafson (2004), p.35 ff Bias to 0 if (π11 + π00 − 1) > 0

Measurement Error Part 1 Bilbao 27-05-2019 HK 40 / 50

Page 41: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Misclassification in a confounder

X ∗ : Misclassified confounderZ : ExposureY : ResponseEven in the case of non differential measurement error with respect to Yand Z :

I Bias in both direction possible

I Residual confounding

I e.g. Savitz and Baron (1989)

Measurement Error Part 1 Bilbao 27-05-2019 HK 41 / 50

Page 42: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Correction methods

I Matrix method: The two by two table can be seen as onemultinomial variable

I Variance estimation see Greenland(1988)

I MLE for unrestricted sampling

I Alternatives by Tennebein (1972)

Measurement Error Part 1 Bilbao 27-05-2019 HK 42 / 50

Page 43: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Misclassification in regression

General Regression Model

E (Y |X1, . . . ,Xk) = h(β0 + β1X1 + . . .+ βkXk) h: Link-function

Misclassification possibly on

I binary covariates: Observe X ∗ instead of X

I binary response : Observe Y ∗ instead of Y

Measurement Error Part 1 Bilbao 27-05-2019 HK 43 / 50

Page 44: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Handling misclassification in Y in binary regression

I Hausmann et al. (Journal of Econometrics, 1998)

I Neuhaus (Biometrika, 1999)

We observe Y ∗ instead of Y with misclassification matrix Π

P(Y ∗ = 1|X ) = π11G(x ′β) + (1− π00)(1− G(x ′β)) = H(x ′β)

H(t) = π11G(t) + (1− π00)(1− G(t))

Measurement Error Part 1 Bilbao 27-05-2019 HK 44 / 50

Page 45: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Observed regression function

Logistic regression with misclassified Y

0

0.2

0.4

0.6

0.8

1

–4 –3 –2 –1 1 2 3 4X

Logistic regression with misclassified Y

Measurement Error Part 1 Bilbao 27-05-2019 HK 45 / 50

Page 46: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Misclassification in regressors

One binary regressor, normal Outcome:Y = β0 + β1I1, β0 = µ0, β1 = µ1− µ0

Naive analysis:E (Y |X ∗ = 0)) = P(X = 0|X ∗ = 0) ∗ µ0 + P(X = 1|X ∗ = 0) ∗ µ1

E (Y |X ∗ = 0) = P(X = 0|X ∗ = 1) ∗ µ0 + P(X = 1|X ∗ = 1) ∗ µ1

These equations can be solved for µ1 and µ2, when MC Matrix andP(X=0) is knownMatrix Method

Measurement Error Part 1 Bilbao 27-05-2019 HK 46 / 50

Page 47: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Likelihood

L(Y ,X ∗) =∑x

L(Y ,X ∗,X = x)

=∑x

L(Y |X ∗ = x∗,X = x) ∗ P(X ∗ = x∗,X = x)

=∑x

L(Y |X = x) ∗ P(X ∗ = x∗|X = x) ∗ P(X = x)

Likelihood for many regression models numerically easy to handleComponents of the misclassification model and its components can beadded.

Measurement Error Part 1 Bilbao 27-05-2019 HK 47 / 50

Page 48: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Effects of misclassification

I Biased and inconsistent estimates for parameters

I In most cases attenuation to 0

I In complex settings bias in any direction possible

I Effect dependent on the misclassification matrix

I Similar to effect of measurement error in continuous variables inregression

Measurement Error Part 1 Bilbao 27-05-2019 HK 48 / 50

Page 49: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Hypothesis testing

Attenuation →I Naive tests (e. g. for no true effect in a 2x2 table) have still correct

significance level

I Power reduction

I Sample size calculation has to be corrected

Measurement Error Part 1 Bilbao 27-05-2019 HK 49 / 50

Page 50: Measurement Error and Misclassification in statistical ...idaejin.github.io/courses/BCAM-AS courses/material/ME_HK_2705.pdf · Accuracy, Validity and Reliability I Accuracy: General

Outlook

I Use of validation data

I Latent class analysis

Measurement Error Part 1 Bilbao 27-05-2019 HK 50 / 50


Recommended