Measuring Agreement

1

Measuring Agreement

2

Introduction Different types of agreement Diagnosis by different methods

Do both methods give the same results? Disease absent or Disease present

Staging of carcinomas Will different methods lead to the same results? Will different raters lead to the same results?

Measurements of blood pressure How consistent are measurements made

Using different devices? With different observers? At different times?

3

Investigating agreement

Need to consider Data type

Categorical or continuous How are the data repeated?

Measuring instrument (s), rater(s), time(s) The goal

Are ratings consistent? Estimate the magnitude of differences between

measurements Investigate factors that affect ratings

Number of raters

4

Data type

Categorical Binary

Disease absent, disease present Nominal

Hepatitis Viral A, B, C, D, E or autoimmune

Ordinal Severity of disease

Mild, moderate, severe Continuous

Size of tumour Blood pressure

5

How are data repeated? Same person, same measuring instrument

Different observers Inter-rater reliability

Same observer at different times Intra-rater reliability

Repeatability

Internal consistency Do the items of a test measure the same attribute?

6

Measures of agreement Categorical

Kappa Weighted Fleiss’

Continuous Limits of agreement Coefficient of variation (CV) Intraclass Correlation (ICC)

Cronbach’s Internal consistency

7

Number of raters

Two

Three or more

8

Categorical data: two raters

Kappa Magnitude quoted ≥0.75 Excellent, 0.40 to 0.75 Fair to good, < 0.40 as Poor 0 to 0.20 Slight, >0.20 to 0.40 Fair, >0.40 to 0.60 Moderate,

>0.60 to 0.80 Substantial, >0.80 Almost perfect

Degree of disagreement can be included Weighted kappa

Values close together do not count to disagreement as much as those further apart

Linear / quadratic weightings

9

Categorical data: > two raters

Different tests forBinomial dataData with more than two categories

Online calculatorshttp://www.vassarstats.net/kappa.html

10

Example 1 Two raters

Scores 1 to 5

Unweighted kappa 0.79, 95% CI (0.62 to 0.96) Linear weighting 0.84, 95% CI (0.70 to 0.98) Quadratic weighting 0.90, 95% CI (0.77 to 1.00)

11

Example 2

Binomial data Two raters Two ratings each Inter-rater agreement Intra-rater agreement

12

Example 2 ctd.

Inter-rater agreement Kappa1,2= 0.865 (P<0.001) Kappa1,3= 0.054 (P=0.765) Kappa2,3= -0.071 (P=0.696)

Intra-rater agreement Kappa1= 0.800 (P<0.001) Kappa2= 0.790 (P<0.001) Kappa3= 0.000 (P=1.000)

13

Continuous data

Test for bias Check differences not related to magnitude Calculate mean and SD of differences Limits of agreement Coefficient of variation ICC

14

Test for bias

Student’s paired t (mean) Wilcoxon matched pairs (median) If there is bias, agreement cannot be

investigated further

15

Example 3: Test for bias

Paired t test P=0.362 No bias

16

Check differences unrelated to magnitude

Clearly no relationship

17

Calculate Mean and SD differences

this is s

N MeanStd.

Deviation

Difference 17 4.9412 21.72404

Valid N (listwise) 17

this is mean

18

Limits of agreement

Lower limit of agreement (LLA) = mean - 1.96×s = -37.6 Upper limit of agreement (ULA) = mean + 1.96×s = 47.5 95% of differences between a pair of measurements for an

individual lie in (-37.6, 47.5)

19

Coefficient of variation

Measure of variability of differences Expressed as a proportion of the average

measured value

Suitable when error (the differences between pairs) increases with the measured values Other measures require this not to be the case

100 × s ÷ mean of the measurements 100 × 21.72 ÷ 447.88 4.85%

20

Intraclass Correlation

Continuous data Two or more sets of measurements Measure of correlation that adjusts for

differences in scale Several models

Absolute agreement of consistency Raters chosen randomly or same raters

throughout Single or average measures

21

Intraclass Correlation

≥0.75 Excellent 0.4 to 0.75 Fair to Good <0.4 Poor

22

Cronbach’s α Internal consistency

Total scores Several components.

α ≥0.8 good ≥0.7 adequate

23

Investigating agreement

Data type Categorical

Chi squared Continuous

Limits of agreement Coefficient of variation Intraclass correlation

How are the data repeated? Measuring instrument (s), rater(s), time(s)

Number of raters Two

Straightforward Three or more

Help!

Date post:	06-Jan-2016
Category:	Documents
Upload:	long
View:	16 times
Download:	0 times

Measuring Agreement

Documents