S TEP BY S TEP A PPROACH TO E VALUATION AND C OMPARISON OF A NALYTICAL M ETHODS

transcript

STEP BY STEP APPROACH TO EVALUATION AND COMPARISON

OF ANALYTICAL METHODS

J M KUYL

Department of Chemical Pathology

NHLS Universitas & UFS

Physicists have a long tradition of building their own

equipment, and are often fascinated by its

mechanics. Biologists’ fascination is primarily with

the mechanics of nature and, for many, the

machines themselves are simply tools –

complicated ‘black boxes’ that produce the results

they need. It doesn’t help that the tools biologists

are using may have been designed by physicists,

and that the two groups tend to use different jargon.

Nature 2007; 447: 116

INTRODUCTION

• Quantitative analytical methods have become more reliable and more standardized.

• Emphasis moved away from methods development to the selection and evaluation of those commercial available methods that suit a particular laboratory best.

• Commercial kit methods are ready for implementation in the laboratory, often in a “closed” analytical system on a dedicated instrument.

• Furthermore, method evaluation is a costly exercise in terms of reagents, specimens, and labour and time of the professionals doing the evaluating.

• If not done properly it wastes laboratory revenue and time, if the method is accepted might lead to errors in medical decisions based on results the method generates on patient samples.

Generally what happens is that

laboratories are most concerned with

getting the methods up and running that

there is little time, or thought given, to

selection and evaluation studies.

• The most common scenario is the implementation of readily available commercial kit methods, often in a “closed” analytical system on a dedicated instrument.

• When a new clinical analyzer is included in the overall evaluation process, various instrumental parameters also require evaluation. Information on most of these parameters should be available from the instrument manufacturer, who should also be able to furnish information on what user studies to conduct in estimating these parameters for an individual analyzer.

Establish need

Method selection

Definition of quality goal

Method evaluation

Method development

Implementation

Routine analysis

Quality control practices

Submission of specimen

Result report

Reasons for Selecting a New Method

• improve accuracy and / or precision over existing methods

• to reduce reagent cost

• to reduce labour cost

• new analyzer or instrument

• to measure a new analyte

METHOD SELECTION

• Evaluation of need

• Application characteristics

• Method characteristics

• Analytical performance characteristics

Scopes of Method Evaluation Studies

• Evaluation is the determination of the analytical performance characteristics of a new method.

• Validation is confirmation by examination and provision of objective evidence that the particular requirements for a specific intended use can be consistently fulfilled.

• Verification is confirmation by examination of objective evidence that specified requirements have been fulfilled.

• Demonstration is a minimum evaluation for a laboratory to use to show that it is able to obtain expected results by following the manufacturer’s instructions. This is appropriate for test systems whose performance characteristics have been well studied and documented.

Method Evaluation and Validation

• Main purpose is error assessment.

• To demonstrate that prior to reporting patient test results, it can obtain the performance specifications for accuracy, precision, and reportable range of patient test results, comparable to those established by the manufacturer.

• The laboratory must also verify that the manufacturer’s reference range is appropriate for laboratory’s population.

An Overview of Qualitative Terms and Quantitative Measures Related to Method Performance

Qualitative Concept Quantitative Measure

TruenessCloseness of agreement of mean value with “true value”

BiasA measure of the systematic error

PrecisionRepeatability (within run)Intermediate precision (long term)Reproducibility (interlaboratory)

Imprecision (sd)A measure of the dispersion of random errors

AccuracyCloseness of agreement of a single measurement with “true value”

Error of measurementComprises both random and systematic influences

Total Analytical error TEA.

TEA = RE + SE

Constant and proportional errors.

Analytical Sensitivity

• Several terms describe the different aspects of the minimum analytical sensitivity of a method.

• Limit of absence (LoA) is the lowest concentration of analyte that the method can differentiate from zero.

• Limit of detection (LoD) is the minimum concentration of analyte whose presence can be quantitatively detected under defined conditions.

• Functional sensitivity or limit of quantification (LoQ) is the minimum concentration of analyte whose presence can be quantitatively measured reliably under defined conditions.

The concentration at which the CV = 20%.

Illustration of different aspects of analytical sensitivity or detection limits.

Random Analytical Error (RE)

Factors contributing to random analytical error (RE) are those that affect the reproducibility of measurement. These include:

• instability of the instrument,• variations in the temperature,

• variations in the reagents and calibrators (and calibration-curve stability),

• variability in handling techniques such as pipetting, mixing, and timing, and

• variability in operators.

These factors superimpose their effects on each other at different times. Some cause rapid fluctuations, and others occur over a longer time. Thus RE has different components of variation that are related to the actual laboratory setting.

Random Analytical Error (RE) Components

• Within-run component of variation (wr)

• Within-day, between-run variation (br)

• Between-day component of variation (bd)

Within-run component of variation (wr)

is caused by specific steps in the procedure:

1. sampling

2. pipetting precision

3. short-term variations in temperature and

4. stability of the instrument.

Within-day, between-run variation (br)

is caused by:

1. instability of calibration curve 2. differences in recalibration that occur

throughout the day, 3. longer term variations in the instrument,4. small changes in the condition of the

calibrator and reagents, 5. changes in the condition of the laboratory

during the day, and 6. fatigue of the laboratory staff.

Between-day component of variation (bd)

is caused by:

1. daily variations in the instrument, 2. changes in calibrators and reagents

(especially if new vials are opened each day), and

3. changes in staff from day to day. 4. Although not a true random component of

variation, any drift in the stability of the calibration curve over time greatly affects the bd as well.

Total Variance of a Method (t2)

t2 = wr

2 + br2 + bd

RE = t

Familiarization with the method

• It is essential that operators of the method become thoroughly familiar with the details of the method and instrument operation before the collection of any data that will be used to characterize the method’s performance.

• May include training by the manufacturer.

• It should be of sufficient duration that, at its completion, the operators can perform all aspects of the method or instrument operation comfortably.

Experiments for Estimating Analytical Errors

Outliers

The importance of daily examination and plotting of

comparison-of-method data cannot be over emphasized,

and the data must be carefully examined for outliers.

Definition of an outlier from a regression line:

| yi – Yi| > 4•sx,y

Outlier specimens must be detected immediately and

reanalyzed by both methods so that the data can correct

or confirm the outlier.

An example evaluation study: Cholesterol in serum.

Step 1: Analytical needs

Rapid procedure with a turnaround time of 30 min suitable for lipid clinic requirement. Short turnaround time means that patients do not have to come back for treatment based on lipid-profile results.

A sample volume of 200 µL.

Analytical range of 0 to 20 mmol/L.

High through-put.

Analytical goals

Analytical Goals

Analyte Acceptable performance

criteria (CLIA 88)

Decision level XC

Allowable error

(CLIA 88)

Maximum sd

(CLIA 88)(CV%)

Medically based maximum sd

(Fraser)(CV%)

Albumin ± 10% 35 g/L 3.5 0.9 (2.6%) 0.5 (1.43%)

Cholesterol ± 10% 5.2 mmol/L 0.52 0.13 (2.5%) 0.14 (2.7%)

Creatinine ± 15%88 µmol/L

265 µmol/L2640

7.0 (8%)9.7 (3.7%)

1.8 (2.0%)6.2 (2.3%)

Glucose ± 10%2.75 mmol/L6.9 mmol/L

11.0 mmol/L

0.33 0.691.10

0.08 (2.9%)0.18 (2.6%)

0.28 (2.55%)

0.06 (2.2%)0.15 (2.2%)0.24 (2.2%)

Hb A1C 7.0% 0.35% 0.14%

K ± 0.5 mmol/L3.0 mmol/L6.0 mmol/L

0.500.50

0.12 (4%)0.12 (2%)

0.07 (2.33%)0.14 (2.33%)

ALP ± 30% 150 U/L 45 11 (7.3%) 5.1 (3.4%)

CK ± 30% 200 U/L 60 15 (7.5%) 40 (20%)

Step 2: Quality goals

Medical decision (XC) levels of interest for cholesterol analysis

are taken as 4.5 mmol/L; levels below this indicate low risk of CVD, and 6.0 mmol/L; high risk, levels above this should be actively treated with cholesterol lowering drugs, respectively.

Precision goals for cholesterol are defined to be 0.12 mmol/L at 4.5 mmol/L and 0.15 mmol/L at 6.0 mmol/L (2.5%).

Total error goals (TEA) are 0.45 mmol/L at 4.5 mmol/L and 0.60

mmol/L at 6.0 mmol/L (10%).

Total Analytical error. (TEA)

TEA = 10%

RE = 2.5%

SE = 7.5%

For CholesterolTEA = RE + SE

10% = 2.5% + 7.5%

Step 3: Method selection

Existing laboratory analyzer Beckman-Coulter LX20 analyzer

Cholesterol kit specifically designed for this analyzer.

Senior operator who is familiar with this particular analyzer and is available to do the evaluation.

Step 4: Test material selection

QC-material Synchron 1: mean [cholesterol] 2.71 mmol/L,

Synchron 2: mean [cholesterol] 4.19 mmol/L, and

Synchron 3: mean [cholesterol] 5.82 mmol/L.

Pooled patient serum two levels A and B – matrix closest to real patient serum.

20 Patient serum samples to be run in parallel with existing laboratory method.

Step 5: Within-run imprecision

Performed by analyzing 6 aliquots of Synchron 1, 2, and 3 and Pool A and B within a run.

Results:

Mean (mmol/L) sd (mmol/L) RE %

Synchron 1 2.69 0.028 1.04

Synchron 2 4.21 0.042 1.00

Synchron 3 5.80 0.073 1.26

Pool A 4.89 0.057 1.17

Pool B 6.54 0.109 1.67

Step 5a: Within-run imprecision

Testing for acceptable performance

RE against Maximum allowable CV%

CLIA 88: 2.5% > synchron 1: 1.04% < Fraser: 2.7%

CLIA 88: 2.5% > synchron 2: 1.00% < Fraser: 2.7%

CLIA 88: 2.5% > synchron 3: 1. 26% < Fraser: 2.7%

CLIA 88: 2.5% > pool A: 1.17% < Fraser: 2.7%

CLIA 88: 2.5% > pool B: 1.67% < Fraser: 2.7%

proceed with step 5b

An example evaluation study: Cholesterol in serum

Step 5b: Within-run imprecisionTesting for acceptable performance

RE against TEA

If 4 x RE > TEA reject method

If 4 x RE < TEA proceed with step 6

With the TEA = 10% for cholesterol, the within-run imprecision of synchron 1, 2, 3 and pool A and B each passes the test.

Proceed to step 6.

Step 6: Between-run (day-to-day) precisionPerformed by analyzing aliquots of pool A and B for 20 days

Results

Mean (mmol/L)

sd (mmol/L)

RE % 4 x RE%

Pool A 4.93 0.098 1.99 < 2.5 7.96 < 10

Pool B 6.49 0.135 2.08 < 2.5 8.32 < 10

Step 7: SD has confidence intervals Factors for computing one-sided confidence intervals

for standard deviation.

Degrees of freedom (N – 1)

A0.05 A0.95

1 0.5103 15.947

5 0.6721 2.089

10 0.7391 1.593

15 0.7747 1.437

20 0.7979 1.358

Step 7: Confidence-interval estimate of random error REU

and REL ; N = 20

Mean (mmol/L)

sd (mmol/L)

sd x A.95

sd x A.05

REU= 4 x sdU

REL= 4 x sdL

Pool A

4.93 0.098 0.133 0.078 0.532 0.312

Pool B

6.49 0.135 0.183 0.108 0.732 0.432

REU pool A > 0.493 and REU pool B > 0.649

Step 8: Validation of linearity or reportable range

Obtained pool C by combining all serum samples with [cholesterol] > 15 mmol/L.

Prepared the following samples:

Sample 1 Special prepared with [cholesterol] 0

Sample 2 3 parts sample 1 + 1 part pool A

Sample 3 Pool A

Sample 4 Pool B

Sample 5 2 parts sample 1 + 2 parts pool C

Sample 6 Pool C

Pools analyzed by Kendal-Abell method (reference method)

[cholesterol]

mmol/L

Pool A 4.88

Pool B 6.52

Pool C 16.7

Samples 1, 2, 3, 4, 5, and 6 were analyzed in triplicate in a single run in random order.

Theoretical (X) Mean (Y) Bias (%)

Sample 1 0 0.035 +0.035 (N/A)

Sample 2 1.22 1.967 -0.024 (-2.0)

Sample 3 4.88 4.846 -0.034 (-0.7)

Sample 4 6.52 6.47 -0.05 (-0.77)

Sample 5 8.09 7.99 -0.1 (-1.24)

Sample 6 16.7 16.35 -0.35 (-2.1)

Reportable Range of Serum-[cholesterol]

Y = 0.9565 X + 0.3125R = 0.9989

0 5 10 15 20

Theoretical (X) [cholesterol] mmol/L

Step 9: Estimation of SE from the linearity study which is a comparison of the method against reference method. The following statistics were obtained by linear regression analysis:

Y = 0.956 X + 0.313 mmol/L SY,X = 0.294

Mean X = 6.235 Mean Y = 6.276

Bias = | mean Y – mean X| = 0.041 mmol/L

This is the estimate of SE at the mean of the data.

Step 9: Point estimate of SE at medical decision levels (XC).

For XC = 4.5 mmol/L, YC = 4.615 mmol/L

SE1 = | YC – XC | = 0.115 mmol/L

Because SE1 < TEA = 0.45 mmol/L,

SE1 is acceptable.

For XC = 6.0 mmol/L, YC = 6.049 mmol/L

SE2 = | YC – XC | = 0.049 mmol/L

Because SE2 < TEA = 0.6 mmol/L,

SE2 is acceptable

Step 10: Point estimate of TE

Criteria for acceptable performance:

TEA > TE = 3 x sd + | YC – XC |

For XC1 = 4.5 mmol/L, YC1 = 4.615 mmol/L and sd = 0.098

TE1 = 3 x 0.098 + 0.115 = 0.409 mmol/L < 0.45 mmol/L

Performance acceptable

For XC2 = 6.0 mmol/L, YC2 = 6.049 mmol/L and sd = 0.135

TE2 = 3 x 0.135 + 0.049 = 0.454 mmol/L < 0.6 mmol/L

Performance acceptable

Step 11: Medical decision chart

XC1 XC2

Level mmol/L 4.5 6.0

TEA mmol/L 0.45 0.60

SE mmol/L 0.115 0.049

RE mmol/L 0.098 0.135

RE as % of TEA 21.8 22.5

SE as % of TEA 25.6 8.2

Medical decision Chart

Use of method decision chart.A method with:

1. Unacceptable performance does not meet the requirement for quality, even when the method is working properly. Not acceptable for routine operation.

2. Marginal performance provides the desired quality when everything is working correctly. But, difficult to manage in routine operation, requires total QC strategy, well-trained operators, aggressive preventive maintenance, etc.

3. Good performance meets requirement for quality and can be well-managed in routine service. Requires multirule procedure with 4-6 control measurements per run.

4. Six sigma or excellent performance is clearly acceptable and easy to manage in routine service and can be controlled

A comparison of methods experiment is performed to estimate

inaccuracy or systematic error.

This performed by analyzing patient samples by the new method (test method) and a comparative method, then estimate the systematic errors (SE) on the basis of differences observed between the methods.

The systematic differences at the critical medical decision concentrations are the errors of interest.

When possible, a “reference method” should be chosen for the comparative method.

Any differences between a test method and a reference method are assigned to the test method.

Cholesterol Methods Comparison Plot. N = 20

y = 1.0032x - 0.0233R = 0.999

0 2 4 6 8 10 12 14

Comparative Method mmol/L

Bland - Altman Difference Plot

0 5 10 15

[Cholesterol] mmol/L

Interpretation of comparison of methods study.

The differences are relatively small, not more than 2.2% across the concentration range of 2.0 – 15.0 mmol/L.

The two methods have the same relative accuracy.

The can be substituted for the other.

Recommended Minimum Studies for comparison of methods experiment.

1. Select 40 patient specimens to cover the full working range of the method.

2. Analyze 8 specimens a day within 2 hours by the test and comparative methods.

3. Graph results immediately on a difference plot and inspect for discrepancies.

4. Reanalyze specimens that give discrepant results.

5. Continue the experiment for 5 days if no discrepant results are observed.

Recommended Minimum Studies for comparison of methods experiment.

6. Continue for another 5 days if discrepancies are observed during the first 5 days.

7. Prepare a comparison plot of all the data to assess the range, outliers, and linearity.

8. Calculate the correlation coefficient and if 0.99 or greater, calculate simple linear regression statistics and estimate the systematic error at medical decision concentrations.

9. Use the medical decision chart to combine the estimates of SE and RE and make judgment on the total error observed for the method.

NATURE, 18 September 2003

• Monkeys reject unequal pay.

- Sarah Brosnan and Frans de Waal

• Working for peanuts.

- Paul Smaglik

S TEP BY S TEP A PPROACH TO E VALUATION AND C OMPARISON OF A NALYTICAL M ETHODS

Documents