+ All Categories
Home > Documents > Criteria for monitoring tests were described: validity, responsiveness, detectability of long-term...

Criteria for monitoring tests were described: validity, responsiveness, detectability of long-term...

Date post: 30-Dec-2016
Category:
Upload: les
View: 214 times
Download: 0 times
Share this document with a friend
8
ORIGINAL ARTICLES Criteria for monitoring tests were described: validity, responsiveness, detectability of long-term change, and practicality Katy J.L. Bell a,b, * , Paul P. Glasziou a , Andrew Hayen c , Les Irwig b a Centre for Research in Evidence-Based Practice (CREBP), Bond University, Gold Coast, QLD 4229, Australia b Screening and Diagnostic Test Evaluation Program (STEP), School of Public Health, Building A27, University of Sydney, Sydney, NSW 2006, Australia c School of Public Health and Community Medicine, University of New South Wales, Sydney, NSW 2052, Australia Accepted 19 July 2013; Published online 1 November 2013 Abstract Objectives: To describe how evidence from trials and cohort studies may be used to guide choice of test for monitoring patients with chronic disease. Study Design and Setting: Exploration of potential criteria for choosing the best monitoring test. Criteria are defined and options for assessment measures for test performance on each criterion discussed. Results: Monitoring in clinical practice occurs in three main phases: before treatment, response to treatment, and long-term monitoring. Four important criteria may be used to choose the best test for monitoring a patient in each of these phases. Clinical validity describes the ability of the test to predict the clinically relevant outcome that we are trying to control or prevent. Responsiveness describes how much the test changes in response to an intervention relative to background random variation. Detectability of long-term change describes the size of changes in the test over the long term relative to background random variation. Practicality describes the ease of use, invasiveness, and cost of the test. Test performance generally requires longitudinal data from trial and/or cohort studies using statistical methods such as those discussed. Conclusion: Four specific criteria can help clinicians inform evidence-based decisions on which monitoring test to use. Ó 2014 Elsevier Inc. All rights reserved. Keywords: Chronic disease; Disease management; Diagnostic tests; Biological markers; Reproducibility of results; Statistical models 1. Introduction Monitoring is an important element of a patient’s long- term care. Consider a 68-year-old patient with early stage chronic kidney disease who has an elevated clinic blood pressure (BP) measurement. A good clinician will wish to do a series of BP measurements to establish a baseline, as- sess the long-term BP, and assess the need for intervention. If BP treatment is required, then the clinician will wish to monitor the response to treatment to check if it has been ad- equate and monitor for adverse effects such as hyperkalemia. The patient will need to be monitored long term for not only BP changes but also further decline in renal function. As this scenario illustrates, clinicians monitor their pa- tients for many reasons: to assess the progress of disease, the need to start or change treatment, the adequacy of treat- ment, and the development of complications [1]. Monitor- ing may be summarized as consisting of three important phases: (1) pretreatment monitoring, (2) initial response monitoring, and (3) on-treatment long-term monitoring. These are illustrated in Fig. 1, which shows a theoretical trajectory of a monitoring test over time. Pretreatment mon- itoring is done to screen individuals on the need to start treatment. As these individuals are usually at lower risk of adverse clinical outcomes from disease, monitoring in- tervals are often longer during this phase than subsequent ones. Initial response monitoring is done over a relatively short time to assess whether the individual’s response to treatment is the same as expected from mean response ob- served in trials. On-treatment long-term monitoring is done after initial treatment is stabilized to assess whether treat- ment remains adequate over the long term. Further clinical intervention may be indicated, for example, because of Conflict of interest: The authors have no potential conflicts of interest to declare. Funding sources: The authors have received funding from the Austra- lian National Health and Medical Research Council (Program Grant No. 633003, Early Career Fellowship No. APP1013390). The funders had no role in design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript. * Corresponding author. Tel.: þ61-293515994; fax: þ61-293515049. E-mail address: [email protected] (K.J.L. Bell). 0895-4356/$ - see front matter Ó 2014 Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.jclinepi.2013.07.015 Journal of Clinical Epidemiology 67 (2014) 152e159
Transcript
  • ORIGINAL A

    Criteria for monitoring tests were dedetectability of long-term c

    a,b, ziouEBP)

    blic H

    nivers

    lishe

    tudie

    r cho

    pressure (BP) measurement. A good clinician will wish to

    tients for many reasons: to assess the progress of disease,

    ment, and the development of complications [1]. Monitor-

    treatment is the same as expected from mean response ob-served in trials. On-treatment long-term monitoring is doneafter initial treatment is stabilized to assess whether treat-ment remains adequate over the long term. Further clinical

    633003, Early Career Fellowship No. APP1013390). The funders had no

    role in design and conduct of the study; collection, management, analysis,

    and interpretation of the data; and preparation, review, or approval of the

    manuscript.

    * Corresponding author. Tel.: 61-293515994; fax: 61-293515049.

    Journal of Clinical Epidemioldo a series of BP measurements to establish a baseline, as-sess the long-term BP, and assess the need for intervention.If BP treatment is required, then the clinician will wish tomonitor the response to treatment to check if it has been ad-equate andmonitor for adverse effects such as hyperkalemia.

    ing may be summarized as consisting of three importantphases: (1) pretreatment monitoring, (2) initial responsemonitoring, and (3) on-treatment long-term monitoring.These are illustrated in Fig. 1, which shows a theoreticaltrajectory of a monitoring test over time. Pretreatment mon-itoring is done to screen individuals on the need to starttreatment. As these individuals are usually at lower riskof adverse clinical outcomes from disease, monitoring in-tervals are often longer during this phase than subsequentones. Initial response monitoring is done over a relativelyshort time to assess whether the individuals response to

    Conflict of interest: The authors have no potential conflicts of interest

    to declare.

    Funding sources: The authors have received funding from the Austra-

    lian National Health and Medical Research Council (Program Grant No.chronic kidney disease who has an elevated clinic bloodthe need to start or change treatment, the adequacy of treat-term care. Consider a 68-year-old patient with early stageassessment measures for test performance on each criterion discussed.Results: Monitoring in clinical practice occurs in three main phases: before treatment, response to treatment, and long-term monitoring.

    Four important criteriamay be used to choose the best test formonitoring a patient in each of these phases. Clinical validity describes the abilityof the test to predict the clinically relevant outcome that we are trying to control or prevent. Responsiveness describes how much the testchanges in response to an intervention relative to background randomvariation. Detectability of long-term change describes the size of changesin the test over the long term relative to background random variation. Practicality describes the ease of use, invasiveness, and cost of the test.Test performance generally requires longitudinal data from trial and/or cohort studies using statistical methods such as those discussed.

    Conclusion: Four specific criteria can help clinicians inform evidence-based decisions on which monitoring test to use. 2014Elsevier Inc. All rights reserved.

    Keywords: Chronic disease; Disease management; Diagnostic tests; Biological markers; Reproducibility of results; Statistical models

    1. Introduction

    Monitoring is an important element of a patients long-

    The patient will need to be monitored long term for not onlyBP changes but also further decline in renal function.

    As this scenario illustrates, clinicians monitor their pa-Katy J.L. Bell *, Paul P. GlasaCentre for Research in Evidence-Based Practice (CR

    bScreening and Diagnostic Test Evaluation Program (STEP), School of PucSchool of Public Health and Community Medicine, U

    Accepted 19 July 2013; Pub

    Abstract

    Objectives: To describe how evidence from trials and cohort schronic disease.

    Study Design and Setting: Exploration of potential criteria foE-mail address: [email protected] (K.J.L. Bell).

    0895-4356/$ - see front matter 2014 Elsevier Inc. All rights reserved.http://dx.doi.org/10.1016/j.jclinepi.2013.07.015RTICLES

    scribed: validity, responsiveness,hange, and practicalitya, Andrew Hayenc, Les Irwigb

    , Bond University, Gold Coast, QLD 4229, Australia

    ealth, Building A27, University of Sydney, Sydney, NSW 2006, Australia

    ity of New South Wales, Sydney, NSW 2052, Australia

    d online 1 November 2013

    s may be used to guide choice of test for monitoring patients with

    osing the best monitoring test. Criteria are defined and options for

    ogy 67 (2014) 152e159intervention may be indicated, for example, because of

  • Key findings Monitoring in clinical practice occurs in three main

    Methods have been described to help choose which tests

    portant, poses few theoretical questions for evaluation andis included for completeness but will not be considered fur-

    153K.J.L. Bell et al. / Journal of Clinical Epidemiology 67 (2014) 152e159or markers are best for diagnosis, predicting risk, and treat-ment decisions [2e4], but there are no accepted approachesfor choosing which tests should be used for clinical moni-toring. However, just as with diagnosis, there are oftena number of different tests available for the clinician tochoose from. For example, we might consider the clinicalpopulation of patients at high risk for cardiovascular dis-ease (CVD). Should we monitor total cholesterol or low-density lipoprotein (LDL) cholesterol when deciding onthe need for/adequacy of statin treatment? Or is there an-other lipid test that is better to monitor? Should we measureBP in the clinic, or via a 24-hour ambulatory device orteach patients how to do it themselves at home? Similarly,for patients with osteoporosis, should we monitor boneturnover markersdand if so, which one?dor bone density(which can take several years to change)? If so, how often?After considering key criteria that may be used to helpchoose the best monitoring test, we will revisit some ofthese clinical situations at the end of the article.

    2. Criteria for good monitoring measurements

    To help choose between monitoring tests, a number ofcriteria may be used [5]. These include (1) clinical validity,phases: before treatment, response to treatment,and long-term monitoring. Four important criteriamay be used to choose the best test for monitoringa patient in each of these phases: clinical validity,responsiveness, long-term change, and practicality.The performance of tests on these criteria may beinformed from trial and/or cohort data.

    What this adds to what was known? Evidence can be used to compare different testsperformance across these key criteria to judgewhich test is best for monitoring patients at eachphase of clinical management.

    What is the implication and what should changenow? Using methods we describe in this article, clini-cians may make evidence-based decisions onwhich monitoring test to use.

    problems with adherence, lifestyle changes that modify theeffect of treatment, or natural progression of the disease andthe development of complications.What is new?ther here.These criteria may be considered within the context of

    the pathway that leads from the prescription of treatmentto the outcome that the treatment aims to produce, illus-trated in Fig. 2. The top section of Fig. 2 shows the naturalhistory of disease from risk factors to the outcome via inter-mediate pathology, both early and late. Below this is a path-way that starts with prescription of a drug (or another typeof intervention) that aims to alter that natural history of dis-ease and decrease the patients risk of the clinical outcome.The pathway suggests that choices for monitoring will in-clude the treatment (compliance or blood levels), a proximalconsequence of treatment (change in BP, cholesterol, orother marker), or the disease itself (symptoms or signs).

    This treatmenterisk factoredisease pathway may helpus understand the choice of test in the different monitoringphases: before treatment, initial response, and long term.We often may choose the same test for each of the threemonitoring phases for ease of comparison over time. How-ever, there is often a trade-off between criteria, for example,as shown in Fig. 2, the most valid test is often not the mostresponsive one. This means that in some cases, a differenttest may be chosen for different monitoring phases: for ex-ample, a more responsive test for initial response and a morevalid test for pretreatment and long term.

    3. The three measurement criteria

    We now consider each of the specific criteria, as appliedto monitoring tests with continuous outcomes, and then ap-ply these to analyze the choice among several options formonitoring of lipid- and BP-lowering treatments.

    3.1. Clinical validity: how well do the tests predictpatient outcomes?

    A test has clinical validity if it predicts the clinical out-come of interest. It must be on the risk factor/ outcomepathway, either directly or as a proxy for another marker onthe pathway that is less easily measured. As Fig. 2 shows,changes in response to an intervention relative to back-ground random variation. Detectability of long-term changedescribes the size of changes in the test over the long termrelative to background random variation. Practicality de-scribes the ease of the test in terms of invasiveness, cost,and straightforwardness. This fourth criterion, although im-(2) responsiveness, (3) detectability of long-term change,and (4) practicality. Clinical validity describes the abilityof the test to predict the clinically relevant outcome thatwe are trying to prevent. For example, with both cholesteroland BP tests, the clinical outcomes we are interested in arecardiovascular events such as myocardial infarction andstroke. Responsiveness describes how much the test

  • the later the test is on the pathway (closer to outcome) themore predictive it is likely to be, with the most valid testbeing a measure of the outcome itself. For example, withour aforementioned 68-year-old patient, following treat-

    Fig. 1. Phases of monitoring. Adapted from Ref. [1].

    154 K.J.L. Bell et al. / Journal of Clinical Epidemiology 67 (2014) 152e159ment, we might monitor compliance by pill counts, BP, ath-erosclerotic changes (such as renal stenosis), signs ofkidney damage (such as proteinuria), or renal function.The measures along this path have progressively greaterimportance but take progressively longer for changes to be-come apparent.

    Hazard ratios (HRs) are the most obvious statisticalmeasure for capturing how well tests predict clinical out-comes; however, in their natural form, they have limited ca-pacity for comparison across tests as the HR depends onboth the units of measurement (which are test specific)and distributional range of test results. To compare the pre-dictiveness of different tests, a standardized measure maybe used, such as the HR over the interquartile range (HR/IQR) or the HR per standard deviation (SD) increase inthe test (HR/SD). Using the IQR to standardize allows lessreliance on the distributional assumptions of normality thatapply when using SD and thus may be preferred in manyFig. 2. Flow diagram of disease and the effects of treatment. Adaptedfrom Ref. [2]. CHD, coronary heart disease.instances. However, neither measure is robust against veryskewed data, where a transformation may be required firstbefore calculating predictiveness measures. Alternativemethods of standardization include calculating the HR forthe top quartile compared with the bottom quartile [6] (ordecile, quintile, or similar). Although this usually gives re-sults with a similar interpretation to standardizing by theIQR (or interdecile range etc), less information is used. In-stead of estimating risk from the complete data and thencalculating the ratio from two points (HR/IQR), the averagerisk from the top quartile is compared with that from thebottom quartile (HR for top vs. bottom quartile). HR com-paring top with bottom quartile is also likely to be moresensitive to distributional assumptions than HR/IQR.

    To calculate our preferred standardized measure of pre-dictiveness, we first need to choose the patient populationthat we will use for the data. This may include patientson no treatment (or placebo), patients on active treatment,or both patients on and off treatment.

    In some cases, there is evidence that the relationshipsbetween monitoring tests and clinical outcome are similarwhether patients are on treatment or not. For example, inthe Long-Term Intervention with Pravastatin in IschemicDisease (LIPID) trial, total cholesterol and LDL cholesterolhad a similar ability to predict CVD whether the patientwas on pravastatin treatment or placebo after adjustmentfor other nonlipid risk factors and measurement error [7].In this case, the same data may be used to compare validityfor pretreatment and on-treatment monitoring, and this maycome from individuals on placebo/no treatment or individ-uals on treatment. If data are from a randomized controlledtrial (RCT), it may be better to include individuals fromboth placebo and treatment groups. This helps ensure a widerange of values for the monitoring test and increase the sta-tistical precision because of the larger data set.

    In other cases, the predictiveness of the monitoring testmay differ depending on whether the patient is on treat-ment. For example, in the Framingham study, the relation-ship between BP and CVD appeared stronger if the patientwas on BP-lowering treatment than if they were not [8]. Inthis case, it may be best to use data from patients not ontreatment to decide on the clinical validity of a test for pre-treatment monitoring and data from patients on treatment todecide on the clinical validity of a test for on-treatmentmonitoring. A caveat to this is that in the first couple ofyears of on-treatment monitoring, data from off-treatmentpopulations may be more informative. This is because inmany cases of chronic disease, the effect of the treatmenton the monitoring test (e.g., drop in BP) is relatively fastbut the effect on the clinical outcome (e.g., CVD) may takea few years to become apparent, and pretreatment levels ofthe monitoring test may be more predictive during thistime. This lag time phenomenon is evident in trial datafor a number of chronic diseases, including treatments tolower lipids [9] and BP [10]. In each case, the differencein monitoring test levels between treatment and placebo

  • Not all responsive tests show rapid changes in response to

    155K.J.L. Bell et al. / Journal of Clinical Epidemiology 67 (2014) 152e159the intervention; in fact, some take months/years to change,for example, HbA1c, left ventricular function, and BMD.Because changes in these tests reflect average treatmenteffects over a longer period of time, these tests may bepreferred for judging effects over the mediumelong term.

    The concepts of signal and noise are relevant toboth response monitoring and long-term monitoring, whichis maximal after a few months, but the difference in CVDevents between the two treatment groups continues to di-verge for a number of years.

    Other methods of assessing validity such as proportionof treatment explained (PTE) also assess the responsivenessof the monitoring test and are discussed in the following.

    3.2. Responsiveness: how clearly and rapidly do thetests change with treatment change?

    The responsiveness criterion is especially important forthe initial response phase of monitoring soon after a newtreatment has been started. Although less obvious, this crite-rion is also important for both pretreatment and long-termmonitoring. For all monitoring phases, we ideally want thetest to be responsive to interventions that alter the patientsrisk of the clinical outcome. Such interventions may belifestyle changes in the pretreatment screening phase, phar-macologic treatments in the initial response phase, or mea-sures to improve adherence in the long-term monitoringphase.

    Responsiveness describes the amount the test changesfrom the expected trajectory in response to an intervention.It is dependent on the intervention and where the test isplaced in the risk factor / outcome pathway. In general,responsive tests tend to be reversible and are placed earlierin the risk factor / outcome pathway as illustrated inFig. 2. For example, lipids and BP normalize in responseto lipid- or BP-lowering therapy. However, this is not al-ways the case; sometimes, responsive tests may be less re-versible or even irreversible and are placed later in the riskfactor/ outcome pathway. For example, we may observea lesser decline in bone mineral density (BMD) for a post-menopausal woman started on bisphosphonate therapy ora slowing of visual field loss for glaucoma patient startedon therapy to lower intraocular pressure.

    Related to the concept of responsiveness is the speed ofchange in response to an intervention. We often want teststhat show a rapid response to an intervention. This is obvi-ously a necessity when the change in outcome in responseto the intervention is also rapid, for example, risk of hypo-glycemia for glucose-lowering drugs (monitor glucose),bleeding risk for patients on warfarin (monitor internationalnormalized ratio). In other situations in which the change inoutcome is much slower, we still often want tests that showa rapid response so that we may quickly judge whethertreatment is working as expected, for example, risk of a car-diovascular event (monitor cholesterol and BP).follow on from this section. For response monitoring, signalincludes both mean change and between-person variation inresponse. If the between-person variation component of thesignal is small, then we may estimate the signal for an in-dividual using the population mean change without needingto monitor (see Refs. [11e13] for examples). If thebetween-person variation is not small, we are unable to es-timate signal on the basis of population mean change alone.We will also need to estimate the individuals true deviationfrom the mean change, and this is best done where there isa favorable signal-to-noise ratio.

    Noise is a result of background random variation withinindividuals because of measurement error and biologicalfluctuations. The amount of noise in monitoring tests maynot be appreciated by clinicians, and variations due to noisemay be wrongly attributed to real change. For example, theSD for random variation in one measurement of systolicblood is approximately 10 mm Hg [12e14]. This meansthat a change of 20 mm Hg between two measurements willoften be because of background noise alone, without anyreal underlying change. We may estimate noise using sim-ple methods such as halving the observed variance of thedifference between two measurements made within a shortinterval [14] (or for the noise of change, we may simply usethe observed variance of difference between the two mea-surements). More sophisticated methods include vario-grams [14e16] and using the residual estimate for mixedmodels [11e13,16]. A simple method of decreasing noiseis to take the mean of multiple measurements, for example,for the BP of our 68-year-old patient, it would be commonto use the average of several sets of home measurements.

    We may estimate response-monitoring signal using datafrom placebo-controlled randomized trials (these are usu-ally drug trials but theoretically could include behavioralchange or other nondrug interventions). Just as the truemean response is commonly estimated by comparing thedifference in mean changes seen in the active and placeboarms before and after treatment, the true between-personvariation in response may be estimated by comparing thedifference in variation in changes. Variation in changein the active group results from both true between-personvariation and noise, whereas variation in the placebo groupresults from just noise; thus, we may estimate true between-person variation from the difference in variation betweenthe two groups.

    If only summary data are available, it may be possible toestimate response-monitoring signal using a direct method[17]. For example, Fig. 3 shows the distributions of changein cholesterol concentration after 6 months found in theLIPID trial [9]. The placebo group had a mean increasein total cholesterol of 0.02 mmol/L (variance of change0.4225 mmol2/L2), and the pravastatin group had a meandecrease in total cholesterol of 1.16 mmol/L (variance ofchange 0.5625 mmol2/L2). We estimated the mean responseto treatment as a decrease in cholesterol concentration of1.18 mmol/L (0.02 to 1.16 mmol/L) and the between-

  • that if treatment had a mean systolic BP-lowering effectof 6.5 mm Hg, it would be necessary to averageO90 mea-surement occasions both before and after starting treatmentto be 95% certain that an apparent decrease ofO4 mm Hg

    after starting an ACE inhibitor (based on meta-analytic estimates fromseven RCTsdsee Ref. [10]). The pair of normal distribution curveswas constructed to have the same height, and the area under eachcurve is proportional to the SD of change in BP. Only 3% of the appar-ent variance in systolic BP change is because of true between-personvariance in treatment effects [proportion is calculated from between-person variance in treatment effects/(between-person variance intreatment effects within-person variation in BP change)]. Adaptedfrom Ref. [10].

    156 K.J.L. Bell et al. / Journal of Clinical Epidemiology 67 (2014) 152e159person variance of response as 0.14 mmol2/L2

    (0.5625e0.4225 mmol/L). Noise may be estimated fromthe placebo group: SD of change 0.65 mmol/L (varianceof change 0.4225 mmol2/L2). Although this method out-lines the conceptual framework for estimating response-monitoring signal and noise, in practice, it relies on thedistributional assumptions of normality and constant vari-ance. Often, it is difficult to test these assumptions directly,and other data may suggest that they are unlikely to hold(e.g., where measurement error is known to be proportionalto level). For these reasons, the modeling approachdescribed next is preferred wherever possible (with a trans-formation applied to data if necessary).

    If individual patient data are available, mixed modelsmay be used to estimate mean and between-person variationin response, as well as noise (e.g., see Refs. [11e13,18]). For

    Fig. 3. Average and variation in change in cholesterol 6 months afterstarting 40-mg pravastatin. Adapted from Ref. [14].example, Fig. 4 shows distribution of apparent and truechanges in BP after starting an angiotensin-converting en-zyme (ACE) inhibitor, estimated from a meta-analysis ofmixed models from seven placebo-controlled RCTs ofACE inhibitors [13]. The curves give an indication of thesignal-to-noise ratio, and in this case, the ratio is ratherlow (a fact that may not be widely appreciated by clinicianswho use clinic BP to monitor response to BP-lowering treat-ments). The proportion of the observed variance of change insystolic BP after starting treatment that was actually becauseof true between-person variance in the response to ACEinhibitorsdpart of the signaldwas estimated at only 3%.In contrast, the proportion of observed change that was be-cause of measurement variability and random day-to-dayfluctuationsdthe noisedwas estimated at 97%.

    These percentages do not account for mean change insystolic BP after starting treatment, which is also a compo-nent of the signal; even so, the signal-to-noise ratio is verypoor and monitoring systolic BP is unlikely to be helpful injudging response to treatment. For example, we estimatedFig. 4. Distribution of apparent and true changes in systolic BP (SBP)in systolic BP indicated a true decrease ofO4 mm Hg (i.e.,to be certain that treatment is having a substantial effect).

    When there are insufficient data (neither individual pa-tient nor summary data available; data analyzed in 2011)to estimate true variation in response, the largest probablevariation in true response may be estimated using the for-mula: 1 SD of between-person variation in treatmenteffects half the mean treatment effect (see Ref. [19]) ormean proportional change divided by coefficient of varia-tion (Glasziou et al., unpublished data). For example, themean treatment effect of tumor necrosis factor inhibitorson tender joint count among patients with rheumatoid ar-thritis has been reported as a decrease of 10.5 units[20,21]. Assuming that the minimum treatment effect isgreater than zero, this means that the largest probable SDof variation in response to treatment is 5.3 units (i.e., halfthe mean effect). The SD for within-person variation forchange in tender joint count (i.e., noise) was estimated at5.7 [22]. From these estimates, we concluded that usingtender joint count we are unlikely to be able to disaggregate

  • small, rendering monitoring unnecessary, in long-term mon-

    normal. For example, it is well accepted that the noisecomponent of variance may be level dependent [25]. Othercomponents of total variance (such as between-person var-iation in baseline levels and trajectories over time) may alsobe level dependent. This will often result in the residualsand/or random-effects distributions being non-normallydistributed. Often a transformation such as natural logmay solve both problems and should be done before calcu-lation of the signal-to-noise ratio (for examples, see Refs.[14,18]).

    4. Some clinical examples

    We applied the aforementioned framework to the clini-cal situations of monitoring patients at high risk for CVD

    LDL cholesterol 6a 2 6

    157K.J.L. Bell et al. / Journal of Clinical Epidemiology 67 (2014) 152e159itoring, there is usually substantial between-person variationin the long-term trends (see Refs, [14,16,24] for examples).This means that we are unable to estimate signal on the basisof population mean change alone and need to also estimatethe individuals true deviation from the mean change underconditions of a favorable signal-to-noise ratio.

    Analogous methods may be used to those for estimatingsignal and noise as for the case of response monitoring.Between-person variation in long-term trends over timemay be estimated by simple methods (e.g., subtractingtwice the short-term random variation from the total ob-served variation of the difference from the baseline to eachsubsequent time point [14,16]) or through modeling [mixedmodels, which may include random slopes (trends overtime) [11,14,16]].

    Other issues to consider is that variance may be depen-dent on level, and the distribution curves may be non-true treatment effects from the background day-to-daywithin-person variation.

    Responsiveness is also indirectly assessed using the PTE[23] and related methods. These statistical methods weredeveloped to assess the validity of a surrogate outcome inthe setting of an RCT of a treatment. PTE aims to estimatehow much of the treatment effect on clinical outcome is ex-plained by treatment effect on the surrogate; this includesboth how much the surrogate predicts the clinical outcomeand how responsive the surrogate is to treatment. Estimatesfrom the LIPID trial found that most, or all, of the effectpravastatin had on reducing cardiovascular events couldbe attributable to the combined effects on total cholesteroland high-density lipoprotein (HDL). The estimated PTE foreach of the single lipid parameters ranged from 8% (triglyc-erides) to 87% (apolipoprotein B) [7].

    3.3. Detection of long-term change: how well do thetests distinguish long-term true change from randomvariation

    Long-term change describes the ability of the test to dis-cern true long-term changes in the patients condition (sig-nal) from short-term measurement variability (noise). Thiscriterion is relevant to pretreatment screening and long termon treatment monitoring. Although not directly relevant toinitial response monitoring (which is done over the shortterm), we might still consider it in this setting, if for consis-tency we want to choose one test to use for all three mon-itoring phases.

    The signal for long-term change monitoring is the truelong-term trend in level within an individual over time. Thisis a combination of the population mean change and thebetween-person variability or individual deviations fromthe mean change. Noise is the same as for responsemonitoringdshort-term random variation in level withinan individual. Unlike response monitoring in which thebetween-person variation component of the signal is oftenHDL cholesterol 54a 6 3Total-to-HDL cholesterol ratio 3a 4 1LDL-to-HDL cholesterol ratio 2a 1 2Non-HDL cholesterol 54a 3 4Estimated absolutecardiovascular risk

    1 ? ?

    Abbreviations: LDL, low-density lipoprotein; HDL, high-density li-poprotein; HR, hazard ratio; IQR, interquartile range; LIPID, Long-Term Intervention with Pravastatin in Ischemic Disease.5 indicates that two tests tied on rankings.a Based on HR/IQR for change in lipid for predicting coronary

    heart disease in the LIPID trial [9].(lipids and BP). For the purposes of illustration, we haveused ranking to compare the tests for each of the criteria.An alternative approach would be to present the actualquantitative measures for each criterion. Although datafrom a single study are used as illustration for both lipidsand blood pressure, ideally we would use data from severalsources or from a systematic review.

    The results for lipids are shown in Table 1. Overall, thebest lipid marker for monitoring appears to be LDL-to-HDL ratio, although other markers that combined two testsalso ranked highly: total-to-HDL cholesterol ratio and non-HDL cholesterol.

    The results for BP measurements are shown in Table 2.All three tests (clinic, ambulatory, and home BP) aim tomeasure the same risk factor: the individuals BP. In thiscase, the differences in ranking for validity are becauseout-of-office measurement of BP offers a more accurate re-flection of the individuals usual BP. This is because of botha more natural environment where measurement takesplace (avoidance of white coat hypertension) and reduc-tion in noise (measurement variability) through averaginga larger number of measurements. In addition, the measuresof variability possible with out-of-office measurement havebeen found to be predictive of CVD independent of meanBP level; this also adds to the superior predictiveness of

    Table 1. Ranking of lipid-monitoring tests

    Lipid Validity ResponsivenessLong-termchange

    Total cholesterol 7a 5 5

  • 158 K.J.L. Bell et al. / Journal of Clinical Epidemiology 67 (2014) 152e159home and ambulatory BP (which perform similarly on thecriteria of validity, responsiveness, and long-term change),the clinician and patient may opt for home BP measure-ment because of its ease of application.

    The criteria that we describe evaluate monitoring fortests, assessed on a continuum. We do this for several rea-sons: many common monitoring tests are reported as con-tinuous; separating signal from noise is most easilydone using continuous measures; and maintaining the con-tinuity of measurements will increase statistical power [27].However, sometimes, changes observed in continuous testsduring monitoring may then be dichotomized or split intomore than two categories, by comparing the level to deci-these types of BP measurement. Both types of out-of-officeBP also outrank clinic BP for both responsiveness and long-term change primarily through the reduction in noise that ispossible with these types of BP measurement.

    Cardiovascular risk estimation (using the Framingham[8] or similar equation) outperforms both lipids and BP interms of validity. Currently, this is only monitored in thepretreatment population to decide who should be startedon lipid- and BP-lowering treatments [26], but theoretically,it could be used for monitoring patients on treatment also.How this test performs on measures of responsiveness andlong-term change is currently unknown.

    5. Discussion and conclusions

    The choice of the best test to monitor a patient may bemade by applying the methods described previously toavailable evidence and then ranking tests in terms of valid-ity, responsiveness, and long-term change. Although notdiscussed in this article, the fourth criterion of practicalityis also important. For example, when choosing between

    Table 2. Ranking of BP-monitoring tests

    BP test Validity ResponsivenessLong-termchange

    Clinic BP 4 3 3Ambulatory BP 52a 51 51Home BP 52a 51 51Estimated absolutecardiovascular risk

    1 ? ?

    Abbreviation: BP, blood pressure.a Based on risk of cardiovascular events in the PAMELA

    study [28].sion thresholds to decide whether there is a need for alter-ing management or early retesting.

    Unless one test dominates on all three criteria, and onpracticality, the choice of test will involve some trade-offbetween the criteria. Depending on the clinical circum-stances, clinicians may give more weight to one of thesecriteria than the others, although validity will usually bethe most important. By making an evidence-based choiceon the monitoring test to use, we can expect better clinicaloutcomes, higher patient and clinician satisfaction, and lesswaste of scarce health resources.

    inhibitor based regimens: an individual patient data meta-analysisfrom randomised placebo controlled trials. Hypertension 2010;56:

    533e9.[14] Keenan K, Hayen A, Neal B, Irwig L. Long term monitoring in pa-

    tients receiving treatment to lower blood pressure: analysis of data

    from placebo controlled randomised controlled trial. BMJ 2009;

    338:b1492.Acknowledgments

    The authors thank Dr. Clement Loy and Dr. RafaelPerera-Salazar for their helpful comments on an earlierdraft of this article.

    References

    [1] Glasziou PP, Irwig L, Mant D. Monitoring in chronic disease: a ratio-

    nal approach. BMJ 2005;330:644e8.[2] Bossuyt PM, Irwig L, Craig J, Glasziou P. Comparative accuracy: as-

    sessing new tests against existing diagnostic pathways. [Review] [16

    refs] [Erratum appears in BMJ. 2006;332(7554):1368]. BMJ 2006;

    332:1089e92.[3] Hlatky MA, Greenland P, Arnett DK, Ballantyne CM, Criqui MH,

    Elkind MS, et al. Criteria for evaluation of novel markers of cardio-

    vascular risk: a scientific statement from the American Heart Associ-

    ation. [Erratum appears in Circulation. 2009;119(25):e606 Note:

    Hong, Yuling [added]]. Circulation 2009;119:2408e16.

    [4] Janes H, Pepe M, Bossuyt P, Barlow W. Measuring the performance

    of markers for guiding treatment decisions. Ann Intern Med 2011;

    154:253e9.

    [5] Irwig L, Glasziou PP. Choosing the best monitoring tests. In:

    Glasziou PP, Irwig L, Aronson JK, editors. Evidence-based medical

    monitoring from principles to practice. Malden, MA: BMJ Books;

    2008:63e74.

    [6] Boekholdt SM, Arsenault BJ, Mora S, Pedersen TR, LaRosa JC,

    Nestel PJ, et al. Association of LDL cholesterol, non-HDL choles-

    terol, and apolipoprotein B levels with risk of cardiovascular events

    among patients treated with statins. A meta-analysis. JAMA 2012;

    307:1302e9.

    [7] Simes RJ, Marschner IC, Hunt D, Colquhoun D, Sullivan D,

    Steward RAH, et al. Relationship between lipid levels and clinical

    outcomes in the Long-Term Intervention with Pravastatin in Ischemic

    Disease (LIPID) Trial: to what extend is the reduction in coronary

    events with pravastatin explained by on-study lipid levels? Circula-

    tion 2002;105:1162e9.

    [8] DAgostino RB, Vasan RS, Pencina MJ, Wolf PA, Cobain M,

    Massaro JM, et al. General cardiovascular risk profile for use in pri-

    mary care: the Framingham Heart Study. Circulation 2008;117:

    743e53.

    [9] The Long-Term Intervention with Pravastatin in Ischaemic Disease

    (LIPID) Study Group. Prevention of cardiovascular events and death

    with pravastatin in patients with coronary heart disease and a broad

    range of initial cholesterol levels. N Engl J Med 1998;339:

    1349e57.

    [10] PROGRESS Collaboration Group. Randomised trial of a perindopril-

    based blood-pressure-lowering regimen among 6105 individuals with

    previous stroke or transient ischaemic attack. Lancet 2001;358:

    1033e41.

    [11] Bell KJL, Hayen A, Macaskill P, Irwig L, Craig JC, Ensrud KEE,

    et al. Value of routine monitoring of bone mineral density after start-

    ing bisphosphonate treatment: secondary analysis of trial data. BMJ

    2009;338:b2266.

    [12] Bell KJL, Hayen A, Macaskill P, Craig JC, Neal BC, Irwig L. Mixed

    models showed no need for initial response monitoring after starting

    anti-hypertensive therapy. J Clin Epidemiol 2009;62:650e9.

    [13] Bell KJL, Hayen A, Macaskill P, Craig JC, Neal BC, Fox KM,

    et al. Monitoring initial response to angiotensin converting enzyme

  • [15] Shepard D. Reliability of blood pressure measurements: implications

    for designing and evaluating programs to control hypertension.

    J Chronic Dis 1981;34:191e209.

    [16] Glasziou PP, Irwig L, Heritier S, Simes J, Tonkin A, the LIPID Study

    Investigators. Monitoring cholesterol levels: measurement error or

    true change? Ann Intern Med 2008;148:656e61.

    [17] Bell KJL, Irwig L, Craig JC, Macaskill P. Use of randomized trials to

    decide when to monitor response to new treatment. BMJ 2008;336:

    361e5.[18] Bell K, Hayen A, Irwig L, Hochberg M, Ensrud K, Cummings S,

    et al. The potential value of monitoring bone turnover markers among

    women on alendronate. J Bone Miner Res 2012;27:195e201.[19] Bell KJL, Irwig L, March L, Hayen A, Macaskill P, Craig JC. Should

    response rules be used to decide continued subsidy of very expensive

    drugs? A checklist for decision makers. Pharmacoepidemiol Drug Saf

    2010;19(1):99e105.[20] van de Putte LBA, Atkins C, Malaise M, Sany J, Russell AS, van

    Riel PLCM. Efficacy and safety of adalimumab as monotherapy in

    patients with rheumatoid arthritis for whom previous disease modify-

    ing antirheumatic drug treatment has failed. Ann Rheum Dis 2004;

    63:508e16.

    [21] Westhoevens R, Yocum D, Han J, Berman A, Strusber I, Geusens P,

    et al. The safety of infliximab, combined with background treatments,

    among patients with rheumatoid arthritis and various comorbidities.

    Arthritis Rheum 2006;54:1075e86.

    [22] Lassere M, van der Heijde D, Johnson KR, Boers M, Edmonds J.

    Reliability of measures of disease activity and disease damage in rheu-

    matoid arthritis: implications for smallest detectable difference, min-

    imal clinically important difference, and analysis of treatment effects

    in randomized controlled trials. J Rheumatol 2001;28:892e903.[23] Freedman LS, Graubard BI. Statistical validation of intermediate

    endpoints for chronic diseases. Stat Med 1992;11:167e78.

    [24] Takahashi O, Glasziou PP, Perera R, Shimbo T, Fukui T. Blood pres-

    sure re-screening for healthy adults: what is the best measure and in-

    terval? J Hum Hypertens 2012;26:540e6. http://dx.doi.org:10.1038/

    jhh.2011.72.

    [25] Bland JM, Altman DG. Measurement error proportional to the mean.

    BMJ 1996;313:106.

    [26] Jackson R, Lawes C, Bennet D, Milne R, Roders A. Treatment with

    drugs to lower blood pressure and blood cholesterol based on an in-

    dividuals absolute cardiovascular risk. Lancet 2005;365:434e41.[27] Spruijt B, Vergouwe Y, Nijman RG, Thompson M, Oostenbrink R.

    Vital signs should be maintained as continuous variables when pre-

    dicting bacterial infections in febrile children. J Clin Epidemiol

    2013;66:453e7.[28] Sega R, Facchetti R, Bombelli M, Cesana G, Corrao G, Grassi G,

    Mancia G. Prognostic value of ambulatory and home blood pressures

    compared with office blood pressure in the general population: fol-

    low-up results from the Pressioni Arteriose Monitorate e Loro Asso-

    ciazioni (PAMELA) study. Circulation 2005;111:1777e83.

    159K.J.L. Bell et al. / Journal of Clinical Epidemiology 67 (2014) 152e159

    Criteria for monitoring tests were described: validity, responsiveness, detectability of long-term change, and practicality1 Introduction2 Criteria for good monitoring measurements3 The three measurement criteria3.1 Clinical validity: how well do the tests predict patient outcomes?3.2 Responsiveness: how clearly and rapidly do the tests change with treatment change?3.3 Detection of long-term change: how well do the tests distinguish long-term true change from random variation

    4 Some clinical examples5 Discussion and conclusionsAcknowledgmentsReferences


Recommended