+ All Categories
Home > Documents > Same trials, different conclusions: sorting out discrepancies between reviews on interventional...

Same trials, different conclusions: sorting out discrepancies between reviews on interventional...

Date post: 30-Nov-2016
Category:
Upload: roger-chou
View: 212 times
Download: 0 times
Share this document with a friend
11
Same trials, different conclusions: sorting out discrepancies between reviews on interventional procedures of the spine Roger Chou, MD a,b, * a Department of Medicine, 3181 SW Sam Jackson Park Road, Mail code BICC, Portland, OR 97239, USA b Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA Received 5 January 2009; accepted 8 May 2009 COMMENTARY ON: Levin JH. Prospective, double-blind, randomized placebo-controlled trials in interventional spine: what the highest quality literature tells us. Spine J 2009;9:690–703 (in this issue). Should clinicians recommend interventional spine proce- dures for patients with back pain, and if so, which procedures and in which patients? These are vexing questions that must be faced by anyone who manages patients with back pain. It is logical to assume that clinical trials on the utility of various interventional spine procedures should help answer these questions. It is also logical to assume that systematic re- views—a specific type of review article that uses methods to insure comprehensiveness, reduce bias, and enhance trans- parency—should be the best way to put together the evidence from all of the trials and generate valid conclusions about the utility of particular interventions [1]. Yet clinical trials often report discordant results, and different systematic reviews of the same interventional procedure frequently offer contradic- tory conclusions [2,3]. For many clinicians, the end result of all this evidence on interventional spine procedures is more rather than less confusion. In this issue of the Spine Journal, a review article by Levin evaluates the evidence for various interventional spine procedures based on results of randomized, double- blind, placebo-controlled trials [4]. I led a contemporaneous review on lumbar spine interventional procedures commis- sioned by the American Pain Society (APS) that included many of the same trials [5]. Because a number of conclu- sions differ between the two reviews, they provide an opportunity to examine how researchers that ostensibly address the same questions, using the same evidence base, can reach discordant conclusions. Clinicians need to understand how and why these differences occur to select the most appropriate review to guide their clinical decision making. This commentary focuses on interventions for low (lumbar) back pain, the subject of the APS review—though similar principles can be applied to the cervical interventions also covered by the Levin review. How do the conclusions of the reviews differ? In general, the Levin review came to more positive or fa- vorable conclusions regarding beneficial effects of various lumbar interventional therapies compared with the APS review (Table 1). In three cases (denervation for presumed facet joint or discogenic pain, and intradiscal electrother- mal therapy for presumed discogenic pain), the Levin review concluded that the interventional procedure is supe- rior to placebo, though these conclusions were sometimes qualified by or limited to specific patients or interventional techniques. The APS review, on the other hand, found in- sufficient evidence to determine whether these intervention- al procedures are associated with benefits. For epidural steroid injections, a similar discrepancy was present when conclusions were limited to trials of the transforaminal approach for acute or subacute radicular pain (the focus of the Levin review). In two cases (facet joint corticosteroid injection for presumed facet joint pain and intradiscal ste- roid injection for presumed discogenic pain), Levin con- cluded that there is insufficient evidence to determine benefits, but the APS review concluded that each procedure is not beneficial. Both reviews found insufficient evidence to determine benefits of sacroiliac joint injection for non- spondyloarthropathic, presumed sacroiliac pain, though Levin concluded that it is effective for spondyloarthro- pathic pain (not addressed by the APS review). DOI of original article: 10.1016/j.spinee.2008.06.447. FDA drug/device status: not applicable. Author disclosures: RC (research support from the American Pain Society). * Corresponding author. Department of Medicine, 3181 SW Sam Jackson Park Road, Mail code BICC, Portland, OR 97239, USA. E-mail address: [email protected] (R. Chou) 1529-9430/09/$ – see front matter Ó 2009 Elsevier Inc. All rights reserved. doi:10.1016/j.spinee.2009.05.003 The Spine Journal 9 (2009) 679–689
Transcript

The Spine Journal 9 (2009) 679–689

Same trials, different conclusions: sorting out discrepancies betweenreviews on interventional procedures of the spine

Roger Chou, MDa,b,*aDepartment of Medicine, 3181 SW Sam Jackson Park Road, Mail code BICC, Portland, OR 97239, USA

bDepartment of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA

Received 5 January 2009; accepted 8 May 2009

COMMENTARY ON: Levin JH. Prospective,

DOI of original ar

FDA drug/device

Author disclosures:

* Corresponding

Jackson Park Road, M

E-mail address: c

1529-9430/09/$ – see

doi:10.1016/j.spinee.2

double-blind, randomized placebo-controlled trialsin interventional spine: what the highest quality literature tells us. Spine J 2009;9:690–703 (in thisissue).

Should clinicians recommend interventional spine proce-dures for patients with back pain, and if so, which proceduresand in which patients? These are vexing questions that mustbe faced by anyone who manages patients with back pain. It islogical to assume that clinical trials on the utility of variousinterventional spine procedures should help answer thesequestions. It is also logical to assume that systematic re-views—a specific type of review article that uses methodsto insure comprehensiveness, reduce bias, and enhance trans-parency—should be the best way to put together the evidencefrom all of the trials and generate valid conclusions about theutility of particular interventions [1]. Yet clinical trials oftenreport discordant results, and different systematic reviews ofthe same interventional procedure frequently offer contradic-tory conclusions [2,3]. For many clinicians, the end result ofall this evidence on interventional spine procedures is morerather than less confusion.

In this issue of the Spine Journal, a review article byLevin evaluates the evidence for various interventionalspine procedures based on results of randomized, double-blind, placebo-controlled trials [4]. I led a contemporaneousreview on lumbar spine interventional procedures commis-sioned by the American Pain Society (APS) that includedmany of the same trials [5]. Because a number of conclu-sions differ between the two reviews, they provide anopportunity to examine how researchers that ostensiblyaddress the same questions, using the same evidence base,can reach discordant conclusions. Clinicians need to

ticle: 10.1016/j.spinee.2008.06.447.

status: not applicable.

RC (research support from the American Pain Society).

author. Department of Medicine, 3181 SW Sam

ail code BICC, Portland, OR 97239, USA.

[email protected] (R. Chou)

front matter � 2009 Elsevier Inc. All rights reserved.

009.05.003

understand how and why these differences occur to selectthe most appropriate review to guide their clinical decisionmaking. This commentary focuses on interventions for low(lumbar) back pain, the subject of the APS review—thoughsimilar principles can be applied to the cervicalinterventions also covered by the Levin review.

How do the conclusions of the reviews differ?

In general, the Levin review came to more positive or fa-vorable conclusions regarding beneficial effects of variouslumbar interventional therapies compared with the APSreview (Table 1). In three cases (denervation for presumedfacet joint or discogenic pain, and intradiscal electrother-mal therapy for presumed discogenic pain), the Levinreview concluded that the interventional procedure is supe-rior to placebo, though these conclusions were sometimesqualified by or limited to specific patients or interventionaltechniques. The APS review, on the other hand, found in-sufficient evidence to determine whether these intervention-al procedures are associated with benefits. For epiduralsteroid injections, a similar discrepancy was present whenconclusions were limited to trials of the transforaminalapproach for acute or subacute radicular pain (the focusof the Levin review). In two cases (facet joint corticosteroidinjection for presumed facet joint pain and intradiscal ste-roid injection for presumed discogenic pain), Levin con-cluded that there is insufficient evidence to determinebenefits, but the APS review concluded that each procedureis not beneficial. Both reviews found insufficient evidenceto determine benefits of sacroiliac joint injection for non-spondyloarthropathic, presumed sacroiliac pain, thoughLevin concluded that it is effective for spondyloarthro-pathic pain (not addressed by the APS review).

Table 1

Main conclusions from two contemporaneous reviews on lumbar interventional therapies

Intervention Main conclusions, Levin review [4]

Main conclusions, American Pain Society

review [5]

Transforaminal epidural corticosteroid injection

for acute/subacute radicular pain

Beneficial at short-term and possibly long-term

follow-up for pain and for preventing future

surgeries

Insufficient evidence to determine benefits

Facet joint injection for presumed facet joint

pain

Insufficient evidence to determine benefits Not beneficial

Radiofrequency denervation for presumed facet

joint pain

Beneficial (using specific technique) Insufficient evidence to determine benefits

Radiofrequency denervation for presumed

discogenic back pain

Beneficial Insufficient evidence to determine benefits

Intradiscal electrothermal therapy for presumed

discogenic back pain

Beneficial (in selected patients) Insufficient evidence to determine benefits

Intradiscal corticosteroid injection for presumed

discogenic back pain

Insufficient evidence to determine benefits Not beneficial

Sacroiliac joint corticosteroid injection for

presumed sacroiliac pain

Insufficient evidence to determine benefits for

non-spondyloarthropathic back pain;

beneficial for spondyloarthropathic back pain

Insufficient evidence to determine benefits

for non-spondyloarthropathic back pain

(spondyloarthropathic back pain)

680 R. Chou / The Spine Journal 9 (2009) 679–689

These discrepancies have important clinical implica-tions. Based on the APS review, facet joint and intradiscalsteroid injection should probably not be offered, butaccording to the Levin review, they may or may not beindicated. Transforaminal epidural steroid injection, radio-frequency denervation, and intradiscal electrothermal

Table 2

Quality assessment of systematic reviews

Criterion

Was an ‘‘a priori’’ design provided? The research question and inclusion criteria

before the conduct of the review

Was there duplicate study selection and data extraction? There should be at leas

extractors and a consensus procedure for disagreements should be in place

Was a comprehensive literature search performed? At least two electronic sourc

The report must include years and databases used. Key words and/or MESH te

where feasible the search strategy should be provided. All searches should be

consulting current contents, reviews, textbooks, specialized registers, or expert

of study, and by reviewing the references in the studies found

Were all relevant studies included regardless of publication type or status? The a

they searched for reports regardless of their publication type. The authors shou

they excluded any reports (from the systematic review), based on their public

etc.

Was a list of studies (included and excluded) provided? A list of included and e

reasons for exclusion) should be provided

Were the characteristics of the included studies provided? In an aggregated form

from the original studies should be provided on the participants, interventions

Was the scientific quality of the included studies assessed and documented? ‘‘A

assessment should be provided.

Was the scientific quality of the included studies used appropriately in formulati

results of the methodological rigor and scientific quality should be considered

conclusions of the review, and explicitly stated in formulating recommendatio

Were the methods used to combine the findings of studies appropriate? ‘‘A prior

explicitly stated for synthesizing the findings of studies that take into account

and quality of studies; inconsistency between studies; and magnitude of bene

should be used to formulate conclusions.

Was the likelihood of publication bias assessed?

Were conflicts of interest reported?

Criteria adapted from the Assessment of Multiple Systematic Reviews (AMS

therapy are proven therapies according to the Levin review,but not according to the APS review. Fortunately, selectingwhich review to trust need not be arbitrary decision, as it isusually possible to determine whether the conclusions ofa systematic review are sound based on a carefulexamination of its methods [6].

Levin review [4]

American Pain Society

review [5]

should be established No Yes

t two independent data No Yes

es should be searched.

rms must be stated and

supplemented by

s in the particular field

No (one electronic

source)

Yes

uthors should state that

ld state whether or not

ation status, language,

Can’t answer No (limited to English

language and

published articles)

xcluded studies (and No Yes

such as a table, data

, and outcomes.

No Yes

priori’’ methods of No Yes

ng conclusions? The

in the analysis and the

ns.

No Yes

i’’ methods should be

the type, number, size,

fits and these methods

No Yes

No No

No Yes

TAR) instrument (Shea et al., 2007 [8]).

Table 3

Method for grading the overall strength of the evidence for an intervention from an American Pain Society [5] review

Grade Definition

Good Evidence includes consistent results from well-designed, well-conducted studies in representative populations that directly assess effects on health

outcomes (at least two consistent, higher-quality trials).

Fair Evidence is sufficient to determine effects on health outcomes, but the strength of the evidence is limited by the number, quality, size, or consistency

of included studies; generalizability to routine practice; or indirect nature of the evidence on health outcomes (at least one higher-quality trial of

sufficient sample size; two or more higher-quality trials with some inconsistency; at least two consistent, lower-quality trials, or multiple consistent

observational studies with no significant methodological flaws).

Poor Evidence is insufficient to assess effects on health outcomes because of limited number or power of studies, large and unexplained inconsistency

between higher-quality trials, important flaws in trial design or conduct, gaps in the chain of evidence, or lack of information on important health

outcomes

681R. Chou / The Spine Journal 9 (2009) 679–689

Why do the conclusions of the reviews differ?

Assuming that two reviews address the same clinicalquestion and evaluate roughly the same evidence, discrep-ancies can often be explained by differences in methodolog-ical quality [6]. The methodological quality of review articlesis critical because lower-quality reviews tend to report morepositive conclusions regarding benefits of interventions com-pared with higher-quality reviews [2,7]. A number ofmethods are available to assess whether the review usedmethods to enhance comprehensiveness and transparencywhile minimizing bias, error, and subjectivity in how studieswere identified and analyzed [7,8]. Although details of qual-ity rating methods differ, all include criteria related to use ofcomprehensive search strategies, application of predefinedinclusion/exclusion criteria, appropriate assessments ofstudy quality, and appropriate methods for synthesizing evi-dence and generating conclusions. Newer instruments (suchas the Assessment of Multiple Systematic Reviews, or AM-STAR tool) incorporate additional criteria more recently un-derstood to be important, such as use of dual review to selectstudies and abstract data, assessment of publication bias, andreporting of conflicts of interest (Table 2) [8].

Both the Levin and the APS review may be consideredto be systematic in the sense that each performed searcheson electronic databases to identify studies, focused on high-er-quality evidence (randomized trials), and generated

Table 4

Definitions for estimating magnitude of effects from an American Pain Society

Magnitude of effect Definition

Large/substantial

Pain scales: Mean 5- to 10-point improveme

Back-specific functional status: Mean 5- to 1

All outcomes: SMD, 0.2–0.5

Moderate

Pain scales: Mean 10- to 20-point improvem

Back-specific functional status: Mean 10- to

All outcomes: SMD, 0.5–0.8

Small/modest

Pain scales: Mean O20-point improvement o

Back-specific functional status: Mean O20-p

All outcomes: SMD, O0.8

ODI, Oswestry Disability Index; RDQ, Roland-Morris Disability Questionna

conclusions described as evidence based. However, detailsof their review methods otherwise differ dramatically.The Levin review failed to adequately meet any of 11 qual-ity criteria included in the AMSTAR instrument (Table 2).The APS review, on the other hand, met nine criteria. Short-comings of the APS review were that it did not attempt toevaluate for presence of publication bias because too fewtrials were available to perform formal assessments [9],and it limited inclusion to English language and publishedliterature (no placebo-controlled, non-English language tri-als were identified).

The fact that the higher-quality (APS) review (based onAMSTAR criteria) reached less positive conclusions is con-sistent with previous research [2,7]. But what was the rela-tive importance of specific methodologic differences? Inthis case, a critical factor was in how evidence was synthe-sized. The APS review described predefined methods forgrading the evidence for an intervention that incorporatedboth the quality of evidence (based on number, type, size,and quality of studies; presence of inconsistency; and otherfactors) (Table 3) and magnitude of clinical benefit (Table4) [5]. This allows readers to understand how these factorswere used to generate conclusions, how confident to be inthe results, and how much clinical benefit to expect [10].The evidence grades are based on the principle that consis-tent results from a number of higher-quality studies acrossa broad range of populations increase the certainty that the

review [5]

nt on a 100-point VAS or equivalent

0-point improvement on the ODI, 1–2 points on the RDQ, or equivalent

ent on a 100-point VAS or equivalent

20-point improvement on the ODI, 2–5 points on the RDQ, or equivalent

n a 100-point VAS or equivalent

oint improvement on the ODI, O5 points on the RDQ, or equivalent

ire; SMD, standardized mean difference; VAS, visual analog scale.

Table 5

Summary of evidence and main conclusions from systematic reviews of lumbar interventional procedures

Intervention Review

Number of placebo-

controlled trials

(number rated

higher quality)

Placebo-

controlled

trials with

$100 patients

Total number

of trials (placebo

controlled and

active controlled)

Net benefit vs.

placeboa InconsistencybOverall quality

of evidence Comments

Transforaminal

epidural steroid

injection for

acute or

subacute

radiculopathy

APS review [5] 3 (3) 2 6 Unable to determine Yes Poor Inconsistent results from higher-

quality placebo-controlled

trials

Levin review [4] 3 (quality not

assessed)

2 4 Short-term and

possibly long-term

benefits

Explained by

potential

beneficial

effects of

placebos used

in trials

Not graded Interpreted all trials as potentially

positive despite some trials

showing no benefit vs. assessed

placebo

Facet joint steroid

injection for

presumed facet

joint pain

APS review 2 (1) 2 7 No effect No Fair No benefit in two trials

Levin review 2 (quality not

assessed)

2 2 Unable to determine Not assessed Not graded Excluded one lower-quality

negative trial because it did not

use diagnostic blocks to select

patients and downplayed results

of one higher-quality negative

trial

Radiofrequency

denervation for

presumed facet

joint pain

APS review 6 (4) 0 6 Unable to determine Yes Poor Inconsistent results; one higher-

quality trial used an inadequate

technique, another had large

baseline differences in pain

scores

Levin review 4 (quality not

assessed)

0 5 Beneficial Explained by

technical

factors in trials

Not graded Conclusions based on one small

positive placebo-controlled trial

because of technical issues in

other trials

Radiofrequency

denervation for

presumed

discogenic pain

APS review 1 (0) 0 1 Unable to determine

(one trial)

Not applicable Poor The single trial was small and

rated lower quality

Levin review 1 (quality not

assessed)

0 1 Beneficial Not applicable Not graded Conclusions based on one small

positive trial

Intradiscal

electrothermal

therapy for

presumed

discogenic pain

APS review 2 (2) 0 2 Unable to determine Yes Poor Inconsistent results between two

higher-quality trials

Levin review 2 (quality not

assessed)

0 2 Beneficial Explained by

potential

methodological

issues in one of

the trials

Not graded Conclusions based on one small

positive trial

Intradiscal steroid

injection for

presumed

discogenic pain

APS review 3 (1) 2 3 No effect No Good No benefit shown in three trials

Levin review 2 (quality not

assessed)

1 2 Unable to determine Not assessed Not graded Findings of no benefit in the two

trials called into question

because of discography method

used to select patients

68

2R

.C

ho

u/

Th

eS

pin

eJo

urn

al

9(2

00

9)

67

9–

68

9

Sac

roil

iac

join

t

ster

oid

inje

ctio

n

for

pre

sum

ed

sacr

oili

acjo

int

pai

nw

ith

spo

ndy

loar

thro

path

y

AP

Sre

vie

w1

(1)

01

Su

bst

anti

al(o

ne

smal

ltr

ial)

No

tap

pli

cab

leP

oor

Th

eo

nly

avai

lab

letr

ial

eval

uat

ed

aper

iart

icula

rco

rtic

ost

eroid

inje

ctio

n

Lev

inre

vie

w0

No

tap

pli

cab

le0

No

tria

lsN

ot

app

lica

ble

No

evid

ence

No

tria

lo

fp

atie

nt

wit

ho

ut

spo

nd

ylo

arth

rop

ath

yid

enti

fied

aIn

the

AP

Sre

vie

w,d

eter

min

atio

no

fn

etb

enefi

tw

asb

ased

on

evid

ence

show

ing

the

inte

rven

tio

nis

mo

reef

fect

ive

than

pla

ceb

oo

rsh

amth

erap

yfo

ro

ne

or

mo

reo

fth

efo

llow

ing

ou

tco

mes

:p

ain

,fu

nct

iona

l

stat

us,

over

all

imp

rove

men

t,o

rw

ork

stat

us.

Ver

sus

pla

ceb

o,

smal

lb

enefi

td

efine

das

5–

10

po

ints

on

a1

00

-po

int

Vis

ual

An

alo

gS

cale

(VA

S)

for

pai

n(o

req

uiv

alen

t),

1–

2p

oin

tso

nth

eR

ola

nd-

Mo

rris

Dis

abil

ity

Qu

esti

onn

aire

(RD

Q),

5–

10

po

ints

on

the

Osw

estr

yD

isab

ilit

yIn

dex

(OD

I),

or

ast

and

ard

ized

mea

nd

iffe

ren

ce(S

MD

)o

f0

.2–0

.5.

Mo

der

ate

ben

efit

defi

ned

as1

0–

20p

oin

tso

na

VA

Sfo

rp

ain

,2

–5

po

ints

on

the

RD

Q,

10

–2

0p

oin

tso

nth

eO

DI,

or

aS

MD

of

0.5

–0.8

.L

arg

eb

enefi

td

efine

das

O2

0p

oin

tso

na

10

0-p

oin

tV

AS

for

pai

n;O

5p

oin

tso

nth

eR

DQ

,O2

0p

oin

tso

nth

eO

DI,

or

aS

MD

of

O0

.8.

bIn

the

AP

Sre

vie

w,

inco

nsi

sten

cyw

asd

efin

edas

!7

5%

of

tria

lsre

ach

ing

con

sist

ent

con

clu

sio

ns

on

effi

cacy

(no

effe

ctv

s.p

osi

tive

effe

ctco

nsi

der

edin

con

sist

ent)

.

AP

S,

Am

eric

anP

ain

So

ciet

y.

683R. Chou / The Spine Journal 9 (2009) 679–689

results of the studies are true (the entire body of evidencewould be considered ‘‘good quality’’) [11]. For a ‘‘fair-quality’’ body of evidence, results are sufficient to estimatebenefits, but there is uncertainty because results could bethe result of the true effects or affected by biases operatingacross some or all of the studies. There is therefore a greaterlikelihood that future trials could change or overturn con-clusions. For a ‘‘poor-quality’’ body of evidence, there istoo much uncertainty to form reliable conclusions. TheAPS review used a relatively low threshold to define a bodyof evidence as fair quality: at least one higher quality ofsufficient sample size, two or more higher-quality trialswith some inconsistency, or at least two lower-quality trialswith consistent results.

The Levin review, on the other hand, did not formallyassess the internal validity (quality) of included trials.The ‘‘flaws’’ discussed in the Levin review frequently referto issues related to external validity (factors affecting whichpopulations, interventions, and outcomes a trial are likely toapply to [12]), rather than to factors that affect internal val-idity, or the risk of bias (systematic errors that favor oneconclusion over another) [13]. In addition, the Levin reviewdid not describe methods used to synthesize evidence andgenerate conclusions. This is problematic, particularly be-cause it frequently concluded that interventional proceduresare beneficial based on results of one small trial. Suchevidence is not reliable for guiding clinical decision mak-ing. As stated by Egger and Davey Smith nearly 15 yearsago, ‘‘Several medium-sized trials of high quality seemnecessary to render results trustworthy [14].’’ Sparse evi-dence from small trials result in imprecise estimates, aremore subject to publication bias, may not be generalizableto other populations and settings, and are often overturnedby subsequent studies [15]. Several conclusions in theLevin review were also made despite the presence of incon-sistency between trials (ie, some trials reported benefits ofan interventions but others did not). This runs counter to anintegral principle of scientific inquiry—the independentreproducibility of research findings [15]. If beneficial re-sults of a trial can’t be reliably replicated in tightly man-aged trial settings, there is little reason to expectpredictable benefits in the far messier world of clinicalpractice.

In addition to not describing methods for synthesizingevidence and generating conclusions, the Levin review alsoapproached trials differently depending on whether theyreported positive (the intervention was statistically superiorto placebo) or negative (no statistically significant differ-ence) results. Specifically, it states that ‘‘in the interpreta-tion of medical literature, the design of negative studiesdeserves closer evaluation than that of positive studies[4].’’ Given that randomization was successful, resultsmet standard statistical significance thresholds, and anappropriate control intervention was used, Levin goes onto assert that ‘‘.positive results are positive. Negativeresults, however, require greater scrutiny to determine if

684 R. Chou / The Spine Journal 9 (2009) 679–689

the treatment is truly ineffective.’’ Following this approach,Levin rejected or downplayed several negative trials whenformulating conclusions.

For the sake of this commentary, we will use the term‘‘negative study’’ as defined by Levin, despite the long-standing suggestion that it be abandoned because it impliesthat the study has shown that there is no difference, whereasusually all that has been demonstrated is an absence ofevidence of a difference [16]. In addition, the word ‘‘nega-tive’’ has pejorative connotations, implying that the studydoes not have anything positive to contribute. In fact, so-called negative studies can provide very useful scientificevidence about what may not work [17,18]. Studies withinadequate statistical power (which can result in Type IIerror, or finding of no difference when a difference in factexists) can result in false negative results [19], but it is inap-propriate to categorically dismiss their results. Rather, pointestimates and confidence intervals should be examined tojudge whether it is likely that enhancing statistical powerwould result in a positive result. In addition, one of the pur-poses of systematic reviews is to enhance statistical powerby looking at multiple trials—so even small studies can con-tribute information. Similarly, improper selection of pa-tients, interventional techniques, or controls can providenegative results with questionable generalizability [12],but that are nonetheless true (have high internal validity) un-der the conditions of the trial.

From a conceptual standpoint, a broader issue is that the ap-plication of a low threshold to reject results of negative trialswhile accepting positive trials largely on facevalue is an unbal-anced approach to critical appraisal that is inconsistent withevidence that falsely positive trials are in fact very common[15]. Even large, highly cited positive randomized trials reportresults that are stronger than or contradicted by subsequent tri-als with disturbing frequency [20]. The possibility of a chancefinding of a difference (Type I error, reflected by the p value) isjust one of a number of factors known to be associated withspuriously positive results or inflated estimates of benefit.Methodological shortcomings such as inadequate randomiza-tion or allocation concealment, inadequate blinding, presenceof unequal or high attrition, differential use of cointerventions,and failure to perform intention-to-treat analysis all increasethe risk of bias [21,22]. In fact, such flaws have stronger effectsin trials that assess subjective outcomes such as pain, com-pared with trials that assess more objective outcomes [23]. Nu-merous studies have also shown that publication, selectiveoutcomes reporting, and other related biases (often related tofinancial or other conflicts of interest) are common and can se-riously distort conclusions of systematic reviews [24–28]. Onestudy of nearly 750 clinical trials submitted to ethics commit-tees found that those reporting positive results were more thanthree times as likely as negative trials to be published [29]. An-other study found that of 74 trials on antidepressants submittedto the FDA, 37 of 38 positive trials were published, but only 3of 36 negative trials [30]. Positive trials also are just as subjectto issues related to generalizability as negative trials, as they

frequently evaluate highly selected patients in specializedsettings [12,31].

Positive trials also warrant at least as much scrutiny asnegative trials because their clinical implications are fargreater [32]. For a truly effective treatment, the cost ofrequiring more research is a delay, not permanent abandon-ment. Spinal interventional procedures are typically offeredelectively, do not provide more than moderate average ben-efits in even the most positive trials, and a number of alter-natives supported by relatively strong evidence [33] areavailable for most patients with back pain. The conse-quences of a negative trial—to not offer an intervention,to consider an alternative therapy, or to wait for additionalresearch—are relatively low risk to patients. On the otherhand, if an ineffective treatment is adopted based on flawedor limited evidence, patients are exposed to all of its atten-dant harms, costs, and burdens. Furthermore, the necessaryresearch may never be conducted, and may even be labeledby proponents as unethical. This is not just a theoreticalconcern, as there are a number of historical examples oftreatments adopted for low back pain based on weak evi-dence, only to be abandoned later when it became clear thatthey were not beneficial, or even harmful [34].

Given the large differences in methods used to assess andsynthesize the evidence, it is not surprising that the two re-views reached different conclusions. Table 5 summarizeshow various factors were used by each review to synthesizeevidence for different lumbar interventional therapies.

Transforaminal epidural steroid injection for acute/subacute radicular pain

Both reviews included three trials of a transforaminalepidural steroid versus a transforaminal placebo (saline orlocal anesthetic) injection [35–37]. All three trials wererated higher quality by the APS review. The Levin reviewalso included a lower-quality trial of a transforaminal ver-sus interlaminar epidural steroid injection [38]. It con-cluded that transforaminal epidural steroid injection isbeneficial for short-term and possibly long-term pain eventhough both placebo-controlled trials [35,36] that reportedlong-term pain found no benefit (one trial found the placeboinjection to be superior) and only one [35] of the two trialsreported short-term benefit [4]. This interpretation is justi-fied in part by the assertion that negative results in someof the placebo-controlled trials may not be truly negative,because transforaminal placebo injections may have hadbeneficial effects, even though there are no placebo-controlled trials of transforaminal epidural saline or localanesthetic injection versus a nonepidural placebo to supportthis assumption. The Levin review also used the active-controlled trial to support the conclusion that transforami-nal epidural steroid injections are superior to placebo[38]. However, inclusion of this trial is problematic.Although inclusion and exclusion criteria were not clearlystated by the Levin review, its title indicates that inclusion

685R. Chou / The Spine Journal 9 (2009) 679–689

should have been restricted to placebo-controlled trials. Theinclusion of this active-controlled trial appears arbitrary,especially because a negative active-controlled trial oftransforaminal epidural steroid injections was excluded[39]. Furthermore, inferences regarding the relative efficacyof transforaminal epidural steroid injection compared withplacebo from this trial were based on indirect reasoning (ie,tranforaminal epidural steroid injection is superior to inter-laminar epidural steroid injection and interlaminar epiduralsteroid injection is superior or equivalent to placebo, there-fore transforaminal epidural steroid injection is superior toplacebo). Such reasoning seems logical, but can in fact bequite misleading, particularly if the critical assumptionregarding similarity of treatment effects across all trials isnot met [40,41]. For this reason, indirectness is routinelyconsidered a reason to downgrade evidence [10].

The APS review found insufficient evidence to deter-mine benefits of transforaminal epidural steroid injectionsfor pain relief because of inconsistent short-term resultsand lack of long-term benefit in two placebo-controlled tri-als [35,36]. It also found insufficient evidence to reliablydetermine effects on subsequent surgery rates, as onlyone small (n555), placebo-controlled trial assessed thisoutcome [37]. Active-controlled trials that compared thetransforaminal to other approaches were reviewed but didnot affect conclusions, as they were small and mostly lowerquality, with inconsistent results [38,39,42].

Facet joint steroid injection for presumed facetjoint pain

The APS review found fair evidence that facet jointsteroid injections are not beneficial for presumed facet jointpain, based on two placebo-controlled trials of facet jointsteroid injection (one rated higher quality [43] and onerated lower quality [44]) that did not show short-termbenefits. The Levin review found insufficient evidence todetermine efficacy. One reason for the discrepancy is thatthe Levin review excluded the lower-quality trial becausediagnostic facet joint blocks were not used to select patients[44]. It also downplayed the negative results of the higher-quality trial [43] because a single rather than controlled di-agnostic block was used to select patients, 2 cc rather than1 cc of lidocaine were used for the injection, and 16% ofjoints were not successfully injected.

The decision to exclude the lower-quality trial was basedon the assumption that clinical methods are inadequate todiagnose facet joint pain, and diagnostic facet blocks arenecessary. However, no reliable evidence exists to estimatethe sensitivity, specificity, or clinical utility of diagnosticblocks [45]. It is impossible to calculate sensitivity andspecificity because not only is the correlation between diag-nostic facet joint blocks and imaging findings variable,there is in fact no reliable reference standard for identifica-tion of ‘‘true’’ facet joint pain [46]. Furthermore, no studieshave shown that use of facet joint blocks (controlled or

uncontrolled) to guide choice of therapy improves subse-quent clinical outcomes, compared with choosing therapybased on other criteria. The assumption that controlled fac-et joint blocks are more accurate than uncontrolled blocksis also unproven. Although use of controlled diagnostic fac-et joint blocks results in fewer positive results comparedwith uncontrolled blocks, it is unknown what proportionis due to fewer true positives (leading to lower sensitivity)versus fewer false positives (leading to higher specificity).This is important because changes in sensitivity and speci-ficity both affect the likelihood that a positive test is trulyassociated with disease (the positive likelihood ratio, orsensitivity/1�specificity). There is also no evidence thatuse of a smaller amount of injectate or a marginally in-creased rate of successful injections is associated withgreater, clinically relevant benefits.

Even if the APS review excluded the lower-quality trialbecause it did not use diagnostic facet joint blocks to selectpatients, its conclusions would not change, as there wouldstill exist one higher-quality trial [43] with greater than100 patients showing no benefit, meeting criteria for fair-quality evidence.

Radiofrequency denervation for presumed facetjoint pain

Both reviews found inconsistent results from four placebo-controlled trials [47–50] (three rated higher quality [48–50]by the APS review) of radiofrequency denervation for pre-sumed facet joint pain. The Levin review also included anactive-controlled trial of continuous versus pulsed radiofre-quency denervation (as in the case of transforaminal epiduralinjections, inferences regarding efficacy vs. placebo werebased on indirect reasoning) [51]. None of the trials used con-trolled diagnostic facet joint blocks to select patients, and in-terpretation of results is challenging because some of the trialsmay have used suboptimal techniques. The Levin review con-cluded that radiofrequency is beneficial, largely based ona single small (n530) trial that presumably used the best tech-nique [49]. The APS review, on the other hand, acknowledgedthat some trials may have had poor external validity becausethey used suboptimal ablation technique, but concluded thatthere is insufficient evidence to evaluate beneficial effectsbecause of the inconsistency between higher-quality trials.Even if it accepted the trial that used the superior technique(according to Levin) as the only admissible evidence, conclu-sions of the APS review that evidence is insufficient to esti-mate benefits would be unchanged, because they would bebased on a single, very small trial [49].

The APS review also included another placebo-controlledtrial that was published after the Levin review was conducted[52]. Although this was the only trial to use controlled facetjoint blocks to select patients and a radiofrequency ablationtechnique believed to be optimal, it did not change its conclu-sions. Although the trial found radiofrequency denervationmoderately superior to sham treatment for improvement in

686 R. Chou / The Spine Journal 9 (2009) 679–689

generalized, back, and leg pain after 6 months, the differencewas not statistically significant for back pain (the main symp-tom thought to be associated with facet pain). In addition,baseline pain scores in the radiofrequency denervation groupaveraged 1.6 points higher (p!.05 for differences) than in thesham group, which suggests inadequate randomization andcould be associated with differential potential for improve-ment or regression to the mean. In fact, final pain scores inthe two groups were identical.

Radiofrequency denervation for presumed discogenicback pain

The Levin review concluded that radiofrequency dener-vation is beneficial for presumed discogenic back pain,based on one placebo-controlled trial [53]. Based on thesame trial, the APS review concluded that evidence isinsufficient because it was rated lower quality and enrolleda small sample (n549).

Intradiscal electrothermal therapy for presumeddiscogenic back pain

The Levin review concluded that intradiscal electrother-mal therapy is beneficial for presumed discogenic back pain,largely based on one small (n564) positive trial [54]. Itdownplayed the results of a second, negative trial, citing in-adequate statistical power, possible baseline differences,and inadequate discography criteria [55]. As in the case of di-agnostic facet joint blocks, however, the accuracy and clini-cal utility of different discography criteria is not established.In addition, clinically significant baseline differences did notin fact appear to be present in this trial, as baseline Low BackOutcome Scores, Oswestry Disability Index score, and otheroutcome measures were almost identical [55]. Statisticalpower is unlikely to have been an important issue in this trial,as it enrolled almost as many patients (n557) as the positivetrial. Furthermore, interpretation of potentially inadequatestatistical power should include an examination of the pointestimates and confidence intervals reported by the trial[56]. In this case, there were essentially no differences inpoint estimates for any outcome (some even slightly favoredthe placebo group), with small or trivial maximum benefitsaccording to the upper limits of confidence interval bound-aries. Type II error due is typically a concern when there isa nonstatistically but clinically significant trend in favor ofone group. In this case, enhancement of statistical powerwould only result in the finding of clinically significant ben-efits if the 18 additional patients recruited into the trial tomeet the original sample size goal of 75 patients were toexperience substantially better results from intradiscal elec-trothermal therapy than the 57 patients already evaluated.This could happen, but is quite unlikely.

The APS review found insufficient evidence to reliablyevaluate benefits because of inconsistency between thetwo small, higher-quality trials [54,55].

Intradiscal steroid injection for presumeddiscogenic pain

The Levin review included two negative trials of intradis-cal steroid injection for presumed discogenic pain, but con-cluded that evidence is insufficient to evaluate benefits[57,58]. As in the case of intradiscal electrothermal therapy,it downplayed negative results largely based on use of inad-equate discography criteria to select patients and also citedlong-term follow-up as an inappropriate time interval to as-sess outcomes. The APS review included these two trialsand a third trial [59] not included in the Levin review. Onetrial was rated higher quality [57]. Based on consistent resultsfrom the three trials (two enrolled more than 100 subjects), itconcluded that there is good evidence of no benefit at any fol-low-up period.

Sacroiliac steroid injection for presumed sacroiliac pain

The APS review included one small (n524), higher-quality trial of a periarticular steroid injection for presumedsacroiliac pain not associated with spondyloarthropathy[60]. Based on the small sample size, it concluded thatthere is insufficient evidence to reliably evaluate benefits.This trial was not included in the Levin review.

Discussion

Conclusions of systematic reviews that address the sameinterventions and are largely based on the same evidencecan differ in ways that have important implications for clin-ical decisions [3,61]. This is often a source of confusionand frustration, as systematic reviews are supposed to bringmore objectivity and scientific rigor to the review process[1]. However, these discrepancies need not be a mystery,as readers can often determine for themselves how andwhy these differences occurred [6]. In this case, the morepositive conclusions of the Levin review can be accountedfor by several major factors [4]. First, it did not meetcurrent standards for conducting systematic reviews toincrease comprehensiveness, enhance transparency, andreduce bias and error. Second, it accepted very weak evi-dence (one small trial) as sufficient to establish benefitsof interventions. Third, it used an unbalanced approach tonegative compared with positive trials. In several cases, thisresulted in rejection of negative trials (and therefore of in-consistency between trials) based on unproven assumptionsor tenuous chains of logic, and reluctance to conclude thatthere is no evidence of benefit, even when all available tri-als consistently failed to demonstrate benefits.

The Levin and APS reviews are far from the only exam-ples of discordant systematic reviews for interventionalspine procedures. For intradiscal electrothermal therapy,for example, there are only two placebo-controlled trials,yet there are at least four other systematic reviews[62–65]. Among these four studies, the two systematic

687R. Chou / The Spine Journal 9 (2009) 679–689

reviews with more methodological flaws were also the onesthat concluded that intradiscal electrothermal therapy iseffective [62,63]. Unlike the Levin review, which focusedon randomized trials, a critical shortcoming of both of thesesystematic reviews is that conclusions were heavily basedon pooled results of uncontrolled observational studies—

a particularly weak and unreliable form of evidence, andcertainly not capable of resolving inconsistencies betweenwell-conducted randomized trials.

So what should a clinician do? The APS review metmost methodologic standards and used a more balancedapproach to critical appraisal that included predefined andmore stringent evidence thresholds. Clinicians who acceptthe methods of the APS review and the parameters usedto grade evidence should not offer facet joint injectionand intradiscal steroid injection, as the best currently avail-able evidence failed to demonstrate that they improvepatient outcomes, though future research could changethese conclusions. For intradiscal electrothermal therapy,radiofrequency denervation, and sacroiliac joint injection,there is insufficient evidence to draw reliable conclusionsabout benefit. In general, clinicians should prioritize thera-pies (including noninterventional therapies [33]) supportedby higher-quality evidence over those supported by onlyweak evidence. Not offering therapies supported by weakevidence is consistent with the principle that cliniciansshould only recommend interventions with proven benefits.Clinicians who do choose to offer these therapies shouldreserve them for patients with at least moderately severesymptoms despite trials of alternative therapies supportedby stronger evidence. In such cases, patients always needto be clearly informed about the substantial uncertaintiesregarding potential benefits and harms. Decisions regardingtransforaminal epidural injections are less straightforward.Based on evidence for epidural steroid injections in general,clinicians may consider them as an option for short-termbenefits, but there is inconsistency among higher-qualitytrials (with some showing no benefits), and there is insuffi-cient evidence to determine whether the transforaminalapproach is superior to the interlaminar approach [5].

How can we move beyond a state of uncertainty for mostinterventional therapies? In short, we need more and bettertrials. It is time to leave behind the practice of adopting in-terventional therapies based on sparse or seriously inconsis-tent evidence. There is no reason to believe that we shouldaccept lower standards of evidence for interventional spineprocedures than for other medical interventions, and doingso is not scientifically credible given all that we know aboutthe vagaries of evidence [15]. Levin raises a number ofintriguing hypotheses about the accuracy of different diag-nostic methods and the relative efficacy of differentinterventional techniques or methods [4]. Until thesehypotheses are tested, however, all inferences about howthey might have affected trial results remain speculative.If controlled diagnostic facet joint blocks or provocativediscography using specific criteria are thought to be

essential to accurately select patients for appropriate proce-dures, trials that use these methods should be conducted.Similarly, trials that use radiofrequency denervation tech-niques that are thought to be optimal [66] and are designedto minimize bias are needed. Trials that compare epiduralsaline or local anesthetic injection versus a dry epiduralor soft-tissue needlestick could help determine whetherthey have therapeutic value, to guide appropriate choicesfor interventions and placebos in future trials.

Enthusiasm about benefits of interventional proceduresbased on positive results from single trials should is almostalways premature, especially when sample sizes are small,effects are moderate, or there are methodological shortcom-ings. Additional confirmatory trials are almost always nec-essary to establish the clinical benefits of interventions, byshowing that results can be replicated with some degree ofcertainty. Systematic reviews of the interventional spine lit-erature can be very valuable to clinicians and policymakerswhen conducted and reported according to published stan-dards [8,67,68]. A current weakness of systematic reviewsof interventional spine procedures is that it is very difficultto assess publication bias using statistical and graphicalmethods because of small numbers of trials and diversityin reported outcomes [9]. However, increased use of trialregistries could help in the future detection—and preferablyprevention—of publication, outcomes, and other relatedbiases [69].

References

[1] Mulrow CD. Rationale for systematic reviews. BMJ 1994;309:597–9.

[2] Furlan AD, Clarke J, Esmail R, Sinclair S, Irvin E, Bombardier C. A

critical review of reviews on the treatment of chronic low back pain.

Spine 2001;26:E155–62.

[3] Hopayian K, Mugford M. Conflicting conclusions from two system-

atic reviews of epidural steroid injections for sciatica: which evidence

should general practitioners heed? Br J Gen Pract 1999;49:57–61.

[4] Levin JH. Prospective, double-blind, randomized placebo-controlled

trials in interventional spine: what the highest quality literature tells

us. 2009;9:690–703.

[5] Chou R, Atlas S, Stanos S, Rosenquist R. Nonoperative intervention-

al therapies for low back pain: a review of the evidence for an Amer-

ican Pain Society clinical practice guideline. Spine 2009;34:1078–93.

[6] Jadad AR, Cook DJ, Browman GP. A guide to interpreting discordant

systematic reviews. CMAJ 1997;156:1411–6.

[7] Oxman AD, Guyatt GH. Validation of an index of the quality of

review articles. J Clin Epidemiol 1991;44:1271–8.

[8] Shea BJ, Grimshaw JM, Wells GA, et al. Development of AMSTAR:

a measurement tool to assess the methodological quality of system-

atic reviews. BMC Med Res Methodol 2007;7:10.

[9] Sterne JAC, Egger M, Smith GD. Systematic review in health care:

investigating and dealing with publication and other biases in meta-

analysis. BMJ 2001;323:101–5.

[10] Guyatt GH, Oxman AD, Vist GE, et al. GRADE: What is ‘‘quality of

evidence’’ and why is it important to clinicians. BMJ 2008;336:

995–8.

[11] Guyatt GH, Gutterman D, Baumann MH, et al. Grading strength of

recommendations and quality of evidence in clinical guidelines:

report from an American College of Chest Physicians Task Force.

Chest 2006;129:174–81.

688 R. Chou / The Spine Journal 9 (2009) 679–689

[12] Rothwell PM. External validity of randomised controlled trials: ‘‘To

whom do the results of this trial apply?’’. Lancet 2005;365:82–93.

[13] Gluud LL. Bias in clinical intervention research. Am J Epidemiol

2006;163:493–501.

[14] Egger M, Smith GD. Misleading meta-analysis. BMJ 1995;310:

752–4.

[15] Ioannidis JP. Why most published research findings are false. PLos

Med 2005;2:696–701.

[16] Chalmers I. Proposal to outlaw the term ‘‘negative trials’’. BMJ

1985;290:1002.

[17] Connor JT. Positive reasons for publishing negative findings. Am J

Gastroenterol 2008;103:2181–3.

[18] Gluud C. ‘‘Negative trials’’ are positive!. J Hepatol 1998;28:731–3.

[19] Freiman JA, Chalmers TC, Smith H, Kuebler RR. The importance of

beta, the type II error, and sample size in the design and interpretation

or the randomized controlled trial. Survey of 71 ‘‘negative’’ trials. N

Engl J Med 1978;299:690–4.

[20] Ioannidis JPA. Contradicted and initially stronger effects in highly

cited clinical research. JAMA 2005;294:218–28.

[21] Moher D, Jones A, Cook DJ, et al. Does quality of randomised trials

affect estimates of intervention efficacy reported in meta-analyses.

Lancet 1998;352:609–13.

[22] Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of

bias. Dimensions of methodological quality associated with estimates

of treatment effects in controlled trials. JAMA 1995;273:408–12.

[23] Wood Le M, Gluud LL, Schulz KF, et al. Empirical evidence of bias in

treatment effect estimates in controlled trials with different interven-

tions and outcomes: meta-epidemiological study. BMJ 2008;336:

601–6.

[24] Chan A-W, Hrobjartsson A, Haahr MT, et al. Empirical evidence for

selective reporting of outcomes in randomized trials: comparison of

protocols to published articles. JAMA 2004;291:2457–65.

[25] Kjaergard LL, Als-Nielsen B. Association between competing inter-

ests and authors’ conclusions: epidemiological study of randomised

clinical trials published in the BMJ. BMJ 2002;325:249.

[26] Lexchin J, Bero LA, Djulbegovic B, Clark O. Pharmaceutical indus-

try sponsorship and research outcome and quality: systematic review.

BMJ 2003;326:1167–70.

[27] Melander H, Ahlqvist-Rastad J, Meijer G, Beermann B. Evidence

b(i)ased medicine—selective reporting from studies sponsored by

pharmaceutical industry: review of studies in new drug applications.

BMJ 2003;326:1171–3.

[28] Sterne J, Gavaghan D, Egger M. Publication and related bias in meta-

analysis: power of statistical tests and prevalence in the literature.

J Clin Epidemiol 2000;53:1119–29.

[29] Sterne JM, Simes RJ. Publication bias: evidence of delayed publica-

tion in a cohort study of clinical research projects. BMJ 1997;315:

640–5.

[30] Turner EH, Matthews AM, Linardatos E, Tell RA, et al. Selective

publication of antidepressant trials and its influence on apparent effi-

cacy. N Engl J Med 2008;358:252–60.

[31] Haynes B. Can it work? Does it work? Is it worth it? BMJ 1999;319:

652–3.

[32] Eddy DM. From theory to practice: principles for making difficult

decisions in difficult times. JAMA 1994;271:1792–8.

[33] Chou R, Huffman LH. Non-pharmacologic therapies for acute and

chronic low back pain: a review of the evidence for an American Pain

Society/American College of Physicians Clinical Practice Guideline.

Ann Intern Med 2007;147:492–504.

[34] Deyo RA. Fads in the treatment of low back pain. N Engl J Med

1991;325:1039–40.

[35] Karppinen J, Malmivaara A, Kurunlahti M, et al. Periradicular infil-

tration for sciatica: a randomized controlled trial. Spine 2001;26:

1059–67.

[36] Ng L, Chaudhary N, Sell P. The efficacy of corticosteroids in perira-

dicular infiltration for chronic radicular pain. A randomized, double-

blind, controlled trial. Spine 2005;30:857–62.

[37] Riew K, Yin Y, Gilula L, et al. The effect of nerve-root injections on

the need for operative treatment of lumbar radicular pain. A prospec-

tive, randomized, controlled, double-blind study. J Bone Joint Surg

2000;82-A:1589–93.

[38] Thomas E, Cyteval C, Abiad L, Picot MC, et al. Efficacy of transfor-

aminal versus interspinous corticosteroid injections in discal radicu-

lalgia—a prospective, randomized, double-blind study. Clin

Rheumatol 2003;22:299–304.

[39] Kolsi I, Delecrin J, Berthelot JM, et al. Efficacy of nerve root versus

interspinous injections of glucocorticoids in the treatment of disk-re-

lated sciatica. A pilot, prospective, randomized, double-blind study.

Joint Bone Spine 2000;67:113–8.

[40] Chou R, Fu R, Huffman LH, Korthuis PT. Initial highly-active anti-

retroviral therapy with a protease inhibitor versus a non-nucleoside

reverse transcriptase inhibitor: discrepancies between direct and indi-

rect meta-analysis. Lancet 2006;368:1503–15.

[41] Glenny AM, Altman DG, Song F, et al. Indirect comparisons of com-

peting interventions. Health Technol Assess 2005;9:1–148.

[42] Ackerman WE 3rd, Ahmad M. The efficacy of lumbar epidural ste-

roid injections in patients with lumbar disc herniations. Anesth Analg

2007;104:1217–22.

[43] Carette S, Marcoux S, Truchon R, et al. A controlled trial of cortico-

steroid injections into facet joints for chronic low back pain. N Engl J

Med 1991;325:1002–7.

[44] Lilius G, Lassonen AM, Myllynen P, et al. The lumbar facet joint

syndrome—significance of inappropriate signs. A randomized, pla-

cebo-controlled trial. French J Orthop Surg 1989;3:479–86.

[45] Bogduk N. Diagnosing lumbar zygapophysial joint pain. Pain Med

2005;6:30–3.

[46] Kalichman L, Li L, Kim DH, et al. Facet joint osteoarthritis and low

back pain in the community-based population. Spine 2008;33:

2560–5.

[47] Gallagher J, Petriccione D, Wedley J, et al. Radiofrequency facet

joint denervation in the treatment of low back pain: a prospective

controlled double-blind study to assess its efficacy. Pain Clinic

1994;7:193–8.

[48] Leclaire R, Fortin L, Lambert R, et al. Radiofrequency facet joint de-

nervation in the treatment of low back pain: a placebo-controlled

clinical trial to assess efficacy. Spine 2001;26:1411–6.

[49] van Kleef M, Barendse G, Kessels A, et al. Randomized trial of ra-

diofrequency lumbar facet denervation for chronic low back pain.

Spine 1999;24:1937–42.

[50] van Wijk R, Geurts J, Wynne H, et al. Radiofrequency denervation of

lumbar facet joints in the treatment of chronic low back pain: a ran-

domized, double-blind, sham lesion-controlled trial. Clin J Pain

2005;21:335–44.

[51] Tekin I, Mirzai H, Ok G, Erbuyun K, Vatansever D. A comparison of

conventional and pulsed radiofrequency denervation in the treatment

of chronic facet joint pain. Clin J Pain 2007;23:524–9.

[52] Nath S, Nath C, Pettersson K. Percutaneous lumbar zygapophysial

(facet) joint neurotomy using radiofrequency current, in the manage-

ment of chronic low back pain. Spine 2008;33:1291–7.

[53] Oh WS, Shim JC. A randomized controlled trial of radiofrequency

denervation of the ramus communicans nerve for chronic discogenic

low back pain. Clin J Pain 2004;20:55–60.

[54] Pauza KJ, Howell S, Dreyfuss P, et al. A randomized, placebo-con-

trolled trial of intradiscal electrothermal therapy for the treatment

of discogenic low back pain. Spine J 2004;4:27–35.

[55] Freeman BJ, Fraser RD, Cain CM, et al. A randomized, double-blind,

controlled trial: intradiscal electrothermal therapy versus placebo for

the treatment of chronic discogenic low back pain. Spine 2005;30:

2369–77.

[56] Alderson P. Absence of evidence is not evidence of absence. BMJ

2004;328:476–7.

[57] Khot A, Bowditch M, Powell J, Sharp D. The use of intradiscal ste-

roid therapy for lumbar spinal discogenic pain: a randomized con-

trolled trial. Spine 2004;29:833–6.

689R. Chou / The Spine Journal 9 (2009) 679–689

[58] Simmons JW, McMillin JN, Emery SF, Kimmich SJ. Intradiscal ste-

roids. A prospective double-blind clinical trial. Spine 1992;17(Suppl

6):S172–5.

[59] Buttermann GR. The effect of spinal steroid injections for degenera-

tive disc disease. Spine J 2004;4:495–505.

[60] Luukkainen R, Wennerstrand P, Kautiainen H, et al. Efficacy of peri-

articular corticosteroid treatment of the sacroiliac join in non-spondy-

loarthropathic patients with chronic low back pain in the region of the

sacroiliac joint. Clin Exp Rheumatol 2002;20:52–4.

[61] Hopayian K. The need for caution in interpreting high quality sys-

tematic reviews. BMJ 2001;323:681–4.

[62] Andersson GBJ, Mekhail NA, Block JE. Treatment of intractable disco-

genic low back pain. A systematic review of spinal fusion and intradiscal

electrothermal therapy (IDET). Pain Physician 2006;9:237–48.

[63] Appleby D, Andersson G, Totta M. Meta-analysis of the efficacy and

safety of intradiscal electrothermal therapy (IDET). Pain Med

2006;7:308–16.

[64] Gibson J, Waddell G. Surgery for degenerative lumbar spondylosis:

updated Cochrane Review. Spine 2005;30:2312–20.

[65] Urrutia G, Kovacs F, Nishishinya MB, Olabe J. Percutaneous thermo-

coagulation intradiscal techniques for discogenic low back pain.

Spine 2007;32:1146–54.

[66] Hooten WM, Martin DP, Huntoon MA. Radiofrequency neurotomy

for low back pain: Evidence-based procedural guidelines. Pain Med

2005;6:129–38.

[67] Moher D, Cook DJ, Eastwood S, et al. Improving the quality of re-

ports of meta-analyses of randomised controlled trials: the QUOROM

statement. Lancet 1999;354:1896–900.

[68] van Tulder M, Furlan AD, Bombardier C, et al. Updated method

guidelines for systematic reviews in the Cochrane Collaboration Back

Review Group. Spine 2003;28:1290–9.

[69] Laine C, Horton R, DeAngelis CD, et al. Clinical trial registra-

tion—looking back and moving ahead. N Engl J Med 2007;356:

2734–6.


Recommended