Date post: | 30-Nov-2016 |
Category: |
Documents |
Upload: | roger-chou |
View: | 212 times |
Download: | 0 times |
The Spine Journal 9 (2009) 679–689
Same trials, different conclusions: sorting out discrepancies betweenreviews on interventional procedures of the spine
Roger Chou, MDa,b,*aDepartment of Medicine, 3181 SW Sam Jackson Park Road, Mail code BICC, Portland, OR 97239, USA
bDepartment of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, OR, USA
Received 5 January 2009; accepted 8 May 2009
COMMENTARY ON: Levin JH. Prospective,
DOI of original ar
FDA drug/device
Author disclosures:
* Corresponding
Jackson Park Road, M
E-mail address: c
1529-9430/09/$ – see
doi:10.1016/j.spinee.2
double-blind, randomized placebo-controlled trialsin interventional spine: what the highest quality literature tells us. Spine J 2009;9:690–703 (in thisissue).
Should clinicians recommend interventional spine proce-dures for patients with back pain, and if so, which proceduresand in which patients? These are vexing questions that mustbe faced by anyone who manages patients with back pain. It islogical to assume that clinical trials on the utility of variousinterventional spine procedures should help answer thesequestions. It is also logical to assume that systematic re-views—a specific type of review article that uses methodsto insure comprehensiveness, reduce bias, and enhance trans-parency—should be the best way to put together the evidencefrom all of the trials and generate valid conclusions about theutility of particular interventions [1]. Yet clinical trials oftenreport discordant results, and different systematic reviews ofthe same interventional procedure frequently offer contradic-tory conclusions [2,3]. For many clinicians, the end result ofall this evidence on interventional spine procedures is morerather than less confusion.
In this issue of the Spine Journal, a review article byLevin evaluates the evidence for various interventionalspine procedures based on results of randomized, double-blind, placebo-controlled trials [4]. I led a contemporaneousreview on lumbar spine interventional procedures commis-sioned by the American Pain Society (APS) that includedmany of the same trials [5]. Because a number of conclu-sions differ between the two reviews, they provide anopportunity to examine how researchers that ostensiblyaddress the same questions, using the same evidence base,can reach discordant conclusions. Clinicians need to
ticle: 10.1016/j.spinee.2008.06.447.
status: not applicable.
RC (research support from the American Pain Society).
author. Department of Medicine, 3181 SW Sam
ail code BICC, Portland, OR 97239, USA.
[email protected] (R. Chou)
front matter � 2009 Elsevier Inc. All rights reserved.
009.05.003
understand how and why these differences occur to selectthe most appropriate review to guide their clinical decisionmaking. This commentary focuses on interventions for low(lumbar) back pain, the subject of the APS review—thoughsimilar principles can be applied to the cervicalinterventions also covered by the Levin review.
How do the conclusions of the reviews differ?
In general, the Levin review came to more positive or fa-vorable conclusions regarding beneficial effects of variouslumbar interventional therapies compared with the APSreview (Table 1). In three cases (denervation for presumedfacet joint or discogenic pain, and intradiscal electrother-mal therapy for presumed discogenic pain), the Levinreview concluded that the interventional procedure is supe-rior to placebo, though these conclusions were sometimesqualified by or limited to specific patients or interventionaltechniques. The APS review, on the other hand, found in-sufficient evidence to determine whether these intervention-al procedures are associated with benefits. For epiduralsteroid injections, a similar discrepancy was present whenconclusions were limited to trials of the transforaminalapproach for acute or subacute radicular pain (the focusof the Levin review). In two cases (facet joint corticosteroidinjection for presumed facet joint pain and intradiscal ste-roid injection for presumed discogenic pain), Levin con-cluded that there is insufficient evidence to determinebenefits, but the APS review concluded that each procedureis not beneficial. Both reviews found insufficient evidenceto determine benefits of sacroiliac joint injection for non-spondyloarthropathic, presumed sacroiliac pain, thoughLevin concluded that it is effective for spondyloarthro-pathic pain (not addressed by the APS review).
Table 1
Main conclusions from two contemporaneous reviews on lumbar interventional therapies
Intervention Main conclusions, Levin review [4]
Main conclusions, American Pain Society
review [5]
Transforaminal epidural corticosteroid injection
for acute/subacute radicular pain
Beneficial at short-term and possibly long-term
follow-up for pain and for preventing future
surgeries
Insufficient evidence to determine benefits
Facet joint injection for presumed facet joint
pain
Insufficient evidence to determine benefits Not beneficial
Radiofrequency denervation for presumed facet
joint pain
Beneficial (using specific technique) Insufficient evidence to determine benefits
Radiofrequency denervation for presumed
discogenic back pain
Beneficial Insufficient evidence to determine benefits
Intradiscal electrothermal therapy for presumed
discogenic back pain
Beneficial (in selected patients) Insufficient evidence to determine benefits
Intradiscal corticosteroid injection for presumed
discogenic back pain
Insufficient evidence to determine benefits Not beneficial
Sacroiliac joint corticosteroid injection for
presumed sacroiliac pain
Insufficient evidence to determine benefits for
non-spondyloarthropathic back pain;
beneficial for spondyloarthropathic back pain
Insufficient evidence to determine benefits
for non-spondyloarthropathic back pain
(spondyloarthropathic back pain)
680 R. Chou / The Spine Journal 9 (2009) 679–689
These discrepancies have important clinical implica-tions. Based on the APS review, facet joint and intradiscalsteroid injection should probably not be offered, butaccording to the Levin review, they may or may not beindicated. Transforaminal epidural steroid injection, radio-frequency denervation, and intradiscal electrothermal
Table 2
Quality assessment of systematic reviews
Criterion
Was an ‘‘a priori’’ design provided? The research question and inclusion criteria
before the conduct of the review
Was there duplicate study selection and data extraction? There should be at leas
extractors and a consensus procedure for disagreements should be in place
Was a comprehensive literature search performed? At least two electronic sourc
The report must include years and databases used. Key words and/or MESH te
where feasible the search strategy should be provided. All searches should be
consulting current contents, reviews, textbooks, specialized registers, or expert
of study, and by reviewing the references in the studies found
Were all relevant studies included regardless of publication type or status? The a
they searched for reports regardless of their publication type. The authors shou
they excluded any reports (from the systematic review), based on their public
etc.
Was a list of studies (included and excluded) provided? A list of included and e
reasons for exclusion) should be provided
Were the characteristics of the included studies provided? In an aggregated form
from the original studies should be provided on the participants, interventions
Was the scientific quality of the included studies assessed and documented? ‘‘A
assessment should be provided.
Was the scientific quality of the included studies used appropriately in formulati
results of the methodological rigor and scientific quality should be considered
conclusions of the review, and explicitly stated in formulating recommendatio
Were the methods used to combine the findings of studies appropriate? ‘‘A prior
explicitly stated for synthesizing the findings of studies that take into account
and quality of studies; inconsistency between studies; and magnitude of bene
should be used to formulate conclusions.
Was the likelihood of publication bias assessed?
Were conflicts of interest reported?
Criteria adapted from the Assessment of Multiple Systematic Reviews (AMS
therapy are proven therapies according to the Levin review,but not according to the APS review. Fortunately, selectingwhich review to trust need not be arbitrary decision, as it isusually possible to determine whether the conclusions ofa systematic review are sound based on a carefulexamination of its methods [6].
Levin review [4]
American Pain Society
review [5]
should be established No Yes
t two independent data No Yes
es should be searched.
rms must be stated and
supplemented by
s in the particular field
No (one electronic
source)
Yes
uthors should state that
ld state whether or not
ation status, language,
Can’t answer No (limited to English
language and
published articles)
xcluded studies (and No Yes
such as a table, data
, and outcomes.
No Yes
priori’’ methods of No Yes
ng conclusions? The
in the analysis and the
ns.
No Yes
i’’ methods should be
the type, number, size,
fits and these methods
No Yes
No No
No Yes
TAR) instrument (Shea et al., 2007 [8]).
Table 3
Method for grading the overall strength of the evidence for an intervention from an American Pain Society [5] review
Grade Definition
Good Evidence includes consistent results from well-designed, well-conducted studies in representative populations that directly assess effects on health
outcomes (at least two consistent, higher-quality trials).
Fair Evidence is sufficient to determine effects on health outcomes, but the strength of the evidence is limited by the number, quality, size, or consistency
of included studies; generalizability to routine practice; or indirect nature of the evidence on health outcomes (at least one higher-quality trial of
sufficient sample size; two or more higher-quality trials with some inconsistency; at least two consistent, lower-quality trials, or multiple consistent
observational studies with no significant methodological flaws).
Poor Evidence is insufficient to assess effects on health outcomes because of limited number or power of studies, large and unexplained inconsistency
between higher-quality trials, important flaws in trial design or conduct, gaps in the chain of evidence, or lack of information on important health
outcomes
681R. Chou / The Spine Journal 9 (2009) 679–689
Why do the conclusions of the reviews differ?
Assuming that two reviews address the same clinicalquestion and evaluate roughly the same evidence, discrep-ancies can often be explained by differences in methodolog-ical quality [6]. The methodological quality of review articlesis critical because lower-quality reviews tend to report morepositive conclusions regarding benefits of interventions com-pared with higher-quality reviews [2,7]. A number ofmethods are available to assess whether the review usedmethods to enhance comprehensiveness and transparencywhile minimizing bias, error, and subjectivity in how studieswere identified and analyzed [7,8]. Although details of qual-ity rating methods differ, all include criteria related to use ofcomprehensive search strategies, application of predefinedinclusion/exclusion criteria, appropriate assessments ofstudy quality, and appropriate methods for synthesizing evi-dence and generating conclusions. Newer instruments (suchas the Assessment of Multiple Systematic Reviews, or AM-STAR tool) incorporate additional criteria more recently un-derstood to be important, such as use of dual review to selectstudies and abstract data, assessment of publication bias, andreporting of conflicts of interest (Table 2) [8].
Both the Levin and the APS review may be consideredto be systematic in the sense that each performed searcheson electronic databases to identify studies, focused on high-er-quality evidence (randomized trials), and generated
Table 4
Definitions for estimating magnitude of effects from an American Pain Society
Magnitude of effect Definition
Large/substantial
Pain scales: Mean 5- to 10-point improveme
Back-specific functional status: Mean 5- to 1
All outcomes: SMD, 0.2–0.5
Moderate
Pain scales: Mean 10- to 20-point improvem
Back-specific functional status: Mean 10- to
All outcomes: SMD, 0.5–0.8
Small/modest
Pain scales: Mean O20-point improvement o
Back-specific functional status: Mean O20-p
All outcomes: SMD, O0.8
ODI, Oswestry Disability Index; RDQ, Roland-Morris Disability Questionna
conclusions described as evidence based. However, detailsof their review methods otherwise differ dramatically.The Levin review failed to adequately meet any of 11 qual-ity criteria included in the AMSTAR instrument (Table 2).The APS review, on the other hand, met nine criteria. Short-comings of the APS review were that it did not attempt toevaluate for presence of publication bias because too fewtrials were available to perform formal assessments [9],and it limited inclusion to English language and publishedliterature (no placebo-controlled, non-English language tri-als were identified).
The fact that the higher-quality (APS) review (based onAMSTAR criteria) reached less positive conclusions is con-sistent with previous research [2,7]. But what was the rela-tive importance of specific methodologic differences? Inthis case, a critical factor was in how evidence was synthe-sized. The APS review described predefined methods forgrading the evidence for an intervention that incorporatedboth the quality of evidence (based on number, type, size,and quality of studies; presence of inconsistency; and otherfactors) (Table 3) and magnitude of clinical benefit (Table4) [5]. This allows readers to understand how these factorswere used to generate conclusions, how confident to be inthe results, and how much clinical benefit to expect [10].The evidence grades are based on the principle that consis-tent results from a number of higher-quality studies acrossa broad range of populations increase the certainty that the
review [5]
nt on a 100-point VAS or equivalent
0-point improvement on the ODI, 1–2 points on the RDQ, or equivalent
ent on a 100-point VAS or equivalent
20-point improvement on the ODI, 2–5 points on the RDQ, or equivalent
n a 100-point VAS or equivalent
oint improvement on the ODI, O5 points on the RDQ, or equivalent
ire; SMD, standardized mean difference; VAS, visual analog scale.
Table 5
Summary of evidence and main conclusions from systematic reviews of lumbar interventional procedures
Intervention Review
Number of placebo-
controlled trials
(number rated
higher quality)
Placebo-
controlled
trials with
$100 patients
Total number
of trials (placebo
controlled and
active controlled)
Net benefit vs.
placeboa InconsistencybOverall quality
of evidence Comments
Transforaminal
epidural steroid
injection for
acute or
subacute
radiculopathy
APS review [5] 3 (3) 2 6 Unable to determine Yes Poor Inconsistent results from higher-
quality placebo-controlled
trials
Levin review [4] 3 (quality not
assessed)
2 4 Short-term and
possibly long-term
benefits
Explained by
potential
beneficial
effects of
placebos used
in trials
Not graded Interpreted all trials as potentially
positive despite some trials
showing no benefit vs. assessed
placebo
Facet joint steroid
injection for
presumed facet
joint pain
APS review 2 (1) 2 7 No effect No Fair No benefit in two trials
Levin review 2 (quality not
assessed)
2 2 Unable to determine Not assessed Not graded Excluded one lower-quality
negative trial because it did not
use diagnostic blocks to select
patients and downplayed results
of one higher-quality negative
trial
Radiofrequency
denervation for
presumed facet
joint pain
APS review 6 (4) 0 6 Unable to determine Yes Poor Inconsistent results; one higher-
quality trial used an inadequate
technique, another had large
baseline differences in pain
scores
Levin review 4 (quality not
assessed)
0 5 Beneficial Explained by
technical
factors in trials
Not graded Conclusions based on one small
positive placebo-controlled trial
because of technical issues in
other trials
Radiofrequency
denervation for
presumed
discogenic pain
APS review 1 (0) 0 1 Unable to determine
(one trial)
Not applicable Poor The single trial was small and
rated lower quality
Levin review 1 (quality not
assessed)
0 1 Beneficial Not applicable Not graded Conclusions based on one small
positive trial
Intradiscal
electrothermal
therapy for
presumed
discogenic pain
APS review 2 (2) 0 2 Unable to determine Yes Poor Inconsistent results between two
higher-quality trials
Levin review 2 (quality not
assessed)
0 2 Beneficial Explained by
potential
methodological
issues in one of
the trials
Not graded Conclusions based on one small
positive trial
Intradiscal steroid
injection for
presumed
discogenic pain
APS review 3 (1) 2 3 No effect No Good No benefit shown in three trials
Levin review 2 (quality not
assessed)
1 2 Unable to determine Not assessed Not graded Findings of no benefit in the two
trials called into question
because of discography method
used to select patients
68
2R
.C
ho
u/
Th
eS
pin
eJo
urn
al
9(2
00
9)
67
9–
68
9
Sac
roil
iac
join
t
ster
oid
inje
ctio
n
for
pre
sum
ed
sacr
oili
acjo
int
pai
nw
ith
spo
ndy
loar
thro
path
y
AP
Sre
vie
w1
(1)
01
Su
bst
anti
al(o
ne
smal
ltr
ial)
No
tap
pli
cab
leP
oor
Th
eo
nly
avai
lab
letr
ial
eval
uat
ed
aper
iart
icula
rco
rtic
ost
eroid
inje
ctio
n
Lev
inre
vie
w0
No
tap
pli
cab
le0
No
tria
lsN
ot
app
lica
ble
No
evid
ence
No
tria
lo
fp
atie
nt
wit
ho
ut
spo
nd
ylo
arth
rop
ath
yid
enti
fied
aIn
the
AP
Sre
vie
w,d
eter
min
atio
no
fn
etb
enefi
tw
asb
ased
on
evid
ence
show
ing
the
inte
rven
tio
nis
mo
reef
fect
ive
than
pla
ceb
oo
rsh
amth
erap
yfo
ro
ne
or
mo
reo
fth
efo
llow
ing
ou
tco
mes
:p
ain
,fu
nct
iona
l
stat
us,
over
all
imp
rove
men
t,o
rw
ork
stat
us.
Ver
sus
pla
ceb
o,
smal
lb
enefi
td
efine
das
5–
10
po
ints
on
a1
00
-po
int
Vis
ual
An
alo
gS
cale
(VA
S)
for
pai
n(o
req
uiv
alen
t),
1–
2p
oin
tso
nth
eR
ola
nd-
Mo
rris
Dis
abil
ity
Qu
esti
onn
aire
(RD
Q),
5–
10
po
ints
on
the
Osw
estr
yD
isab
ilit
yIn
dex
(OD
I),
or
ast
and
ard
ized
mea
nd
iffe
ren
ce(S
MD
)o
f0
.2–0
.5.
Mo
der
ate
ben
efit
defi
ned
as1
0–
20p
oin
tso
na
VA
Sfo
rp
ain
,2
–5
po
ints
on
the
RD
Q,
10
–2
0p
oin
tso
nth
eO
DI,
or
aS
MD
of
0.5
–0.8
.L
arg
eb
enefi
td
efine
das
O2
0p
oin
tso
na
10
0-p
oin
tV
AS
for
pai
n;O
5p
oin
tso
nth
eR
DQ
,O2
0p
oin
tso
nth
eO
DI,
or
aS
MD
of
O0
.8.
bIn
the
AP
Sre
vie
w,
inco
nsi
sten
cyw
asd
efin
edas
!7
5%
of
tria
lsre
ach
ing
con
sist
ent
con
clu
sio
ns
on
effi
cacy
(no
effe
ctv
s.p
osi
tive
effe
ctco
nsi
der
edin
con
sist
ent)
.
AP
S,
Am
eric
anP
ain
So
ciet
y.
683R. Chou / The Spine Journal 9 (2009) 679–689
results of the studies are true (the entire body of evidencewould be considered ‘‘good quality’’) [11]. For a ‘‘fair-quality’’ body of evidence, results are sufficient to estimatebenefits, but there is uncertainty because results could bethe result of the true effects or affected by biases operatingacross some or all of the studies. There is therefore a greaterlikelihood that future trials could change or overturn con-clusions. For a ‘‘poor-quality’’ body of evidence, there istoo much uncertainty to form reliable conclusions. TheAPS review used a relatively low threshold to define a bodyof evidence as fair quality: at least one higher quality ofsufficient sample size, two or more higher-quality trialswith some inconsistency, or at least two lower-quality trialswith consistent results.
The Levin review, on the other hand, did not formallyassess the internal validity (quality) of included trials.The ‘‘flaws’’ discussed in the Levin review frequently referto issues related to external validity (factors affecting whichpopulations, interventions, and outcomes a trial are likely toapply to [12]), rather than to factors that affect internal val-idity, or the risk of bias (systematic errors that favor oneconclusion over another) [13]. In addition, the Levin reviewdid not describe methods used to synthesize evidence andgenerate conclusions. This is problematic, particularly be-cause it frequently concluded that interventional proceduresare beneficial based on results of one small trial. Suchevidence is not reliable for guiding clinical decision mak-ing. As stated by Egger and Davey Smith nearly 15 yearsago, ‘‘Several medium-sized trials of high quality seemnecessary to render results trustworthy [14].’’ Sparse evi-dence from small trials result in imprecise estimates, aremore subject to publication bias, may not be generalizableto other populations and settings, and are often overturnedby subsequent studies [15]. Several conclusions in theLevin review were also made despite the presence of incon-sistency between trials (ie, some trials reported benefits ofan interventions but others did not). This runs counter to anintegral principle of scientific inquiry—the independentreproducibility of research findings [15]. If beneficial re-sults of a trial can’t be reliably replicated in tightly man-aged trial settings, there is little reason to expectpredictable benefits in the far messier world of clinicalpractice.
In addition to not describing methods for synthesizingevidence and generating conclusions, the Levin review alsoapproached trials differently depending on whether theyreported positive (the intervention was statistically superiorto placebo) or negative (no statistically significant differ-ence) results. Specifically, it states that ‘‘in the interpreta-tion of medical literature, the design of negative studiesdeserves closer evaluation than that of positive studies[4].’’ Given that randomization was successful, resultsmet standard statistical significance thresholds, and anappropriate control intervention was used, Levin goes onto assert that ‘‘.positive results are positive. Negativeresults, however, require greater scrutiny to determine if
684 R. Chou / The Spine Journal 9 (2009) 679–689
the treatment is truly ineffective.’’ Following this approach,Levin rejected or downplayed several negative trials whenformulating conclusions.
For the sake of this commentary, we will use the term‘‘negative study’’ as defined by Levin, despite the long-standing suggestion that it be abandoned because it impliesthat the study has shown that there is no difference, whereasusually all that has been demonstrated is an absence ofevidence of a difference [16]. In addition, the word ‘‘nega-tive’’ has pejorative connotations, implying that the studydoes not have anything positive to contribute. In fact, so-called negative studies can provide very useful scientificevidence about what may not work [17,18]. Studies withinadequate statistical power (which can result in Type IIerror, or finding of no difference when a difference in factexists) can result in false negative results [19], but it is inap-propriate to categorically dismiss their results. Rather, pointestimates and confidence intervals should be examined tojudge whether it is likely that enhancing statistical powerwould result in a positive result. In addition, one of the pur-poses of systematic reviews is to enhance statistical powerby looking at multiple trials—so even small studies can con-tribute information. Similarly, improper selection of pa-tients, interventional techniques, or controls can providenegative results with questionable generalizability [12],but that are nonetheless true (have high internal validity) un-der the conditions of the trial.
From a conceptual standpoint, a broader issue is that the ap-plication of a low threshold to reject results of negative trialswhile accepting positive trials largely on facevalue is an unbal-anced approach to critical appraisal that is inconsistent withevidence that falsely positive trials are in fact very common[15]. Even large, highly cited positive randomized trials reportresults that are stronger than or contradicted by subsequent tri-als with disturbing frequency [20]. The possibility of a chancefinding of a difference (Type I error, reflected by the p value) isjust one of a number of factors known to be associated withspuriously positive results or inflated estimates of benefit.Methodological shortcomings such as inadequate randomiza-tion or allocation concealment, inadequate blinding, presenceof unequal or high attrition, differential use of cointerventions,and failure to perform intention-to-treat analysis all increasethe risk of bias [21,22]. In fact, such flaws have stronger effectsin trials that assess subjective outcomes such as pain, com-pared with trials that assess more objective outcomes [23]. Nu-merous studies have also shown that publication, selectiveoutcomes reporting, and other related biases (often related tofinancial or other conflicts of interest) are common and can se-riously distort conclusions of systematic reviews [24–28]. Onestudy of nearly 750 clinical trials submitted to ethics commit-tees found that those reporting positive results were more thanthree times as likely as negative trials to be published [29]. An-other study found that of 74 trials on antidepressants submittedto the FDA, 37 of 38 positive trials were published, but only 3of 36 negative trials [30]. Positive trials also are just as subjectto issues related to generalizability as negative trials, as they
frequently evaluate highly selected patients in specializedsettings [12,31].
Positive trials also warrant at least as much scrutiny asnegative trials because their clinical implications are fargreater [32]. For a truly effective treatment, the cost ofrequiring more research is a delay, not permanent abandon-ment. Spinal interventional procedures are typically offeredelectively, do not provide more than moderate average ben-efits in even the most positive trials, and a number of alter-natives supported by relatively strong evidence [33] areavailable for most patients with back pain. The conse-quences of a negative trial—to not offer an intervention,to consider an alternative therapy, or to wait for additionalresearch—are relatively low risk to patients. On the otherhand, if an ineffective treatment is adopted based on flawedor limited evidence, patients are exposed to all of its atten-dant harms, costs, and burdens. Furthermore, the necessaryresearch may never be conducted, and may even be labeledby proponents as unethical. This is not just a theoreticalconcern, as there are a number of historical examples oftreatments adopted for low back pain based on weak evi-dence, only to be abandoned later when it became clear thatthey were not beneficial, or even harmful [34].
Given the large differences in methods used to assess andsynthesize the evidence, it is not surprising that the two re-views reached different conclusions. Table 5 summarizeshow various factors were used by each review to synthesizeevidence for different lumbar interventional therapies.
Transforaminal epidural steroid injection for acute/subacute radicular pain
Both reviews included three trials of a transforaminalepidural steroid versus a transforaminal placebo (saline orlocal anesthetic) injection [35–37]. All three trials wererated higher quality by the APS review. The Levin reviewalso included a lower-quality trial of a transforaminal ver-sus interlaminar epidural steroid injection [38]. It con-cluded that transforaminal epidural steroid injection isbeneficial for short-term and possibly long-term pain eventhough both placebo-controlled trials [35,36] that reportedlong-term pain found no benefit (one trial found the placeboinjection to be superior) and only one [35] of the two trialsreported short-term benefit [4]. This interpretation is justi-fied in part by the assertion that negative results in someof the placebo-controlled trials may not be truly negative,because transforaminal placebo injections may have hadbeneficial effects, even though there are no placebo-controlled trials of transforaminal epidural saline or localanesthetic injection versus a nonepidural placebo to supportthis assumption. The Levin review also used the active-controlled trial to support the conclusion that transforami-nal epidural steroid injections are superior to placebo[38]. However, inclusion of this trial is problematic.Although inclusion and exclusion criteria were not clearlystated by the Levin review, its title indicates that inclusion
685R. Chou / The Spine Journal 9 (2009) 679–689
should have been restricted to placebo-controlled trials. Theinclusion of this active-controlled trial appears arbitrary,especially because a negative active-controlled trial oftransforaminal epidural steroid injections was excluded[39]. Furthermore, inferences regarding the relative efficacyof transforaminal epidural steroid injection compared withplacebo from this trial were based on indirect reasoning (ie,tranforaminal epidural steroid injection is superior to inter-laminar epidural steroid injection and interlaminar epiduralsteroid injection is superior or equivalent to placebo, there-fore transforaminal epidural steroid injection is superior toplacebo). Such reasoning seems logical, but can in fact bequite misleading, particularly if the critical assumptionregarding similarity of treatment effects across all trials isnot met [40,41]. For this reason, indirectness is routinelyconsidered a reason to downgrade evidence [10].
The APS review found insufficient evidence to deter-mine benefits of transforaminal epidural steroid injectionsfor pain relief because of inconsistent short-term resultsand lack of long-term benefit in two placebo-controlled tri-als [35,36]. It also found insufficient evidence to reliablydetermine effects on subsequent surgery rates, as onlyone small (n555), placebo-controlled trial assessed thisoutcome [37]. Active-controlled trials that compared thetransforaminal to other approaches were reviewed but didnot affect conclusions, as they were small and mostly lowerquality, with inconsistent results [38,39,42].
Facet joint steroid injection for presumed facetjoint pain
The APS review found fair evidence that facet jointsteroid injections are not beneficial for presumed facet jointpain, based on two placebo-controlled trials of facet jointsteroid injection (one rated higher quality [43] and onerated lower quality [44]) that did not show short-termbenefits. The Levin review found insufficient evidence todetermine efficacy. One reason for the discrepancy is thatthe Levin review excluded the lower-quality trial becausediagnostic facet joint blocks were not used to select patients[44]. It also downplayed the negative results of the higher-quality trial [43] because a single rather than controlled di-agnostic block was used to select patients, 2 cc rather than1 cc of lidocaine were used for the injection, and 16% ofjoints were not successfully injected.
The decision to exclude the lower-quality trial was basedon the assumption that clinical methods are inadequate todiagnose facet joint pain, and diagnostic facet blocks arenecessary. However, no reliable evidence exists to estimatethe sensitivity, specificity, or clinical utility of diagnosticblocks [45]. It is impossible to calculate sensitivity andspecificity because not only is the correlation between diag-nostic facet joint blocks and imaging findings variable,there is in fact no reliable reference standard for identifica-tion of ‘‘true’’ facet joint pain [46]. Furthermore, no studieshave shown that use of facet joint blocks (controlled or
uncontrolled) to guide choice of therapy improves subse-quent clinical outcomes, compared with choosing therapybased on other criteria. The assumption that controlled fac-et joint blocks are more accurate than uncontrolled blocksis also unproven. Although use of controlled diagnostic fac-et joint blocks results in fewer positive results comparedwith uncontrolled blocks, it is unknown what proportionis due to fewer true positives (leading to lower sensitivity)versus fewer false positives (leading to higher specificity).This is important because changes in sensitivity and speci-ficity both affect the likelihood that a positive test is trulyassociated with disease (the positive likelihood ratio, orsensitivity/1�specificity). There is also no evidence thatuse of a smaller amount of injectate or a marginally in-creased rate of successful injections is associated withgreater, clinically relevant benefits.
Even if the APS review excluded the lower-quality trialbecause it did not use diagnostic facet joint blocks to selectpatients, its conclusions would not change, as there wouldstill exist one higher-quality trial [43] with greater than100 patients showing no benefit, meeting criteria for fair-quality evidence.
Radiofrequency denervation for presumed facetjoint pain
Both reviews found inconsistent results from four placebo-controlled trials [47–50] (three rated higher quality [48–50]by the APS review) of radiofrequency denervation for pre-sumed facet joint pain. The Levin review also included anactive-controlled trial of continuous versus pulsed radiofre-quency denervation (as in the case of transforaminal epiduralinjections, inferences regarding efficacy vs. placebo werebased on indirect reasoning) [51]. None of the trials used con-trolled diagnostic facet joint blocks to select patients, and in-terpretation of results is challenging because some of the trialsmay have used suboptimal techniques. The Levin review con-cluded that radiofrequency is beneficial, largely based ona single small (n530) trial that presumably used the best tech-nique [49]. The APS review, on the other hand, acknowledgedthat some trials may have had poor external validity becausethey used suboptimal ablation technique, but concluded thatthere is insufficient evidence to evaluate beneficial effectsbecause of the inconsistency between higher-quality trials.Even if it accepted the trial that used the superior technique(according to Levin) as the only admissible evidence, conclu-sions of the APS review that evidence is insufficient to esti-mate benefits would be unchanged, because they would bebased on a single, very small trial [49].
The APS review also included another placebo-controlledtrial that was published after the Levin review was conducted[52]. Although this was the only trial to use controlled facetjoint blocks to select patients and a radiofrequency ablationtechnique believed to be optimal, it did not change its conclu-sions. Although the trial found radiofrequency denervationmoderately superior to sham treatment for improvement in
686 R. Chou / The Spine Journal 9 (2009) 679–689
generalized, back, and leg pain after 6 months, the differencewas not statistically significant for back pain (the main symp-tom thought to be associated with facet pain). In addition,baseline pain scores in the radiofrequency denervation groupaveraged 1.6 points higher (p!.05 for differences) than in thesham group, which suggests inadequate randomization andcould be associated with differential potential for improve-ment or regression to the mean. In fact, final pain scores inthe two groups were identical.
Radiofrequency denervation for presumed discogenicback pain
The Levin review concluded that radiofrequency dener-vation is beneficial for presumed discogenic back pain,based on one placebo-controlled trial [53]. Based on thesame trial, the APS review concluded that evidence isinsufficient because it was rated lower quality and enrolleda small sample (n549).
Intradiscal electrothermal therapy for presumeddiscogenic back pain
The Levin review concluded that intradiscal electrother-mal therapy is beneficial for presumed discogenic back pain,largely based on one small (n564) positive trial [54]. Itdownplayed the results of a second, negative trial, citing in-adequate statistical power, possible baseline differences,and inadequate discography criteria [55]. As in the case of di-agnostic facet joint blocks, however, the accuracy and clini-cal utility of different discography criteria is not established.In addition, clinically significant baseline differences did notin fact appear to be present in this trial, as baseline Low BackOutcome Scores, Oswestry Disability Index score, and otheroutcome measures were almost identical [55]. Statisticalpower is unlikely to have been an important issue in this trial,as it enrolled almost as many patients (n557) as the positivetrial. Furthermore, interpretation of potentially inadequatestatistical power should include an examination of the pointestimates and confidence intervals reported by the trial[56]. In this case, there were essentially no differences inpoint estimates for any outcome (some even slightly favoredthe placebo group), with small or trivial maximum benefitsaccording to the upper limits of confidence interval bound-aries. Type II error due is typically a concern when there isa nonstatistically but clinically significant trend in favor ofone group. In this case, enhancement of statistical powerwould only result in the finding of clinically significant ben-efits if the 18 additional patients recruited into the trial tomeet the original sample size goal of 75 patients were toexperience substantially better results from intradiscal elec-trothermal therapy than the 57 patients already evaluated.This could happen, but is quite unlikely.
The APS review found insufficient evidence to reliablyevaluate benefits because of inconsistency between thetwo small, higher-quality trials [54,55].
Intradiscal steroid injection for presumeddiscogenic pain
The Levin review included two negative trials of intradis-cal steroid injection for presumed discogenic pain, but con-cluded that evidence is insufficient to evaluate benefits[57,58]. As in the case of intradiscal electrothermal therapy,it downplayed negative results largely based on use of inad-equate discography criteria to select patients and also citedlong-term follow-up as an inappropriate time interval to as-sess outcomes. The APS review included these two trialsand a third trial [59] not included in the Levin review. Onetrial was rated higher quality [57]. Based on consistent resultsfrom the three trials (two enrolled more than 100 subjects), itconcluded that there is good evidence of no benefit at any fol-low-up period.
Sacroiliac steroid injection for presumed sacroiliac pain
The APS review included one small (n524), higher-quality trial of a periarticular steroid injection for presumedsacroiliac pain not associated with spondyloarthropathy[60]. Based on the small sample size, it concluded thatthere is insufficient evidence to reliably evaluate benefits.This trial was not included in the Levin review.
Discussion
Conclusions of systematic reviews that address the sameinterventions and are largely based on the same evidencecan differ in ways that have important implications for clin-ical decisions [3,61]. This is often a source of confusionand frustration, as systematic reviews are supposed to bringmore objectivity and scientific rigor to the review process[1]. However, these discrepancies need not be a mystery,as readers can often determine for themselves how andwhy these differences occurred [6]. In this case, the morepositive conclusions of the Levin review can be accountedfor by several major factors [4]. First, it did not meetcurrent standards for conducting systematic reviews toincrease comprehensiveness, enhance transparency, andreduce bias and error. Second, it accepted very weak evi-dence (one small trial) as sufficient to establish benefitsof interventions. Third, it used an unbalanced approach tonegative compared with positive trials. In several cases, thisresulted in rejection of negative trials (and therefore of in-consistency between trials) based on unproven assumptionsor tenuous chains of logic, and reluctance to conclude thatthere is no evidence of benefit, even when all available tri-als consistently failed to demonstrate benefits.
The Levin and APS reviews are far from the only exam-ples of discordant systematic reviews for interventionalspine procedures. For intradiscal electrothermal therapy,for example, there are only two placebo-controlled trials,yet there are at least four other systematic reviews[62–65]. Among these four studies, the two systematic
687R. Chou / The Spine Journal 9 (2009) 679–689
reviews with more methodological flaws were also the onesthat concluded that intradiscal electrothermal therapy iseffective [62,63]. Unlike the Levin review, which focusedon randomized trials, a critical shortcoming of both of thesesystematic reviews is that conclusions were heavily basedon pooled results of uncontrolled observational studies—
a particularly weak and unreliable form of evidence, andcertainly not capable of resolving inconsistencies betweenwell-conducted randomized trials.
So what should a clinician do? The APS review metmost methodologic standards and used a more balancedapproach to critical appraisal that included predefined andmore stringent evidence thresholds. Clinicians who acceptthe methods of the APS review and the parameters usedto grade evidence should not offer facet joint injectionand intradiscal steroid injection, as the best currently avail-able evidence failed to demonstrate that they improvepatient outcomes, though future research could changethese conclusions. For intradiscal electrothermal therapy,radiofrequency denervation, and sacroiliac joint injection,there is insufficient evidence to draw reliable conclusionsabout benefit. In general, clinicians should prioritize thera-pies (including noninterventional therapies [33]) supportedby higher-quality evidence over those supported by onlyweak evidence. Not offering therapies supported by weakevidence is consistent with the principle that cliniciansshould only recommend interventions with proven benefits.Clinicians who do choose to offer these therapies shouldreserve them for patients with at least moderately severesymptoms despite trials of alternative therapies supportedby stronger evidence. In such cases, patients always needto be clearly informed about the substantial uncertaintiesregarding potential benefits and harms. Decisions regardingtransforaminal epidural injections are less straightforward.Based on evidence for epidural steroid injections in general,clinicians may consider them as an option for short-termbenefits, but there is inconsistency among higher-qualitytrials (with some showing no benefits), and there is insuffi-cient evidence to determine whether the transforaminalapproach is superior to the interlaminar approach [5].
How can we move beyond a state of uncertainty for mostinterventional therapies? In short, we need more and bettertrials. It is time to leave behind the practice of adopting in-terventional therapies based on sparse or seriously inconsis-tent evidence. There is no reason to believe that we shouldaccept lower standards of evidence for interventional spineprocedures than for other medical interventions, and doingso is not scientifically credible given all that we know aboutthe vagaries of evidence [15]. Levin raises a number ofintriguing hypotheses about the accuracy of different diag-nostic methods and the relative efficacy of differentinterventional techniques or methods [4]. Until thesehypotheses are tested, however, all inferences about howthey might have affected trial results remain speculative.If controlled diagnostic facet joint blocks or provocativediscography using specific criteria are thought to be
essential to accurately select patients for appropriate proce-dures, trials that use these methods should be conducted.Similarly, trials that use radiofrequency denervation tech-niques that are thought to be optimal [66] and are designedto minimize bias are needed. Trials that compare epiduralsaline or local anesthetic injection versus a dry epiduralor soft-tissue needlestick could help determine whetherthey have therapeutic value, to guide appropriate choicesfor interventions and placebos in future trials.
Enthusiasm about benefits of interventional proceduresbased on positive results from single trials should is almostalways premature, especially when sample sizes are small,effects are moderate, or there are methodological shortcom-ings. Additional confirmatory trials are almost always nec-essary to establish the clinical benefits of interventions, byshowing that results can be replicated with some degree ofcertainty. Systematic reviews of the interventional spine lit-erature can be very valuable to clinicians and policymakerswhen conducted and reported according to published stan-dards [8,67,68]. A current weakness of systematic reviewsof interventional spine procedures is that it is very difficultto assess publication bias using statistical and graphicalmethods because of small numbers of trials and diversityin reported outcomes [9]. However, increased use of trialregistries could help in the future detection—and preferablyprevention—of publication, outcomes, and other relatedbiases [69].
References
[1] Mulrow CD. Rationale for systematic reviews. BMJ 1994;309:597–9.
[2] Furlan AD, Clarke J, Esmail R, Sinclair S, Irvin E, Bombardier C. A
critical review of reviews on the treatment of chronic low back pain.
Spine 2001;26:E155–62.
[3] Hopayian K, Mugford M. Conflicting conclusions from two system-
atic reviews of epidural steroid injections for sciatica: which evidence
should general practitioners heed? Br J Gen Pract 1999;49:57–61.
[4] Levin JH. Prospective, double-blind, randomized placebo-controlled
trials in interventional spine: what the highest quality literature tells
us. 2009;9:690–703.
[5] Chou R, Atlas S, Stanos S, Rosenquist R. Nonoperative intervention-
al therapies for low back pain: a review of the evidence for an Amer-
ican Pain Society clinical practice guideline. Spine 2009;34:1078–93.
[6] Jadad AR, Cook DJ, Browman GP. A guide to interpreting discordant
systematic reviews. CMAJ 1997;156:1411–6.
[7] Oxman AD, Guyatt GH. Validation of an index of the quality of
review articles. J Clin Epidemiol 1991;44:1271–8.
[8] Shea BJ, Grimshaw JM, Wells GA, et al. Development of AMSTAR:
a measurement tool to assess the methodological quality of system-
atic reviews. BMC Med Res Methodol 2007;7:10.
[9] Sterne JAC, Egger M, Smith GD. Systematic review in health care:
investigating and dealing with publication and other biases in meta-
analysis. BMJ 2001;323:101–5.
[10] Guyatt GH, Oxman AD, Vist GE, et al. GRADE: What is ‘‘quality of
evidence’’ and why is it important to clinicians. BMJ 2008;336:
995–8.
[11] Guyatt GH, Gutterman D, Baumann MH, et al. Grading strength of
recommendations and quality of evidence in clinical guidelines:
report from an American College of Chest Physicians Task Force.
Chest 2006;129:174–81.
688 R. Chou / The Spine Journal 9 (2009) 679–689
[12] Rothwell PM. External validity of randomised controlled trials: ‘‘To
whom do the results of this trial apply?’’. Lancet 2005;365:82–93.
[13] Gluud LL. Bias in clinical intervention research. Am J Epidemiol
2006;163:493–501.
[14] Egger M, Smith GD. Misleading meta-analysis. BMJ 1995;310:
752–4.
[15] Ioannidis JP. Why most published research findings are false. PLos
Med 2005;2:696–701.
[16] Chalmers I. Proposal to outlaw the term ‘‘negative trials’’. BMJ
1985;290:1002.
[17] Connor JT. Positive reasons for publishing negative findings. Am J
Gastroenterol 2008;103:2181–3.
[18] Gluud C. ‘‘Negative trials’’ are positive!. J Hepatol 1998;28:731–3.
[19] Freiman JA, Chalmers TC, Smith H, Kuebler RR. The importance of
beta, the type II error, and sample size in the design and interpretation
or the randomized controlled trial. Survey of 71 ‘‘negative’’ trials. N
Engl J Med 1978;299:690–4.
[20] Ioannidis JPA. Contradicted and initially stronger effects in highly
cited clinical research. JAMA 2005;294:218–28.
[21] Moher D, Jones A, Cook DJ, et al. Does quality of randomised trials
affect estimates of intervention efficacy reported in meta-analyses.
Lancet 1998;352:609–13.
[22] Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of
bias. Dimensions of methodological quality associated with estimates
of treatment effects in controlled trials. JAMA 1995;273:408–12.
[23] Wood Le M, Gluud LL, Schulz KF, et al. Empirical evidence of bias in
treatment effect estimates in controlled trials with different interven-
tions and outcomes: meta-epidemiological study. BMJ 2008;336:
601–6.
[24] Chan A-W, Hrobjartsson A, Haahr MT, et al. Empirical evidence for
selective reporting of outcomes in randomized trials: comparison of
protocols to published articles. JAMA 2004;291:2457–65.
[25] Kjaergard LL, Als-Nielsen B. Association between competing inter-
ests and authors’ conclusions: epidemiological study of randomised
clinical trials published in the BMJ. BMJ 2002;325:249.
[26] Lexchin J, Bero LA, Djulbegovic B, Clark O. Pharmaceutical indus-
try sponsorship and research outcome and quality: systematic review.
BMJ 2003;326:1167–70.
[27] Melander H, Ahlqvist-Rastad J, Meijer G, Beermann B. Evidence
b(i)ased medicine—selective reporting from studies sponsored by
pharmaceutical industry: review of studies in new drug applications.
BMJ 2003;326:1171–3.
[28] Sterne J, Gavaghan D, Egger M. Publication and related bias in meta-
analysis: power of statistical tests and prevalence in the literature.
J Clin Epidemiol 2000;53:1119–29.
[29] Sterne JM, Simes RJ. Publication bias: evidence of delayed publica-
tion in a cohort study of clinical research projects. BMJ 1997;315:
640–5.
[30] Turner EH, Matthews AM, Linardatos E, Tell RA, et al. Selective
publication of antidepressant trials and its influence on apparent effi-
cacy. N Engl J Med 2008;358:252–60.
[31] Haynes B. Can it work? Does it work? Is it worth it? BMJ 1999;319:
652–3.
[32] Eddy DM. From theory to practice: principles for making difficult
decisions in difficult times. JAMA 1994;271:1792–8.
[33] Chou R, Huffman LH. Non-pharmacologic therapies for acute and
chronic low back pain: a review of the evidence for an American Pain
Society/American College of Physicians Clinical Practice Guideline.
Ann Intern Med 2007;147:492–504.
[34] Deyo RA. Fads in the treatment of low back pain. N Engl J Med
1991;325:1039–40.
[35] Karppinen J, Malmivaara A, Kurunlahti M, et al. Periradicular infil-
tration for sciatica: a randomized controlled trial. Spine 2001;26:
1059–67.
[36] Ng L, Chaudhary N, Sell P. The efficacy of corticosteroids in perira-
dicular infiltration for chronic radicular pain. A randomized, double-
blind, controlled trial. Spine 2005;30:857–62.
[37] Riew K, Yin Y, Gilula L, et al. The effect of nerve-root injections on
the need for operative treatment of lumbar radicular pain. A prospec-
tive, randomized, controlled, double-blind study. J Bone Joint Surg
2000;82-A:1589–93.
[38] Thomas E, Cyteval C, Abiad L, Picot MC, et al. Efficacy of transfor-
aminal versus interspinous corticosteroid injections in discal radicu-
lalgia—a prospective, randomized, double-blind study. Clin
Rheumatol 2003;22:299–304.
[39] Kolsi I, Delecrin J, Berthelot JM, et al. Efficacy of nerve root versus
interspinous injections of glucocorticoids in the treatment of disk-re-
lated sciatica. A pilot, prospective, randomized, double-blind study.
Joint Bone Spine 2000;67:113–8.
[40] Chou R, Fu R, Huffman LH, Korthuis PT. Initial highly-active anti-
retroviral therapy with a protease inhibitor versus a non-nucleoside
reverse transcriptase inhibitor: discrepancies between direct and indi-
rect meta-analysis. Lancet 2006;368:1503–15.
[41] Glenny AM, Altman DG, Song F, et al. Indirect comparisons of com-
peting interventions. Health Technol Assess 2005;9:1–148.
[42] Ackerman WE 3rd, Ahmad M. The efficacy of lumbar epidural ste-
roid injections in patients with lumbar disc herniations. Anesth Analg
2007;104:1217–22.
[43] Carette S, Marcoux S, Truchon R, et al. A controlled trial of cortico-
steroid injections into facet joints for chronic low back pain. N Engl J
Med 1991;325:1002–7.
[44] Lilius G, Lassonen AM, Myllynen P, et al. The lumbar facet joint
syndrome—significance of inappropriate signs. A randomized, pla-
cebo-controlled trial. French J Orthop Surg 1989;3:479–86.
[45] Bogduk N. Diagnosing lumbar zygapophysial joint pain. Pain Med
2005;6:30–3.
[46] Kalichman L, Li L, Kim DH, et al. Facet joint osteoarthritis and low
back pain in the community-based population. Spine 2008;33:
2560–5.
[47] Gallagher J, Petriccione D, Wedley J, et al. Radiofrequency facet
joint denervation in the treatment of low back pain: a prospective
controlled double-blind study to assess its efficacy. Pain Clinic
1994;7:193–8.
[48] Leclaire R, Fortin L, Lambert R, et al. Radiofrequency facet joint de-
nervation in the treatment of low back pain: a placebo-controlled
clinical trial to assess efficacy. Spine 2001;26:1411–6.
[49] van Kleef M, Barendse G, Kessels A, et al. Randomized trial of ra-
diofrequency lumbar facet denervation for chronic low back pain.
Spine 1999;24:1937–42.
[50] van Wijk R, Geurts J, Wynne H, et al. Radiofrequency denervation of
lumbar facet joints in the treatment of chronic low back pain: a ran-
domized, double-blind, sham lesion-controlled trial. Clin J Pain
2005;21:335–44.
[51] Tekin I, Mirzai H, Ok G, Erbuyun K, Vatansever D. A comparison of
conventional and pulsed radiofrequency denervation in the treatment
of chronic facet joint pain. Clin J Pain 2007;23:524–9.
[52] Nath S, Nath C, Pettersson K. Percutaneous lumbar zygapophysial
(facet) joint neurotomy using radiofrequency current, in the manage-
ment of chronic low back pain. Spine 2008;33:1291–7.
[53] Oh WS, Shim JC. A randomized controlled trial of radiofrequency
denervation of the ramus communicans nerve for chronic discogenic
low back pain. Clin J Pain 2004;20:55–60.
[54] Pauza KJ, Howell S, Dreyfuss P, et al. A randomized, placebo-con-
trolled trial of intradiscal electrothermal therapy for the treatment
of discogenic low back pain. Spine J 2004;4:27–35.
[55] Freeman BJ, Fraser RD, Cain CM, et al. A randomized, double-blind,
controlled trial: intradiscal electrothermal therapy versus placebo for
the treatment of chronic discogenic low back pain. Spine 2005;30:
2369–77.
[56] Alderson P. Absence of evidence is not evidence of absence. BMJ
2004;328:476–7.
[57] Khot A, Bowditch M, Powell J, Sharp D. The use of intradiscal ste-
roid therapy for lumbar spinal discogenic pain: a randomized con-
trolled trial. Spine 2004;29:833–6.
689R. Chou / The Spine Journal 9 (2009) 679–689
[58] Simmons JW, McMillin JN, Emery SF, Kimmich SJ. Intradiscal ste-
roids. A prospective double-blind clinical trial. Spine 1992;17(Suppl
6):S172–5.
[59] Buttermann GR. The effect of spinal steroid injections for degenera-
tive disc disease. Spine J 2004;4:495–505.
[60] Luukkainen R, Wennerstrand P, Kautiainen H, et al. Efficacy of peri-
articular corticosteroid treatment of the sacroiliac join in non-spondy-
loarthropathic patients with chronic low back pain in the region of the
sacroiliac joint. Clin Exp Rheumatol 2002;20:52–4.
[61] Hopayian K. The need for caution in interpreting high quality sys-
tematic reviews. BMJ 2001;323:681–4.
[62] Andersson GBJ, Mekhail NA, Block JE. Treatment of intractable disco-
genic low back pain. A systematic review of spinal fusion and intradiscal
electrothermal therapy (IDET). Pain Physician 2006;9:237–48.
[63] Appleby D, Andersson G, Totta M. Meta-analysis of the efficacy and
safety of intradiscal electrothermal therapy (IDET). Pain Med
2006;7:308–16.
[64] Gibson J, Waddell G. Surgery for degenerative lumbar spondylosis:
updated Cochrane Review. Spine 2005;30:2312–20.
[65] Urrutia G, Kovacs F, Nishishinya MB, Olabe J. Percutaneous thermo-
coagulation intradiscal techniques for discogenic low back pain.
Spine 2007;32:1146–54.
[66] Hooten WM, Martin DP, Huntoon MA. Radiofrequency neurotomy
for low back pain: Evidence-based procedural guidelines. Pain Med
2005;6:129–38.
[67] Moher D, Cook DJ, Eastwood S, et al. Improving the quality of re-
ports of meta-analyses of randomised controlled trials: the QUOROM
statement. Lancet 1999;354:1896–900.
[68] van Tulder M, Furlan AD, Bombardier C, et al. Updated method
guidelines for systematic reviews in the Cochrane Collaboration Back
Review Group. Spine 2003;28:1290–9.
[69] Laine C, Horton R, DeAngelis CD, et al. Clinical trial registra-
tion—looking back and moving ahead. N Engl J Med 2007;356:
2734–6.