Date post: | 27-Oct-2016 |
Category: |
Documents |
Upload: | abigail-taylor |
View: | 215 times |
Download: | 1 times |
Review article
Scales for the identification of adults with attention deficithyperactivity disorder (ADHD): A systematic review
Abigail Taylor a, Shoumitro Deb b,*, Gemma Unwin c
a Milton Keynes Hospital NHS Foundation Trust, Standing Way, Eaglestone, Milton Keynes MK6 5LD, UKb University of Birmingham, The Barberry-National Centre for Mental Health, 25 Vincent Drive, Edgbaston, Birmingham B15 2FG, UKc University of Birmingham, School of Psychology, Edgbaston, Birmingham B15 2TT, UK
Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925
1.1. Scale development and items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925
1.2. Type of scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925
1.3. Study quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926
1.4. Comparison groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926
1.5. Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926
2. Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926
2.1. Psychometric properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927
2.1.1. Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927
2.1.2. Validity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927
3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927
3.1. Study quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927
Research in Developmental Disabilities 32 (2011) 924–938
A R T I C L E I N F O
Article history:
Received 16 November 2010
Received in revised form 20 December 2010
Accepted 27 December 2010
Available online 12 February 2011
Keywords:
Attention deficit hyperactivity disorder
Rating scales
Adults
A B S T R A C T
Attention deficit hyperactivity disorder (ADHD) is prevalent in the adult population. The
associated co-morbidities and impairments can be relieved with treatment. Therefore,
several rating scales have been developed to identify adults with ADHD who may benefit
from treatment. No systematic review has yet sought to evaluate these scales in more
detail. The present systematic review was undertaken to describe the properties, including
psychometric statistics, of the currently available adult ADHD rating scales and their
scoring methods, along with the procedure for development. Descriptive synthesis of the
data is presented and study quality has been assessed by an objective quality assessment
tool. The properties of each scale are discussed to make judgements about their validity
and usefulness. The literature search retrieved 35 validation studies of adult ADHD rating
scales and 14 separate scales were identified. The majority of studies were of poor quality
and reported insufficient detail. Of the 14 scales, the Conners’ Adult ADHD Rating scale and
the Wender Utah Rating Scale (short version) had more robust psychometric statistics and
content validity. More research into these scales, with good quality studies, is needed to
confirm the findings of this review. Future studies of ADHD rating scales should be
reported in more detail so that further reviews have more support for their findings.
� 2010 Elsevier Ltd. All rights reserved.
* Corresponding author. Tel.: +44 0 121 414 7130; fax: +44 0 121 301 2351.
E-mail addresses: [email protected] (A. Taylor), [email protected] (S. Deb), [email protected] (G. Unwin).
Contents lists available at ScienceDirect
Research in Developmental Disabilities
0891-4222/$ – see front matter � 2010 Elsevier Ltd. All rights reserved.
doi:10.1016/j.ridd.2010.12.036
3.2. Scale development and items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 928
3.3. Type of scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 928
3.4. Completion methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 929
3.5. Scoring methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 929
3.6. Detailed explanation of study methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934
3.7. Representative population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934
3.8. Comparison groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934
3.9. Psychometric properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934
3.9.1. Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934
3.9.2. Validity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 934
4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 936
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 937
1. Introduction
Attention deficit hyperactivity disorder (ADHD) is a developmental disorder of childhood onset. Functional impairmentsassociated with ADHD include poor work and school performance (Daley, 2006). Adults with ADHD have a high rate ofpsychiatric co-morbidity, particularly substance abuse and antisocial personality disorder (Mannuzza, Klein, & Bessler,1998; Weiss, Hechtman, Milroy, & Perlman, 1985; Zwi & York, 2004).
The Diagnostic and Statistical Manual 4th edition (DSM-IV; American Psychiatric Association; APA, 1994) providesclinical criteria for diagnosis of ADHD. It is estimated that around 3–5% of children and about 1–3% of the adult populationhas ADHD (Fayyad et al., 2007; Kessler et al., 2006; Polanczyk & Rohde, 2007). A meta-analysis of follow-up studiesconcluded that in up to 65% of children with ADHD, symptoms and impairments may persist into adulthood (Biederman,Faraone, & Mick, 2006). A systematic review of 37 intervention studies found that psychostimulants and someantidepressants have a beneficial effect on adult ADHD symptoms (Wilens, Spencer, & Biederman, 2001).
Rating scales have been shown to be useful in identifying and screening for ADHD in children (Daley, 2006). Several scaleshave been developed and validated specifically to identify adults with ADHD as the disorder has a different symptom profileto childhood ADHD (Biederman et al., 2006; Zwi & York, 2004).
The accuracy of a rating scale can be affected by the content validity, including the types of items included in that scaleand method of completion. A valid rating scale should be well designed and fit for purpose. Below, we have described criteriawhich could be used to assess usefulness and validity of rating scales.
1.1. Scale development and items
Clinically valid scales are likely to be those based on standard diagnostic criteria (for example the DSM-IV criteria) incombination with other criteria that are specific to adult population. The DSM-IV criteria were developed in field trialsstudying children with ADHD and were validated in a large group of children (American Psychiatric Association, 1994;Mannuzza, 2003). Studies which have used the DSM-IV criteria in adults have found that they can also be useful forassessment of adult ADHD (Mannuzza et al., 1998; Polanczyk & Rohde, 2007; Weiss et al., 1985). However, exclusive use ofthe DSM-IV criteria could restrict the evaluation of adult ADHD to a limited symptom list. Follow up studies, such as that byBiederman et al. (2006), show that inattentive symptoms persist more into adulthood than hyperactivity and impulsivity.Adults face more social situations that increase the potential for manifestation of impairment than children such as in theworkplace, at home, friendships, and marriages.
1.2. Type of scales
Assessment of current symptoms is important to demonstrate that the patient currently suffers from impairment.However, as ADHD is a developmental disorder, in order for an adult to receive a DSM-IV diagnosis of ADHD, there needs to beevidence of presence of ADHD symptoms in childhood (American Psychiatric Association, 1994). As the patient is required torecall symptoms and behaviour from as young as 5 years of age, recall bias could affect the reliability of retrospective scales,particularly with ADHD patients (Mannuzza, Klein, Klein, Bessler, & Shrout, 2002). Neuropsychological tests such as theWechsler-Memory scale, where participants are asked to remember as much as they can about a short story they have beenread, show that ADHD adults have impaired short term and long term memory recall (Pollak, Kahana-Vax, & Hoofien, 2008).This will affect the patient’s recall of both childhood and adulthood symptoms.
The relationship between self and informant symptom ratings, and the accuracy of these reports are unclear. There maybe several reasons for the discrepancies between self and informant reports. Informant reports may be unreliable. Parentsmay be unaware of delinquent behaviour in youth, as shown by Du Paul et al. (2001). Belenduik, Clarke, Chronis, and Raggi(2007) suggested that patients with ADHD may conceal symptoms from friends, family and co-workers to ‘‘get on with theirlives’’ and not jeopardise their jobs. They suggested that inattentive symptoms were the easiest to conceal, perhaps
A. Taylor et al. / Research in Developmental Disabilities 32 (2011) 924–938 925
explaining why inattentive symptoms have the greatest reporting discrepancy. Informants may be unaware of more internalsymptoms such as emotional problems.
Conversely, self reports may be inaccurate as adults with ADHD may be unaware of externally manifested symptoms,such as fidgeting, as it has become a natural part of their behaviour. In the same way, adults may have adapted to theirsymptoms and therefore do not feel they are problematic. As self and informant ratings of symptoms can differ, good qualityscales should have both informant and self rated versions for assessment of both adult and childhood symptoms (Barkley,Fischer, Smallish, & Fletcher, 2002).
1.3. Study quality
There should be sufficient detail in reporting the study methods to make a judgement on study quality; including howsamples were selected and how the tests were administered. The Quality Assessment for Diagnostic Accuracy Studies (QUADAS;Whiting, Rutjes, Reitsma, Bossuyt, & Kleijnen, 2003) can be used to judge the quality of study. The QUADAS stipulates thatstudies should include sufficient detail regarding the study methods, including how samples were selected. Samples shouldbe representative of the population that the scale is intended for and there should be comparison groups. Scales should becompared against the gold standard reference test. Study samples should be sufficiently large to provide statistical power.
1.4. Comparison groups
Ideally, scale validation studies should include a population based sample of adults clinically diagnosed with ADHD and amatched control group of participants taken from the same population of adults who do not have a diagnosis of ADHD.Samples should include those with co-morbidities such as substance abuse and depression. Mehringer et al. (2002)suggested that symptoms of cocaine withdrawal in patients with substance abuse could mimic ADHD. Patients withpsychiatric conditions, without ADHD, tend to score highly on ADHD rating scales (McCann, Scheele, Ward, & Roy-Byrne,2000; Mehringer et al., 2002; Ward, Wender, & Reimherr, 1993). Therefore, it is important to determine how well scalesperform in these populations in terms of their discriminant validity. Comparison groups may reduce the effects ofconfounding variables on scale scores.
Gender can affect ADHD scale scores independent of disease status. For instance, delinquency symptoms are reportedmore often in males than females (Conners, Erhadt, Epstein, et al., 1999; Erhardt, Epstein, Conners, Parker, & Sitarenios, 1999;Young, 2004). However, some studies describe no differences between male and female symptom report (Belenduik et al.,2007; Heiligenstein, Conyers, Berns, & Smith, 1998; Mancini, Ameringen, Oakman, & Figueiredo, 1999; Ward et al., 1993).Younger people report more symptoms than older people (Conners, Erhardt, & Sparrow, 1999; Heiligenstein et al., 1998;Murphy and Barkley, 1996; Solanto, Etefia, & Marks, 2004; Young, 2004). It is not clear whether this is a cohort effect, in thatpeople of one generation do not self report ‘‘hyperactivity’’ in childhood for instance, as it was perceived to be normal or dueto the attrition of childhood memories. ADHD symptoms may also decline with age, in disproportion to impairment(Biederman et al., 2006; Mannuzza, 2003; Weiss et al., 1985).
1.5. Objectives
Although there are a number of literature reviews available on scales for screening for ADHD in adults (Adler, Shaw, Sitt,Maya, & Ippolito, 2009; Faraone & Antshel, 2008; Murphy & Adler, 2004), no systematic reviews have been published.Therefore, we have carried out a systematic review in order to identify and analyse all studies validating rating scales used toidentify or screen for adults with ADHD.
2. Methods
The review protocol was developed in accordance with guidelines and advice from the Centre for Reviews andDissemination (CDC), UK (Khan & Kleijnen, 2001). Suitable papers were identified by searching four online medical journaldatabases, namely MEDLINE (1950 – June 2010), CINAHL (1981 – June 2010), EMBASE (1980 – June 2010) and PsycINFO(1967 – June 2010). The terms used to search each database are included in Appendix A.
Search results were documented in a Reference Manager 11� database. The software was used to search for and removeduplicated articles. Titles were then scrutinised in order to remove obviously inappropriate articles according to theinclusion and exclusion criteria (see Table 1).
Two researchers (AT & GU) independently applied inclusion and exclusion criteria to the abstracts and then to the full textarticles. Pre-piloted inclusion/exclusion forms were used to document and guide this process. The primary reviewer (AT)completed data extraction on the final selection of articles using pre-piloted data extraction forms. The properties of thestudied scales, sample demographics and the study findings were extracted. A second reviewer (GU) independentlyconducted data extraction on five of these articles. The articles for dual data extraction were chosen by a consensus decisionwith a third reviewer (SD) as these were thought to be the most salient articles. Data synthesis was descriptive. The findingsfrom each included study were tabulated to describe the scales and assess the psychometric properties of each scale. Studyquality was assessed using the QUADAS.
A. Taylor et al. / Research in Developmental Disabilities 32 (2011) 924–938926
2.1. Psychometric properties
Psychometric statistics objectively demonstrate the reliability and validity of scales. Reliable scales are consistent andreproducible. Valid scales truthfully measure the underlying concept that they are designed to measure, in this case, ADHD.
2.1.1. Reliability
One of the main statistics used to assess reliability is internal consistency. Internal consistency demonstrates how wellrelated items are in a scale, and that all items in a subscale measure the same concept. Cronbach’s alpha is one measure ofinternal consistency and the minimal acceptable level is 0.7 (Field, 2005). Split-half reliability has also been used to assessinternal consistency, as it measures the correlation between two halves of a scale. The concordance between patient andinformant scores shows how well informant ratings agree with self ratings. Although this is related to inter-rater reliability,it is not the same, as the patient is completing the scale about themselves. This can be measured by Cohen’s kappa. Cohen’skappa of 0.6–0.74 denotes good agreement and 0.75 and upwards is excellent (Field, 2005).
Pearson’s correlation is also used to assess patient-informant concordance. Intra-class correlation coefficient (ICC) is alsoused to assess concordance. Test–retest reliability demonstrates the stability of scale measurements over time and is oftenassessed in normative samples, as it is assumed that scores would be stable, as there is no pathology present.
2.1.2. Validity
A reliable scale is not necessarily a valid scale. Even if scores on scales are consistent and reproducible, scales are notuseful unless they measure the underlying pathology adequately. Factor analysis can be considered part of validation as itdemonstrates that items in, for instance, a hyperactivity subscale, measure only hyperactivity and not other concepts. Factorvariance, as a percentage, shows to what extent score variation is due to actual differences in pathology rather than chance.
Construct validity demonstrates that a scale measures the underlying construct of ADHD and does not measure unrelatedconstructs. Scores on ADHD scales should correlate with scores on general psychiatric scales. However, it is important toensure that ADHD rating scales are not merely assessing general psychopathology. Therefore, the correlation should not betoo great.
Concurrent validity demonstrates how well scale ratings agree with a gold standard such as the DSM-IV diagnosticinterview. This is arguably one of the more important aspects of scale validation. There is little point in using a rating scalewhich has no correlation with the gold standard assessment, as this scale would be invalid. Cohen’s kappa, Pearson’scorrelation and ICC can be used to measure concurrent validity.
Sensitivity and specificity are related to concurrent validity, but provide more information about the accuracy of the scale.Sensitivity shows how well scales identify true cases and specificity shows how well scales identify true non-cases. Totalclassification accuracy (TCA) is a measure of the overall diagnostic accuracy of the scale; and shows the percentage of bothcases and non-cases correctly diagnosed by the scale.
Positive predictive value (PPV) is the proportion of those who screen positive who actually have the disease, and negativepredictive value (NPV) is the proportion that screen negative that are true non-cases.
Receiver operating characteristics assess the accuracy of scales with continuous variables. A graph of sensitivity over 1-specificity gives the area under the curve (AUC), which can demonstrate the accuracy of a scale.
3. Results
Fig. 1 shows the results of the search at each stage in terms of the numbers of identified articles. Thirty-five validationstudies were identified for review. A summary of each of the 14 adult ADHD scales identified is presented in Table 2. Thecharacteristics of these are shown in Table 3.
3.1. Study quality
Only the test procedures and the scale properties were well described in all studies. Recruitment methods, howindeterminate results were dealt with, and withdrawals from the study were not well explained in nearly all of the studies.Only one study stated whether or not scale scores were blinded from interviewers and vice versa (Kooij et al., 2004).
Table 1
Inclusion and exclusion criteria.
Inclusion criteria Exclusion criteria
Study design: studies investigating a structured symptom or behaviour
based scale, of childhood or current symptoms, for diagnosis, screening
or identification of ADHD in adults
Publication: Foreign language studies
Participants: adults (18 years and over) with ADHD Scales: neuropsychological functioning scales/tests
or quality of life scales. Scales assessing personality traits
Outcomes: psychometric properties of rating scales (validity, reliability,
factor analysis, sensitivity, specificity, internal consistency, etc.)
A. Taylor et al. / Research in Developmental Disabilities 32 (2011) 924–938 927
Two studies had unrepresentative samples, namely, healthy college/university students or an all female population(Belenduik et al., 2007; Young, 2004). A further 17 provided insufficient detail to determine sample representativeness.
Many studies used small samples of less than 100 participants, except for selected studies of the Wender Utah RatingScale (WURS), Current Symptoms Scale (CSS), Conners’ Adult ADHD Rating Scale (CAARS), Young Adult Rating Scale (YARS),Attention Deficit Scales for Adults (ADSA), ADHD Rating Scale (ADHD-RS) and the Caterino Scale. Only Rossini and O’Connor(1995) undertook a power calculation before recruitment.
3.2. Scale development and items
Scale items are based on ADHD symptoms, behaviours and difficulties. The Assessment of Hyperactivity and Attention
(AHA), CSS, ADHD-RS, Adult Self Report Scale (ASRS-18) and Symptom Inventory (SI) are based entirely on the DSM-IV ‘A’ criteria.These scales contain the 18 DSM-IV ‘A’ criteria which have been reworded so that they can be included in a rating scale. TheAdult Rating Scale (ARS) contains 25 items, based on the DSMIII-R criteria. During the development of the ASRS, several itemswere identified as ADHD symptoms which could not be mapped onto the DSM-IV criteria, and were therefore discarded(Adler et al., 2006).
Other scales have attempted to mitigate the potential restrictions of DSM-IV by combining these items with other criteriaor developing an entirely new set of criteria. The ADSA and Adult Problems Questionnaire (APQ) items were developed frominterviews of ADHD adults. The CAARS items were developed from childhood rating scales using the DSM-IV criteria and theUtah criteria. The WURS was developed using the Utah criteria. Similarly the Brown Attention Deficit Disorder Scales (BADDS)
items are based on DSM-IV and the author’s own published studies of ADHD. The YARS uses 17 of 18 DSM-IV ‘A’ criteria, andincluded seven of their own items which they considered to tap into educational difficulties The Young Adult Questionnaire
(YAQ) items were chosen from a literature review of adult ADHD symptoms (Young, 2004).
3.3. Type of scales
Ten of the scales assess current symptoms. Four of the scales (WURS, YAQ, ADHD-RS and AHA) either enquire aboutchildhood symptoms separately or use the same scale items to retrospectively assess childhood symptoms, in the same waythat the DSM-IV criteria are used to diagnose both adults and children.
All of the scales were designed for adults to report their own symptoms. However, six scales (CAARS, YAQ, ADHD-RS, CSS,
AHA, and WURS) derive an informant version from the self report symptoms, to be completed by a spouse or co-worker forcurrent symptoms, or a parent or teacher, if available, for childhood symptoms.
However, studies have shown a lack of concordance between self and informant reports of symptoms (Fossati et al., 2001;Zucker, Morris, Ingram, Morris, & Bakeman, 2002). Informants tended to report fewer inattentive symptoms. Similar resultswere found by Zucker et al. (2002), and Fossati et al. (2001). Conversely, Murphy and Schachar (2000) found no differencebetween informant and self report symptoms on a brief ADHD questionnaire. In all of these studies, it is unclear whose reportis more accurate, as no objective measures, such as an assessment of attention span, were conducted.
[()TD$FIG]
All Databases1899
1317
77 abstractsobtained
44 full textsobtained
1 excluded on full text afterdiscussion
582 duplicatesremoved
1240 excluded on titlel
33 excluded onabstract
12 excluded on fulltext
32
31 included
4 cross-referencesidentified
35 studies
MEDLINE657
EMBASE721
CINAHL45
PsycInfo476
Fig. 1. Summary of the search process at each stage.
A. Taylor et al. / Research in Developmental Disabilities 32 (2011) 924–938928
3.4. Completion methods
In all but one of the scales, symptom frequency is rated on a Likert scale, from never/very rarely, to always/very often. Thelanguage used for each scale is slightly different; a score of two on one scale does not necessarily equate with a score of twoon another. Nine of the scales are based on a four point (0–3) Likert scale, two are five point (0–4), one is five point (1–5) andone is an eight point Likert scale (0–7). The AHA is the only scale which is scored differently. Items are worded as they appearin the DSM-IV; for example; ‘‘I often feel restless’’ and the patient merely answers ‘‘yes or no’’.
3.5. Scoring methods
Several different scoring methods have been employed in these scales. The main scoring method is a symptom count.Where patients have a clinically significant symptom occurrence; for instance they answered very often or often (usually 2 or3), this is scored as a positive symptom. The patient scores one point for each positive symptom. In the case of the DSM-IV
criteria based scales, this gives a total score of 18 (9 per subscale). A continuous scoring method involves summation of theactual item responses, for instance if a patient has answered two to all 18 questions, they will receive a score of 36.
Alternatively, a continuous scoring method has been used either exclusively or as a compliment to the symptom countmethod. The patient’s actual item responses are summed (e.g. 2 + 3 + 4). This can be a more useful scoring method as thescore is compared to means for that particular population, which gives the score a population specific context. All of thesescoring methods take a clinically significant score to be 1.5 standard deviations (SD) above the presented mean. This is acommonly used cut-off point for rating scales, although 2 SD above the mean has also been used. The CSS and ADHD-RS have
Table 2
Studies included in the systematic review.
Scale (date when the scale was first published) Studies
Wender Utah Rating Scale (WURS)(Ward et al., 1993)
(1) Ward et al. (1993) (6) McCann et al. (2000)
(2) Stein et al. (1995) (7) Fossati et al. (2001)
(3) Rossini and O’Connor (1995) (8) Wierzbicki, 2005
(4) Weyandt et al. (1995) (9) Belenduik et al. (2007)
(5) Mancini et al. (1999)
Adult Rating Scale (ARS) (Weyandt et al., 1995) (4) Weyandt et al. (1995)
(10) McCann and Roy-Byrne (2004)
Current Symptoms Scale (CSS)(Barkley & Murphy, 1998)
(11) Murphy and Barkley (1996) (14) Zucker et al. (2002)
(12) Heiligenstein et al. (1998) (15) Aycicegi et al. (2003)
(13) O’Donnell et al. (2001)
(33) Katz, Petscher, Welles,
and Welles (2009)
Conners’ Adult ADHD Rating Scale (CAARS)(Conners, Erhardt, & Sparrow, 1999)
(16) Conners, Erhadt, Epstein,et al. (1999)
(19) Cleland, Magura, Foote,
Rosenblum, and Kosanke (2006)
(17) Erhardt et al. (1999) (9) Belenduik et al. (2007)
(18) Solanto et al. (2004) (20) Kooij et al. (2008)
(34) Adler et al. (2008)
Adult Problems Questionnaire (APQ)(De Quiros & Kinsbourne, 2001)
(21) De Quiros and Kinsbourne (2001)
Young Adult Rating Scale (YARS) (Du Paul et al., 2001) (22) Du Paul et al. (2001)
Assessment of Hyperactivity and Attention (AHA)(Mehringer et al., 2002)
(23) Mehringer et al. (2002)
Attention Deficit Scales for Adults (ADSA)(Triolo & Murphy, 1996)
(24) West, Mulsow, and Arredondo (2003) (10) McCann and Roy-Byrne
(2004)
(25) Dowson et al. (2004) (26) West, Mulsow, and
Arredondo (2007)
ADHD Rating Scale (ADHD-RS)(Du Paul, Power, Anastopoulos, & Reid, 1998)
(27) Kooij et al. (2004)
(20) Kooij et al. (2008)
Brown Attention Deficit Disorder Scales (BADDS) (Brown, 1996) (18) Solanto et al. (2004)
(20) Kooij et al. (2008)
Symptom Inventory (SI) (McCann & Roy-Byrne, 2004) (10) McCann and Roy-Byrne (2004)
Young Adult Questionnaire (YAQ) (Young, 2004) (28) Young (2004)
Adult Self Report Scale (ASRS) (Adler et al., 2006) (29) Kessler et al. (2005) (31) Reuter et al. (2006)
(30) Adler et al. (2006) (32) Kessler et al. (2007)
Caterino Scale (Caterino et al., 2009) (35) Caterino et al. (2009)
The articles highlighted in bold text are those retrieved through cross-references and not in the original database search.
A. Taylor et al. / Research in Developmental Disabilities 32 (2011) 924–938 929
Table 3
Summary of characteristics of adult ADHD scales.
Type of scale Items Completion method Scoring method Cut off scores Score range Scale development
1a. WURS-61
(Long)
Self report of
childhood
and current
symptoms
61 Symptom frequency is
rated on a 5 point
Likert scale (0–4)
Sum actual answer
responses (score 0, 1, 2,
3 or 4) per item
No cut off scores have
been reported owing to
the weaker
psychometric properties
compared with the 25-
item scale
0–244 Items were taken from Wender’s ‘‘Minimal
Brain Dysfunction in Children’’
1b. WURS-25
(short)
Self and
informant
report of
childhood
and current
symptoms
25 Symptom frequency is
rated on a 5 point
Likert scale (0–4)
Sum actual item
responses (score 0, 1, 2,
3 or 4) per item
>36 if depression is
present
>46 if depression is
absent
0–100 25 items from long WURS which had the
highest mean difference between ADHD and
non-ADHD participants. The higher cut off
score in depression was based on a study using
the scale in this population
2. ARS Self report of
current
symptoms
25 Symptom frequency is
rated on a 4 point
Likert scale (0–3)
Sum actual item
responses (score 0, 1, 2
or 3) per item
31 0–75 Items derived from DSM-III-R criteria.
Designed to be a similar format to the original
children’s ADHD-RS
3. CSS Self report of
current
symptoms
18 (9 inattention
+ 9 hyperactivity/
impulsivity)
Symptom frequency is
rated on a 4 point
Likert scale (0–3)
Score 1 for a positive
symptom rating
(answered 2 or 3 to an
item)
Also, continuous
scoring where actual
item responses are
summed (as for the
ARS and WURS)
6/9 on one or both
subscales
Or 1.5 SD above the
mean total score for age/
sex
0–18
(symptom
rating)
0–54
(summed)
Taken directly from the 18 DSM-IV ‘A’ criteria.
Has also been used for retrospective childhood
symptom report
4a. CAARS
(Long)
Self and
informant
report of
current
symptoms
66 (42 items in
4 subscales
+ 18 DSM-IV items
+ 12 item ADHD
index)
Some items tap
into more than
1 subscale
Symptom frequency is
rated on a 4 point
Likert scale (0–3)
Actual responses are
entered onto a scoring
sheet. T scores are then
obtained from the
scoring sheet as per
age and gender. T
scores are then
compared to the
normative T value
(T = 50)
T> 65 0–198
(T = 0–100)
Developed 93 items from children’s rating
scale and Utah criteria in 9 domains. After
factor analysis of these 93 items, 42 were
chosen. Gender and age specific scores were
obtained. An inconsistency index is included
to ensure that the scale was completed
honestly. T scores between 50 and 65 are
considered borderline and require
interpretation by a trained clinician
4b. CAARS
(Short)
26 (20 items in
4 subscales
+ 12 item ADHD
index)
Some items
tap into both
subscales
0–78
(T = 0–100)
20 items were selected from the 42 subscale
items in the long CAARS that discriminated
ADHD the best
5. APQ Self report
of current
symptoms
43 Symptom frequency is
rated on a 4 point
Likert scale (0–3)
Sum responses on
individual items (0–3)
and divide by the
number of items
No scale specific cut offs
are presented. However,
a score of 2.5/3 on 3 of
the items may be used
0–3 Pool of common symptoms was identified by
ADHD adults. Items which tapped into the
Utah criteria and DSM-IV were chosen
A.
Ta
ylo
ret
al./R
esearch
inD
evelo
pm
enta
lD
isab
ilities3
2(2
01
1)
92
4–
93
89
30
6. YARS Self report
of current
symptoms
24 (17 out of the
18 DSM-IV criteria
+ 7 items addressing
difficulties
encountered
at college)
Symptom frequency is
rated on a 4 point
Likert scale (0–3)
Score 1 for a positive
symptom rating
(answered 2 or 3 to an
item)
No scale-specific cut off
scores are presented
0–24 The investigators constructed the scale based
on 17 DSM-IV criteria. Included 7 items that
the authors considered to reflect specific
difficulties encountered by ADHD adults in
college or university. A cut-off score of 1.5 SD
above the mean may be used though the
authors do not include scale-specific cut-off
scores
7. AHA Self and
informant
report of
current and
childhood
symptoms
18 (2 subscales of
9 inattention
+ 9 hyperactivity/
impulsivity items)
items from DSM-IV
Symptoms are rated
‘Yes’ or ‘No’ as to
whether or not they
were present in
childhood and
adulthood
Score 1 for a positive
symptom rating
(answered 2 or 3 to an
item)
4/9 adult symptoms + 6/
9 childhood symptoms
(on one or both
subscales)
0–18 The items were taken directly from the DSM-
IV criteria. Both childhood and adulthood
symptoms on the AHA are required for a
diagnosis of ADHD
8. ADSA Self report
of current
symptoms
54 Symptom frequency is
rated on a 5 point
Likert scale (1–5)
Actual answers are
entered onto a scoring
sheet. T scores are then
obtained from the
scoring sheet for each
subscale and the total.
T scores are then
compared with the
normative T values
(T = 50)
If the Total T >60
(total = 161), patient is
likely to have ADHD
If T >70 (total = 181),
highly likely to have
ADHD
54–270
(T = 0–100)
Interviewed adults with attention problems.
The authors used this information to construct
9 subscales based on their clinical experience
(not factor analysis), namely attention,
interpersonal, disorganisation, co-ordination,
academic theme, emotive, long term,
childhood, and negative social. Additionally,
an inconsistency index is included
9. ADHD-RS Self and
informant
report of
current and
childhood
symptom
18 adult items
(2 subscales;
9 inattention
+ 9 hyperactivity)
+ 3 childhood items
Symptom frequency is
rated on a 4 point
Likert scale (0–3)
Score 1 for a positive
symptom rating
(answered 2 or 3 to an
item). Also continuous
scoring where actual
item responses are
summed.
4/9 adult symptoms on
one or both
subscales + 3/3 on
childhood items
1.5 SD above age group
mean (continuous)
0–18 (symptom
counts). 54
(summed)
Adaptation of children’s ADHD-RS which was
taken from the DSM-IV criteria. Both the child
and adult symptoms on the ADHD-RS are
required for a diagnosis of ADHD
10. BADDS Self report of
current
symptoms.
40 items in
5 subscales.
Symptom frequency is
rated on a 4 point
Likert scale (0-3).
Actual answers are
entered onto a scoring
sheet. T scores are then
obtained from the
scoring sheet for each
subscale and the total
T = 50 0–120
(T = 0–100)
Items are based on DSM-IV criteria and the
author’s own observations of ADHD from
several published studies. Scale was piloted
and the data were published in the manual.
Only one cut off as scores did not vary by age
or gender. Five subscales are organisation/
work, attention, energy/effort, mood, and
memory
11. SI Self report of
current
symptoms
18 (2 subscales;
9 inattention
+ 9 hyperactivity/
impulsivity)
Symptom frequency is
rated on a 4 point
Likert scale (0–3)
Score 1 for a positive
symptom rating
(answered 2 or 3 to an
item)
6/9 on one or both
subscales
0–18 The authors used the DSM-IV criteria to
develop a scale for use in their own ADHD
clinic
12. YAQ Self report of
childhood
symptoms
112 (4 subscales) Symptom frequency is
rated on an 8 point
Likert scale (0–7)
Sum responses (score
1–8 per item), divide
by the number of items
in that subscale
None presented. Can use
1.5 SD above presented
mean
1–8 per subscale The author conducted a literature review of
ADHD symptoms which were likely to
diagnose ADHD and co-morbid factors. Four
subscales are ADHD symptoms, emotional,
delinquency, and social
A.
Ta
ylo
ret
al./R
esearch
inD
evelo
pm
enta
lD
isab
ilities3
2(2
01
1)
92
4–
93
89
31
Table 3 (Continued )
Type of scale Items Completion method Scoring method Cut off scores Score range Scale development
13a. ASRS-18
(long)
Self report of
current symptoms
18 (2 subscales;
9 inattention
+ 9 hyperactivity)
Symptom frequency is
rated on a 5 point
Likert scale (0–4)
Score 1 for a positive
symptom rating
(answered 2, 3 or 4
to an item)
Also continuous
scoring where actual
item responses are
summed
9/18 across both
subscales
Or 21/36 on either
subscale
0–18 (symptom
count)
72 (summed)
An item pool of ADHD symptoms was
generated. Mapped onto DSM-IV criteria.
Psychiatrists chose the items which best fitted
the DSM-IV criteria
13b. ASRS-6
(short)
Self report of
current symptoms
6 Symptom frequency is
rated on a 5 point
Likert scale (0–4)
Score 1 for a positive
symptom rating
(answered 2, 3 or 4 to
an item)
Also continuous
scoring where actual
item responses are
summed
4/6
or 14/24
0–6 (symptom
count)
0–24 (summed)
Six items from the 18 item ASRS that had the
same strength of association with clinician
diagnosis as the 18 item ASRS. The six items
with the most stable psychometric properties
were chosen for the short ASRS
14. Caterino
Scale
Self/informant
+ Child/adult
18 (in 4 scales,
self report of
child and
adult symptoms)
Rated on a 3 point scale
(0–2) in 4 situations –
As a child, at work, at
home, in social
settings
Continuous scoring,
summation of actual
responses
None presented 0–144 Based on DSM-IV, psychologists chose
behaviours that best met the DSM-IV criteria.
Factor analysed in a group of children and
adults
A.
Ta
ylo
ret
al./R
esearch
inD
evelo
pm
enta
lD
isab
ilities3
2(2
01
1)
92
4–
93
89
32
Table 4
Psychometric statistics retrieved from the studies sorted by scale.
Scale Internal Consistency Inter-informant reliability Test–retest Sensitivity* % Specificity* % TCA * % PPV * % NPV* % AUC* Con k
a SH k r ICC ICC r
WURS-61 0.69–0.91 – – – – 0.68 0.68–0.90 – – – – – – –
WURS-25 0.86–0.92 0.35–0.90 0.72 – 0.88 0.74 0.62–0.98 96 96 – – – – –
WURS-C+A 0.95 – – – – – – 73 58 64.5 – – – –
ARS 0.89 0.86 – – – – – 0.80 92 33 – – – – –
CSS-C 0.75–0.91 – 0.30–0.31 0.55–0.57 – – 0.82 22–43 96–100 – 76–100 32–71 – –
CSS-C+A – – 0.32–0.35 0.56–0.65 – – – – – – – – – –
CAARS 0.74–0.92 – – – – – 0.80–0.91 82 87 85 87 83 – 0.67
APQ – – – – – – – 83 90 – – – – –
YARS 0.86 – – – – – – – – – – – – –
AHA – – – – – – – 80–84 60–67 70 67 75 0.79 0.40
ADSA 0.70–0.93 0.92 – – – – – 58–81 46–94 71–83 78 87 – –
ADHD-RS 0.76–0.88 – – – – – – 71 67–77 – – – 0.72–0.76 –
BADDS 0.69–0.81 – – – – – – 84–92 33 74 76 67 – –
SI 0.91 – – – – – – 78 54 – – – – –
YAQ 0.50–0.98 – – 0.20–0.77 – – – – – – – – – –
ASRS-18 0.75–0.89 – – – – – – 56 98 96 25–82 98.3 0.77 0.22–0.60
ASRS-6 0.63–0.72 – – – – 0.47–0.77 69–39 88–100 84–98 24–57 94 –97 0.79–0.84 0.21–0.52
Caterino Scale 0.81–0.91 – – 0.519–0.661 – – – 0.94 0.87 – 0.87 0.93 – –
C – Childhood Symptoms; A – Adult Symptoms; a – Cronbach’s alpha; SH – Split half reliability; k – Cohen’s kappa; r – Pearson’s correlation coefficient; ICC – intra-class correlation coefficient; TCA – total
classification accuracy; PPV – positive predictive value; NPV – negative predictive value; AUC – area under the curve; Con – concurrent validity; * – at given cut-off scores (see Table 2).
A.
Ta
ylo
ret
al./R
esearch
inD
evelo
pm
enta
lD
isab
ilities3
2(2
01
1)
92
4–
93
89
33
age and gender dependant cut-off scores. This allows cut-off scores to be adjusted depending on the demographics of thepatient being assessed.
Another scoring method employed which is an extension of the continuous scoring method, is used in the CAARS, ADSA
and BADDS, where T values are used to determine the clinical significance of a patient’s score. The score on each subscale andtotal score is entered onto a graphical scoring sheet. Different sheets are used depending on the patient’s age and gender. Thissheet then shows the corresponding T value for that patient’s score. On the CAARS, a T score greater than 65 is clinicallysignificant. On the BADDS, a T score of 50 and on the ADSA a T score of 60–70, may signify ADHD. As with the continuousscoring method, T values allow patient’s scores to be referenced to population means.
3.6. Detailed explanation of study methodology
In 15 studies, it was not clear how participants were recruited from the population. It is unclear in some cases whether ornot participants were excluded on the basis of psychiatric co-morbidity. All of the studies explained the test procedures indetail. In particular, the scales themselves were discussed in detail, particularly how the items were developed, what theitems are and how they are scored.
3.7. Representative population
Healthy university populations, or similarly unrepresentative populations, were used to validate the scale in 16 of thestudies. Whilst these samples could have been representative, the authors did not discuss whether or not this was the scale’sintended population, except in the case of the YARS, which was designed for use in university students (Du Paul et al., 2001).Belenduik et al. (2007) study was considered to have the most unrepresentative population as only females were included.
3.8. Comparison groups
Only four studies used matched ADHD and non-ADHD groups, matched by age and gender (Conners, Erhardt, & Sparrow,1999; De Quiros & Kinsbourne, 2001; Dowson et al., 2004; O’Donnell, McCann, & Pluth, 2001). Nine studies used unmatchedcontrol groups (Caterino, Gomez-Benito, Balleurka, & Amador-Campos, 2009; Erhardt et al., 1999; Fossati et al., 2001; Kessleret al., 2007; McCann et al., 2000; Mehringer et al., 2002; Solanto et al., 2004; Ward et al., 1993; Young, 2004). These includedcontrols with psychiatric co-morbidity, which allows evaluation of the effects of co-morbidities on scale scores.
Twenty-two studies did not have control comparison groups, e.g. an ADHD group and a non-ADHD group. Some of thesestudies did use mixed groups; i.e. some had ADHD and some did not, but they did not report separately on these groups andanalyses covered the entire sample.
3.9. Psychometric properties
The psychometric properties of the scales (including different versions of the scale such as 25-item and 61-item version ofWURS) are summarised in Table 4. For some scales, different values of psychometric statistics are reported in differentstudies and these are presented in Table 4. Some psychometric statistics were not available for some scales. Therefore, nodata could be entered for these in Table 4.
3.9.1. Reliability
All but the APQ, AHA and CSS-C (childhood version) have internal consistency data. The WURS-C+A, (childhood + adultversions) and the SI have the highest Cronbach’s alphas (>0.90). The ADSA has the highest split half reliability (>0.90).
Cohen’s kappa for concordance has been calculated for three scales. The WURS-25 has the highest Cohen’s kappa, at 0.72.The CSS-C+A has the highest Pearson’s concordance, at r = 0.56–0.65. Only one study used ICC; the WURS-25 has excellent
concordance with an ICC of 0.88 (Ward et al., 1993).Test–retest reliability was assessed over different time periods varying from one week to two months. Test–retest
reliability assessed by ICC was calculated for two scales. The WURS-25 has a test–retest reliability ICC of 0.74. The WURS-61
has an ICC of 0.68. Six scales have test–retest reliability measured in Pearson’s coefficients. The CSS-C, ARS and CAARS havehigh test–retest reliability, with Pearson’s coefficients being greater than 0.80.
3.9.2. Validity
Factor analysis was undertaken for 11 scales. The WURS-61 has the highest variance explained by the five factor structure,at 71%. The next highest was the WURS-25 which has a factor structure variance of 60%. Many of the studies assessedconstruct validity and 10 of the scales have construct validity correlations presented. A wide range of different measureshave been used.
Of the four scales that have concurrent validity data, the CAARS performed the best, with a Cohen’s kappa of 0.67. Cohen’skappa for the ASRS-18 is low at 0.22–0.59, ICC is 0.84 and agreement between scale items and interview items is 43–72%.Cohen’s kappa is 0.40 for the AHA. Pearson’s correlations with total ADSA score and the DSM-IV score is 0.22–0.51 for theADSA.
A. Taylor et al. / Research in Developmental Disabilities 32 (2011) 924–938934
Only 13 of the 17 scale versions have sensitivity scores. The WURS-25, ARS and BADDS have excellent sensitivity (>84%).The ASRS-18, CSS-A, WURS-25 and APQ have excellent specificity (>90%), and the CAARS and ASRS-6 have good specificity (87–88%).
Only seven scales have TCA calculated. The ASRS-18 has the highest TCA at 96%, followed by the ASRS-6 (84%) and theCAARS (87%).
Seven scales have PPV and NPV calculated. The CAARS has the highest PPV at 87%. The ASRS-18 and the ASRS-6 have anNPV greater than 90%. The CAARS and ADSA have PPV greater than 80%.
AUC has only been calculated for four scales. Field (2005) states that AUC should be at least 0.89 but none of the studiesreached this threshold. The AUC for the ADHD-RS, ASRS-18, ASRS-6 and AHA are all between 0.72 and 0.79.
Overall, the WURS-25 has the best combination of psychometric properties, followed by the CAARS and the ASRS-18.However, the WURS-25 has only moderate split half reliability. The CAARS and ASRS-18 have moderate Cronbach’s alphas.Additionally, the ASRS-18 has only moderate sensitivity, positive predictive value and concurrent validity. The APQ, ARS andSI have some good psychometric properties but do not perform as well in other areas.
As can be seen from Table 4, the WURS-25 has a high internal consistency (0.86–0.92), the highest patient-informantconcordance (Cohen’s kappa 0.72; ICC 0.88), a high test–retest reliability (ICC 0.75), a high factor structure variance (60%,showing that variation in score is explained by variation in pathology rather than due to chance), 85% sensitivity and >90%specificity at its given cut off. The CAARS and ASRS-18 also perform well although have only moderate internal consistencies.The APQ, ARS and SI have some good psychometric properties but perform poorly in other areas.
There are insufficient psychometric properties published for some scales. No internal consistency data are published forthe CSS, APQ or the AHA. Test–retest reliability is only available for the WURS, CAARS, CSS and ASRS-6. Table 4 highlights thegaps in psychometric data published.
The CAARS and WURS are well designed scales with good content validity. Both scales have had factor analysis conducted;therefore, only relevant items are included. Both use a combination of items, including the Utah criteria, and the DSM-IV
criteria.
4. Discussion
The main findings of this study are discussed below. Whilst some of the scales are similar, each has their own particularstrengths and weaknesses. By nature, the language used in rating scales can be vague (Barnes, Cerrito, & Levi, 2003). Likertscales ask patients to assess how ‘‘often’’ they manifest a symptom. However, there is no standardised reference point for‘‘often.’’ For instance, one patient may consider often to be once a week, whilst their informant considers it to be once a day.Rating scales often give no advice to people completing them as to what ‘‘often’’ should mean.
No studies were excluded on the basis of study design or because they were of poor quality as only 35 studies wereretrieved. It is possible that many of these studies would have been excluded based on the quality analysis. Studies were notassigned a quality score to ‘‘weigh’’ their findings as this can be inappropriate for diagnostic accuracy studies, particularlywhere meta-analysis is not undertaken. Different methods for study weighting can produce very different conclusions(Whiting, Harbord, & Kleijnen, 2005).
Indeterminate results in these studies were likely to have been caused by incomplete or incorrectly completed scales.Some studies do report having incomplete scales and these results were removed from the analysis. However, theincomplete scales could provide useful information. For instance, where questions are left out, it could indicate they werepoorly understood by the participants. No details were given about missing data or extra participants.
For the symptom count scoring method, a cut off of six of nine symptoms of either inattention or hyperactivity, asspecified in DSM-IV, is used to identify adults with ADHD in many of the DSM-IV criteria based scales. This may beinappropriate for adults, as follow up studies show that whilst the number of ADHD adults with ‘‘clinically significant’’persistent symptoms can be as low as 40%, as many as 90% of these adults still show significant impairment and maybenefit from treatment (Biederman, Mick, & Faraone, 2000; Polanczyk & Rohde, 2007). Heiligenstein et al. (1998) foundthat a cut off of four DSM-IV symptoms was best at discriminating ADHD in adults. Other studies using the DSM-IV
criteria based scales have used similarly lower cut off scores (Kessler et al., 2005; Kooij et al., 2004, 2008; Mehringeret al., 2002).
Many of the studies used small samples, particularly where sub-group analyses have taken place. Whilst it can beacceptable to validate scales in small populations, large samples are needed to confirm the results of these validity studies.Large samples will be more representative of the population and will reduce the likelihood that any observed groupdifferences are due to chance. Only Rossini and O’Connor (1995) undertook a power calculation for their sample size. Themost widely studied scales (WURS, CAARS and CSS) unsurprisingly, had the largest overall sample sizes. The YARS wasvalidated in a large sample of 1209 participants.
Only 18 studies used the gold standard clinical interview for comparison, although it is important to note that theinterview itself has not been extensively validated. In six studies, scale scores were used as part of the ADHD diagnosis, whichis unacceptable (Aycicegi, Dinn, & Harris, 2003; Kooij et al., 2004; McCann & Roy-Byrne, 2004; McCann et al., 2000; Reuter,Kirsch, & Hennig, 2006; Solanto et al., 2004). Verification bias could have been introduced in a further six studies as onlythose who scored highly on the rating scale were interviewed (Conners, Erhardt, & Sparrow, 1999; De Quiros & Kinsbourne,2001; Dowson et al., 2004; Erhardt et al., 1999; McCann et al., 2000; Weyandt, Linterman, & Rice, 1995).
A. Taylor et al. / Research in Developmental Disabilities 32 (2011) 924–938 935
Out of the 14 scales identified, the short version WURS and the CAARS have the best psychometric properties. They are alsothe most widely studied rating scales. However, these results should be interpreted with caution. Many of the studies were ofpoor quality, for instance unrepresentative samples were used and in some cases there was no gold standard reference testfor comparison. Unfortunately, a large proportion of these studies were deemed as poor quality due to poor reporting.
The CAARS and the WURS are not based exclusively on the DSM-IV criteria, unlike many of the other scales. The currentsystematic review therefore, provides some evidence that the current DSM-IV criteria for adult ADHD are perhaps notspecific enough for accurate diagnosis in this group, and that other symptoms should be considered. It supports thesuggestion that field trials for DSM-IV criteria in adults with ADHD should be conducted, so that future diagnosis of adultADHD can be improved (Zwi & York, 2004).
The findings of this study should be considered in the context of the strengths and weaknesses of its design. The protocolwas developed in accordance with guidelines from the Centre for Reviews and Dissemination; therefore, it does have severalstrengths (Khan & Kleijnen, 2001). The most important medical databases were used for the search. Search terms werepiloted and the search had a high sensitivity (88%) but a low specificity (2%).
However, no grey literature e.g. conference reports and unpublished research was retrieved. Therefore, identification biascould have arisen as only positive research might have been identified. However, cross-references were retrieved asappropriate. Also, a meta-analysis of data was not possible.
To the authors’ knowledge, the present paper is the only published systematic review of adult ADHD rating scales. Anumber of journalistic reviews of adult ADHD scales have already been published (Adler et al., 2009; Faraone & Antshel,2008; Murphy & Adler, 2004). Whilst these reviews have significant value, and have reached similar conclusions, theirliterature searches were not as systematic as in this paper, and therefore, may not include all the published literature(Murphy & Adler, 2004). These results provide evidence that further research into the validity of adult ADHD rating scales isrequired. Further research on the WURS and CAARS should be conducted with large samples in order to confirm previousfindings. Rating scales which performed well in some areas, such as the ASRS, SI, ARS, Caterino Scale and APQ, may be usefulbut cannot be reliably used as they have not been independently validated in good quality studies.
Many different psychometric statistics were retrieved for this study. In future, other systematic reviews could focus onone or more of these statistics in more detail, so that a meta-analysis could be performed. For instance, sensitivity andspecificity are good measures of diagnostic accuracy which can be easily compared. A meta-analytic or large scale systematicreview may provide further support for these results.
Rating scales only provide a snapshot of a patient’s life, based on a restrictive symptom list. Therefore, rating scales shouldbe used along with other methods of information gathering such as a direct examination of patients, direct interview withthe patients and where possible informants, and careful examination of case notes before a clinical diagnosis could be made.Rating scales alone, particularly the case detection instruments, should not be relied upon for making a diagnosis.
Conflicts of interest
None.
Acknowledgement
Gemma Unwin is currently supported by the Baily Thomas Charitable Fund.
Appendix A. Search terms
attention deficit hyperactivity disorder$.mp. or exp Attention Deficit Disorder with Hyperactivity/
ADHD.mp.
attention deficit disorder$ with hyperactivity.mp.
hyperkinetic syndrome$.mp.
hyperkinetic disorder$.mp.
attention deficit disorder$.mp.
minimal brain dysfunction.mp.
psychiatric status rating scale$.mp.
self report$.mp.
self disclosure.mp. or exp Self Disclosure/
psychiatric diagnos$.mp. [mp=title, original title, abstract, name of substance word, subject heading word]
question$.mp.
instrument$.mp. [mp=title, original title, abstract, name of substance word, subject heading word]
screening.mp. [mp=title, original title, abstract, name of substance word, subject heading word]
screening tool$.mp. [mp=title, original title, abstract, name of substance word, subject heading word]
A. Taylor et al. / Research in Developmental Disabilities 32 (2011) 924–938936
screening scale.mp. [mp=title, original title, abstract, name of substance word, subject heading word]
diagnostic tool.mp. [mp=title, original title, abstract, name of substance word, subject heading word]
diagnostic scale.mp. [mp=title, original title, abstract, name of substance word, subject heading word]
assessment$.mp. [mp=title, original title, abstract, name of substance word, subject heading word]
diagnos$.mp. [mp=title, original title, abstract, name of substance word, subject heading word]
exp Diagnosis/
valid$.mp. or exp ‘‘Reproducibility of Results’’/
reliab$.mp.
psychometric propert$.mp. [mp=title, original title, abstract, name of substance word, subject heading word]
reproducib$.mp. [mp=title, original title, abstract, name of substance word, subject heading word]
specific$.mp. [mp=title, original title, abstract, name of substance word, subject heading word]
sensitiv$.mp. [mp=title, original title, abstract, name of substance word, subject heading word]
internal$ consisten$.mp. [mp=title, original title, abstract, name of substance word, subject heading word]
inter-rater.mp. [mp=title, original title, abstract, name of substance word, subject heading word]
test–retest.mp. [mp=title, original title, abstract, name of substance word, subject heading word]
inter-informant.mp. [mp=title, original title, abstract, name of substance word, subject heading word]
factor analysis.mp. [mp=title, original title, abstract, name of substance word, subject heading word]
exp Adult/ or adult$.mp.
1 or 2 or 3 or 4 or 5 or 6 or 7
Questionnaires/ or exp Psychiatric Status Rating Scales/ or rating scale$.mp.
*Psychometrics/cl, di, sn [Classification, Diagnosis, Statistics & Numerical Data]
8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 or 16 or 17 or 18 or 19 or 20 or 21 or 35
22 or 23 or 24 or 25 or 26 or 27 or 28 or 29 or 30 or 31 or 32 or 36
33 and 34 and 37 and 38
References
Adler, L., Faraone, S., Spencer, T., Frederick, W., Reimherr, F., Glatt, S., et al. (2008). The reliability and validity of self- and investigator ratings of ADHD in adults.Journal of Attention Disorders, 11, 711.
Adler, L., Shaw, D., Sitt, D., Maya, E., & Ippolito, M. M. (2009). Issues in the diagnosis and treatment of adult ADHD by primary care physicians. Primary Psychiatry,16, 57–63.
Adler, L. A., Spencer, T., Faraone, S. V., Kessler, R. C., Howes, M. J., Biederman, J., et al. (2006). Validity of pilot adult ADHD Self-Report Scale (ASRS) to rate adultADHD symptoms. Annals of Clinical Psychiatry, 18, 145–148.
American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (DSM-VI) (4th ed.). Washington, DC, USA: American PsychiatricAssociation.
Aycicegi, A., Dinn, W. M., & Harris, C. L. (2003). Assessing adult attention-deficit/hyperactivity disorder: A Turkish version of the Current Symptoms Scale.Psychopathology, 36, 160–167.
Barkley, R. A., Fischer, M., Smallish, L., & Fletcher, K. (2002). The persistence of attention-deficit/hyperactivity disorder into young adulthood as a function ofreporting source and definition of disorder. Journal of Abnormal Psychology, 111, 279–289.
Barkley, R. A., & Murphy, K. R. (1998). Attention deficit/hyperactivity disorder: A clinical workbook (2nd ed.). New York: Guildford Press.Barnes, G. R., Cerrito, P. B., & Levi, I. (2003). An Examination of the variability of understanding of language used in ADHD behavior rating scales. Ethical Human
Sciences and Services, 5, 195–208.Belenduik, K. A., Clarke, T. L., Chronis, M. A., & Raggi, V. L. (2007). Assessing the concordance of measures used to diagnose adult ADHD. Journal of Attention
Disorders, 10, 276–287.Biederman, J., Faraone, S. V., & Mick, E. (2006). The age-dependent decline of attention deficit hyperactivity disorder: A meta-analysis of follow-up studies.
Psychological Medicine, 36, 159–165.Biederman, J., Mick, E., & Faraone, S. V. (2000). Age-dependent decline of symptoms of attention deficit hyperactivity disorder: Impact of remission definition and
symptom type. American Journal of Psychiatry, 157, 816–818.Brown, T. S. (1996). Brown attention deficit disorder scales. TX: The Psychological Corporation.Caterino, L., Gomez-Benito, J., Balleurka, N., & Amador-Campos, J. (2009). Development and validation of a scale to assess the symptoms of attention-deficit/
hyperactivity disorder in young adults. Psychological Assessment, 21, 152–161.Cleland, C., Magura, S., Foote, J., Rosenblum, A., & Kosanke, N. (2006). Factor structure of the Conners Adult ADHD Rating Scale (CAARS) for substance users.
Addictive Behaviors, 31, 1277–1282.Conners, C. K., Erhadt, D., Epstein, J. N., Parker, J. D., Sitarenios, G., & Sparrow, E. (1999). Self ratings of ADHD symptoms in adults I: Factor structure and normative
data. Journal of Attention Disorders, 3, 141–151.Conners, C. K., Erhardt, D., & Sparrow, E. (1999). Conners’ adult ADHD rating scales. New York, USA: Technical Manual: Multi-Health Systems.Daley, D. (2006). Attention deficit hyperactivity disorder: A review of the essential facts. Child Care, Health and Development, 32, 193–204.De Quiros, G. B., & Kinsbourne, M. (2001). Analysis of self-ratings on a behavior questionnaire. Annals of the New York Academy of Sciences, 931, 140–147.Dowson, J. H., McLean, A., Bazanis, E., Toone, B., Young, S., Robbins, T. W., et al. (2004). The specificity of clinical characteristics in adults with attention-deficit/
hyperactivity disorder: A comparison with patients with borderline personality disorder. European Psychiatry, 19, 72–78.Du Paul, G. J., Power, T. J., Anastopoulos, A. D., & Reid, R. (1998). ADHD rating scale-IV: Checklists norms and clinical interpretation. New York: Guildford Press.Du Paul, G. J., Schaughency, E. A., Weyandt, L. L., Tripp, G., Kiesner, J., Ota, K., et al. (2001). Self-report of ADHD symptoms in university students: Cross-gender and
cross-national prevalence. Journal of Learning Disabilities, 34, 370–379.Erhardt, D., Epstein, J. N., Conners, C. K., Parker, J. D., & Sitarenios, G. (1999). Self-ratings of ADHD symptoms in adults II: Reliability, validity, and diagnostic
sensitivity. Journal of Attention Disorders, 3, 153–158.Faraone, S. V, & Antshel, K. M. (2008). Diagnosing and treating attention-deficit/hyperactivity disorder in adults. World Psychiatry, 7, 131–136.Fayyad, J., De Graaf, R., Kessler, R., Alonso, J., Angermeyer, M., Demyttenaere, K., et al. (2007). Cross-national prevalence and correlates of adult attention-deficit
hyperactivity disorder. British Journal of Psychiatry, 190, 402–409.
A. Taylor et al. / Research in Developmental Disabilities 32 (2011) 924–938 937
Field, A. P. (Ed.). (2005). Discovering statistics using SPSS. London, UK: SAGE Publication.Fossati, A., Ceglie, A. D., Acquarini, E., Donati, D., Donini, M., Novella, L., et al. (2001). The retrospective assessment of childhood attention deficit hyperactivity
disorder in adults: Reliability and validity of the Italian version of the Wender Utah Rating Scale. Comprehensive Psychiatry, 42, 326–336.Heiligenstein, E., Conyers, L. M., Berns, A. R., & Smith, M. A. (1998). Preliminary normative data on DSM-IV attention deficit hyperactivity disorder in college
students. Journal of American College Health, 46, 185–188.Katz, N., Petscher, Y., Welles, T., & Welles, T. (2009). Diagnosing attention-deficit hyperactivity disorder in college students: An investigation of the impact of
informant ratings on diagnosis and subjective impairment. Journal of Attention Disorders, 13, 277.Kessler, R. C., Adler, L. A., Ames, M., Demler, O., Faraone, S. V., Hiripi, E., et al. (2005). The World Health Organization adult ADHD self-report scale (ASRS): A short
screening scale for use in the general population. Psychological Medicine, 35, 245–256.Kessler, R. C., Adler, L., Barkley, R., Biederman, J., Conners, C. K., Demler, O., et al. (2006). The prevalence and correlates of adult ADHD in the United States: Results
from the national comorbidity survey replication. American Journal of Psychiatry, 163, 716–727.Kessler, R. C., Adler, L. A., Gruber, M. J., Sarawate, C. A., Spencer, T., & Van Brunt, D. L. (2007). Validity of the World Health Organization Adult ADHD Self-Report
Scale (ASRS) Screener in a representative sample of health plan members. International Journal of Methods in Psychiatric Research, 16, 52–65.Khan, K. S., & Kleijnen, K. (2001). Undertaking systematic reviews of research on effectiveness: CRD’s guidance for those carrying out or commissioning reviews. CRD
Report 4 (2nd ed.). UK: York, York Publishing Services.Kooij, J. J. S., Boonstra, A. M., Swinkels, S. H., Bekker, E. M., de Noord, I., & Buitelaar, J. K. (2008). Reliability, validity, and utility of instruments for self-report and
informant report concerning symptoms of ADHD in adult patients. Journal of Attention Disorders, 11, 445–458.Kooij, J. J. S, Buitelaar, J. K., VanDenOord, E. J., Furer, J. W., Rijnders, C. A., & Hodiamont, P. P. (2004). Internal and external validity of attention-deficit/hyperactivity
disorder in a population-based sample of adults. Psychological Medicine, 35, 817–827.Mancini, C., Ameringen, M. V., Oakman, J. M., & Figueiredo, D. (1999). Childhood attention deficit/hyperactivity disorder in adults with anxiety disorders.
Psychological Medicine, 29, 515–525.Mannuzza, S. (2003). Persistence of attention-deficit/hyperactivity disorder into adulthood: What have we learned from the prospective follow-up studies?
Journal of Attention Disorders, 7, 93–100.Mannuzza, S., Klein, R. G., & Bessler, A. (1998). Adult psychiatric status of hyperactive boys grown up. American Journal of Psychiatry, 155, 493–498.Mannuzza, S., Klein, R. G., Klein, D. F., Bessler, A., & Shrout, P. (2002). Accuracy of adult recall of childhood attention deficit hyperactivity disorder. American Journal
of Psychiatry, 159(11), 1882–1888.McCann, B. S., & Roy-Byrne, P. (2004). Screening and diagnostic utility of self-report attention deficit hyperactivity disorder scales in adults. Comprehensive
Psychiatry, 45, 175–183.McCann, B. S., Scheele, L., Ward, N., & Roy-Byrne, P. (2000). Discriminant validity of the Wender Utah Rating Scale for attention-deficit/hyperactivity disorder in
adults. Journal of Neuropsychiatry and Clinical Neurosciences, 12, 240–245.Mehringer, A. M., Downey, K. K., Schuh, L. M., Pomerleau, C. S., Snedecor, S. M., & Schbiner, H. (2002). The Assessment of Hyperactivity and Attention (AHA):
Development and preliminary validation of a brief self-assessment of adult ADHD. Journal of Attention Disorders, 5, 223–231.Murphy, K. R., & Adler, L. A. (2004). Assessing attention deficit/hyperactivity disorder in adults: Focus on rating scales. Journal of Clinical Psychiatry, 65, 12–17 (S).Murphy, K., & Barkley, R. A. (1996). Prevalence of DSM-IV symptoms of ADHD in adult licensed drivers: Implications for clinical diagnosis. Journal of Attention
Disorders, 1, 147–161.Murphy, P., & Schachar, R. (2000). Use of self-ratings in the assessment of symptoms of attention deficit hyperactivity disorder in adults. American Journal of
Psychiatry, 157(7), 1156–1159.O’Donnell, J. P., McCann, K. K., & Pluth, S. (2001). Assessing adult ADHD using a self-report symptom checklist. Psychological Reports, 88, 871–881.Polanczyk, P., & Rohde, L. A. (2007). Epidemiology of attention deficit/hyperactivity disorder across the lifespan. Current Opinions in Psychiatry, 20, 386–392.Pollak, Y., Kahana-Vax, G., & Hoofien, D. (2008). Retrieval processes in adults with ADHD: A RAVLT study. Developmental Neuropsychology, 33, 62–73.Reuter, M., Kirsch, P., & Hennig, J. (2006). Inferring candidate genes for Attention Deficit Hyperactivity Disorder (ADHD) assessed by the World Health
Organisation Adult ADHD Self-Report Scale (ASRS). Journal of Neural Transmission, 113, 838–929.Rossini, E. D., & O’Connor, M. A. (1995). Retrospective self-reported symptoms of attention-deficit hyperactivity disorder: Reliability of the Wender Utah Rating
Scale. Psychological Reports, 77, 751–754.Solanto, M. V., Etefia, K., & Marks, D. J. (2004). The utility of self-report measures and the Continuous Performance Test in the diagnosis of ADHD in adults. CNS
Spectrums, 9, 649–659.Stein, M. A., Sandoval, R., Szumowski, E., Roizen, N., Reinecke, M. A., Blondis, T. A., et al. (1995). Psychometric characteristics of the Wender Utah Rating Scale
(WURS): Reliability and factor structure for men and women. Psychopharmacology Bulletin, 31, 425–433.Triolo, S. J., & Murphy, K. R. (1996). Attention Deficit Scales for Adults (ADSA): Manual for scoring and interpretation. UK: Bristol, Taylor and Francis.Ward, M. F., Wender, P. H., & Reimherr, F. W. (1993). The Wender Utah Rating Scale: An aid in the retrospective diagnosis of childhood attention deficit
hyperactivity disorder. American Journal of Psychiatry, 150, 885–890.Weiss, G., Hechtman, L., Milroy, T., & Perlman, T. (1985). Psychiatric status of hyperactives as adults: A controlled prospective 15-year follow-up of 63 hyperactive
children. Journal of the American Academy of Child Psychiatry, 24, 211–220.West, S. L., Mulsow, M., & Arredondo, R. (2003). Factor analysis of the Attention Deficit Scales for Adults (ADSA) with a clinical sample of outpatient substance
abusers. American Journal of Addiction, 12, 159–165.West, S. L., Mulsow, M., & Arredondo, R. (2007). An examination of the psychometric properties of the Attention Deficit Scales for Adults with outpatient substance
abusers. American Journal of Drug and Alcohol Abuse, 33, 755–764.Weyandt, L. L., Linterman, I., & Rice, J. A. (1995). Reported prevalence of attentional difficulties in a general sample of college students. Journal of Psychopathology
and Behavioural Assessment, 17, 293–304.Whiting, P., Harbord, R., & Kleijnen, J. (2005). No role for quality scores in systematic reviews of diagnostic accuracy studies. BMC Research Methodology, 5, 19–25.Whiting, P., Rutjes, A. W., Reitsma, J. B., Bossuyt, P. M., & Kleijnen, J. (2003). The Development of QUADAS: A tool for the quality assessment of studies of diagnostic
accuracy included in systematic reviews. BMC Medical Research Methodology, 2, 1–13.Wierzbicki, M. (2005). Reliability and validity of the Wender Utah Rating Scale for college students. Psychological Reports, 96, 833–839.Wilens, T. E., Spencer, T. J., & Biederman, J. (2001). A review of the pharmacotherapy of adults with attention-deficit/hyperactivity disorder. Journal of Attention
Disorders, 5, 189–202.Young, S. (2004). The YAQ-S and YAQ-I: The development of self and informant questionnaires reporting on current adult ADHD symptomatology, comorbid and
associated problems. Personality and Individual Differences, 35, 1211–1223.Zucker, M., Morris, M. K., Ingram, S. M., Morris, R. D., & Bakeman, R. (2002). Concordance of self- and informant ratings of adults’ current and childhood attention-
deficit/hyperactivity disorder symptoms. Psychological Assessment, 14, 379–389.Zwi, M., & York, A. (2004). Attention deficit hyperactivity disorder in adults; validity unknown. Advances in Psychiatric Treatment, 10, 248–269.
A. Taylor et al. / Research in Developmental Disabilities 32 (2011) 924–938938