ORIGINAL PAPER
The Clinical Utility of the Modified Checklist for Autismin Toddlers with High Risk 18–48 Month Old Childrenin Singapore
Hwan Cui Koh • Si Huan Lim • Gifford Jiguang Chan •
Marisa Bilin Lin • Hong Huay Lim •
Sylvia Henn Tean Choo • Iliana Magiati
Published online: 28 June 2013
� Springer Science+Business Media New York 2013
Abstract The modified checklist for autism in toddlers
(M-CHAT) is a tool developed for 16–30 month old chil-
dren to screen for autism spectrum disorders (ASD). It is a
well-researched tool, but little is known about its utility
with Singaporean toddlers and with older children referred
for developmental concerns. This study investigated the
M-CHAT’s performance with 18–30 month old (N = 173)
and[30–48 month old (N = 407) developmentally at-risk
Singaporean children, when used with three recommended
scoring methods i.e., the total, critical and Best7 scoring
methods. The results indicate that the critical and Best7
scoring methods detected most true cases of ASD without
inflating the false positive rates in toddlers, and that only
the total scoring method performed acceptably for the older
children.
Keywords Autism spectrum disorders � Screening �Early identification � M-CHAT � Level 2 screening �High risk children
Introduction
Increasingly more young children are diagnosed with
Autism Spectrum Disorders (ASD) before the age of
3 years old and early diagnosis is generally valid and
stable over time (see Woolfenden et al. 2012 for review).
Similarly, despite continuing limitations and gaps in
knowledge, the overall importance and effectiveness of
early, intensive comprehensive ASD-focused interven-
tions for improving young children’s outcomes has also
been supported by empirical evidence (see Magiati et al.
2012 for a recent review). Such advances in early
identification and intervention have prompted the devel-
opment and validation of various screening tools, in
order to identify as early and as accurately as possible
children who are at risk for ASD (see Robins and
Dumont-Mathieu 2006; Norris and Lecavalier 2010 for
reviews). Amongst the numerous screening tools devel-
oped, the Modified Checklist for Autism in Toddlers
(M-CHAT; Robins et al. 2001) is one of the most widely
employed and researched.
The Modified Checklist for Autism in Toddlers
(M-CHAT)
The M-CHAT was initially developed in the United States
as a screening tool for ASD in 16–30 month old children
(Robins et al. 2001, Kleinman et al. 2008, can be down-
loaded from http://www2.gsu.edu/*psydlr or www.
m-chat.org). It consists of 23 yes/no questions and it
can be quickly completed by a parent/caregiver who
knows the toddler well, without requiring professional
administration. It was designed for use with the general
population (i.e., as a Level 1 ‘‘primary’’ screener; Robins
2008), but it has also been used with populations at higher
risk of ASD (i.e., as a Level 2 ‘‘secondary’’ screener;
Eaves et al. 2006; Kleinman et al. 2008; Pandey et al.
2008; Snow and Lecavalier 2008).
H. C. Koh (&) � H. H. Lim � S. H. T. Choo
Department of Child Development, KK Women’s and Children’s
Hospital, Women’s Tower Level 5, 100 Bukit Timah Road,
Singapore 229899, Singapore
e-mail: [email protected]
S. H. Lim � G. J. Chan � M. B. Lin � I. Magiati
Department of Psychology, National University of Singapore,
Singapore, Singapore
123
J Autism Dev Disord (2014) 44:405–416
DOI 10.1007/s10803-013-1880-1
The Use of the M-CHAT in English-Speaking
Populations
The M-CHAT was initially validated with a sample of
1,293 18–30 month old children who did not have previous
DSM-IV diagnoses through well-baby check-ups at pae-
diatric or family medicine practices and early intervention
providers in the United States (Robins et al. 2001). Thirty
nine children were eventually diagnosed with ASD,
resulting in an incidence rate of 3.0 % for ASD in the
sample. A child was screened to be at risk for ASD either
by the ‘‘total’’ score (failing any 3 or more of the 23
M-CHAT items) or the ‘‘critical’’ score (failing any 2 or
more of 6 selected items: items 2- interest in other children;
7- pointing for interest; 9- bringing objects to show; 13-
imitation; 14- response to name; 15- following point).
These 6 items were selected on the basis that they best
discriminated between children who were diagnosed with
ASD and those who were not. The M-CHAT authors
recently recommended a ‘‘Best 7’’ score (failing any 2 of 7
selected items: items 2; 5- pretend play; 7; 9; 14; 15; and
20- parents concerned that child may be deaf, see
www.m-chat.org) as an alternative to the critical score
(Robins et al. 2010).
High sensitivity and specificity values of 0.80–0.99 were
reported for the M-CHAT by Robins et al. (2001) and by
Kleinman et al. (2008) in a larger follow up study. How-
ever, in both studies, the Positive Predictive Value (PPV)
for the initial screening was low (0.36), although it
increased to 0.68–0.74 for low and high risk samples
combined with a structured follow up telephone interview.
This increase was large for the low risk sample (from an
unacceptably low 0.11 for M-CHAT only, to 0.65 with the
follow-up interview), but only small for the high risk
sample (from 0.60 to 0.76). PPV of 0.30–0.50 are report-
edly acceptable in the screening literature (Glascoe 2005),
although still below chance level (i.e., *0.50). These
findings indicated that, when only the M-CHAT was used
with the low risk sample, there was a high rate of false
positives, therefore increasing over-referrals and possibly
incurring unnecessary healthcare costs, not to mention the
emotional impact of this for families. The follow up
interview was thus crucial for reducing the rate of false
positives using the M-CHAT in a low risk population, but
the use of the M-CHAT only was likely to be sufficient for
high risk populations.
Similarly, acceptable or high sensitivity values (i.e.,
[0.70) have been reported in other M-CHAT studies,
although the findings on specificity and PPV values have
not always been as robust (Eaves et al. 2006; Pandey et al.
2008; Robins 2008; Snow and Lecavalier 2008). The
general agreement is that the M-CHAT is a promising
screening tool for ASD and that it has shown respectable
psychometric properties when used without a follow-up
telephone interview in populations at high risk of ASD.
The Use of the M-CHAT with Non-English-Speaking
Populations
The heterogeneity of culturally defined behavioural
expectations across diverse countries and cultures poses a
challenge for the development and validation of any
screening tool for a behaviourally diagnosed group of
conditions such as ASD (Wallis and Pinto-Martin 2008).
For this reason, it is important to gather data on the clinical
utility and validity of promising tools in diverse societies.
A number of studies have been published on the clinical
utility of translated versions of the M-CHAT in Arab
countries (Seif Eldin et al. 2008), Spain (Canal-Bedia et al.
2011), China (Wong et al. 2004), Taiwan (Lung et al.
2011), Sri Lanka (Perera et al. 2009) and Japan (Inada et al.
2010). In most cases, acceptable or good clinical utility
values have been reported, but poor performance was
documented for a general population sample in Sri Lanka
(Perera et al. 2009).
Furthermore, different cut-off scores and/or scoring
methods have been suggested to yield better sensitivity and
specificity for translated versions of the M-CHAT with
Asian populations (Inada et al. 2010, Wong et al. 2004).
Inada et al. (2010) found acceptable sensitivity and speci-
ficity values in a general population sample in Japan, but
with a lower cut-off score of 2 for the total score (i.e., child
fails any 2 of the 23 items). They also created a shorter 9
item version (items 5, 6—pointing to make request, 7, 9,
13, 15—following point, 17—following eye gaze, 21—
understanding what is said, & 23—social referencing) with
higher internal consistency and found similarly acceptable
sensitivity and specificity values as reported by Robins
et al. (2001) and Kleinman et al. (2008) with a cut-off of 1
(i.e., child fails any 1 of these 9 items). Wong et al. (2004)
evaluated the M-CHAT with a general population sample
and a high risk sample in Hong Kong. They identified
7 M-CHAT items (2, 5, 7, 9, 13, 15, & 23) that best dis-
tinguished children with and without ASD and reported
that failing 2 of those 7 items yielded good sensitivity,
specificity and PPV values.
There have been no validation studies of the use of the
M-CHAT in Singapore, a South-East Asian multi-racial
population consisting of 74.2 % Chinese, 13.3 % Malay,
and 9.2 % Indian (Department of Statistics, Singapore
2012). At the same time, Singapore has many Western
influences, its main language of education and business is
English, and most parents of young children are proficient in
English, thus making the English M-CHAT appropriate for
use with most parents of young children in Singapore. Even
so, there may be cultural differences in parent report of early
406 J Autism Dev Disord (2014) 44:405–416
123
social and communication behaviours. It would therefore be
useful to examine whether different cut-off scores and/or
scoring methods result in higher clinical utility values with a
Singaporean sample of high-risk children.
Using the M-CHAT with Children Older Than 30
Months Old
Another question regarding the use of the M-CHAT pertains
to whether it can be employed to accurately screen older
toddlers and pre-school children (i.e.,[30–48 months old).
Eaves et al. (2006)’s study reported acceptable sensitivity
values of 0.77/0.92 (critical/total score) with high risk
17–48 month olds, but which were not as good as those
reported by Robins et al. (2001) and Kleinman et al. (2008).
Including older toddlers and pre-school children in Eaves
et al. (2006)’s sample was thought to possibly account for
their findings (Kleinman et al. 2008). A recently published
study with a population sample of children 20–67 months
old provided initial evidence that the M-CHAT could be
used with children 20–48 months of age, but should not be
used with children beyond 48 months old (Yama et al.
2012). Further investigation is required to ascertain if the
M-CHAT can indeed be employed to accurately screen
older children ([30–48 months). The effect on the
M-CHAT’s psychometric properties when used with chil-
dren older than the intended age range of 16–30 months also
needs to be further examined.
This Study: Research Aims/Questions
The present study thus aimed (1) to validate the use of the
M-CHAT with a high risk Singaporean sample; (2) to
compare the use of the three existing proposed M-CHAT
scoring methods (i.e., total, critical, and Best 7); and (3) to
investigate the potential clinical utility of the M-CHAT in a
high risk child population of ‘‘older’’ 30–48 month old pre-
school children. This study also aimed to determine if there
may be different optimal cut off scores than those reported
in the existing literature for the three scoring methods for
the Singaporean high risk sample. This study examined the
use of the M-CHAT with a high risk sample and not a low
risk sample, as the M-CHAT was already being used as
part of the initial screening assessment in a high risk
clinical population in Singapore.
Methods
Setting
The study was carried out in the Department of Child
Development (DCD), KK Women’s and Children’s Hospital
Singapore, a specialist multi-disciplinary clinic that provides
diagnostic and intervention services for children up to
7 years old referred for various developmental concerns.
The DCD is one of two leading public child development
units in the country for children less than 7 years old and sees
approximately 80 % of referred children in Singapore. Since
February 2009, the M-CHAT has been included in an
internally developed intake questionnaire that parents/care-
givers of newly referred 0–48 month old children are rou-
tinely asked to complete before their first appointment with a
consultant paediatrician. This routine incorporation of the
M-CHAT prior to the first appointment provides an oppor-
tunity to evaluate its use with high risk Singaporean children
who are within or older than the tool’s intended age range.
Procedure
This study was approved by the Singhealth Centralized
Institutional Review Board. A waiver of informed consent
for use of patient data was obtained as the study was a
retrospective review of clinical records and involved data
routinely collected as part of clinical care.
Parents/caregivers of newly referred children completed
an intake questionnaire about their child, which included
the M-CHAT, before their first appointment with a paedi-
atrician at DCD. Thus, although the data from the clinical
records were obtained retrospectively, the M-CHAT was
completed by the caregivers prospectively prior to or just
before their first contact with DCD professionals. The
intake questionnaire was available in English and Chinese.
The Chinese translation of the M-CHAT, printed in tradi-
tional Chinese characters, was obtained from the M-CHAT
authors’ website (http://www2.gsu.edu/*psydlr). It was
then amended to include simplified Chinese characters and
grammatical structure, which Singaporeans are more
familiar with. Of the 580 children included in the study,
94.8 % of parents/caregivers (N = 550) completed the
questionnaire in English and 5.2 % (N = 30) in Chinese.
At the first appointment, the paediatrician observed the
child, and obtained developmental and medical history
from parents/caregivers. Provisional diagnoses of ASD,
global developmental delay, speech and language delay,
motor delay, behavioural or other difficulties were made.
The paediatrician then made the necessary referrals to
DCD diagnostic and intervention services (e.g., psycho-
logical assessment and/or early intervention, occupational
or speech and language assessment/therapy). Children with
a provisional diagnosis of ASD or who were observed to
show ASD features were typically referred for a more
comprehensive ASD diagnostic assessment by the team’s
psychologists who were trained to assess and diagnose
ASD in order to clarify and confirm diagnosis, to rule out
other diagnoses and to make further educational or
J Autism Dev Disord (2014) 44:405–416 407
123
intervention recommendations. The provisional diagnoses
of ASD by the paediatricians, and the confirmed diagnoses
of ASD by the psychologists, are based on DSM-IV-TR
criteria (APA 2000).
ASD Diagnostic Assessment at DCD: Procedure
and Measures
The ASD diagnostic assessment at DCD typically
includes direct semi-structured observations of the child
using the Autism Diagnostic Observation Schedule-Gen-
eric (ADOS-G; Lord et al. 2000) and a detailed internally
developed clinician-directed semi-structured interview
with the parent(s)/caregiver(s) to obtain ASD-specific
developmental history and information about the child’s
ASD-specific and other behaviours and level of func-
tioning in different contexts (e.g., home, preschool).
Depending on the child’s presentation and the consultant
pediatrician’s recommendation, some children may also
complete a developmental/cognitive/adaptive functioning
assessment as appropriate, using well established and
widely employed measures, including the Vineland
Adaptive Behaviour Scales—Second Edition (Vineland-II;
Sparrow et al. 2005), the Bayley Scales of Infant and
Toddler Development (3rd Edition; Bayley-III; Bayley
2006), the Wechsler Preschool and Primary Scale of
Intelligence—Third Edition (WPPSI—III; Wechsler 2002)
or the Differential Ability Scales—Second Edition
(DAS—II, Elliot 2007; see Fig. 1 for a flowchart of
procedure from referral to diagnosis).
Participants
Data were obtained from the medical records of
18–48 month old children first seen at DCD from February
2009 to July 2010. The extraction of data from the medical
records took place between September 2010 and Septem-
ber 2011. Medical records of 1,056 patients (82.3 % of the
1,283 total number of referrals) were accessed. Not all
medical records for all DCD referrals were accessed,
because some were required for follow-up appointments
and were thus not available at the time when they were
requested for data collection.
Children who had already obtained a clinical diagnosis
from other public or private healthcare institutions before
they sought advice from DCD were excluded (N = 13).
Children whose provisional diagnosis from the paediatri-
cian’s first evaluation was not clearly documented in their
medical records and who did not have an ASD diagnostic
assessment to indicate a definitive diagnosis of ASD were
also excluded (N = 13). Of the remaining 1,030 children
considered for analysis, the M-CHAT return rate was
78.2 % (N = 808) and its completion rate was 56.3 %
(N = 580). Thus, the final validation sample consisted of
580 children (see Fig. 2 for sample selection process).
Only data from children whose caregivers completed the
M-CHAT before their first appointment at DCD were used
for data analyses, to ensure that ratings were independent
of the professionals’ opinion. There were no statistically
significant differences between the children whose parents/
caregivers returned and completed the M-CHAT before
their first appointment at DCD (N = 580) and the children
whose parents/caregivers returned it incomplete/late or did
not complete it at all (N = 450) in any of the child or
family characteristics measured (Table 1), strongly sug-
gesting that the included participants were likely to be
generally representative of the population typically seen at
DCD.
Of the 580 children, 173 were 18–30 months (29.8 %)
and the incidence rate of ASD in this younger high-risk
group was 30.6 % (N = 53). There were 407 30–48 month
old children (70.1 %) with an incidence rate of ASD of
35.9 % (N = 146). The remaining children were found to
be developing within normal limits (N = 28, 7.4 %), or
were given other non-ASD diagnoses by the paediatrician
[Global Developmental Delay (GDD: N = 73, 19.2 %);
Speech and Language Disorders (N = 224, 58.8 %);
attention and behaviour issues (N = 34, 8.9 %); learning
problems (N = 5, 1.3 %); cerebral palsy/motor delays
(N = 7, 1.8 %); hearing impairment (N = 3, 0.8 %);
environment-related delay (N = 6, 1.6 %); and syndromal
disorders (i.e. congenital rubella syndrome; N = 1, 0.3 %).
Some children (N = 124, 12.0 % of the 1,030 children
considered for data analysis) received an initial diagnosis
of ASD by the DCD paediatrician after their first
appointment, but their ASD psychological diagnostic
assessment was delayed at parents’ request and had not
been completed by the time data collection was terminated.
The percentage of agreement between the earlier provi-
sional diagnosis by the paediatrician following the first
appointment (ASD versus no ASD) and the child’s con-
firmed diagnosis following the ASD diagnostic assessment
for the children who received a ASD diagnostic assessment
(N = 269, 26.1 % of the 1,030 children) was very high at
92.6 %. For this reason, data from these children were not
excluded and the children with ASD who had a completed
M-CHAT (N = 199) either received a confirmed diagnosis
from an ASD diagnostic assessment (N = 129, 65 %) or a
diagnosis provided by the paediatrician (N = 70, 35 %),
after the first appointment if no further ASD specific
diagnostic assessment was carried out.
Data Analysis
The internal consistency, sensitivity, specificity, PPV and
negative predictive value (NPV) of the M-CHAT using the
408 J Autism Dev Disord (2014) 44:405–416
123
three scoring methods were examined for the younger
(18–30 months) and the older age groups ([30–48 months)
separately. Sensitivity values of more than 0.70 and spec-
ificity values closer to 0.80 (Glascoe 2005) with both being
equally weighted are preferred for high risk samples (Snow
and Lecavalier 2008). There are no recommended values
for PPV in the literature and low PPV values of 0.30–0.50
are reportedly not unusual (Glascoe 2005). Even so, a PPV
of over 0.50 would be preferred, so that the precision of the
M-CHAT for identifying a child at risk or not of ASD is
better than chance.
Chi squared tests were conducted to examine any sig-
nificant differences in the distribution of true positives,
false positives, true negatives and false negatives with the
three scoring methods in the younger and older age groups.
Inspection of the distributions, and post hoc procedures
conducted with standard residuals (values [2.0 indicate
observed frequency higher or lower than expected), indi-
cated where significant differences might exist.
The performance of the M-CHAT in terms of the trade-
off between sensitivity and specificity was examined using
Receiver Operating Characteristic (ROC) curves for the
younger and older age groups and for the three M-CHAT
scoring methods (Fawcett 2006). The closer the area under
the curve (AUC) to 1.0 is or the bigger the area, the better
the tool’s performance is. The AUC values were compared
for ROC curves for the scoring methods in the two age
groups to determine if any set of M-CHAT items is better
for a certain age group and if there were age group dif-
ferences in the clinical performance of the different
M-CHAT scoring methods. Inspection of the sensitivity
and specificity values at different cut-off scores for each
scoring method on the ROC curves was carried out to
provide information on optimal cut-off scores in this
sample. All data were analysed with IBM SPSS Statistics
version 19, except for the comparison of the areas under
the independent ROC curves, which was conducted using
AUC values and standard errors obtained from SPSS with
the MedCalc version 12 (MedCalc Software, Mariakerke,
Belgium).
Results
Internal Consistency
The full M-CHAT showed good internal consistency for
both age groups. The critical 6 items showed good internal
consistency for the younger and acceptable internal con-
sistency for the older age group. The Best 7 items showed
Child referred to DCD
Child first seen by DCD pediatrician (also case manager)
Caregivers posted/ emailed intake questionnaire to complete and return by post/email to DCD
before first appointment
DCD pediatrician provides initial provisional diagnosis and makes referrals for assessment/ intervention as
needed
If there is provisional diagnosis or query for ASD,paediatrician makes referral to team’s psychologist for
ASD diagnostic assessment
DCD psychologist makes diagnosis of ASD based on direct observations using the ADOS-G, and parent/caregiver report on child’s developmental history and current
functioning in different contexts; additional cognitive and/ or adaptive behavior functioning assessment carried out for some children as appropriate
All DCD patients are followed up by the DCD pediatrician half yearly or yearly until they are discharged from DCD services at 7 years old.
Fig. 1 Standard referral and
assessment procedures for
newly referred patients at DCD
KKH
J Autism Dev Disord (2014) 44:405–416 409
123
acceptable internal consistency for both age groups
(Table 2).
Clinical Utility of the M-CHAT in the Two Age Groups
The sensitivity, specificity, PPV and NPV for the younger
and older age groups for each of the three scoring methods
separately1 are presented in Table 3. For the younger age
group, the critical and the Best 7 scoring method appear to
have the best balance of sensitivity, specificity, PPV and
NPV, with both sensitivity and specificity values [0.70.
For the older age group, the total scoring method shows the
best balance of sensitivity and specificity, both [0.70 and
acceptable PPV.
Chi Squared Analysis of True/False Positive
and Negative Rates
The distribution of true positives, false positives, true
negatives and false negatives for all the three M-CHAT
scoring methods were significantly different between the
younger and the older age groups (total: v2(df = 3) =
11.1, p = 0.011; critical 6: v2(df = 3) = 22.3, p \ 0.001;
Best 7: v2(df = 3) = 23.8, p \ 0.001). This finding was
largely explained by a higher than expected proportion
of false positives in the younger age group for all three
scoring methods, and lower than expected false negatives
in the younger age group for the critical 6 and Best 7
scores. The three scoring methods were all more likely
to classify younger children \30 months old without
ASD as at risk of ASD. The critical and Best 7 scoring
methods were more likely to miss children with ASD
in the older as compared to the younger age group
(Table 4).
New referrals of 18-48 month old children to DCD [February 2009 to July 2010] (N=1283)
Medical records accessed during study’s timeframe [September 2010 - September 2011]
(N=1056)
Medical records NOT accessed during study’s timeframe
(N=227)
Children with confirmed ASD diagnoses from other agencies
prior to DCD appointment (N=13)
Children without clearly documented diagnosis in medical
records (N=13)
Data included in analyses (N=1030)
M-CHAT returned prior to first appointment (N=808)
FINAL SAMPLE: Children with completed M-CHAT (N=580)
M-CHAT NOT returned prior to first appointment (N=222)
M-CHAT returned but not completed/ partially completed
(N=228)
Fig. 2 Sample selection
process
1 The psychometric properties of the M-CHAT were reported
separately for each of the 3 scoring methods, instead of combined
as in previous studies (i.e., failing either total or critical 6; and failing
either total or Best 7). Combining the scoring methods was likely to
increase sensitivity, but to decrease specificity, which was not
desirable for screening a population already at risk of developmental
issues. Further analyses only briefly reported here also indicated that
values of sensitivity (Sn), specificity (Sp), PPV and NPV of using the
combined scoring methods were very similar to those obtained from
the total scoring method alone for both age groups, except that there
was a slight increase in sensitivity for the scoring method of either
total or Best 7 in the younger age group. For younger age group,
either total or critical scoring method yielded Sn: 0.89, Sp: 0.58, PPV:
0.49, NPV: 0.92; either total or Best 7 scoring method yielded Sn:
0.93, Sp: 0.58, PPV: 0.50, NPV: 0.95. For the older age group, either
total or critical 6 scoring method yielded Sn: 0.76, Sp: 0.71, PPV:
0.59, NPV: 0.84; and either total or Best 7 scoring method yielded Sn:
0.77, Sp: 0.71, PPV: 0.59, NPV: 0.85.
410 J Autism Dev Disord (2014) 44:405–416
123
ROC Curve Analysis
The AUC values for all three scoring methods were all
significantly different from an AUC value of 0.50 (i.e.,
chance), as indicated by the z statistics and p values (see
Table 5; Figs. 3, 4). The AUC values were closer to 1.0
than 0.5, indicating that the three scoring methods could all
be considered good tests for both age groups.
The AUC values for the three scoring methods were not
significantly different from one another within each age
Table 1 Demographic characteristics of participants and non-participants
Participants (complete
M-CHAT; N = 580)
Excluded (incomplete or no
M-CHAT; N = 450)
Statistics
Mean (SD) or N (%) Mean (SD) or N (%)
Child’s age at first consultation 2 years 11 months (8 months) 2 years 11 months (8 months) t(1,028) = 0.780, p = 0.44
Gender
Male 435 (75 %) 341 (75.8 %) v2(1) = 0.08, p = 0.77
Female 145 (25 %) 109 (24.2 %)
Race
Chinese 444 (76.6 %) 336 (74.8 %) v2(df = 3) = 1.03, p = 0.79
Malay 52 (9.0 %) 46 (10.2 %)
Indian 41 (7.1 %) 29 (6.5 %)
Othersa 43 (7.4 %) 38 (8.5 %)
Missing 1
Age at first word 1 year 6 months (8 months) 1 year 6 months (7 months) t(df = 751) = 1.09, p = 0.28
Missing N = 126 Missing N = 151
ASD diagnosis
Yes 199 (34.3 %) 158 (35.1 %) v2(df = 1) = 0.11, p = 0.74
No 381 (65.9 %) 292 (64.9 %)
Father’s age at first visit 37 years 7 months
(5 years 8 months)
37 years 9 months
(6 years 1 month)
t(df = 998) = -0.316, p = 0.75
Missing N = 14 Missing N = 16
Mother’s age at first visit 34 years (4 years 8 months) 33 years 11 months
(4 years 7 months)
t(df = 1,022) = 0.358, p = 0.72
Missing N = 4 Missing N = 2
Combined monthly family income
\$1,200 23 (4.0 %) 21 (5.2 %) v2(df = 3) = 5.03, p = 0.17
$1,200–3,000 123 (21.5 %) 100 (24.8 %)
$3,000–5,000 159 (27.9 %) 89 (22.0 %)
[$5,000 266 (46.6 %) 194 (48.0 %)
Missing data 9 46
Highest attained parent education level
\A-Levels N = 139(24.1 %) 126 (29.0 %) v2(df = 3) = 3.80, p = 0.28
Diploma 116 (20.1 %) 87 (20.0 %)
Degree 238 (41.2 %) 158 (36.3 %)
Postgraduate 84 (14.6 %) 64 (14.7 %)
Missing data 3 15
a Includes Bangladeshi, Burmese, Caucasian, Eurasian, Filipino, Indonesian, Japanese, Korean, Nepalese, Pakistani, Sikh, Sri Lankan, Thai,
Vietnamese, Other races
Table 2 Internal consistency (Cronbach’s alphaa) for the three
M-CHAT scoring methods
Age group Total 23 item
M-CHAT
6 critical item
M-CHAT
Best 7 item
M-CHAT
18–30 months 0.86b 0.81b 0.79c
[30–48 months 0.82b 0.75c 0.75c
a Cronbach’s alpha values of [0.6 are ‘‘questionable’’, [0.7
‘‘acceptable’’ and[0.80 ‘‘good’’ (George and Mallery 2003); b values
indicating good internal consistency; c values indicating acceptable
internal consistency
J Autism Dev Disord (2014) 44:405–416 411
123
group (younger age group: z statistics = 0.60–1.33,
p = n.s.; older age group: z statistics = 0.40–0.86,
p = n.s.). This indicated that no one particular set of items
performed substantially better than the other. When the
AUC values of the three methods were compared between
the two age groups, there was no significant age group
difference for the total scoring method (z statistics = 1.25,
p = n.s.); there was a significant age group difference for
the critical scoring method (z statistics = 2.23, p = 0.03);
and a near significant difference for the Best 7 scoring
method (z statistics = 1.93, p = 0.05). The AUC values
indicated that the critical scoring method performed better
in the younger as compared to the older age group.
Optimal Cut-Offs for the Younger Singaporean Age
Group
Inspection of the data suggests that the originally recom-
mended cut-off score of 2 in the literature for the critical
and Best 7 scoring methods was also optimal in the
younger age group in this sample (Table 6). The originally
recommended cut-off score of 3 for the total scoring
method however was not optimal in this sample; instead a
cut off score of 5 better differentiated children with and
those without ASD but with other developmental issues in
this study, while maintaining sensitivity at [0.70.
Optimal Cut-Offs for the Singaporean Older
Age-Group
The originally recommended cut-off score of 3 in the total
scoring method was also optimal in the older pre-schoolers
in the present high-risk sample (Table 7). However, the
recommended cut-off score of 2 for the critical and the Best
7 scoring methods was not optimal; instead, a cut-off score
of 1 was optimal in this study for the older age group.
Discussion
The M-CHAT performed adequately in screening for ASD
in 18–30 month old Singaporean children in this study,
Table 3 M-CHAT Clinical utility values by scoring method and age group in a high risk Singaporean sample (N = 580)
Children’s age group Criterion used Sensitivity (95 % CI) Specificity (95 % CI) PPV (95 % CI) NPV (95 % CI)
18–30 months Total (3/23) 0.89 (0.76–0.95) 0.59 (0.50–0.68) 0.49 (0.39–0.59) 0.92 (0.83–0.97)
Critical (2/6)a 0.75 (0.61–0.86) 0.78 (0.70–0.85) 0.61 (0.48–0.72) 0.88 (0.80–0.93)
Best (2/7)a 0.81 (0.68–0.90) 0.78 (0.69–0.84) 0.61 (0.49–0.73) 0.90 (0.83–0.95)
[30–48 months Total (3/23)a 0.76 (0.68–0.83) 0.72 (0.66–0.77) 0.60 (0.53–0.67) 0.84 (0.79–0.89)
Critical (2/6) 0.53 (0.45–0.62) 0.92 (0.87–0.95) 0.78 (0.68–0.85) 0.78 (0.73–0.82)
Best (2/7) 0.54 (0.46–0.62) 0.90 (0.86–0.93) 0.75 (0.66–0.83) 0.78 (0.70–0.82)
a Indicates scoring method that yielded a good balance of sensitivity, specificity, PPV and NPV values
Table 4 Distribution of M-CHAT true and false positives and negatives by scoring method and age group in a high risk Singaporean sample
(N = 580)
Age group Criterion/scoring method True positives False positives True negatives False negatives
N (% within age group), standard residuals
18–30 months Total (3/23) 47 (27.2 %), \0.10 49 (28.3 %), 2.00a 71 (41.0 %), -0.70 6 (3.5 %), -1.80
Critical (2/6) 40 (23.1 %), 0.80 26 (15.0 %), 3.10a 94 (54.3 %), -0.50 13 (7.5 %), -2.30a
Best (2/7) 43 (24.9 %), 1.10 27 (15.6 %), 2.80a 93 (53.8 %), -0.50 10 (5.8 %), -2.70a
[30–48 months Total (3/23) 111 (27.3 %), \0.10 74 (18.2 %), -1.30 187 (45.9 %), 0.40 35 (8.6 %), 1.20
Critical (2/6) 78 (19.2 %), -0.50 22 (5.4 %), -2.00a 239 (58.7 %), 0.30 68 (16.7 %), 1.50
Best (2/7) 79 (19.4 %), -0.70 26 (6.4 %), -1.80 235 (57.7 %), 0.30 67 (16.5 %), 1.80
a Standard residuals of 2.0 or more indicate that the observed proportion is significantly different from the expected proportion
Table 5 AUC values and standard errors for ROC curves for the
three M-CHAT scoring methods by age group
Age group M-CHAT
scoring method
AUC Standard
error
Z statistica
18–30 months 23 item 0.84 0.032 10.4
6 item 0.86 0.029 12.4
7 item 0.85 0.033 10.6
[30–48 months 23 item 0.78 0.025 11.3
6 item 0.78 0.024 11.7
7 item 0.77 0.024 11.2
a all significantly different from an AUC value of 0.50 at the
p \ 0.0001 level
412 J Autism Dev Disord (2014) 44:405–416
123
providing support for its use. The three different scoring
methods recommended in existing literature correctly
identified many true cases of ASD in this sample with
sensitivity values above 0.70 and comparable to earlier
research (Eaves et al. 2006; Kleinman et al. 2008; Robins
et al. 2001; Snow and Lecavalier 2008). The critical and
Best 7 scoring method were also good at correctly identi-
fying children who did not have ASD, with specificity
values above 0.70. The total scoring method was less good
at correctly identifying children who did not have ASD,
with a specificity value of slightly under 0.70.
The false positive rate was acceptably low for the crit-
ical and Best 7 scoring methods, as these also yielded
healthy PPV values [0.60. The total scoring method
resulted in a false positive rate that was higher than chance,
with a PPV of slightly below 0.50, although low PPVs of
0.30–0.50 are not unusual (Glascoe 2005). It has been
argued that the cost of under-diagnosis is likely to be
higher than that of over-diagnosis. Furthermore, children
who have been ‘‘falsely’’ identified by the M-CHAT in
high-risk samples are also likely to have needs requiring
early intervention. Thus, overall, the clinical utility values
of the M-CHAT for the three scoring methods with the
18–30 month old high risk Singaporean sample are at least
acceptable, if not good.
There was also opportunity to explore the potential
clinical utility of the M-CHAT for children older than the
initially intended age range of \30 months. Practically, it
would be highly preferable for any clinic to employ one
tool to screen for ASD in children up to 48 months old, if
the tool could demonstrate good clinical utility with the
older age group.
In the present high risk sample, the three M-CHAT
scoring methods performed differently for the younger and
older age groups. All three scoring methods were more
likely to classify younger children\30 months old without
ASD as at risk of ASD. Pandey et al. (2008) found similar
results with 16–23 month olds and 24–30 month olds, but
the false positive rate was higher in the younger than older
children only in a low risk sample. This finding is consis-
tent with general screening literature, as developmental
delays are more likely to indicate pervasive developmental
issues with older than younger children.
The critical and Best 7 scoring methods were more
likely to miss children with ASD in the older as compared
to the younger age groups. Only the total score showed
acceptable clinical utility values for the older age group,
making it more suitable to screen for ASD in children
30–48 months old. A couple of studies have reported
similar findings (Eaves et al. 2006; Snow and Lecavalier
2008), however they did not have sufficient sample size to
reach firm conclusions. Eaves et al.’s study (2006) with 84
17–48 month old high risk children reported a higher
sensitivity value of 0.92 with the total score as compared to
0.77 with the critical score. The specificity values were
unacceptable for both scoring methods, but PPVs were
good. Snow and Lecavalier (2008) reported high sensitivity
and low specificity values with 56 18–48 month old high
risk children with both total and critical 6 scoring methods.
Fig. 3 ROC curves for the M-CHAT total, critical 6 and Best 7
scoring methods for the younger age group (\30 months)
Fig. 4 ROC curves for the M-CHAT total, critical 6 and Best 7
scoring methods for the older age-group ([30–48 months)
J Autism Dev Disord (2014) 44:405–416 413
123
In their study, the total score performed better than the
critical score to screen for ASD in the older group of 39
30–48 month old children. The M-CHAT items tap on
early developmental abilities or behaviours (e.g., imitation,
showing, pointing, functional play) and for this reason its
items are likely to be more relevant and developmentally
appropriate for 18–30 month old children. It is possible
that some of these earlier behaviours may have already
been mastered in older children with ASD which could
explain the low true positive rates in older children. This
could also explain why it may be necessary to use collec-
tive information from all 23 M-CHAT items to capture
atypical development consistent with ASD in the older
children.
The ROC curve analysis provided further evidence of
the M-CHAT as a good screening tool with adequate AUC
values in high risk younger and older children. Although a
power analysis indicated that there was a high risk of a type
II error (i.e., not finding a significant difference when in
fact there was one), when comparing the AUC values, the
critical scoring method performed better for the younger
than the older age group and the Best 7 method also
appeared to be somewhat better for the younger than the
older age group.
Different cut-off scores than those initially suggested by
the M-CHAT developers (Robins et al. 2001) for some of
the M-CHAT scoring methods were found to be optimal
for the present Singaporean high risk sample. The original
cut-off scores recommended by the M-CHAT authors were
optimal for the critical and Best 7 scoring methods for the
younger age group, which were the two ‘‘best’’ scoring
methods in terms of clinical utility for this group. However,
if the total score was to be employed for the younger age
group, a higher cut-off score of 5 was considered optimal
compared to the initially recommended cut-off score of 3.
The original cut-off scores recommended by the M-CHAT
authors were found to be optimal for the total scoring
method for the older age-group in our study, but a lower
cut-off score of 1 should be considered if the critical 6 or
the Best 7 methods are to be used with the older age group.
Table 6 Sensitivity and specificity values at different cut-off scores for the three M-CHAT scoring methods for the younger age-group
(\30 months)
Total 23 Critical 6 Best 7
Cut-off score Sensitivity Specificity Cut-off score Sensitivity Specificity Cut-off score Sensitivity Specificity
1 1.00 0.18 1 0.94 0.62 1 0.89 0.62
2 0.94 0.43 2a,b 0.76 0.78 2a,b 0.81 0.78
3a 0.89 0.59 3 0.66 0.87 3 0.64 0.87
4 0.83 0.69 4 0.58 0.93 4 0.53 0.96
5b 0.74 0.74 5 0.28 0.98
6 0.70 0.78
7 0.64 0.82
8 0.60 0.89
a Original cut-off score recommended by M-CHAT authors; b Optimal cut-off score in the current sample
Table 7 Sensitivity and specificity values at different cut-off scores for the three M-CHAT scoring methods for the older age-group
([30–48 months)
Total 23 Critical 6 Best 7
Cut-off score Sensitivity Specificity Cut-off score Sensitivity Specificity Cut-off score Sensitivity Specificity
1 0.94 0.21 1b 0.73 0.73 1b 0.71 0.73
2 0.82 0.44 2a 0.53 0.91 2a 0.54 0.90
3a,b 0.76 0.72 3 0.36 0.97 3 0.40 0.98
4 0.64 0.84 4 0.21 0.99 4 0.27 0.98
5 0.55 0.90 5 0.14 1.00
6 0.47 0.92
7 0.40 0.95
8 0.35 0.97
a Original cut-off score recommended by M-CHAT authors for children\30 months; b Optimal cut-off score in the current sample for the older
age group
414 J Autism Dev Disord (2014) 44:405–416
123
Limitations and Recommendations for Future Research
Although the included sample appeared to be representa-
tive of the population typically seen at DCD (see Table 1),
the low completion rate of 56.3 % indicates that just over
half of caregivers/parents completed the M-CHAT. Child
development units who are considering employing the
M-CHAT as a Level II screening tool may need to inves-
tigate internal processes to increase completion rates in
order to enhance and accelerate the screening and diag-
nostic process for ASD.
Furthermore, the completion of the M-CHAT could not
be repeated to examine test–retest reliability nor were
M-CHATs obtained from more than one caregiver to
explore reliability between different informants. It is rec-
ommended that future research examines test–retest reli-
ability and the potential impact on clinical utility of
respondents’ characteristics We are currently working on
exploring whether the clinical utility of the M-CHAT could
be affected by parents’ educational or ethnic/language
background, other caregiver/family factors or child char-
acteristics (Koh et al. in preparation).
Finally, this study was a retrospective review of clinical
records and examined the routine use of the M-CHAT as
part of standard clinical procedures. However, it should be
noted that although all patient information and M-CHAT
data was retrospectively obtained, all caregivers prospec-
tively completed the M-CHAT prior to their first appoint-
ment and thus there were no retrospective memory or
professional contact biases influencing parents’ ratings.
The children included in this study are still being devel-
opmentally monitored by DCD paediatricians until they are
7 years old or until they are appropriately placed in a
formal education setting. The data that was captured in the
data extraction period would not have documented any
changes in diagnosis after this time period. There is a small
possibility that there may be children who were given a
diagnosis of ASD by the paediatrician only, without a full
assessment by the psychologists during the course of the
study, but who might have received an ASD or other
diagnosis at a later time. However, only 12 % of the par-
ticipants were included in this study based on pediatrician’s
initial diagnosis alone; thus, given diagnostic stability and
very high agreement between paediatricians initial and
final diagnosis, this possibility is small.
Implications of Our Findings for ASD Screening
in Clinical Settings and Recommendations
The M-CHAT was found to be a good tool to screen for
ASD in developmentally at-risk children 18–30 months old
in Singapore. It was also found to be acceptable as a
screening tool for older high risk children [30–48 months
old, although the total scoring method is preferred and
different cut-off scores were recommended if the critical or
the Best 7 methods are used.
The critical and Best 7 scoring methods were found to
perform better than the total scoring method for screening
18–30 month old high risk children. Thus, it is recom-
mended that these two scoring methods should be consid-
ered for use with parents/caregivers of 18–30 month
children. An optimal cut-off of 5 for the total scoring
method was found to better differentiate children with ASD
from children without ASD who may have other devel-
opmental issues in this study. However, given their com-
parable clinical utility values (see Tables 3, 6), a shorter
version of M-CHAT consisting of the critical 6 and/or
Best 7 items could be piloted, as this would be easier and
quicker to complete and could increase completion rates
while maintaining the strongest clinical utility compared to
the more time-consuming 23 item M-CHAT for this
younger age group. However, for the 30–48 month old
high risk children, the total scoring method was best and
thus the whole questionnaire should routinely be adminis-
tered for this older age group.
Adding to existing literature that has found the
M-CHAT to be a useful screening tool in non-western
populations, this study further contributes to existing lit-
erature reporting on the usefulness of the M-CHAT as a
screening tool in non-western population by providing
further evidence of its usefulness in diverse multi-cultural
countries. Our findings also strongly suggest that the
M-CHAT is likely to be useful for screening children older
than the intended age range of the tool who are \4 years
old. It is important that future research studies and clinical
use of the M-CHAT systematically explores its use in
different settings, age groups and using different scoring
methods, to establish best ‘‘fit’’ use for the specific context
and population.
Acknowledgments We acknowledge Charissa Wong who partici-
pated in data collection for the study. We would like to thank Tang
Hui Nee, Yang Phey Hong, and Cheong Wan Mui, who provided
valuable advice during the course of the research. Special mention
also goes to Jennifer Tan, Suraya Amir, and other administrative staff
of the Department of Child Development, KK Women’s and Chil-
dren’s Hospital, who kindly assisted with the data collection process.
References
American Psychiatric Association. (2000). Diagnostic and statistical
manual of mental disorders (4th ed., text rev.). Washington, DC:
Author.
Bayley, N. (2006). Bayley Scales of Infant and Toddler Development.
San Antonio, TX: The Psychological Corporation.
Canal-Bedia, R., Garcıa-Primo, P., Martın-Cilleros, M. V., Santos-
Borbujo, J., Guisuraga-Fernandez, Z., Herraez-Garcıa, L., et al.
(2011). Modified checklist for autism in toddlers: Cross-cultural
J Autism Dev Disord (2014) 44:405–416 415
123
adaptation and validation in Spain. Journal of Autism and
Developmental Disorders, 41(10), 1342–1351.
Department of Statistics, Singapore. (2012). Population trends 2012.
Retrieved 29 May 2013, from Statistics Singapore website:
http://www.singstat.gov.sg/.
Eaves, L. C., Wingert, H., & Ho, H. H. (2006). Screening for autism:
Agreement with diagnosis. Autism, 10, 229–242.
Elliott, C. D. (2007). Differential ability scales (2nd ed.). San
Antonio, TX: Psychological Corporation.
Fawcett, T. (2006). An introduction to ROC analysis. Pattern
Recognition Letters, 27(8), 861–874.
George, D., & Mallery, P. (2003). SPSS for windows step by step: A
simple guide and reference. 11.0 update (4th ed.). Boston: Allyn
& Bacon.
Glascoe, F. P. (2005). Screening for developmental and behavioural
problems. Mental Retardation and Developmental Disabilities
Research Reviews, 11(3), 173–179.
Inada, N., Koyama, T., Inokuchi, E., Kuroda, M., & Kamio, Y. (2010/
2011). Reliability and validity of the Japanese version of the
Modified Checklist for Autism in Toddlers (M-CHAT).
Research in Austism Spectrum Disorders, 5(1), 330–336.
Kleinman, J. M., Robins, D. L., Ventola, P. E., Pandey, J., Boorstein,
H. C., Esser, E. L., et al. (2008). The modified checklist for
autism in toddlers: A follow-up study investigating the early
detection of autism spectrum disorders. Journal of Autism and
Developmental Disorders, 38(5), 827–839.
Lord, C., Risi, S., Lambrecht, L., Cook, E. H., Jr, Leventhal, B. L.,
DiLavore, P. C., et al. (2000). The autism diagnostic observation
schedule-generic: A standard measure of social and communi-
cation deficits associated with the spectrum of autism. Journal of
Autism and Developmental Disorders, 30(3), 205–223.
Lung, F.-W., Chiang, T.-L., Lin, S.-J., & Shu, B.-C. (2011). Autism-
risk screening in the first 3 years of life in Taiwan Birth Cohort
Pilot Study. Research in Autism Spectrum Disorders, 5(4),
1385–1389.
Magiati, I., Tay, X. W., & Howlin, P. (2012). Early comprehensive
behaviourally based interventions for children with autism
spectrum disorders: A summary of findings from recent reviews
and meta-analyses. Neuropsychiatry, 2(6), 543–570.
Norris, M., & Lecavalier, L. (2010). Screen accuracy of level 2 autism
spectrum disorder rating scales. A review of selected instru-
ments. Autism, 14, 263–284.
Pandey, J., Verbalis, A., Robins, D., Boorstein, H., Klin, A., Babitz,
T., et al. (2008). Screening for autism in older and younger
toddlers with the modified checklist for autism in toddlers.
Autism, 12(5), 513–535.
Perera, H., Wijewardena, K., & Aluthwelage, R. (2009). Screening of
18–24-month-old children for autism in a semi-urban community
in Sri Lanka. Journal of Tropical Pediatrics, 55(6), 402–405.
Robins, D. L. (2008). Screening for autism spectrum disorders in
primary care settings. Autism, 12(5), 537–556.
Robins, D. L., & Dumont-Mathieu, T. (2006). Early screening for
autism spectrum disorders: Update on the modified checklist for
autism in toddlers and other measures. Journal of Developmental
& Behavioural Pediatrics, 27(Supplement 2), S111–S119.
Robins, D. L., Fein, D., Barton, M. L., & Green, J. A. (2001). The
modified checklist for autism in toddlers: An initial study
investigating the early detection of autism and pervasive
developmental disorders. Journal of Autism and Developmental
Disorders, 31, 131–144.
Robins, D. L., Pandey, J., Chlebowski, C., Carr, K., Zaj, J. L., Arroyo,
M., Barton, M. L., Green, J., & Fein, D. A. (2010, May).
M-CHAT Best7: A new scoring algorithm improves positive
predictive power of the M-CHAT. Paper presented at the
International Meeting for Autism Research (IMFAR) in Phila-
delphia, USA.
Seif Eldin, A., Habib, D., Noufal, A., Farrag, S., Bazaid, K., Al-
Sharbati, M., et al. (2008). Use of M-CHAT for a multinational
screening of young children with autism in the Arab countries.
International Review of Psychiatry, 20(3), 281–289.
Snow, A. V., & Lecavalier, L. (2008). Sensitivity and specificity of the
modified checklist for autism in toddlers and the social commu-
nication questionnaire in preschoolers suspected of having perva-
sive developmental disorders. Autism, 12(6), 627–644.
Sparrow, S. S., Cicchetti, D., & Balla, D. (2005). Vineland adaptive
behavior scales-II. Circle Pines, MN: American Guidance Service.
Wallis, K. E., & Pinto-Martin, J. (2008). The challenge of screening
for autism spectrum disorder in a culturally diverse society. Acta
Paediatrica, 97(5), 539–540.
Weschler, D. (2002). Wechsler preschool and primary scale of intelli-
gence (3rd ed.). San Antonio, TX: Psychological Corporation.
Wong, V., Hui, L. H., Lee, W. C., Leung, L. S., Ho, P. K., Lau, W. L.,
et al. (2004). A modified screening tool for autism (checklist for
autism in toddlers [CHAT-23]) for Chinese children. Pediatrics,
114(2), e166–e176.
Woolfenden, S., Sarkozy, V., Riddley, G., & Williams, K. (2012). A
systematic review of the diagnostic stability of autism spectrum
disorder. Research in Autism Spectrum Disorders, 6(1), 345–354.
Yama, B., Freeman, T., Graves, E., Yuan, S., & Campbell, K. (2012).
Examination of the properties of the modified checklist for
autism in toddlers (M-CHAT) in a population sample. Journal of
Autism and Developmental Disorders, 42, 23–34.
416 J Autism Dev Disord (2014) 44:405–416
123