The Clinical Utility of the Modified Checklist for Autism in Toddlers with High Risk 18–48 Month...

ORIGINAL PAPER

The Clinical Utility of the Modified Checklist for Autismin Toddlers with High Risk 18–48 Month Old Childrenin Singapore

Hwan Cui Koh • Si Huan Lim • Gifford Jiguang Chan •

Marisa Bilin Lin • Hong Huay Lim •

Sylvia Henn Tean Choo • Iliana Magiati

Published online: 28 June 2013

� Springer Science+Business Media New York 2013

Abstract The modified checklist for autism in toddlers

(M-CHAT) is a tool developed for 16–30 month old chil-

dren to screen for autism spectrum disorders (ASD). It is a

well-researched tool, but little is known about its utility

with Singaporean toddlers and with older children referred

for developmental concerns. This study investigated the

M-CHAT’s performance with 18–30 month old (N = 173)

and[30–48 month old (N = 407) developmentally at-risk

Singaporean children, when used with three recommended

scoring methods i.e., the total, critical and Best7 scoring

methods. The results indicate that the critical and Best7

scoring methods detected most true cases of ASD without

inflating the false positive rates in toddlers, and that only

the total scoring method performed acceptably for the older

children.

Keywords Autism spectrum disorders � Screening �Early identification � M-CHAT � Level 2 screening �High risk children

Introduction

Increasingly more young children are diagnosed with

Autism Spectrum Disorders (ASD) before the age of

3 years old and early diagnosis is generally valid and

stable over time (see Woolfenden et al. 2012 for review).

Similarly, despite continuing limitations and gaps in

knowledge, the overall importance and effectiveness of

early, intensive comprehensive ASD-focused interven-

tions for improving young children’s outcomes has also

been supported by empirical evidence (see Magiati et al.

2012 for a recent review). Such advances in early

identification and intervention have prompted the devel-

opment and validation of various screening tools, in

order to identify as early and as accurately as possible

children who are at risk for ASD (see Robins and

Dumont-Mathieu 2006; Norris and Lecavalier 2010 for

reviews). Amongst the numerous screening tools devel-

oped, the Modified Checklist for Autism in Toddlers

(M-CHAT; Robins et al. 2001) is one of the most widely

employed and researched.

The Modified Checklist for Autism in Toddlers

(M-CHAT)

The M-CHAT was initially developed in the United States

as a screening tool for ASD in 16–30 month old children

(Robins et al. 2001, Kleinman et al. 2008, can be down-

loaded from http://www2.gsu.edu/*psydlr or www.

m-chat.org). It consists of 23 yes/no questions and it

can be quickly completed by a parent/caregiver who

knows the toddler well, without requiring professional

administration. It was designed for use with the general

population (i.e., as a Level 1 ‘‘primary’’ screener; Robins

2008), but it has also been used with populations at higher

risk of ASD (i.e., as a Level 2 ‘‘secondary’’ screener;

Eaves et al. 2006; Kleinman et al. 2008; Pandey et al.

2008; Snow and Lecavalier 2008).

H. C. Koh (&) � H. H. Lim � S. H. T. Choo

Department of Child Development, KK Women’s and Children’s

Hospital, Women’s Tower Level 5, 100 Bukit Timah Road,

Singapore 229899, Singapore

e-mail: [email protected]

S. H. Lim � G. J. Chan � M. B. Lin � I. Magiati

Department of Psychology, National University of Singapore,

Singapore, Singapore

123

J Autism Dev Disord (2014) 44:405–416

DOI 10.1007/s10803-013-1880-1

http://www2.gsu.edu/~psydlr

http://www.m-chat.org


The Use of the M-CHAT in English-Speaking

Populations

The M-CHAT was initially validated with a sample of

1,293 18–30 month old children who did not have previous

DSM-IV diagnoses through well-baby check-ups at pae-

diatric or family medicine practices and early intervention

providers in the United States (Robins et al. 2001). Thirty

nine children were eventually diagnosed with ASD,

resulting in an incidence rate of 3.0 % for ASD in the

sample. A child was screened to be at risk for ASD either

by the ‘‘total’’ score (failing any 3 or more of the 23

M-CHAT items) or the ‘‘critical’’ score (failing any 2 or

more of 6 selected items: items 2- interest in other children;

7- pointing for interest; 9- bringing objects to show; 13-

imitation; 14- response to name; 15- following point).

These 6 items were selected on the basis that they best

discriminated between children who were diagnosed with

ASD and those who were not. The M-CHAT authors

recently recommended a ‘‘Best 7’’ score (failing any 2 of 7

selected items: items 2; 5- pretend play; 7; 9; 14; 15; and

20- parents concerned that child may be deaf, see

www.m-chat.org) as an alternative to the critical score

(Robins et al. 2010).

High sensitivity and specificity values of 0.80–0.99 were

reported for the M-CHAT by Robins et al. (2001) and by

Kleinman et al. (2008) in a larger follow up study. How-

ever, in both studies, the Positive Predictive Value (PPV)

for the initial screening was low (0.36), although it

increased to 0.68–0.74 for low and high risk samples

combined with a structured follow up telephone interview.

This increase was large for the low risk sample (from an

unacceptably low 0.11 for M-CHAT only, to 0.65 with the

follow-up interview), but only small for the high risk

sample (from 0.60 to 0.76). PPV of 0.30–0.50 are report-

edly acceptable in the screening literature (Glascoe 2005),

although still below chance level (i.e., *0.50). These

findings indicated that, when only the M-CHAT was used

with the low risk sample, there was a high rate of false

positives, therefore increasing over-referrals and possibly

incurring unnecessary healthcare costs, not to mention the

emotional impact of this for families. The follow up

interview was thus crucial for reducing the rate of false

positives using the M-CHAT in a low risk population, but

the use of the M-CHAT only was likely to be sufficient for

high risk populations.

Similarly, acceptable or high sensitivity values (i.e.,

[0.70) have been reported in other M-CHAT studies,

although the findings on specificity and PPV values have

not always been as robust (Eaves et al. 2006; Pandey et al.

2008; Robins 2008; Snow and Lecavalier 2008). The

general agreement is that the M-CHAT is a promising

screening tool for ASD and that it has shown respectable

psychometric properties when used without a follow-up

telephone interview in populations at high risk of ASD.

The Use of the M-CHAT with Non-English-Speaking

Populations

The heterogeneity of culturally defined behavioural

expectations across diverse countries and cultures poses a

challenge for the development and validation of any

screening tool for a behaviourally diagnosed group of

conditions such as ASD (Wallis and Pinto-Martin 2008).

For this reason, it is important to gather data on the clinical

utility and validity of promising tools in diverse societies.

A number of studies have been published on the clinical

utility of translated versions of the M-CHAT in Arab

countries (Seif Eldin et al. 2008), Spain (Canal-Bedia et al.

2011), China (Wong et al. 2004), Taiwan (Lung et al.

2011), Sri Lanka (Perera et al. 2009) and Japan (Inada et al.

2010). In most cases, acceptable or good clinical utility

values have been reported, but poor performance was

documented for a general population sample in Sri Lanka

(Perera et al. 2009).

Furthermore, different cut-off scores and/or scoring

methods have been suggested to yield better sensitivity and

specificity for translated versions of the M-CHAT with

Asian populations (Inada et al. 2010, Wong et al. 2004).

Inada et al. (2010) found acceptable sensitivity and speci-

ficity values in a general population sample in Japan, but

with a lower cut-off score of 2 for the total score (i.e., child

fails any 2 of the 23 items). They also created a shorter 9

item version (items 5, 6—pointing to make request, 7, 9,

13, 15—following point, 17—following eye gaze, 21—

understanding what is said, & 23—social referencing) with

higher internal consistency and found similarly acceptable

sensitivity and specificity values as reported by Robins

et al. (2001) and Kleinman et al. (2008) with a cut-off of 1

(i.e., child fails any 1 of these 9 items). Wong et al. (2004)

evaluated the M-CHAT with a general population sample

and a high risk sample in Hong Kong. They identified

7 M-CHAT items (2, 5, 7, 9, 13, 15, & 23) that best dis-

tinguished children with and without ASD and reported

that failing 2 of those 7 items yielded good sensitivity,

specificity and PPV values.

There have been no validation studies of the use of the

M-CHAT in Singapore, a South-East Asian multi-racial

population consisting of 74.2 % Chinese, 13.3 % Malay,

and 9.2 % Indian (Department of Statistics, Singapore

2012). At the same time, Singapore has many Western

influences, its main language of education and business is

English, and most parents of young children are proficient in

English, thus making the English M-CHAT appropriate for

use with most parents of young children in Singapore. Even

so, there may be cultural differences in parent report of early

406 J Autism Dev Disord (2014) 44:405–416

123


social and communication behaviours. It would therefore be

useful to examine whether different cut-off scores and/or

scoring methods result in higher clinical utility values with a

Singaporean sample of high-risk children.

Using the M-CHAT with Children Older Than 30

Months Old

Another question regarding the use of the M-CHAT pertains

to whether it can be employed to accurately screen older

toddlers and pre-school children (i.e.,[30–48 months old).

Eaves et al. (2006)’s study reported acceptable sensitivity

values of 0.77/0.92 (critical/total score) with high risk

17–48 month olds, but which were not as good as those

reported by Robins et al. (2001) and Kleinman et al. (2008).

Including older toddlers and pre-school children in Eaves

et al. (2006)’s sample was thought to possibly account for

their findings (Kleinman et al. 2008). A recently published

study with a population sample of children 20–67 months

old provided initial evidence that the M-CHAT could be

used with children 20–48 months of age, but should not be

used with children beyond 48 months old (Yama et al.

2012). Further investigation is required to ascertain if the

M-CHAT can indeed be employed to accurately screen

older children ([30–48 months). The effect on the

M-CHAT’s psychometric properties when used with chil-

dren older than the intended age range of 16–30 months also

needs to be further examined.

This Study: Research Aims/Questions

The present study thus aimed (1) to validate the use of the

M-CHAT with a high risk Singaporean sample; (2) to

compare the use of the three existing proposed M-CHAT

scoring methods (i.e., total, critical, and Best 7); and (3) to

investigate the potential clinical utility of the M-CHAT in a

high risk child population of ‘‘older’’ 30–48 month old pre-

school children. This study also aimed to determine if there

may be different optimal cut off scores than those reported

in the existing literature for the three scoring methods for

the Singaporean high risk sample. This study examined the

use of the M-CHAT with a high risk sample and not a low

risk sample, as the M-CHAT was already being used as

part of the initial screening assessment in a high risk

clinical population in Singapore.

Methods

Setting

The study was carried out in the Department of Child

Development (DCD), KK Women’s and Children’s Hospital

Singapore, a specialist multi-disciplinary clinic that provides

diagnostic and intervention services for children up to

7 years old referred for various developmental concerns.

The DCD is one of two leading public child development

units in the country for children less than 7 years old and sees

approximately 80 % of referred children in Singapore. Since

February 2009, the M-CHAT has been included in an

internally developed intake questionnaire that parents/care-

givers of newly referred 0–48 month old children are rou-

tinely asked to complete before their first appointment with a

consultant paediatrician. This routine incorporation of the

M-CHAT prior to the first appointment provides an oppor-

tunity to evaluate its use with high risk Singaporean children

who are within or older than the tool’s intended age range.

Procedure

This study was approved by the Singhealth Centralized

Institutional Review Board. A waiver of informed consent

for use of patient data was obtained as the study was a

retrospective review of clinical records and involved data

routinely collected as part of clinical care.

Parents/caregivers of newly referred children completed

an intake questionnaire about their child, which included

the M-CHAT, before their first appointment with a paedi-

atrician at DCD. Thus, although the data from the clinical

records were obtained retrospectively, the M-CHAT was

completed by the caregivers prospectively prior to or just

before their first contact with DCD professionals. The

intake questionnaire was available in English and Chinese.

The Chinese translation of the M-CHAT, printed in tradi-

tional Chinese characters, was obtained from the M-CHAT

authors’ website (http://www2.gsu.edu/*psydlr). It was

then amended to include simplified Chinese characters and

grammatical structure, which Singaporeans are more

familiar with. Of the 580 children included in the study,

94.8 % of parents/caregivers (N = 550) completed the

questionnaire in English and 5.2 % (N = 30) in Chinese.

At the first appointment, the paediatrician observed the

child, and obtained developmental and medical history

from parents/caregivers. Provisional diagnoses of ASD,

global developmental delay, speech and language delay,

motor delay, behavioural or other difficulties were made.

The paediatrician then made the necessary referrals to

DCD diagnostic and intervention services (e.g., psycho-

logical assessment and/or early intervention, occupational

or speech and language assessment/therapy). Children with

a provisional diagnosis of ASD or who were observed to

show ASD features were typically referred for a more

comprehensive ASD diagnostic assessment by the team’s

psychologists who were trained to assess and diagnose

ASD in order to clarify and confirm diagnosis, to rule out

other diagnoses and to make further educational or

J Autism Dev Disord (2014) 44:405–416 407

123

http://www2.gsu.edu/~psydlr

intervention recommendations. The provisional diagnoses

of ASD by the paediatricians, and the confirmed diagnoses

of ASD by the psychologists, are based on DSM-IV-TR

criteria (APA 2000).

ASD Diagnostic Assessment at DCD: Procedure

and Measures

The ASD diagnostic assessment at DCD typically

includes direct semi-structured observations of the child

using the Autism Diagnostic Observation Schedule-Gen-

eric (ADOS-G; Lord et al. 2000) and a detailed internally

developed clinician-directed semi-structured interview

with the parent(s)/caregiver(s) to obtain ASD-specific

developmental history and information about the child’s

ASD-specific and other behaviours and level of func-

tioning in different contexts (e.g., home, preschool).

Depending on the child’s presentation and the consultant

pediatrician’s recommendation, some children may also

complete a developmental/cognitive/adaptive functioning

assessment as appropriate, using well established and

widely employed measures, including the Vineland

Adaptive Behaviour Scales—Second Edition (Vineland-II;

Sparrow et al. 2005), the Bayley Scales of Infant and

Toddler Development (3rd Edition; Bayley-III; Bayley

2006), the Wechsler Preschool and Primary Scale of

Intelligence—Third Edition (WPPSI—III; Wechsler 2002)

or the Differential Ability Scales—Second Edition

(DAS—II, Elliot 2007; see Fig. 1 for a flowchart of

procedure from referral to diagnosis).

Participants

Data were obtained from the medical records of

18–48 month old children first seen at DCD from February

2009 to July 2010. The extraction of data from the medical

records took place between September 2010 and Septem-

ber 2011. Medical records of 1,056 patients (82.3 % of the

1,283 total number of referrals) were accessed. Not all

medical records for all DCD referrals were accessed,

because some were required for follow-up appointments

and were thus not available at the time when they were

requested for data collection.

Children who had already obtained a clinical diagnosis

from other public or private healthcare institutions before

they sought advice from DCD were excluded (N = 13).

Children whose provisional diagnosis from the paediatri-

cian’s first evaluation was not clearly documented in their

medical records and who did not have an ASD diagnostic

assessment to indicate a definitive diagnosis of ASD were

also excluded (N = 13). Of the remaining 1,030 children

considered for analysis, the M-CHAT return rate was

78.2 % (N = 808) and its completion rate was 56.3 %

(N = 580). Thus, the final validation sample consisted of

580 children (see Fig. 2 for sample selection process).

Only data from children whose caregivers completed the

M-CHAT before their first appointment at DCD were used

for data analyses, to ensure that ratings were independent

of the professionals’ opinion. There were no statistically

significant differences between the children whose parents/

caregivers returned and completed the M-CHAT before

their first appointment at DCD (N = 580) and the children

whose parents/caregivers returned it incomplete/late or did

not complete it at all (N = 450) in any of the child or

family characteristics measured (Table 1), strongly sug-

gesting that the included participants were likely to be

generally representative of the population typically seen at

DCD.

Of the 580 children, 173 were 18–30 months (29.8 %)

and the incidence rate of ASD in this younger high-risk

group was 30.6 % (N = 53). There were 407 30–48 month

old children (70.1 %) with an incidence rate of ASD of

35.9 % (N = 146). The remaining children were found to

be developing within normal limits (N = 28, 7.4 %), or

were given other non-ASD diagnoses by the paediatrician

[Global Developmental Delay (GDD: N = 73, 19.2 %);

Speech and Language Disorders (N = 224, 58.8 %);

attention and behaviour issues (N = 34, 8.9 %); learning

problems (N = 5, 1.3 %); cerebral palsy/motor delays

(N = 7, 1.8 %); hearing impairment (N = 3, 0.8 %);

environment-related delay (N = 6, 1.6 %); and syndromal

disorders (i.e. congenital rubella syndrome; N = 1, 0.3 %).

Some children (N = 124, 12.0 % of the 1,030 children

considered for data analysis) received an initial diagnosis

of ASD by the DCD paediatrician after their first

appointment, but their ASD psychological diagnostic

assessment was delayed at parents’ request and had not

been completed by the time data collection was terminated.

The percentage of agreement between the earlier provi-

sional diagnosis by the paediatrician following the first

appointment (ASD versus no ASD) and the child’s con-

firmed diagnosis following the ASD diagnostic assessment

for the children who received a ASD diagnostic assessment

(N = 269, 26.1 % of the 1,030 children) was very high at

92.6 %. For this reason, data from these children were not

excluded and the children with ASD who had a completed

M-CHAT (N = 199) either received a confirmed diagnosis

from an ASD diagnostic assessment (N = 129, 65 %) or a

diagnosis provided by the paediatrician (N = 70, 35 %),

after the first appointment if no further ASD specific

diagnostic assessment was carried out.

Data Analysis

The internal consistency, sensitivity, specificity, PPV and

negative predictive value (NPV) of the M-CHAT using the


123

three scoring methods were examined for the younger

(18–30 months) and the older age groups ([30–48 months)

separately. Sensitivity values of more than 0.70 and spec-

ificity values closer to 0.80 (Glascoe 2005) with both being

equally weighted are preferred for high risk samples (Snow

and Lecavalier 2008). There are no recommended values

for PPV in the literature and low PPV values of 0.30–0.50

are reportedly not unusual (Glascoe 2005). Even so, a PPV

of over 0.50 would be preferred, so that the precision of the

M-CHAT for identifying a child at risk or not of ASD is

better than chance.

Chi squared tests were conducted to examine any sig-

nificant differences in the distribution of true positives,

false positives, true negatives and false negatives with the

three scoring methods in the younger and older age groups.

Inspection of the distributions, and post hoc procedures

conducted with standard residuals (values [2.0 indicate

observed frequency higher or lower than expected), indi-

cated where significant differences might exist.

The performance of the M-CHAT in terms of the trade-

off between sensitivity and specificity was examined using

Receiver Operating Characteristic (ROC) curves for the

younger and older age groups and for the three M-CHAT

scoring methods (Fawcett 2006). The closer the area under

the curve (AUC) to 1.0 is or the bigger the area, the better

the tool’s performance is. The AUC values were compared

for ROC curves for the scoring methods in the two age

groups to determine if any set of M-CHAT items is better

for a certain age group and if there were age group dif-

ferences in the clinical performance of the different

M-CHAT scoring methods. Inspection of the sensitivity

and specificity values at different cut-off scores for each

scoring method on the ROC curves was carried out to

provide information on optimal cut-off scores in this

sample. All data were analysed with IBM SPSS Statistics

version 19, except for the comparison of the areas under

the independent ROC curves, which was conducted using

AUC values and standard errors obtained from SPSS with

the MedCalc version 12 (MedCalc Software, Mariakerke,

Belgium).

Results

Internal Consistency

The full M-CHAT showed good internal consistency for

both age groups. The critical 6 items showed good internal

consistency for the younger and acceptable internal con-

sistency for the older age group. The Best 7 items showed

Child referred to DCD

Child first seen by DCD pediatrician (also case manager)

Caregivers posted/ emailed intake questionnaire to complete and return by post/email to DCD

before first appointment

DCD pediatrician provides initial provisional diagnosis and makes referrals for assessment/ intervention as

needed

If there is provisional diagnosis or query for ASD,paediatrician makes referral to team’s psychologist for

ASD diagnostic assessment

DCD psychologist makes diagnosis of ASD based on direct observations using the ADOS-G, and parent/caregiver report on child’s developmental history and current

functioning in different contexts; additional cognitive and/ or adaptive behavior functioning assessment carried out for some children as appropriate

All DCD patients are followed up by the DCD pediatrician half yearly or yearly until they are discharged from DCD services at 7 years old.

Fig. 1 Standard referral and

assessment procedures for

newly referred patients at DCD

KKH


123

acceptable internal consistency for both age groups

(Table 2).

Clinical Utility of the M-CHAT in the Two Age Groups

The sensitivity, specificity, PPV and NPV for the younger

and older age groups for each of the three scoring methods

separately1 are presented in Table 3. For the younger age

group, the critical and the Best 7 scoring method appear to

have the best balance of sensitivity, specificity, PPV and

NPV, with both sensitivity and specificity values [0.70.

For the older age group, the total scoring method shows the

best balance of sensitivity and specificity, both [0.70 and

acceptable PPV.

Chi Squared Analysis of True/False Positive

and Negative Rates

The distribution of true positives, false positives, true

negatives and false negatives for all the three M-CHAT

scoring methods were significantly different between the

younger and the older age groups (total: v2(df = 3) =

11.1, p = 0.011; critical 6: v2(df = 3) = 22.3, p \ 0.001;

Best 7: v2(df = 3) = 23.8, p \ 0.001). This finding was

largely explained by a higher than expected proportion

of false positives in the younger age group for all three

scoring methods, and lower than expected false negatives

in the younger age group for the critical 6 and Best 7

scores. The three scoring methods were all more likely

to classify younger children \30 months old without

ASD as at risk of ASD. The critical and Best 7 scoring

methods were more likely to miss children with ASD

in the older as compared to the younger age group

(Table 4).

New referrals of 18-48 month old children to DCD [February 2009 to July 2010] (N=1283)

Medical records accessed during study’s timeframe [September 2010 - September 2011]

(N=1056)

Medical records NOT accessed during study’s timeframe

(N=227)

Children with confirmed ASD diagnoses from other agencies

prior to DCD appointment (N=13)

Children without clearly documented diagnosis in medical

records (N=13)

Data included in analyses (N=1030)

M-CHAT returned prior to first appointment (N=808)

FINAL SAMPLE: Children with completed M-CHAT (N=580)

M-CHAT NOT returned prior to first appointment (N=222)

M-CHAT returned but not completed/ partially completed

(N=228)

Fig. 2 Sample selection

process

1 The psychometric properties of the M-CHAT were reported

separately for each of the 3 scoring methods, instead of combined

as in previous studies (i.e., failing either total or critical 6; and failing

either total or Best 7). Combining the scoring methods was likely to

increase sensitivity, but to decrease specificity, which was not

desirable for screening a population already at risk of developmental

issues. Further analyses only briefly reported here also indicated that

values of sensitivity (Sn), specificity (Sp), PPV and NPV of using the

combined scoring methods were very similar to those obtained from

the total scoring method alone for both age groups, except that there

was a slight increase in sensitivity for the scoring method of either

total or Best 7 in the younger age group. For younger age group,

either total or critical scoring method yielded Sn: 0.89, Sp: 0.58, PPV:

0.49, NPV: 0.92; either total or Best 7 scoring method yielded Sn:

0.93, Sp: 0.58, PPV: 0.50, NPV: 0.95. For the older age group, either

total or critical 6 scoring method yielded Sn: 0.76, Sp: 0.71, PPV:

0.59, NPV: 0.84; and either total or Best 7 scoring method yielded Sn:

0.77, Sp: 0.71, PPV: 0.59, NPV: 0.85.


123

ROC Curve Analysis

The AUC values for all three scoring methods were all

significantly different from an AUC value of 0.50 (i.e.,

chance), as indicated by the z statistics and p values (see

Table 5; Figs. 3, 4). The AUC values were closer to 1.0

than 0.5, indicating that the three scoring methods could all

be considered good tests for both age groups.

The AUC values for the three scoring methods were not

significantly different from one another within each age

Table 1 Demographic characteristics of participants and non-participants

Participants (complete

M-CHAT; N = 580)

Excluded (incomplete or no

M-CHAT; N = 450)

Statistics

Mean (SD) or N (%) Mean (SD) or N (%)

Child’s age at first consultation 2 years 11 months (8 months) 2 years 11 months (8 months) t(1,028) = 0.780, p = 0.44

Gender

Male 435 (75 %) 341 (75.8 %) v2(1) = 0.08, p = 0.77

Female 145 (25 %) 109 (24.2 %)

Race

Chinese 444 (76.6 %) 336 (74.8 %) v2(df = 3) = 1.03, p = 0.79

Malay 52 (9.0 %) 46 (10.2 %)

Indian 41 (7.1 %) 29 (6.5 %)

Othersa 43 (7.4 %) 38 (8.5 %)

Missing 1

Age at first word 1 year 6 months (8 months) 1 year 6 months (7 months) t(df = 751) = 1.09, p = 0.28

Missing N = 126 Missing N = 151

ASD diagnosis

Yes 199 (34.3 %) 158 (35.1 %) v2(df = 1) = 0.11, p = 0.74

No 381 (65.9 %) 292 (64.9 %)

Father’s age at first visit 37 years 7 months

(5 years 8 months)

37 years 9 months

(6 years 1 month)

t(df = 998) = -0.316, p = 0.75


Mother’s age at first visit 34 years (4 years 8 months) 33 years 11 months

(4 years 7 months)

t(df = 1,022) = 0.358, p = 0.72


Combined monthly family income

\$1,200 23 (4.0 %) 21 (5.2 %) v2(df = 3) = 5.03, p = 0.17

$1,200–3,000 123 (21.5 %) 100 (24.8 %)

$3,000–5,000 159 (27.9 %) 89 (22.0 %)

[$5,000 266 (46.6 %) 194 (48.0 %)

Missing data 9 46

Highest attained parent education level

\A-Levels N = 139(24.1 %) 126 (29.0 %) v2(df = 3) = 3.80, p = 0.28

Diploma 116 (20.1 %) 87 (20.0 %)

Degree 238 (41.2 %) 158 (36.3 %)

Postgraduate 84 (14.6 %) 64 (14.7 %)

Missing data 3 15

a Includes Bangladeshi, Burmese, Caucasian, Eurasian, Filipino, Indonesian, Japanese, Korean, Nepalese, Pakistani, Sikh, Sri Lankan, Thai,

Vietnamese, Other races

Table 2 Internal consistency (Cronbach’s alphaa) for the three

M-CHAT scoring methods

Age group Total 23 item

M-CHAT

6 critical item

M-CHAT

Best 7 item

M-CHAT

18–30 months 0.86b 0.81b 0.79c

[30–48 months 0.82b 0.75c 0.75c

a Cronbach’s alpha values of [0.6 are ‘‘questionable’’, [0.7

‘‘acceptable’’ and[0.80 ‘‘good’’ (George and Mallery 2003); b values

indicating good internal consistency; c values indicating acceptable

internal consistency


123

group (younger age group: z statistics = 0.60–1.33,

p = n.s.; older age group: z statistics = 0.40–0.86,

p = n.s.). This indicated that no one particular set of items

performed substantially better than the other. When the

AUC values of the three methods were compared between

the two age groups, there was no significant age group

difference for the total scoring method (z statistics = 1.25,

p = n.s.); there was a significant age group difference for

the critical scoring method (z statistics = 2.23, p = 0.03);

and a near significant difference for the Best 7 scoring

method (z statistics = 1.93, p = 0.05). The AUC values

indicated that the critical scoring method performed better

in the younger as compared to the older age group.

Optimal Cut-Offs for the Younger Singaporean Age

Group

Inspection of the data suggests that the originally recom-

mended cut-off score of 2 in the literature for the critical

and Best 7 scoring methods was also optimal in the

younger age group in this sample (Table 6). The originally

recommended cut-off score of 3 for the total scoring

method however was not optimal in this sample; instead a

cut off score of 5 better differentiated children with and

those without ASD but with other developmental issues in

this study, while maintaining sensitivity at [0.70.

Optimal Cut-Offs for the Singaporean Older

Age-Group

The originally recommended cut-off score of 3 in the total

scoring method was also optimal in the older pre-schoolers

in the present high-risk sample (Table 7). However, the

recommended cut-off score of 2 for the critical and the Best

7 scoring methods was not optimal; instead, a cut-off score

of 1 was optimal in this study for the older age group.

Discussion

The M-CHAT performed adequately in screening for ASD

in 18–30 month old Singaporean children in this study,

Table 3 M-CHAT Clinical utility values by scoring method and age group in a high risk Singaporean sample (N = 580)

Children’s age group Criterion used Sensitivity (95 % CI) Specificity (95 % CI) PPV (95 % CI) NPV (95 % CI)

18–30 months Total (3/23) 0.89 (0.76–0.95) 0.59 (0.50–0.68) 0.49 (0.39–0.59) 0.92 (0.83–0.97)

Critical (2/6)a 0.75 (0.61–0.86) 0.78 (0.70–0.85) 0.61 (0.48–0.72) 0.88 (0.80–0.93)

Best (2/7)a 0.81 (0.68–0.90) 0.78 (0.69–0.84) 0.61 (0.49–0.73) 0.90 (0.83–0.95)

[30–48 months Total (3/23)a 0.76 (0.68–0.83) 0.72 (0.66–0.77) 0.60 (0.53–0.67) 0.84 (0.79–0.89)

Critical (2/6) 0.53 (0.45–0.62) 0.92 (0.87–0.95) 0.78 (0.68–0.85) 0.78 (0.73–0.82)

Best (2/7) 0.54 (0.46–0.62) 0.90 (0.86–0.93) 0.75 (0.66–0.83) 0.78 (0.70–0.82)

a Indicates scoring method that yielded a good balance of sensitivity, specificity, PPV and NPV values

Table 4 Distribution of M-CHAT true and false positives and negatives by scoring method and age group in a high risk Singaporean sample

(N = 580)

Age group Criterion/scoring method True positives False positives True negatives False negatives

N (% within age group), standard residuals

18–30 months Total (3/23) 47 (27.2 %), \0.10 49 (28.3 %), 2.00a 71 (41.0 %), -0.70 6 (3.5 %), -1.80

Critical (2/6) 40 (23.1 %), 0.80 26 (15.0 %), 3.10a 94 (54.3 %), -0.50 13 (7.5 %), -2.30a

Best (2/7) 43 (24.9 %), 1.10 27 (15.6 %), 2.80a 93 (53.8 %), -0.50 10 (5.8 %), -2.70a

[30–48 months Total (3/23) 111 (27.3 %), \0.10 74 (18.2 %), -1.30 187 (45.9 %), 0.40 35 (8.6 %), 1.20

Critical (2/6) 78 (19.2 %), -0.50 22 (5.4 %), -2.00a 239 (58.7 %), 0.30 68 (16.7 %), 1.50

Best (2/7) 79 (19.4 %), -0.70 26 (6.4 %), -1.80 235 (57.7 %), 0.30 67 (16.5 %), 1.80

a Standard residuals of 2.0 or more indicate that the observed proportion is significantly different from the expected proportion

Table 5 AUC values and standard errors for ROC curves for the

three M-CHAT scoring methods by age group

Age group M-CHAT

scoring method

AUC Standard

error

Z statistica

18–30 months 23 item 0.84 0.032 10.4

6 item 0.86 0.029 12.4

7 item 0.85 0.033 10.6

[30–48 months 23 item 0.78 0.025 11.3

6 item 0.78 0.024 11.7

7 item 0.77 0.024 11.2

a all significantly different from an AUC value of 0.50 at the

p \ 0.0001 level


123

providing support for its use. The three different scoring

methods recommended in existing literature correctly

identified many true cases of ASD in this sample with

sensitivity values above 0.70 and comparable to earlier

research (Eaves et al. 2006; Kleinman et al. 2008; Robins

et al. 2001; Snow and Lecavalier 2008). The critical and

Best 7 scoring method were also good at correctly identi-

fying children who did not have ASD, with specificity

values above 0.70. The total scoring method was less good

at correctly identifying children who did not have ASD,

with a specificity value of slightly under 0.70.

The false positive rate was acceptably low for the crit-

ical and Best 7 scoring methods, as these also yielded

healthy PPV values [0.60. The total scoring method

resulted in a false positive rate that was higher than chance,

with a PPV of slightly below 0.50, although low PPVs of

0.30–0.50 are not unusual (Glascoe 2005). It has been

argued that the cost of under-diagnosis is likely to be

higher than that of over-diagnosis. Furthermore, children

who have been ‘‘falsely’’ identified by the M-CHAT in

high-risk samples are also likely to have needs requiring

early intervention. Thus, overall, the clinical utility values

of the M-CHAT for the three scoring methods with the

18–30 month old high risk Singaporean sample are at least

acceptable, if not good.

There was also opportunity to explore the potential

clinical utility of the M-CHAT for children older than the

initially intended age range of \30 months. Practically, it

would be highly preferable for any clinic to employ one

tool to screen for ASD in children up to 48 months old, if

the tool could demonstrate good clinical utility with the

older age group.

In the present high risk sample, the three M-CHAT

scoring methods performed differently for the younger and

older age groups. All three scoring methods were more

likely to classify younger children\30 months old without

ASD as at risk of ASD. Pandey et al. (2008) found similar

results with 16–23 month olds and 24–30 month olds, but

the false positive rate was higher in the younger than older

children only in a low risk sample. This finding is consis-

tent with general screening literature, as developmental

delays are more likely to indicate pervasive developmental

issues with older than younger children.

The critical and Best 7 scoring methods were more

likely to miss children with ASD in the older as compared

to the younger age groups. Only the total score showed

acceptable clinical utility values for the older age group,

making it more suitable to screen for ASD in children

30–48 months old. A couple of studies have reported

similar findings (Eaves et al. 2006; Snow and Lecavalier

2008), however they did not have sufficient sample size to

reach firm conclusions. Eaves et al.’s study (2006) with 84

17–48 month old high risk children reported a higher

sensitivity value of 0.92 with the total score as compared to

0.77 with the critical score. The specificity values were

unacceptable for both scoring methods, but PPVs were

good. Snow and Lecavalier (2008) reported high sensitivity

and low specificity values with 56 18–48 month old high

risk children with both total and critical 6 scoring methods.

Fig. 3 ROC curves for the M-CHAT total, critical 6 and Best 7

scoring methods for the younger age group (\30 months)

Fig. 4 ROC curves for the M-CHAT total, critical 6 and Best 7

scoring methods for the older age-group ([30–48 months)


123

In their study, the total score performed better than the

critical score to screen for ASD in the older group of 39

30–48 month old children. The M-CHAT items tap on

early developmental abilities or behaviours (e.g., imitation,

showing, pointing, functional play) and for this reason its

items are likely to be more relevant and developmentally

appropriate for 18–30 month old children. It is possible

that some of these earlier behaviours may have already

been mastered in older children with ASD which could

explain the low true positive rates in older children. This

could also explain why it may be necessary to use collec-

tive information from all 23 M-CHAT items to capture

atypical development consistent with ASD in the older

children.

The ROC curve analysis provided further evidence of

the M-CHAT as a good screening tool with adequate AUC

values in high risk younger and older children. Although a

power analysis indicated that there was a high risk of a type

II error (i.e., not finding a significant difference when in

fact there was one), when comparing the AUC values, the

critical scoring method performed better for the younger

than the older age group and the Best 7 method also

appeared to be somewhat better for the younger than the

older age group.

Different cut-off scores than those initially suggested by

the M-CHAT developers (Robins et al. 2001) for some of

the M-CHAT scoring methods were found to be optimal

for the present Singaporean high risk sample. The original

cut-off scores recommended by the M-CHAT authors were

optimal for the critical and Best 7 scoring methods for the

younger age group, which were the two ‘‘best’’ scoring

methods in terms of clinical utility for this group. However,

if the total score was to be employed for the younger age

group, a higher cut-off score of 5 was considered optimal

compared to the initially recommended cut-off score of 3.

The original cut-off scores recommended by the M-CHAT

authors were found to be optimal for the total scoring

method for the older age-group in our study, but a lower

cut-off score of 1 should be considered if the critical 6 or

the Best 7 methods are to be used with the older age group.

Table 6 Sensitivity and specificity values at different cut-off scores for the three M-CHAT scoring methods for the younger age-group

(\30 months)

Total 23 Critical 6 Best 7

Cut-off score Sensitivity Specificity Cut-off score Sensitivity Specificity Cut-off score Sensitivity Specificity

1 1.00 0.18 1 0.94 0.62 1 0.89 0.62

2 0.94 0.43 2a,b 0.76 0.78 2a,b 0.81 0.78

3a 0.89 0.59 3 0.66 0.87 3 0.64 0.87

4 0.83 0.69 4 0.58 0.93 4 0.53 0.96

5b 0.74 0.74 5 0.28 0.98

6 0.70 0.78

7 0.64 0.82

8 0.60 0.89

a Original cut-off score recommended by M-CHAT authors; b Optimal cut-off score in the current sample

Table 7 Sensitivity and specificity values at different cut-off scores for the three M-CHAT scoring methods for the older age-group

([30–48 months)

Total 23 Critical 6 Best 7

Cut-off score Sensitivity Specificity Cut-off score Sensitivity Specificity Cut-off score Sensitivity Specificity

1 0.94 0.21 1b 0.73 0.73 1b 0.71 0.73

2 0.82 0.44 2a 0.53 0.91 2a 0.54 0.90

3a,b 0.76 0.72 3 0.36 0.97 3 0.40 0.98

4 0.64 0.84 4 0.21 0.99 4 0.27 0.98

5 0.55 0.90 5 0.14 1.00

6 0.47 0.92

7 0.40 0.95

8 0.35 0.97

a Original cut-off score recommended by M-CHAT authors for children\30 months; b Optimal cut-off score in the current sample for the older

age group


123

Limitations and Recommendations for Future Research

Although the included sample appeared to be representa-

tive of the population typically seen at DCD (see Table 1),

the low completion rate of 56.3 % indicates that just over

half of caregivers/parents completed the M-CHAT. Child

development units who are considering employing the

M-CHAT as a Level II screening tool may need to inves-

tigate internal processes to increase completion rates in

order to enhance and accelerate the screening and diag-

nostic process for ASD.

Furthermore, the completion of the M-CHAT could not

be repeated to examine test–retest reliability nor were

M-CHATs obtained from more than one caregiver to

explore reliability between different informants. It is rec-

ommended that future research examines test–retest reli-

ability and the potential impact on clinical utility of

respondents’ characteristics We are currently working on

exploring whether the clinical utility of the M-CHAT could

be affected by parents’ educational or ethnic/language

background, other caregiver/family factors or child char-

acteristics (Koh et al. in preparation).

Finally, this study was a retrospective review of clinical

records and examined the routine use of the M-CHAT as

part of standard clinical procedures. However, it should be

noted that although all patient information and M-CHAT

data was retrospectively obtained, all caregivers prospec-

tively completed the M-CHAT prior to their first appoint-

ment and thus there were no retrospective memory or

professional contact biases influencing parents’ ratings.

The children included in this study are still being devel-

opmentally monitored by DCD paediatricians until they are

7 years old or until they are appropriately placed in a

formal education setting. The data that was captured in the

data extraction period would not have documented any

changes in diagnosis after this time period. There is a small

possibility that there may be children who were given a

diagnosis of ASD by the paediatrician only, without a full

assessment by the psychologists during the course of the

study, but who might have received an ASD or other

diagnosis at a later time. However, only 12 % of the par-

ticipants were included in this study based on pediatrician’s

initial diagnosis alone; thus, given diagnostic stability and

very high agreement between paediatricians initial and

final diagnosis, this possibility is small.

Implications of Our Findings for ASD Screening

in Clinical Settings and Recommendations

The M-CHAT was found to be a good tool to screen for

ASD in developmentally at-risk children 18–30 months old

in Singapore. It was also found to be acceptable as a

screening tool for older high risk children [30–48 months

old, although the total scoring method is preferred and

different cut-off scores were recommended if the critical or

the Best 7 methods are used.

The critical and Best 7 scoring methods were found to

perform better than the total scoring method for screening

18–30 month old high risk children. Thus, it is recom-

mended that these two scoring methods should be consid-

ered for use with parents/caregivers of 18–30 month

children. An optimal cut-off of 5 for the total scoring

method was found to better differentiate children with ASD

from children without ASD who may have other devel-

opmental issues in this study. However, given their com-

parable clinical utility values (see Tables 3, 6), a shorter

version of M-CHAT consisting of the critical 6 and/or

Best 7 items could be piloted, as this would be easier and

quicker to complete and could increase completion rates

while maintaining the strongest clinical utility compared to

the more time-consuming 23 item M-CHAT for this

younger age group. However, for the 30–48 month old

high risk children, the total scoring method was best and

thus the whole questionnaire should routinely be adminis-

tered for this older age group.

Adding to existing literature that has found the

M-CHAT to be a useful screening tool in non-western

populations, this study further contributes to existing lit-

erature reporting on the usefulness of the M-CHAT as a

screening tool in non-western population by providing

further evidence of its usefulness in diverse multi-cultural

countries. Our findings also strongly suggest that the

M-CHAT is likely to be useful for screening children older

than the intended age range of the tool who are \4 years

old. It is important that future research studies and clinical

use of the M-CHAT systematically explores its use in

different settings, age groups and using different scoring

methods, to establish best ‘‘fit’’ use for the specific context

and population.

Acknowledgments We acknowledge Charissa Wong who partici-

pated in data collection for the study. We would like to thank Tang

Hui Nee, Yang Phey Hong, and Cheong Wan Mui, who provided

valuable advice during the course of the research. Special mention

also goes to Jennifer Tan, Suraya Amir, and other administrative staff

of the Department of Child Development, KK Women’s and Chil-

dren’s Hospital, who kindly assisted with the data collection process.

References

American Psychiatric Association. (2000). Diagnostic and statistical

manual of mental disorders (4th ed., text rev.). Washington, DC:

Author.

Bayley, N. (2006). Bayley Scales of Infant and Toddler Development.

San Antonio, TX: The Psychological Corporation.

Canal-Bedia, R., Garcıa-Primo, P., Martın-Cilleros, M. V., Santos-

Borbujo, J., Guisuraga-Fernandez, Z., Herraez-Garcıa, L., et al.

(2011). Modified checklist for autism in toddlers: Cross-cultural


123

adaptation and validation in Spain. Journal of Autism and

Developmental Disorders, 41(10), 1342–1351.

Department of Statistics, Singapore. (2012). Population trends 2012.

Retrieved 29 May 2013, from Statistics Singapore website:

http://www.singstat.gov.sg/.

Eaves, L. C., Wingert, H., & Ho, H. H. (2006). Screening for autism:

Agreement with diagnosis. Autism, 10, 229–242.

Elliott, C. D. (2007). Differential ability scales (2nd ed.). San

Antonio, TX: Psychological Corporation.

Fawcett, T. (2006). An introduction to ROC analysis. Pattern

Recognition Letters, 27(8), 861–874.

George, D., & Mallery, P. (2003). SPSS for windows step by step: A

simple guide and reference. 11.0 update (4th ed.). Boston: Allyn

& Bacon.

Glascoe, F. P. (2005). Screening for developmental and behavioural

problems. Mental Retardation and Developmental Disabilities

Research Reviews, 11(3), 173–179.

Inada, N., Koyama, T., Inokuchi, E., Kuroda, M., & Kamio, Y. (2010/

2011). Reliability and validity of the Japanese version of the

Modified Checklist for Autism in Toddlers (M-CHAT).

Research in Austism Spectrum Disorders, 5(1), 330–336.

Kleinman, J. M., Robins, D. L., Ventola, P. E., Pandey, J., Boorstein,

H. C., Esser, E. L., et al. (2008). The modified checklist for

autism in toddlers: A follow-up study investigating the early

detection of autism spectrum disorders. Journal of Autism and

Developmental Disorders, 38(5), 827–839.

Lord, C., Risi, S., Lambrecht, L., Cook, E. H., Jr, Leventhal, B. L.,

DiLavore, P. C., et al. (2000). The autism diagnostic observation

schedule-generic: A standard measure of social and communi-

cation deficits associated with the spectrum of autism. Journal of

Autism and Developmental Disorders, 30(3), 205–223.

Lung, F.-W., Chiang, T.-L., Lin, S.-J., & Shu, B.-C. (2011). Autism-

risk screening in the first 3 years of life in Taiwan Birth Cohort

Pilot Study. Research in Autism Spectrum Disorders, 5(4),

1385–1389.

Magiati, I., Tay, X. W., & Howlin, P. (2012). Early comprehensive

behaviourally based interventions for children with autism

spectrum disorders: A summary of findings from recent reviews

and meta-analyses. Neuropsychiatry, 2(6), 543–570.

Norris, M., & Lecavalier, L. (2010). Screen accuracy of level 2 autism

spectrum disorder rating scales. A review of selected instru-

ments. Autism, 14, 263–284.

Pandey, J., Verbalis, A., Robins, D., Boorstein, H., Klin, A., Babitz,

T., et al. (2008). Screening for autism in older and younger

toddlers with the modified checklist for autism in toddlers.

Autism, 12(5), 513–535.

Perera, H., Wijewardena, K., & Aluthwelage, R. (2009). Screening of

18–24-month-old children for autism in a semi-urban community

in Sri Lanka. Journal of Tropical Pediatrics, 55(6), 402–405.

Robins, D. L. (2008). Screening for autism spectrum disorders in

primary care settings. Autism, 12(5), 537–556.

Robins, D. L., & Dumont-Mathieu, T. (2006). Early screening for

autism spectrum disorders: Update on the modified checklist for

autism in toddlers and other measures. Journal of Developmental

& Behavioural Pediatrics, 27(Supplement 2), S111–S119.

Robins, D. L., Fein, D., Barton, M. L., & Green, J. A. (2001). The

modified checklist for autism in toddlers: An initial study

investigating the early detection of autism and pervasive

developmental disorders. Journal of Autism and Developmental

Disorders, 31, 131–144.

Robins, D. L., Pandey, J., Chlebowski, C., Carr, K., Zaj, J. L., Arroyo,

M., Barton, M. L., Green, J., & Fein, D. A. (2010, May).

M-CHAT Best7: A new scoring algorithm improves positive

predictive power of the M-CHAT. Paper presented at the

International Meeting for Autism Research (IMFAR) in Phila-

delphia, USA.

Seif Eldin, A., Habib, D., Noufal, A., Farrag, S., Bazaid, K., Al-

Sharbati, M., et al. (2008). Use of M-CHAT for a multinational

screening of young children with autism in the Arab countries.

International Review of Psychiatry, 20(3), 281–289.

Snow, A. V., & Lecavalier, L. (2008). Sensitivity and specificity of the

modified checklist for autism in toddlers and the social commu-

nication questionnaire in preschoolers suspected of having perva-

sive developmental disorders. Autism, 12(6), 627–644.

Sparrow, S. S., Cicchetti, D., & Balla, D. (2005). Vineland adaptive

behavior scales-II. Circle Pines, MN: American Guidance Service.

Wallis, K. E., & Pinto-Martin, J. (2008). The challenge of screening

for autism spectrum disorder in a culturally diverse society. Acta

Paediatrica, 97(5), 539–540.

Weschler, D. (2002). Wechsler preschool and primary scale of intelli-

gence (3rd ed.). San Antonio, TX: Psychological Corporation.

Wong, V., Hui, L. H., Lee, W. C., Leung, L. S., Ho, P. K., Lau, W. L.,

et al. (2004). A modified screening tool for autism (checklist for

autism in toddlers [CHAT-23]) for Chinese children. Pediatrics,

114(2), e166–e176.

Woolfenden, S., Sarkozy, V., Riddley, G., & Williams, K. (2012). A

systematic review of the diagnostic stability of autism spectrum

disorder. Research in Autism Spectrum Disorders, 6(1), 345–354.

Yama, B., Freeman, T., Graves, E., Yuan, S., & Campbell, K. (2012).

Examination of the properties of the modified checklist for

autism in toddlers (M-CHAT) in a population sample. Journal of

Autism and Developmental Disorders, 42, 23–34.


123

http://www.singstat.gov.sg/

Date post:	23-Dec-2016
Category:	Documents
Upload:	iliana
View:	218 times
Download:	5 times

The Clinical Utility of the Modified Checklist for Autism in Toddlers with High Risk 18–48 Month...

Documents