Comparison of Automated Brain Volumetry Methods With Stereology in Children Aged 2 to 3 Years
Abstract Introduction: The accurate and precise measurement of brain volumes in young children is
important for early identification of children with reduced brain volumes and an increased risk
for neurodevelopmental impairment. Brain volumes can be measured from cerebral MRI
(cMRI), but most neuroimaging tools used for cerebral segmentation and volumetry were
developed for use in adults, and have not been validated in infants or young children. Here we
investigate the feasibility and accuracy of three automated software methods (i.e. SPM, FSL
and FreeSurfer) for brain volumetry in young children, and compare the measures with
corresponding volumes obtained using the Cavalieri method of modern design stereology.
Methods: Cerebral MRI data were collected from 21 children with a complex congenital heart
disease (CHD) before Fontan procedure, at a median age of 27 months (range 20.9-42.4
months). Data were segmented with SPM, FSL, and Freesurfer, and total intracranial volume
(ICV) and total brain volume (TBV) were compared with corresponding measures obtained
using the Cavalieri method.
Results: Agreement between the estimated brain volumes (ICV and TBV) relative to the gold
standard stereological volumes was strongest for FreeSurfer (ps<0.001) and moderate for
SPM Segment (ICV: p=0.05; TBV: p=0.006). No significant association was evident between
ICV and TBV obtained using SPM NewSegment and FSL FAST and the corresponding
stereological volumes.
Conclusions: FreeSurfer provides an accurate method for measuring brain volumes in young
children, even in the presence of structural brain abnormalities.
Keywords: MRI; brain segmentation; brain volume; children; congenital heart disease
1. Introduction
Children with severe congenital heart disease (CHD) are at risk of developmental delay and
adverse neurodevelopmental outcome due to disease and treatment dependent effects on the
maturing brain (reviewed in [1]). Magnetic resonance imaging (MRI) techniques allow for
detailed assessment of brain volumes as well as visualization of structural anomalies
associated with adverse outcomes [2, 3]. Identification of distinct patterns of brain volume loss
might enable subsequent reliable risk stratification for neurodevelopmental impairment and
early identification of patients with need for intervention [4].
Cerebral volumes can be calculated from MRI manually using the unbiased and highly efficient
manual methods of modern design based stereology (i.e. Cavalieri method in combination with
point counting) or using a number of software packages developed for automated brain
segmentation (i.e. Statistical Parametric Mapping (SPM), FMRIB Software Library (FSL), and
FreeSurfer) [5-8]. However, these automated methods for brain segmentation and volumetry
have been developed for quantitative analysis of adult cMRI, and hence their application in
early childhood is limited due to distinct characteristics of the maturing brain. Since the natural
variability in contrast, state of myelination, and volume among different brain regions is greater
in the maturing brain compared to adults [9, 10], automated segmentation in young children
can lead to misclassification, especially in subcortical areas [9, 11]. Additionally, structural
brain anomalies (e.g. widened cerebrospinal fluid (CSF) spaces, white matter (WM) injuries,
periventricular leukomalacia, stroke, hemorrhage, altered cortical folding) and delayed
maturation (e.g. open operculum, delayed myelination) are common in children with complex
CHD, impeding the accuracy of automated segmentation (reviewed in [12]). The aim of the
present study, therefore, was to evaluate the feasibility and accuracy of three widely used
automated methods for brain segmentation and volumetry in children aged between 2 and 3
years with CHD with expected structural abnormalities. To date, exhaustive manual
reconstruction of the brain in serial sections has been the gold standard for pediatric MRI
volumetry. In the present study we use measurements obtained using the time-efficient,
3
unbiased Cavalieri method of modern design stereology as a gold standard for measuring
intracranial volumes (ICV) and total brain volumes (TBV). As an additional validation, we
investigate the relationship between ICVs measured using each automated method and head
circumference, as head circumference has been reported to correlate with ICV in young
children [13].
2. Materials and Methods
The current study was performed as part of an ongoing prospective multi-center trial
evaluating neurodevelopmental outcome and cerebral MRI (cMRI) scans of patients with
univentricular heart defects before Fontan procedure. We included 21 children, diagnosed with
complex CHD such as hypoplastic left-heart syndrome (HLHS, n=11), hypoplastic left-heart
complex (HLHC, n=5), and other univentricular hypoplasia (UVH, n=5) recruited at the
University Children’s Hospital Zurich, Switzerland (n=13) and University Heart Center of
Giessen, Germany (n=8). In 10 cases, children were treated with the Giessen hybrid approach
and six with the classical Norwood approach. Three were palliated with a modified Blalock-
Taussing Shunt, one with isolated pulmonary artery banding and in another hemodynamically
balanced patient there was no need for a neonatal surgery before the Glenn anastomosis. The
study was approved by the local ethics committee of the Canton of Zurich and the University of
Giessen, respectively. Parents or caregivers provided fully informed written consent. Head
circumference was measured from all children according to a standard protocol.
Cerebral MRI data were acquired before Fontan procedure at a median age of 27.0 months
(20.9 – 42.4 months). Patients were scanned under sedation. MRI scans for Zurich patients
were performed with a 3.0 tesla MR 750 scanner (General Electric Medical Systems,
Milwaukee, WI, USA). MRI scans for Giessen patients were performed with a 3.0 tesla
Magnetom Verio B17 scanner (Siemens Medical Systems, Erlangen, Germany). High
resolution 3D T1-weighted images were acquired with a spoiled gradient echo (SPGR) scan
(TR, 9.94 ms; TI, 600 ms; FOV, 25.6x19,2 mm; matrix, 256 x 192; flip angle 8; axial plane;
4
slice thickness, 1 mm; 172 slices) in Zurich and with a magnetization prepared rapid
acquisition gradient echo (MP-RAGE) scan (TR, 1900ms; TI, 900 ms; FOV, 25.6x25.6 mm;
matrix 256 x 256; flip angle 9; sagittal plane; slice thickness 1 mm; 112 slices) in Giessen.
Both SPGR and MPRAGE datasets were reconstructed to a voxel resolution of 1 mm3. The
image quality and uniformity of brain maturation (eg myelination stage) for data sets of both
centers were rated by an experienced neuroradiologist (IS).
In order to optimise an automated pipeline for intracranial volume (ICV) estimation and
anatomical segmentation of cMRI data into gray matter (GM), white matter (WM), and cerebro-
spinal fluid (CSF), we evaluated a number of previously described techniques, namely
Statistical Parametric Mapping 8.0 (SPM8, Wellcome Trust Center for Neuroimaging) running
under MATLAB 7.0 2013b (The MathWorks, Inc., Natrick, Massachusetts, U.S.), FMRIB
Software Library v5.0 (FSL), and FreeSurfer (Martinos Center for Biomedical imaging,
Massachusetts, U.S.) [5-8, 14]. Data sets were analyzed on a Linux workstation.
2.1 SPM NewSegment and Segment
Segmentation with the SPM toolbox was performed using two approaches according to the
manual for SPM (http://www.fil.ion.ucl.ac.uk/spm/doc/spm8_manual.pdf). In a first approach,
we applied the toolbox SPM NewSegment, which performs bias correction, spatial
normalization and automated voxel-based segmentation into GM, WM and CSF in one single
processing pipeline [15]. This toolbox uses adult probabilistic maps (modified ICBM tissue
probabilistic maps) [16], and normalizes the images to MNI space (Montreal Neurologic
Institute, International Consortium for Brain Mapping) [17]. For each subject, segmentation of
the images into GM, WM and CSF was performed with a unified approach, and the segmented
images were written out in native space. Volumes of the three resulting tissue classes in native
space were calculated by an appropriate summation with the toolbox FSL STATS.
5
In a second approach, we performed an automatic segmentation into GM, WM and CSF in
native space with the original SPM Segment toolbox, using the UNC-Infant tissue probabilistic
masks for two year olds as a template [18]. In a pre-processing step, the original images were
warped to the template with the SPM toolbox function Estimate and Write. Resulting volumes
were calculated as described above. Both SPM methods were fully automated and required no
user intervention.
2.2 FSL
The FMRIB Automated Segmentation Tool (FAST) in FSL version 5.0 was used for brain
segmentation. Since the FAST tool requires skull-stripped images as input, the FSL brain
extraction tool (BET) was used for initial skull stripping of the data. To improve the quality of
the skull stripping the BET settings for the fractional intensity threshold and the vertical
gradient in fractional intensity threshold were optimised individually for each patient, specifying
the head radius to as a starting estimate for the initial surface sphere. The quality of the skull
stripping was assessed visually in each case. The FAST tool was then used to segment the
3D T1 images into GM, WM, and CSF maps, correcting for bias field/spatial intensity variations
using a hidden Markov random field model and an expectation maximisation algorithm [19].
Like SPM, the FSL segmentation procedure is also fully automated and requires no user
intervention, except in cases where non-standard thresholds are chosen for the skull
stripping/brain extraction step.
2.3 FreeSurfer
Images were additionally segmented using the freely available FreeSurfer image analysis
software (http://surfer.nmr.mgh.harvard.edu). In an automated workflow of 31 process steps,
the toolbox performs alignment to MNI space, warping, signal intensity normalisation, voxel-
based segmentation and volume calculation. Technical details of those processing steps have
been described previously [5, 17, 20-24]. Since the FreeSurfer toolbox is designed for use in
adults and children over five years of age, the method has known limitations when applied to
6
images of younger children [25]. However, for data from young children with structural
abnormalities, we found that a better segmentation may be obtained by integrating the
following flags: –wsthresh 35 for MPRAGE images, -wsthresh 45 for SPGR images, and –
bigventricles. The segmentation quality was inspected visually for accuracy on each slice, and
manual corrections were performed wherever a suboptimal segmentation was observed, most
often in parietal and fronto-temporal regions. These corrections were performed by TKMEDIT,
a tool integrated in FreeSurfer software, following the instructions detailed in the FreeSurfer
tutorials (surfer.nmr.mgh.harvard.edu/fswiki/RecommendedReconstruction). Specifically, the
manual correction consisted of checking the Talairach transformation and the skull stripping,
and placing control points in WM regions not correctly segmented on the first iteration due to a
failure of the intensity normalisation step. The control points were placed approximately 1 mm
apart, well inside the white matter boundary, following the examples detailed in the Freesurfer
tutorials. After positioning and saving the control points a part of the recon-all process was
rerun (specifically autorecon2-cp). If the addition of control points was not sufficient to fix errors
in the white matter boundary, this boundary was edited manually in TKMEDIT, and the
autorecon2-wm process was rerun. Finally, if errors in the pial surface were observed these
were also corrected manually in TKMEDIT. In order to assess the impact of the manual
corrections on the accuracy of the calculated brain volumes the whole Freesurfer pipeline was
run both with and without the manual correction steps.
2.4 Stereology
Volumetric measurements derived with the three automated analysis approaches described
above were compared with the corresponding volumes obtained using the Cavalieri method in
combination with point counting [26], implemented in the EasyMeasure software package [27,
28]. According to this method, ICV and TBV volumes are estimated from a systematic random
sample of parallel MR image sections covering the whole cranium, or brain, respectively, and
with the first section positioned at random within the sampling interval [29]. The section area of
the transects through the structure on each image are estimated by point counting with a
7
square grid of test points, overlaid on each image with new uniform random position and new
isotropic uniform random orientation. The volume is computed as the sum of the estimated
areas (equivalent to the total point count per section multiplied by the area associated with
each test point) and then multiplied by the sectioning interval, following the image sampling
theory for stereology [30]. The theoretical basis and justification of the methodology has been
described in detail elsewhere [26, 30], and the method has been widely applied in MRI for
volumetric assessment of ICV [31, 32], as well as volume of the hippocampus [33], thalamus
[29], and Broca’s area [32]. The sampling intensity (i.e. grid size of test system for point
counting, sectioning interval) was selected to achieve a coefficient of error (CE) of less than
5%, as described previously [26, 29]. The Cavalieri and FreeSurfer methods are illustrated
graphically in Figure 1. ICV was defined as total volume of the cranium, including brain tissue,
cerebral ventricles and sulcal CSF, while total brain volume (TBV) was defined as brain
volume excluding both ventricular and sulcal CSF. For comparison with FreeSurfer and SPM,
TBV was calculated both including and excluding the brainstem, (as the FreeSurfer
segmentation does not include the brainstem while the SPM segmentation does include the
brainstem in the measure of TBV). In order to assess the inter-observer reproducibility of the
Cavalieri method the stereological volumes were calculated separately by two observers for a
subset of n=6 patients.
2.5 Statistical Analyses
Statistical analyses were performed with SPSS 22.0 (SPSS Inc, Chicago, USA). Descriptive
statistics presented include median, range, mean ± SD for continuous variables, and
frequency with percentage for categorical variables. Students T-test was applied to calculate
differences between groups. Shapiro-Wilk tests were used to test normality. A Bland-Altman
analysis was used to assess the agreement between ICV and TBV derived with the automated
methods (FreeSurfer, FSL, and SPM) and corresponding volumes obtained using the Cavalieri
method. Additional correlations were analyzed with Pearson’s correlation for normally
distributed variables and Spearman’s rho for data which were not normally distributed. A
8
receiver operating characteristic (ROC) analysis was performed to assess the sensitivity,
specificity of each method relative to the gold-standard stereological volumes, which were
dichotomised into high and low volume groups by a median split.
3. Results
Between August 2012 and February 2014, 23 eligible patients were consecutively recruited, of
whom 21 patients were included in the final analysis. Two patients were excluded because the
high-resolution 3D sequence required for volumetry was not acquired during the MRI protocol.
The most frequent heart defect was HLHS (57.1%). Four (19.0%) patients had HLHC, and a
further five had UVH. The age of the children ranged from 20.9 to 42.4 months with a Median
age of 27.0 months. Twelve (57.1%) patients were boys. Nine of the 21 patients (6/13 from
Zürich and 3/8 from Giessen) demonstrated structural brain abnormalities including
ventriculomegaly (n=3), infarct (n=5), white matter lesions (n=2), generalized or focal atrophy
(n=5), or suspected hypoxia (n=2). No motion or other artefacts were present in any of the
images. The typical image quality is depicted in figure 1, together with the stereological overlay
and FreeSurfer segmentation.
The measurements of ICV obtained using the three automated analysis techniques and by the
Cavalieri method are presented in Table 1, and corresponding data for TBV are presented in
Table 2. The CE of the volumes measurements obtained using the Cavalieri method was less
than 1% for both ICV and TBV in all participants (mean 0.6%, range 0.18% - 0.96%). The
mean inter-observer reproducibility of the stereological volumes (expressed as the difference
between the measured volumes divided by the mean volume from both observers) was 4%.
Correlation and Bland-Altman plots showing the agreement between ICV measured using
each automated software method (SPM, FSL, and FreeSurfer) and the Cavalieri-estimated
ICVs are illustrated in Figure 2, and the corresponding TBV data are depicted in Figure 3. For
ICV, only FreeSurfer showed a significant correlation with the stereological volumes
9
(Pearson’s R=0.72, p<0.001), although SPM Segment showed a strong trend towards a
significant correlation (p=0.05). The Bland-Altman analysis demonstrated that all automated
methods significantly underestimated TBV (ps<= 0.005, paired t-test) compared to stereology.
Consistent with the results for ICV, FreeSurfer (Pearson’s R=0.96, p<.001) and SPM Segment
(Spearman’s rho=0.58, p=.006) showed the closest agreement for TBV relative to the
stereological volumes. No significant association was evident between ICV and TBV obtained
using SPM NewSegment and FSL FAST and the corresponding stereological volumes,
although FSL FAST showed a trend towards a significant association for TBV only (p=0.07).
Estimated ICVs from all methods correlated positively with head circumference, but this
association only reached significance for the FreeSurfer estimated volumes (R=0.68,
p<0.001), and those estimated using the Cavalieri method (R=0.52, p=0.01).
In the ROC analysis, Freesurfer showed the highest sensitivity and specificity to brain volume
differences among the automated methods (Table 3), with an area under the curve (AUC)
which was significantly different from chance for both the ICV and TBV (p<0.001). SPM
segment showed an AUC which was significant for the ICV (p=0.014) and present at trend
level for the TBV (p=0.07). The head circumference was also a significant predictor of ICV
(p=0.004), but the the brain volumes from SPM Newsegment and FAST did not show
significant AUC estimates (p>0.1).
Both SPM methods appeared to show systematic differences in volume between centers and
MRI protocols (Figure 2). In contrast, no significant differences between centers were
observed for ICV or TBV with FreeSurfer (ps>0.9), FSL (ps>0.56), or with the Cavalieri method
(ps>0.78). In addition there were no significant differences in head circumference measures
between the two imaging centers. However, mean ICV values for the cohorts scanned at each
center (using MPRAGE and SPGR protocols, respectively) differed significantly for both SPM
methods (SPM NewSegment: p<.001, SPM Segment: p=.03). For TBV, SPM NewSegment
10
showed significant inter-center differences (p=0.001) while SPM Segment showed trend-level
differences (p=0.097).
Values of TBV measured using FreeSurfer incorporating manual correction were
approximately 1% higher than those obtained without manual correction (p=0.04, paired t-test),
whereas values of ICV were the same with and without manual correction. In the case of TBV
both corrected and uncorrected values from FreeSurfer showed near identical agreement to
corresponding values obtained using the Cavalieri method (R=0.96, R=0.97, p<0.001; table 2).
4. Discussion
MRI provides visualization of structural brain abnormalities with unprecedented detail [29, 34-
38]. Recent advances in image processing software have led to the development of a number
of powerful methods for measurement of regional and global brain volumes [29, 34-38].
However, while these automated methods have demonstrated high accuracy and reliability in
healthy adults as well as in some adult patient groups with structural abnormalities, they have
been less widely applied in infants, and validation studies for automated volumetric methods in
young children are lacking, particularly in those with brain abnormalities. For clinical
applicability, automated and time-efficient methods are needed. In this study we investigated
the performance of three widely-used approaches for automatic measurement of ICV and TBV
and compared these to corresponding values obtained using the Cavalieri method of modern
design stereology, which represents a mathematically unbiased, time-efficient manual
approach for obtaining volume estimates with high precision [35, 37, 39]. As an additional
validation, we also examined the relationship between ICV and head circumference, as head
circumference has been reported to be correlated with ICV in young children [13] and
decreases in head circumference associated with neurodevelopmental difficulties have been
widely reported in infants with CHD [40, 41]. Of the three automated software methods
examined in this study, the FreeSurfer approach demonstrated the highest accuracy and
11
strongest agreement with the stereological volumes as well as with head circumference
(ps<.001).
While moderate correlations were observed between ICV and TBV derived with the SPM
Segment approach and those from the Cavalieri method, the SPM NewSegment approach
gave less reliable results (Table 2). This may be due to the use of an adult brain MRI template
(included automatically in the SPM NewSegment pipeline) instead of an age-appropriate
pediatric MRI template (used by SPM Segment). The unified segmentation/tissue classification
method employed by SPM (which uses tissue “priors” from the adult brain template, combined
with Bayesian tissue probabilities estimated from voxel intensities to inform the segmentation)
[15] may also be more sensitive to variations in image contrast arising from developmental
changes or scanner and protocol differences, resulting in a more variable segmentation
quality. SPM NewSegment may perform better with an age appropriate template, but both
SPM methods demonstrated higher sensitivity to differences between the two MRI protocols,
possibly due to differences in tissue probabilities estimated from the differing voxel signal
intensity characteristics of each protocol. The Bayesian segmentation approach utilized by
both SPM methods may therefore be more sensitive to technical factors which alter the voxel
signal intensities, resulting in altered tissue probabilities, but further studies would be needed
to clarify the specific factors affecting the SPM segmentation.
The lack of agreement between ICV measured using FSL and ICV estimated by the Cavalieri
method probably arose from inadequate skull stripping, despite attempts to optimize this
process individually for each dataset. A closer agreement was observed between TBV
measured with FSL FAST and TBV estimated using the Cavalieri method, which could
perhaps be further improved by enhancements in the skull stripping process before
segmentation. The accuracy of volumetry with FSL may also potentially be improved by
registration to a pediatric rather than an adult template.
12
The Bland-Altman plots demonstrated that FreeSurfer showed a smaller mean bias for ICV
and TBV with narrower limits of agreement relative to SPM and FSL, although all three
automated methods underestimated TBV compared to the Cavalieri method. The good
agreement between both the corrected and uncorrected volumes from FreeSurfer and the
corresponding volumes obtained using the Cavalieri method suggests that the opportunity for
manual correction had only a modest effect on the accuracy of the FreeSurfer analyses, even
though the total brain volumes measured with FreeSurfer were significantly higher after
manual correction. However, the improvement in accuracy from the manual correction steps
may become evident with a larger sample.
Our findings are consistent with those from a number of recently published studies [35, 37,
38], which described higher consistency and less sensitivity to noise or variable image quality
with FreeSurfer. This observation suggests that the FreeSurfer algorithm (voxel- and surface
based) may be optimal for multi-center studies with data acquired on different scanners or with
slightly different protocols. In contrast, Eggert et al. reported higher accuracy in tissue
segmentation of a more homogeneous adult cMRI dataset derived by both SPM Segment and
SPM Newsegment compared to FreeSurfer [35]. Similar results have been shown for previous
versions of SPM [37]. Therefore, SPM may perform better in single center studies of healthy
adults or patient groups imaged with a consistent acquisition protocol.
Limitations
Our study is limited by the small sample size and heterogeneity of our patient population,
particularly with regard to age, maturation and intracranial abnormalities. A further limitation is
the lack of a pediatric brain template available for use with most of the automated methods
(notably SPM NewSegment, FSL, and FreeSurfer). However, while the accuracy of
segmentation may be improved by use of an age-appropriate brain template, templates
constructed using data obtained for healthy children should be used with caution when applied
to volumetric MRI studies of clinical populations with intracranial abnormalities.
13
Surprisingly, while FreeSurfer in particular showed no bias in measuring ICV relative to the
Cavalieri method, all automated methods appear to underestimate TBV by 15-20% relative to
the Cavalieri method (see Figure 3). This may be the result of partial volume artefact arising
between low signal intensity CSF and higher signal intensity GM in the cerebral sulci, leading
to overestimation of brain volume on T1-weighted images [39] in comparison to the FreeSurfer
measurements which inherently include correction for partial volume effects. In future, the
development of an anthropomorphic phantom to be used as a gold standard for methods
comparison studies, or the analysis of images obtained with varying voxel sizes may allow for
a more detailed characterization of the partial volume effects on the estimated TBV values.
Alternatively, this discrepancy may point to a need for additional correction of cortical outlines
in children, but future studies would be needed to clarify the source of the apparent
underestimation of TBV.
For the present study, validation of TBV and ICV was provided by stereologically derived
volumes, but the GM, WM and CSF volumes derived with each automated method could not
bewere not validated individually, which may allow for the evaluation of partial volume effects
and the localization of volumetric deficits to gray or white matter. Additionally, we did not
examine regional volumes or the relative size of different brain structures, which may be
relevant for further analysis [9].
Conclusion
This study provides a novel and relatively rare validation of three common, automated image
analysis techniques for measuring brain volumes from 3D MR images in young children. Using
the Cavalieri method as a gold standard, FreeSurfer provided the best agreement for both ICV
and TBV among the automated software methods. SPM and FSL provide modest or limited
agreement for the same volumetric measurements, possibly due to difficulties with skull
stripping, use of an adult rather than a paediatric brain template, and sensitivity to differences
14
in image contrast from different MRI scanners and protocols. While the accuracy of all three
automated methods may be improved by registration to a pediatric template, the present study
confirms the suitability of FreeSurfer for the automated assessment of brain volumes in young
children.
15
Abbreviations
AUC area under the curve
BET brain extraction tool
CE coefficient of error
CHD congenital heart disease
cMRI cerebral magnetic resonance imaging
CSF cerebrospinal fluid
ICV intracranial volume
FAST FSL segmentation tool
FSL FMRIB Software Library v5.0
GM gray matter
HLHS hypoplastic left-heart syndrome
HLHC hypoplastic left-heart complex
MNI Montreal neurological institute
MP-RAGE magnetization prepared rapid acquisition gradient echo
MRI magnetic resonance imaging
SPGR spoiled gradient echo
SPM8 Statistical Parametric Mapping version 8
TBV total brain volume
UVH univentricular hypoplasia
WM white matter
Conflict of interest: We declare that we have no conflict of interest. The draft of the manuscript was
written by the first author. No honorarium, grant, or other form of payment was given to anyone to
produce the manuscript.
16
References
1. Khalil A, Suff N, Thilaganathan B, Hurrell A, Cooper D, Carvalho JS (2014) Brain
abnormalities and neurodevelopmental delay in congenital heart disease: systematic
review and meta-analysis. Ultrasound Obstet Gynecol 43:14-24.
2. Owen M, Shevell M, Donofrio M, Majnemer A, McCarter R, Vezina G, Bouyssi-Kobar
M, Evangelou I, Freeman D, Weisenfeld N, Limperopoulos C (2014) Brain volume and
neurobehavior in newborns with complex congenital heart defects. J Pediatr 164:1121-
1127.e1121.
3. Watanabe K, Matsui M, Matsuzawa J, Tanaka C, Noguchi K, Yoshimura N, Hongo K,
Ishiguro M, Wanatabe S, Hirono K, Uese K, Ichida F, Origasa H, Nakazawa J, Oshima
Y, Miyawaki T, Matsuzaki T, Yagihara T, Bilker W, Gur RC (2009) Impaired
neuroanatomic development in infants with congenital heart disease. J Thorac
Cardiovasc Surg 137:146-153.
4. von Rhein M, Buchmann A, Hagmann C, Huber R, Klaver P, Knirsch W, Latal B (2014)
Brain volumes predict neurodevelopment in adolescents after surgery for congenital
heart disease. Brain 137:268-276.
5. Dale AM, Fischl B, Sereno MI (1999) Cortical surface-based analysis. I. Segmentation
and surface reconstruction. Neuroimage 9:179-194.
6. Fischl B, Salat DH, Busa E, Albert M, Dieterich M, Haselgrove C, van der Kouwe A,
Killiany R, Kennedy D, Klaveness S, Montillo A, Makris N, Rosen B, Dale AM (2002)
Whole brain segmentation: automated labeling of neuroanatomical structures in the
human brain. Neuron 33:341-355.
7. Smith SM, Jenkinson M, Woolrich MW, Beckmann CF, Behrens TE, Johansen-Berg H,
Bannister PR, De Luca M, Drobnjak I, Flitney DE, Niazy RK, Saunders J, Vickers J,
Zhang Y, De Stefano N, Brady JM, Matthews PM (2004) Advances in functional and
structural MR image analysis and implementation as FSL. Neuroimage 23 Suppl
1:S208-219.
17
8. Ashburner J, Friston KJ (1999) Nonlinear spatial normalization using basis functions.
Hum Brain Mapp 7:254-266.
9. Gousias IS, Rueckert D, Heckemann RA, Dyet LE, Boardman JP, Edwards AD,
Hammers A (2008) Automatic segmentation of brain MRIs of 2-year-olds into 83
regions of interest. Neuroimage 40:672-684.
10. Murgasova M, Dyet L, Edwards D, Rutherford M, Hajnal J, Rueckert D (2007)
Segmentation of brain MRI in young children. Acad Radiol 14:1350-1366.
11. Heckemann RA, Hajnal JV, Aljabar P, Rueckert D, Hammers A (2006) Automatic
anatomical brain MRI segmentation combining label propagation and decision fusion.
Neuroimage 33:115-126.
12. McQuillen PS, Miller SP (2010) Congenital heart disease and brain development. Ann
N Y Acad Sci 1184:68-86.
13. Bartholomeusz HH, Courchesne E, Karns CM (2002) Relationship between head
circumference and brain volume in healthy normal toddlers, children, and adults.
Neuropediatrics 33:239-241.
14. Ashburner J, Friston K (1997) Multimodal image coregistration and partitioning--a
unified framework. Neuroimage 6:209-217.
15. Ashburner J, Friston KJ (2005) Unified segmentation. Neuroimage 26:839-851.
16. Mazziotta J, Toga A, Evans A, Fox P, Lancaster J, Zilles K, Woods R, Paus T,
Simpson G, Pike B, Holmes C, Collins L, Thompson P, MacDonald D, Iacoboni M,
Schormann T, Amunts K, Palomero-Gallagher N, Geyer S, Parsons L, Narr K, Kabani
N, Le Goualher G, Boomsma D, Cannon T, Kawashima R, Mazoyer B (2001) A
probabilistic atlas and reference system for the human brain: International Consortium
for Brain Mapping (ICBM). Philos Trans R Soc Lond B Biol Sci 356:1293-1322.
17. AC E (1993) 3D statistical neuroanatomical models from 305 MRI volumes. In: DL C
(ed). Proc. IEEE Nucl. Sci. Symp. Med. Imaging Conf., pp 1813-1817.
18. Shi F, Yap PT, Wu G, Jia H, Gilmore JH, Lin W, Shen D (2011) Infant brain atlases
from neonates to 1- and 2-year-olds. PLoS One 6:e18746.
18
19. Zhang Y, Brady M, Smith S (2001) Segmentation of brain MR images through a hidden
Markov random field model and the expectation-maximization algorithm. IEEE Trans
Med Imaging 20:45-57.
20. Desikan RS, Ségonne F, Fischl B, Quinn BT, Dickerson BC, Blacker D, Buckner RL,
Dale AM, Maguire RP, Hyman BT, Albert MS, Killiany RJ (2006) An automated labeling
system for subdividing the human cerebral cortex on MRI scans into gyral based
regions of interest. Neuroimage 31:968-980.
21. Fischl B, Sereno MI, Dale AM (1999) Cortical surface-based analysis. II: Inflation,
flattening, and a surface-based coordinate system. Neuroimage 9:195-207.
22. Fischl B, van der Kouwe A, Destrieux C, Halgren E, Ségonne F, Salat DH, Busa E,
Seidman LJ, Goldstein J, Kennedy D, Caviness V, Makris N, Rosen B, Dale AM (2004)
Automatically parcellating the human cerebral cortex. Cereb Cortex 14:11-22.
23. Ségonne F, Pacheco J, Fischl B (2007) Geometrically accurate topology-correction of
cortical surfaces using nonseparating loops. IEEE Trans Med Imaging 26:518-529.
24. Sled JG, Zijdenbos AP, Evans AC (1998) A nonparametric method for automatic
correction of intensity nonuniformity in MRI data. IEEE Trans Med Imaging 17:87-97.
25. Lowe JR, Maclean PC, Caprihan A, Ohls RK, Qualls C, Vanmeter J, Phillips JP (2012)
Comparison of cerebral volume in children aged 18-22 and 36-47 months born preterm
and term. J Child Neurol 27:172-177.
26. Roberts N, Puddephat MJ, McNulty V (2000) The benefit of stereology for quantitative
radiology. Br J Radiol 73:679-697.
27. J. PM (1999) Computer Interface fir Convenient Application for Stereological Methods
for Unbiased Estimation of Volume and Surface Area: Studies Using MRI with
Particular Reference to the Human Brain. University of Liverpool, Liverpool.
28. Keller SS, Highley JR, Garcia-Finana M, Sluming V, Rezaie R, Roberts N (2007) Sulcal
variability, stereological measurement and asymmetry of Broca's area on MR images. J
Anat 211:534-555.
19
29. Keller SS, Gerdes JS, Mohammadi S, Kellinghaus C, Kugel H, Deppe K, Ringelstein
EB, Evers S, Schwindt W, Deppe M (2012) Volume estimation of the thalamus using
freesurfer and stereology: consistency between methods. Neuroinformatics 10:341-
350.
30. Cruz-Orive LM, Gelšvartas J, Roberts N (2014) Sampling theory and automated
simulations for vertical sections, applied to human brain. J Microsc 253:119-150.
31. Mayhew TM, Olsen DR (1991) Magnetic resonance imaging (MRI) and model-free
estimates of brain volume determined using the Cavalieri principle. J Anat 178:133-
144.
32. Keller SS, Roberts N (2009) Measurement of brain volume using MRI: software,
techniques, choices and prerequisites. J Anthropol Sci 87:127-151.
33. Salmenperä T, Könönen M, Roberts N, Vanninen R, Pitkänen A, Kälviäinen R (2005)
Hippocampal damage in newly diagnosed focal epilepsy: a prospective MRI study.
Neurology 64:62-68.
34. Mulder ER, de Jong RA, Knol DL, van Schijndel RA, Cover KS, Visser PJ, Barkhof F,
Vrenken H, Initiative AsDN (2014) Hippocampal volume change measurement:
quantitative assessment of the reproducibility of expert manual outlining and the
automated methods FreeSurfer and FIRST. Neuroimage 92:169-181.
35. Eggert LD, Sommer J, Jansen A, Kircher T, Konrad C (2012) Accuracy and reliability of
automated gray matter segmentation pathways on real and simulated structural
magnetic resonance images of the human brain. PLoS One 7:e45081.
36. Morey RA, Petty CM, Xu Y, Hayes JP, Wagner HR, Lewis DV, LaBar KS, Styner M,
McCarthy G (2009) A comparison of automated segmentation and manual tracing for
quantifying hippocampal and amygdala volumes. Neuroimage 45:855-866.
37. Klauschen F, Goldman A, Barra V, Meyer-Lindenberg A, Lundervold A (2009)
Evaluation of automated brain MR image segmentation and volumetry methods. Hum
Brain Mapp 30:1310-1327.
20
38. Dewey J, Hana G, Russell T, Price J, McCaffrey D, Harezlak J, Sem E, Anyanwu JC,
Guttmann CR, Navia B, Cohen R, Tate DF, Consortium HN (2010) Reliability and
validity of MRI-based automated volumetry software relative to auto-assisted manual
measurement of subcortical structures in HIV-infected patients from a multisite study.
Neuroimage 51:1334-1344.
39. Furlong C, García-Fiñana M, Puddephat M, Anderson A, Fabricius K, Eriksen N,
Pakkenberg B, Roberts N (2013) Application of stereological methods to estimate post-
mortem brain surface area using 3T MRI. Magn Reson Imaging 31:456-465.
40. Medoff-Cooper B, Irving SY, Hanlon AL, Golfenshtein N, Radcliffe J, Stallings VA,
Marino BS, Ravishankar C (2016) The Association among Feeding Mode, Growth, and
Developmental Outcomes in Infants with Complex Congenital Heart Disease at 6 and
12 Months of Age. J Pediatr 169:154-159.e151.
41. Daymont C, Neal A, Prosnitz A, Cohen MS (2013) Growth in children with congenital
heart disease. Pediatrics 131:e236-242.
21
Figure legends:
Figure 1: Schematic illustration of the Cavalieri method, shown with the cortical and white
matter outlines from FreeSurfer for comparison. Top panel: Control points included in the total
brain volume are shown in green while those excluded from the volume are shown in red.
Middle panels: zoomed image from the inset region for the Cavalieri method (top) and
FreeSurfer (bottom). Bottom panel: Corresponding axial slices from FreeSurfer depicting the
cortical surface (red) and the pial boundaries.
Figure 2: Validation of semi-automated methods for the total intracranial volume (ICV). Figure
2a-c (top panel): correlation plots showing the association between ICV derived with SPM
NewSegment, SPM Segment, FSL, and FreeSurfer vs. ICV obtained using the Cavalieri
method for each subject. Figure 2d-f (bottom panel): Bland-Altman plots showing the bias for
each of the derived volumes.
FAST, Oxford Centre for Functional MRI of the Brain (FMRIB) Automated Segmentation Tool,
SPM, statistical parametric mapping, n=21
Figure 3: Validation of semi-automated methods for the total brain volume (TBV). Figure 3a-c
(top panel): correlation plots showing the association between TBV derived with SPM
NewSegment, SPM Segment, FSL, and FreeSurfer vs. TBV obtained using the Cavalieri
method for each subject. Figure 3d-f (bottom panel): Bland-Altman plots showing the bias for
each of the derived volumes.
FAST, Oxford Centre for Functional MRI of the Brain (FMRIB) Automated Segmentation Tool,
SPM, statistical parametric mapping, n=21
*excluding brainstem
22
Table 1. Estimated intracranial volumes from the automated segmentation tools, the Cavalieri method and head circumference
ICVMean ± SD
Correlation ICV with Cavalieri
r (p value)
Correlation ICV with HC
r (p value)
Cavalieri (gold standard), mL 1079 ± 74 .52(p=.01)
SPM NewSegment, mL 1156 ± 118 † -.00(p=.50)
.37 (p=.10)
SPM Segment, mL 1260 ± 26 † .43
(p=.05).42
(p=.06)
FAST,mL 1007 ± 139 .23
(p=0.3).42
(p=.06)
FreeSurfer, mL 1035 ± 78 .72 (p<.001)
.68(p<.001)
Table 2. Estimated total brain volumes from the automated segmentation tools and the Cavalieri method
TBVMean ± SD
Correlation TBV with Cavalieri
r (p value)Cavalieri (including brainstem), mL 1011 ± 91
Cavalieri (excluding brainstem), mL 996 ± 91
SPM NewSegment, mL 916 ± 86† -.18(p=.44)
SPM Segment, mL 864 ± 171 .58(p=.006)
FAST, mL 785 ± 133 .40
(p=.07)
FreeSurfer (with manual corrections), mL 858 ± 100 .96* (p<.001)
FreeSurfer (without manual corrections), mL 849 ± 106 .97* (p<.001)
23
Table 3. ROC analysis of the automated segmentation tools
ICV TBV
Sensitivity Specificity AUC Sensitivity Specificity AUC
SPM NewSegment 90% 40% 0.65(p=.26)
90% 50% 0.67(p=.18)
SPM Segment 64% 90% 0.82(p=.014)
90% 70% 0.74(p=.07)
FAST 73% 80% 0.74(p=.13)
73% 80% 0.71(p=.11)
FreeSurfer 91% 100% 0.99(p<.001)
91% 100% 0.97(p<.001)
HC 100% 60% 0.87(p=.004)
24
Table legends:
Table 1: FAST, Oxford Centre for Functional MRI of the Brain (FMRIB) Automated
Segmentation Tool, ICV, intracranial volume, HC, Head circumference, SPM, statistical
parametric mapping, n=21
† Significant differences between the two protocols (center A and center B). No significant
difference was demonstrated for tissue segmentation between centers with FSL FAST
(p=0.98), FreeSurfer (p=0.90) or with stereology (p=0.78).
ρ Spearman’s rho correlation and corresponding p-value (for non-normally distributed data)
Table 2: FAST, Oxford Centre for Functional MRI of the Brain (FMRIB) Automated
Segmentation Tool, TBV, total brain volume, SPM, statistical parametric mapping, n=21
† Significant difference between the two protocols (center A and center B). SPM Segment
showed trend-level differences (p=0.097). No significant difference was demonstrated for
tissue segmentation between centers with FAST (p=0.83), FreeSurfer (p=0.98) or with
stereology (p=0.92).
ρ Spearman’s rho correlation and corresponding p-value (for non-normally distributed data)
* excluding brainstem, as FreeSurfer does not include the brainstem in total brain volume
Table 3: FAST, Oxford Centre for Functional MRI of the Brain (FMRIB) Automated
Segmentation Tool, SPM, statistical parametric mapping, ICV, intracranial volume, TBV, total
brain volume, AUC, area under the curve, n=21