+ All Categories
Home > Documents > Correlation of DTI tractography with long-term … · Web viewCerebral volumes can be calculated...

Correlation of DTI tractography with long-term … · Web viewCerebral volumes can be calculated...

Date post: 23-Apr-2018
Category:
Upload: nguyenthuy
View: 219 times
Download: 5 times
Share this document with a friend
36
Comparison of Automated Brain Volumetry Methods With Stereology in Children Aged 2 to 3 Years Abstract Introduction: The accurate and precise measurement of brain volumes in young children is important for early identification of children with reduced brain volumes and an increased risk for neurodevelopmental impairment. Brain volumes can be measured from cerebral MRI (cMRI), but most neuroimaging tools used for cerebral segmentation and volumetry were developed for use in adults, and have not been validated in infants or young children. Here we investigate the feasibility and accuracy of three automated software methods (i.e. SPM, FSL and FreeSurfer) for brain volumetry in young children, and compare the measures with corresponding volumes obtained using the Cavalieri method of modern design stereology. Methods: Cerebral MRI data were collected from 21 children with a complex congenital heart disease (CHD) before Fontan procedure, at a median age of 27 months (range 20.9-42.4 months). Data were segmented with SPM, FSL, and Freesurfer, and total intracranial volume (ICV) and total brain volume (TBV) were compared with corresponding measures obtained using the Cavalieri method. Results: Agreement between the estimated brain volumes (ICV and TBV) relative to the gold standard stereological volumes was strongest for FreeSurfer (ps <0.001) and moderate for SPM Segment (ICV: p=0.05; TBV: p=0.006). No significant association was evident between ICV and TBV
Transcript

Comparison of Automated Brain Volumetry Methods With Stereology in Children Aged 2 to 3 Years

Abstract Introduction: The accurate and precise measurement of brain volumes in young children is

important for early identification of children with reduced brain volumes and an increased risk

for neurodevelopmental impairment. Brain volumes can be measured from cerebral MRI

(cMRI), but most neuroimaging tools used for cerebral segmentation and volumetry were

developed for use in adults, and have not been validated in infants or young children. Here we

investigate the feasibility and accuracy of three automated software methods (i.e. SPM, FSL

and FreeSurfer) for brain volumetry in young children, and compare the measures with

corresponding volumes obtained using the Cavalieri method of modern design stereology.

Methods: Cerebral MRI data were collected from 21 children with a complex congenital heart

disease (CHD) before Fontan procedure, at a median age of 27 months (range 20.9-42.4

months). Data were segmented with SPM, FSL, and Freesurfer, and total intracranial volume

(ICV) and total brain volume (TBV) were compared with corresponding measures obtained

using the Cavalieri method.

Results: Agreement between the estimated brain volumes (ICV and TBV) relative to the gold

standard stereological volumes was strongest for FreeSurfer (ps<0.001) and moderate for

SPM Segment (ICV: p=0.05; TBV: p=0.006). No significant association was evident between

ICV and TBV obtained using SPM NewSegment and FSL FAST and the corresponding

stereological volumes.

Conclusions: FreeSurfer provides an accurate method for measuring brain volumes in young

children, even in the presence of structural brain abnormalities.

Keywords: MRI; brain segmentation; brain volume; children; congenital heart disease

Kispi, 24.05.16,
R1.1
Kispi, 05/24/16,
R1.3

1. Introduction

Children with severe congenital heart disease (CHD) are at risk of developmental delay and

adverse neurodevelopmental outcome due to disease and treatment dependent effects on the

maturing brain (reviewed in [1]). Magnetic resonance imaging (MRI) techniques allow for

detailed assessment of brain volumes as well as visualization of structural anomalies

associated with adverse outcomes [2, 3]. Identification of distinct patterns of brain volume loss

might enable subsequent reliable risk stratification for neurodevelopmental impairment and

early identification of patients with need for intervention [4].

Cerebral volumes can be calculated from MRI manually using the unbiased and highly efficient

manual methods of modern design based stereology (i.e. Cavalieri method in combination with

point counting) or using a number of software packages developed for automated brain

segmentation (i.e. Statistical Parametric Mapping (SPM), FMRIB Software Library (FSL), and

FreeSurfer) [5-8]. However, these automated methods for brain segmentation and volumetry

have been developed for quantitative analysis of adult cMRI, and hence their application in

early childhood is limited due to distinct characteristics of the maturing brain. Since the natural

variability in contrast, state of myelination, and volume among different brain regions is greater

in the maturing brain compared to adults [9, 10], automated segmentation in young children

can lead to misclassification, especially in subcortical areas [9, 11]. Additionally, structural

brain anomalies (e.g. widened cerebrospinal fluid (CSF) spaces, white matter (WM) injuries,

periventricular leukomalacia, stroke, hemorrhage, altered cortical folding) and delayed

maturation (e.g. open operculum, delayed myelination) are common in children with complex

CHD, impeding the accuracy of automated segmentation (reviewed in [12]). The aim of the

present study, therefore, was to evaluate the feasibility and accuracy of three widely used

automated methods for brain segmentation and volumetry in children aged between 2 and 3

years with CHD with expected structural abnormalities. To date, exhaustive manual

reconstruction of the brain in serial sections has been the gold standard for pediatric MRI

volumetry. In the present study we use measurements obtained using the time-efficient,

3

unbiased Cavalieri method of modern design stereology as a gold standard for measuring

intracranial volumes (ICV) and total brain volumes (TBV). As an additional validation, we

investigate the relationship between ICVs measured using each automated method and head

circumference, as head circumference has been reported to correlate with ICV in young

children [13].

2. Materials and Methods

The current study was performed as part of an ongoing prospective multi-center trial

evaluating neurodevelopmental outcome and cerebral MRI (cMRI) scans of patients with

univentricular heart defects before Fontan procedure. We included 21 children, diagnosed with

complex CHD such as hypoplastic left-heart syndrome (HLHS, n=11), hypoplastic left-heart

complex (HLHC, n=5), and other univentricular hypoplasia (UVH, n=5) recruited at the

University Children’s Hospital Zurich, Switzerland (n=13) and University Heart Center of

Giessen, Germany (n=8). In 10 cases, children were treated with the Giessen hybrid approach

and six with the classical Norwood approach. Three were palliated with a modified Blalock-

Taussing Shunt, one with isolated pulmonary artery banding and in another hemodynamically

balanced patient there was no need for a neonatal surgery before the Glenn anastomosis. The

study was approved by the local ethics committee of the Canton of Zurich and the University of

Giessen, respectively. Parents or caregivers provided fully informed written consent. Head

circumference was measured from all children according to a standard protocol.

Cerebral MRI data were acquired before Fontan procedure at a median age of 27.0 months

(20.9 – 42.4 months). Patients were scanned under sedation. MRI scans for Zurich patients

were performed with a 3.0 tesla MR 750 scanner (General Electric Medical Systems,

Milwaukee, WI, USA). MRI scans for Giessen patients were performed with a 3.0 tesla

Magnetom Verio B17 scanner (Siemens Medical Systems, Erlangen, Germany). High

resolution 3D T1-weighted images were acquired with a spoiled gradient echo (SPGR) scan

(TR, 9.94 ms; TI, 600 ms; FOV, 25.6x19,2 mm; matrix, 256 x 192; flip angle 8; axial plane;

Kispi, 24/05/16,
R1.4
Kispi, 05/24/16,
R1.4

4

slice thickness, 1 mm; 172 slices) in Zurich and with a magnetization prepared rapid

acquisition gradient echo (MP-RAGE) scan (TR, 1900ms; TI, 900 ms; FOV, 25.6x25.6 mm;

matrix 256 x 256; flip angle 9; sagittal plane; slice thickness 1 mm; 112 slices) in Giessen.

Both SPGR and MPRAGE datasets were reconstructed to a voxel resolution of 1 mm3. The

image quality and uniformity of brain maturation (eg myelination stage) for data sets of both

centers were rated by an experienced neuroradiologist (IS).

In order to optimise an automated pipeline for intracranial volume (ICV) estimation and

anatomical segmentation of cMRI data into gray matter (GM), white matter (WM), and cerebro-

spinal fluid (CSF), we evaluated a number of previously described techniques, namely

Statistical Parametric Mapping 8.0 (SPM8, Wellcome Trust Center for Neuroimaging) running

under MATLAB 7.0 2013b (The MathWorks, Inc., Natrick, Massachusetts, U.S.), FMRIB

Software Library v5.0 (FSL), and FreeSurfer (Martinos Center for Biomedical imaging,

Massachusetts, U.S.) [5-8, 14]. Data sets were analyzed on a Linux workstation.

2.1 SPM NewSegment and Segment

Segmentation with the SPM toolbox was performed using two approaches according to the

manual for SPM (http://www.fil.ion.ucl.ac.uk/spm/doc/spm8_manual.pdf). In a first approach,

we applied the toolbox SPM NewSegment, which performs bias correction, spatial

normalization and automated voxel-based segmentation into GM, WM and CSF in one single

processing pipeline [15]. This toolbox uses adult probabilistic maps (modified ICBM tissue

probabilistic maps) [16], and normalizes the images to MNI space (Montreal Neurologic

Institute, International Consortium for Brain Mapping) [17]. For each subject, segmentation of

the images into GM, WM and CSF was performed with a unified approach, and the segmented

images were written out in native space. Volumes of the three resulting tissue classes in native

space were calculated by an appropriate summation with the toolbox FSL STATS.

5

In a second approach, we performed an automatic segmentation into GM, WM and CSF in

native space with the original SPM Segment toolbox, using the UNC-Infant tissue probabilistic

masks for two year olds as a template [18]. In a pre-processing step, the original images were

warped to the template with the SPM toolbox function Estimate and Write. Resulting volumes

were calculated as described above. Both SPM methods were fully automated and required no

user intervention.

2.2 FSL

The FMRIB Automated Segmentation Tool (FAST) in FSL version 5.0 was used for brain

segmentation. Since the FAST tool requires skull-stripped images as input, the FSL brain

extraction tool (BET) was used for initial skull stripping of the data. To improve the quality of

the skull stripping the BET settings for the fractional intensity threshold and the vertical

gradient in fractional intensity threshold were optimised individually for each patient, specifying

the head radius to as a starting estimate for the initial surface sphere. The quality of the skull

stripping was assessed visually in each case. The FAST tool was then used to segment the

3D T1 images into GM, WM, and CSF maps, correcting for bias field/spatial intensity variations

using a hidden Markov random field model and an expectation maximisation algorithm [19].

Like SPM, the FSL segmentation procedure is also fully automated and requires no user

intervention, except in cases where non-standard thresholds are chosen for the skull

stripping/brain extraction step.

2.3 FreeSurfer

Images were additionally segmented using the freely available FreeSurfer image analysis

software (http://surfer.nmr.mgh.harvard.edu). In an automated workflow of 31 process steps,

the toolbox performs alignment to MNI space, warping, signal intensity normalisation, voxel-

based segmentation and volume calculation. Technical details of those processing steps have

been described previously [5, 17, 20-24]. Since the FreeSurfer toolbox is designed for use in

adults and children over five years of age, the method has known limitations when applied to

Kispi, 24/05/16,
R1.8

6

images of younger children [25]. However, for data from young children with structural

abnormalities, we found that a better segmentation may be obtained by integrating the

following flags: –wsthresh 35 for MPRAGE images, -wsthresh 45 for SPGR images, and –

bigventricles. The segmentation quality was inspected visually for accuracy on each slice, and

manual corrections were performed wherever a suboptimal segmentation was observed, most

often in parietal and fronto-temporal regions. These corrections were performed by TKMEDIT,

a tool integrated in FreeSurfer software, following the instructions detailed in the FreeSurfer

tutorials (surfer.nmr.mgh.harvard.edu/fswiki/RecommendedReconstruction). Specifically, the

manual correction consisted of checking the Talairach transformation and the skull stripping,

and placing control points in WM regions not correctly segmented on the first iteration due to a

failure of the intensity normalisation step. The control points were placed approximately 1 mm

apart, well inside the white matter boundary, following the examples detailed in the Freesurfer

tutorials. After positioning and saving the control points a part of the recon-all process was

rerun (specifically autorecon2-cp). If the addition of control points was not sufficient to fix errors

in the white matter boundary, this boundary was edited manually in TKMEDIT, and the

autorecon2-wm process was rerun. Finally, if errors in the pial surface were observed these

were also corrected manually in TKMEDIT. In order to assess the impact of the manual

corrections on the accuracy of the calculated brain volumes the whole Freesurfer pipeline was

run both with and without the manual correction steps.

2.4 Stereology

Volumetric measurements derived with the three automated analysis approaches described

above were compared with the corresponding volumes obtained using the Cavalieri method in

combination with point counting [26], implemented in the EasyMeasure software package [27,

28]. According to this method, ICV and TBV volumes are estimated from a systematic random

sample of parallel MR image sections covering the whole cranium, or brain, respectively, and

with the first section positioned at random within the sampling interval [29]. The section area of

the transects through the structure on each image are estimated by point counting with a

7

square grid of test points, overlaid on each image with new uniform random position and new

isotropic uniform random orientation. The volume is computed as the sum of the estimated

areas (equivalent to the total point count per section multiplied by the area associated with

each test point) and then multiplied by the sectioning interval, following the image sampling

theory for stereology [30]. The theoretical basis and justification of the methodology has been

described in detail elsewhere [26, 30], and the method has been widely applied in MRI for

volumetric assessment of ICV [31, 32], as well as volume of the hippocampus [33], thalamus

[29], and Broca’s area [32]. The sampling intensity (i.e. grid size of test system for point

counting, sectioning interval) was selected to achieve a coefficient of error (CE) of less than

5%, as described previously [26, 29]. The Cavalieri and FreeSurfer methods are illustrated

graphically in Figure 1. ICV was defined as total volume of the cranium, including brain tissue,

cerebral ventricles and sulcal CSF, while total brain volume (TBV) was defined as brain

volume excluding both ventricular and sulcal CSF. For comparison with FreeSurfer and SPM,

TBV was calculated both including and excluding the brainstem, (as the FreeSurfer

segmentation does not include the brainstem while the SPM segmentation does include the

brainstem in the measure of TBV). In order to assess the inter-observer reproducibility of the

Cavalieri method the stereological volumes were calculated separately by two observers for a

subset of n=6 patients.

2.5 Statistical Analyses

Statistical analyses were performed with SPSS 22.0 (SPSS Inc, Chicago, USA). Descriptive

statistics presented include median, range, mean ± SD for continuous variables, and

frequency with percentage for categorical variables. Students T-test was applied to calculate

differences between groups. Shapiro-Wilk tests were used to test normality. A Bland-Altman

analysis was used to assess the agreement between ICV and TBV derived with the automated

methods (FreeSurfer, FSL, and SPM) and corresponding volumes obtained using the Cavalieri

method. Additional correlations were analyzed with Pearson’s correlation for normally

distributed variables and Spearman’s rho for data which were not normally distributed. A

Kispi, 24/05/16,
R1.5

8

receiver operating characteristic (ROC) analysis was performed to assess the sensitivity,

specificity of each method relative to the gold-standard stereological volumes, which were

dichotomised into high and low volume groups by a median split.

3. Results

Between August 2012 and February 2014, 23 eligible patients were consecutively recruited, of

whom 21 patients were included in the final analysis. Two patients were excluded because the

high-resolution 3D sequence required for volumetry was not acquired during the MRI protocol.

The most frequent heart defect was HLHS (57.1%). Four (19.0%) patients had HLHC, and a

further five had UVH. The age of the children ranged from 20.9 to 42.4 months with a Median

age of 27.0 months. Twelve (57.1%) patients were boys. Nine of the 21 patients (6/13 from

Zürich and 3/8 from Giessen) demonstrated structural brain abnormalities including

ventriculomegaly (n=3), infarct (n=5), white matter lesions (n=2), generalized or focal atrophy

(n=5), or suspected hypoxia (n=2). No motion or other artefacts were present in any of the

images. The typical image quality is depicted in figure 1, together with the stereological overlay

and FreeSurfer segmentation.

The measurements of ICV obtained using the three automated analysis techniques and by the

Cavalieri method are presented in Table 1, and corresponding data for TBV are presented in

Table 2. The CE of the volumes measurements obtained using the Cavalieri method was less

than 1% for both ICV and TBV in all participants (mean 0.6%, range 0.18% - 0.96%). The

mean inter-observer reproducibility of the stereological volumes (expressed as the difference

between the measured volumes divided by the mean volume from both observers) was 4%.

Correlation and Bland-Altman plots showing the agreement between ICV measured using

each automated software method (SPM, FSL, and FreeSurfer) and the Cavalieri-estimated

ICVs are illustrated in Figure 2, and the corresponding TBV data are depicted in Figure 3. For

ICV, only FreeSurfer showed a significant correlation with the stereological volumes

Kispi, 05/24/16,
R1.3
Kispi, 24/05/16,
R1.6

9

(Pearson’s R=0.72, p<0.001), although SPM Segment showed a strong trend towards a

significant correlation (p=0.05). The Bland-Altman analysis demonstrated that all automated

methods significantly underestimated TBV (ps<= 0.005, paired t-test) compared to stereology.

Consistent with the results for ICV, FreeSurfer (Pearson’s R=0.96, p<.001) and SPM Segment

(Spearman’s rho=0.58, p=.006) showed the closest agreement for TBV relative to the

stereological volumes. No significant association was evident between ICV and TBV obtained

using SPM NewSegment and FSL FAST and the corresponding stereological volumes,

although FSL FAST showed a trend towards a significant association for TBV only (p=0.07).

Estimated ICVs from all methods correlated positively with head circumference, but this

association only reached significance for the FreeSurfer estimated volumes (R=0.68,

p<0.001), and those estimated using the Cavalieri method (R=0.52, p=0.01).

In the ROC analysis, Freesurfer showed the highest sensitivity and specificity to brain volume

differences among the automated methods (Table 3), with an area under the curve (AUC)

which was significantly different from chance for both the ICV and TBV (p<0.001). SPM

segment showed an AUC which was significant for the ICV (p=0.014) and present at trend

level for the TBV (p=0.07). The head circumference was also a significant predictor of ICV

(p=0.004), but the the brain volumes from SPM Newsegment and FAST did not show

significant AUC estimates (p>0.1).

Both SPM methods appeared to show systematic differences in volume between centers and

MRI protocols (Figure 2). In contrast, no significant differences between centers were

observed for ICV or TBV with FreeSurfer (ps>0.9), FSL (ps>0.56), or with the Cavalieri method

(ps>0.78). In addition there were no significant differences in head circumference measures

between the two imaging centers. However, mean ICV values for the cohorts scanned at each

center (using MPRAGE and SPGR protocols, respectively) differed significantly for both SPM

methods (SPM NewSegment: p<.001, SPM Segment: p=.03). For TBV, SPM NewSegment

Kispi, 05/24/16,
R1.6

10

showed significant inter-center differences (p=0.001) while SPM Segment showed trend-level

differences (p=0.097).

Values of TBV measured using FreeSurfer incorporating manual correction were

approximately 1% higher than those obtained without manual correction (p=0.04, paired t-test),

whereas values of ICV were the same with and without manual correction. In the case of TBV

both corrected and uncorrected values from FreeSurfer showed near identical agreement to

corresponding values obtained using the Cavalieri method (R=0.96, R=0.97, p<0.001; table 2).

4. Discussion

MRI provides visualization of structural brain abnormalities with unprecedented detail [29, 34-

38]. Recent advances in image processing software have led to the development of a number

of powerful methods for measurement of regional and global brain volumes [29, 34-38].

However, while these automated methods have demonstrated high accuracy and reliability in

healthy adults as well as in some adult patient groups with structural abnormalities, they have

been less widely applied in infants, and validation studies for automated volumetric methods in

young children are lacking, particularly in those with brain abnormalities. For clinical

applicability, automated and time-efficient methods are needed. In this study we investigated

the performance of three widely-used approaches for automatic measurement of ICV and TBV

and compared these to corresponding values obtained using the Cavalieri method of modern

design stereology, which represents a mathematically unbiased, time-efficient manual

approach for obtaining volume estimates with high precision [35, 37, 39]. As an additional

validation, we also examined the relationship between ICV and head circumference, as head

circumference has been reported to be correlated with ICV in young children [13] and

decreases in head circumference associated with neurodevelopmental difficulties have been

widely reported in infants with CHD [40, 41]. Of the three automated software methods

examined in this study, the FreeSurfer approach demonstrated the highest accuracy and

11

strongest agreement with the stereological volumes as well as with head circumference

(ps<.001).

While moderate correlations were observed between ICV and TBV derived with the SPM

Segment approach and those from the Cavalieri method, the SPM NewSegment approach

gave less reliable results (Table 2). This may be due to the use of an adult brain MRI template

(included automatically in the SPM NewSegment pipeline) instead of an age-appropriate

pediatric MRI template (used by SPM Segment). The unified segmentation/tissue classification

method employed by SPM (which uses tissue “priors” from the adult brain template, combined

with Bayesian tissue probabilities estimated from voxel intensities to inform the segmentation)

[15] may also be more sensitive to variations in image contrast arising from developmental

changes or scanner and protocol differences, resulting in a more variable segmentation

quality. SPM NewSegment may perform better with an age appropriate template, but both

SPM methods demonstrated higher sensitivity to differences between the two MRI protocols,

possibly due to differences in tissue probabilities estimated from the differing voxel signal

intensity characteristics of each protocol. The Bayesian segmentation approach utilized by

both SPM methods may therefore be more sensitive to technical factors which alter the voxel

signal intensities, resulting in altered tissue probabilities, but further studies would be needed

to clarify the specific factors affecting the SPM segmentation.

The lack of agreement between ICV measured using FSL and ICV estimated by the Cavalieri

method probably arose from inadequate skull stripping, despite attempts to optimize this

process individually for each dataset. A closer agreement was observed between TBV

measured with FSL FAST and TBV estimated using the Cavalieri method, which could

perhaps be further improved by enhancements in the skull stripping process before

segmentation. The accuracy of volumetry with FSL may also potentially be improved by

registration to a pediatric rather than an adult template.

Kispi, 05/24/16,
R1.7

12

The Bland-Altman plots demonstrated that FreeSurfer showed a smaller mean bias for ICV

and TBV with narrower limits of agreement relative to SPM and FSL, although all three

automated methods underestimated TBV compared to the Cavalieri method. The good

agreement between both the corrected and uncorrected volumes from FreeSurfer and the

corresponding volumes obtained using the Cavalieri method suggests that the opportunity for

manual correction had only a modest effect on the accuracy of the FreeSurfer analyses, even

though the total brain volumes measured with FreeSurfer were significantly higher after

manual correction. However, the improvement in accuracy from the manual correction steps

may become evident with a larger sample.

Our findings are consistent with those from a number of recently published studies [35, 37,

38], which described higher consistency and less sensitivity to noise or variable image quality

with FreeSurfer. This observation suggests that the FreeSurfer algorithm (voxel- and surface

based) may be optimal for multi-center studies with data acquired on different scanners or with

slightly different protocols. In contrast, Eggert et al. reported higher accuracy in tissue

segmentation of a more homogeneous adult cMRI dataset derived by both SPM Segment and

SPM Newsegment compared to FreeSurfer [35]. Similar results have been shown for previous

versions of SPM [37]. Therefore, SPM may perform better in single center studies of healthy

adults or patient groups imaged with a consistent acquisition protocol.

Limitations

Our study is limited by the small sample size and heterogeneity of our patient population,

particularly with regard to age, maturation and intracranial abnormalities. A further limitation is

the lack of a pediatric brain template available for use with most of the automated methods

(notably SPM NewSegment, FSL, and FreeSurfer). However, while the accuracy of

segmentation may be improved by use of an age-appropriate brain template, templates

constructed using data obtained for healthy children should be used with caution when applied

to volumetric MRI studies of clinical populations with intracranial abnormalities.

13

Surprisingly, while FreeSurfer in particular showed no bias in measuring ICV relative to the

Cavalieri method, all automated methods appear to underestimate TBV by 15-20% relative to

the Cavalieri method (see Figure 3). This may be the result of partial volume artefact arising

between low signal intensity CSF and higher signal intensity GM in the cerebral sulci, leading

to overestimation of brain volume on T1-weighted images [39] in comparison to the FreeSurfer

measurements which inherently include correction for partial volume effects. In future, the

development of an anthropomorphic phantom to be used as a gold standard for methods

comparison studies, or the analysis of images obtained with varying voxel sizes may allow for

a more detailed characterization of the partial volume effects on the estimated TBV values.

Alternatively, this discrepancy may point to a need for additional correction of cortical outlines

in children, but future studies would be needed to clarify the source of the apparent

underestimation of TBV.

For the present study, validation of TBV and ICV was provided by stereologically derived

volumes, but the GM, WM and CSF volumes derived with each automated method could not

bewere not validated individually, which may allow for the evaluation of partial volume effects

and the localization of volumetric deficits to gray or white matter. Additionally, we did not

examine regional volumes or the relative size of different brain structures, which may be

relevant for further analysis [9].

Conclusion

This study provides a novel and relatively rare validation of three common, automated image

analysis techniques for measuring brain volumes from 3D MR images in young children. Using

the Cavalieri method as a gold standard, FreeSurfer provided the best agreement for both ICV

and TBV among the automated software methods. SPM and FSL provide modest or limited

agreement for the same volumetric measurements, possibly due to difficulties with skull

stripping, use of an adult rather than a paediatric brain template, and sensitivity to differences

Kispi, 05/24/16,
R1.9

14

in image contrast from different MRI scanners and protocols. While the accuracy of all three

automated methods may be improved by registration to a pediatric template, the present study

confirms the suitability of FreeSurfer for the automated assessment of brain volumes in young

children.

15

Abbreviations

AUC area under the curve

BET brain extraction tool

CE coefficient of error

CHD congenital heart disease

cMRI cerebral magnetic resonance imaging

CSF cerebrospinal fluid

ICV intracranial volume

FAST FSL segmentation tool

FSL FMRIB Software Library v5.0

GM gray matter

HLHS hypoplastic left-heart syndrome

HLHC hypoplastic left-heart complex

MNI Montreal neurological institute

MP-RAGE magnetization prepared rapid acquisition gradient echo

MRI magnetic resonance imaging

SPGR spoiled gradient echo

SPM8 Statistical Parametric Mapping version 8

TBV total brain volume

UVH univentricular hypoplasia

WM white matter

Conflict of interest: We declare that we have no conflict of interest. The draft of the manuscript was

written by the first author. No honorarium, grant, or other form of payment was given to anyone to

produce the manuscript.

Kispi, 24/05/16,
R1.5

16

References

1. Khalil A, Suff N, Thilaganathan B, Hurrell A, Cooper D, Carvalho JS (2014) Brain

abnormalities and neurodevelopmental delay in congenital heart disease: systematic

review and meta-analysis. Ultrasound Obstet Gynecol 43:14-24.

2. Owen M, Shevell M, Donofrio M, Majnemer A, McCarter R, Vezina G, Bouyssi-Kobar

M, Evangelou I, Freeman D, Weisenfeld N, Limperopoulos C (2014) Brain volume and

neurobehavior in newborns with complex congenital heart defects. J Pediatr 164:1121-

1127.e1121.

3. Watanabe K, Matsui M, Matsuzawa J, Tanaka C, Noguchi K, Yoshimura N, Hongo K,

Ishiguro M, Wanatabe S, Hirono K, Uese K, Ichida F, Origasa H, Nakazawa J, Oshima

Y, Miyawaki T, Matsuzaki T, Yagihara T, Bilker W, Gur RC (2009) Impaired

neuroanatomic development in infants with congenital heart disease. J Thorac

Cardiovasc Surg 137:146-153.

4. von Rhein M, Buchmann A, Hagmann C, Huber R, Klaver P, Knirsch W, Latal B (2014)

Brain volumes predict neurodevelopment in adolescents after surgery for congenital

heart disease. Brain 137:268-276.

5. Dale AM, Fischl B, Sereno MI (1999) Cortical surface-based analysis. I. Segmentation

and surface reconstruction. Neuroimage 9:179-194.

6. Fischl B, Salat DH, Busa E, Albert M, Dieterich M, Haselgrove C, van der Kouwe A,

Killiany R, Kennedy D, Klaveness S, Montillo A, Makris N, Rosen B, Dale AM (2002)

Whole brain segmentation: automated labeling of neuroanatomical structures in the

human brain. Neuron 33:341-355.

7. Smith SM, Jenkinson M, Woolrich MW, Beckmann CF, Behrens TE, Johansen-Berg H,

Bannister PR, De Luca M, Drobnjak I, Flitney DE, Niazy RK, Saunders J, Vickers J,

Zhang Y, De Stefano N, Brady JM, Matthews PM (2004) Advances in functional and

structural MR image analysis and implementation as FSL. Neuroimage 23 Suppl

1:S208-219.

17

8. Ashburner J, Friston KJ (1999) Nonlinear spatial normalization using basis functions.

Hum Brain Mapp 7:254-266.

9. Gousias IS, Rueckert D, Heckemann RA, Dyet LE, Boardman JP, Edwards AD,

Hammers A (2008) Automatic segmentation of brain MRIs of 2-year-olds into 83

regions of interest. Neuroimage 40:672-684.

10. Murgasova M, Dyet L, Edwards D, Rutherford M, Hajnal J, Rueckert D (2007)

Segmentation of brain MRI in young children. Acad Radiol 14:1350-1366.

11. Heckemann RA, Hajnal JV, Aljabar P, Rueckert D, Hammers A (2006) Automatic

anatomical brain MRI segmentation combining label propagation and decision fusion.

Neuroimage 33:115-126.

12. McQuillen PS, Miller SP (2010) Congenital heart disease and brain development. Ann

N Y Acad Sci 1184:68-86.

13. Bartholomeusz HH, Courchesne E, Karns CM (2002) Relationship between head

circumference and brain volume in healthy normal toddlers, children, and adults.

Neuropediatrics 33:239-241.

14. Ashburner J, Friston K (1997) Multimodal image coregistration and partitioning--a

unified framework. Neuroimage 6:209-217.

15. Ashburner J, Friston KJ (2005) Unified segmentation. Neuroimage 26:839-851.

16. Mazziotta J, Toga A, Evans A, Fox P, Lancaster J, Zilles K, Woods R, Paus T,

Simpson G, Pike B, Holmes C, Collins L, Thompson P, MacDonald D, Iacoboni M,

Schormann T, Amunts K, Palomero-Gallagher N, Geyer S, Parsons L, Narr K, Kabani

N, Le Goualher G, Boomsma D, Cannon T, Kawashima R, Mazoyer B (2001) A

probabilistic atlas and reference system for the human brain: International Consortium

for Brain Mapping (ICBM). Philos Trans R Soc Lond B Biol Sci 356:1293-1322.

17. AC E (1993) 3D statistical neuroanatomical models from 305 MRI volumes. In: DL C

(ed). Proc. IEEE Nucl. Sci. Symp. Med. Imaging Conf., pp 1813-1817.

18. Shi F, Yap PT, Wu G, Jia H, Gilmore JH, Lin W, Shen D (2011) Infant brain atlases

from neonates to 1- and 2-year-olds. PLoS One 6:e18746.

18

19. Zhang Y, Brady M, Smith S (2001) Segmentation of brain MR images through a hidden

Markov random field model and the expectation-maximization algorithm. IEEE Trans

Med Imaging 20:45-57.

20. Desikan RS, Ségonne F, Fischl B, Quinn BT, Dickerson BC, Blacker D, Buckner RL,

Dale AM, Maguire RP, Hyman BT, Albert MS, Killiany RJ (2006) An automated labeling

system for subdividing the human cerebral cortex on MRI scans into gyral based

regions of interest. Neuroimage 31:968-980.

21. Fischl B, Sereno MI, Dale AM (1999) Cortical surface-based analysis. II: Inflation,

flattening, and a surface-based coordinate system. Neuroimage 9:195-207.

22. Fischl B, van der Kouwe A, Destrieux C, Halgren E, Ségonne F, Salat DH, Busa E,

Seidman LJ, Goldstein J, Kennedy D, Caviness V, Makris N, Rosen B, Dale AM (2004)

Automatically parcellating the human cerebral cortex. Cereb Cortex 14:11-22.

23. Ségonne F, Pacheco J, Fischl B (2007) Geometrically accurate topology-correction of

cortical surfaces using nonseparating loops. IEEE Trans Med Imaging 26:518-529.

24. Sled JG, Zijdenbos AP, Evans AC (1998) A nonparametric method for automatic

correction of intensity nonuniformity in MRI data. IEEE Trans Med Imaging 17:87-97.

25. Lowe JR, Maclean PC, Caprihan A, Ohls RK, Qualls C, Vanmeter J, Phillips JP (2012)

Comparison of cerebral volume in children aged 18-22 and 36-47 months born preterm

and term. J Child Neurol 27:172-177.

26. Roberts N, Puddephat MJ, McNulty V (2000) The benefit of stereology for quantitative

radiology. Br J Radiol 73:679-697.

27. J. PM (1999) Computer Interface fir Convenient Application for Stereological Methods

for Unbiased Estimation of Volume and Surface Area: Studies Using MRI with

Particular Reference to the Human Brain. University of Liverpool, Liverpool.

28. Keller SS, Highley JR, Garcia-Finana M, Sluming V, Rezaie R, Roberts N (2007) Sulcal

variability, stereological measurement and asymmetry of Broca's area on MR images. J

Anat 211:534-555.

19

29. Keller SS, Gerdes JS, Mohammadi S, Kellinghaus C, Kugel H, Deppe K, Ringelstein

EB, Evers S, Schwindt W, Deppe M (2012) Volume estimation of the thalamus using

freesurfer and stereology: consistency between methods. Neuroinformatics 10:341-

350.

30. Cruz-Orive LM, Gelšvartas J, Roberts N (2014) Sampling theory and automated

simulations for vertical sections, applied to human brain. J Microsc 253:119-150.

31. Mayhew TM, Olsen DR (1991) Magnetic resonance imaging (MRI) and model-free

estimates of brain volume determined using the Cavalieri principle. J Anat 178:133-

144.

32. Keller SS, Roberts N (2009) Measurement of brain volume using MRI: software,

techniques, choices and prerequisites. J Anthropol Sci 87:127-151.

33. Salmenperä T, Könönen M, Roberts N, Vanninen R, Pitkänen A, Kälviäinen R (2005)

Hippocampal damage in newly diagnosed focal epilepsy: a prospective MRI study.

Neurology 64:62-68.

34. Mulder ER, de Jong RA, Knol DL, van Schijndel RA, Cover KS, Visser PJ, Barkhof F,

Vrenken H, Initiative AsDN (2014) Hippocampal volume change measurement:

quantitative assessment of the reproducibility of expert manual outlining and the

automated methods FreeSurfer and FIRST. Neuroimage 92:169-181.

35. Eggert LD, Sommer J, Jansen A, Kircher T, Konrad C (2012) Accuracy and reliability of

automated gray matter segmentation pathways on real and simulated structural

magnetic resonance images of the human brain. PLoS One 7:e45081.

36. Morey RA, Petty CM, Xu Y, Hayes JP, Wagner HR, Lewis DV, LaBar KS, Styner M,

McCarthy G (2009) A comparison of automated segmentation and manual tracing for

quantifying hippocampal and amygdala volumes. Neuroimage 45:855-866.

37. Klauschen F, Goldman A, Barra V, Meyer-Lindenberg A, Lundervold A (2009)

Evaluation of automated brain MR image segmentation and volumetry methods. Hum

Brain Mapp 30:1310-1327.

20

38. Dewey J, Hana G, Russell T, Price J, McCaffrey D, Harezlak J, Sem E, Anyanwu JC,

Guttmann CR, Navia B, Cohen R, Tate DF, Consortium HN (2010) Reliability and

validity of MRI-based automated volumetry software relative to auto-assisted manual

measurement of subcortical structures in HIV-infected patients from a multisite study.

Neuroimage 51:1334-1344.

39. Furlong C, García-Fiñana M, Puddephat M, Anderson A, Fabricius K, Eriksen N,

Pakkenberg B, Roberts N (2013) Application of stereological methods to estimate post-

mortem brain surface area using 3T MRI. Magn Reson Imaging 31:456-465.

40. Medoff-Cooper B, Irving SY, Hanlon AL, Golfenshtein N, Radcliffe J, Stallings VA,

Marino BS, Ravishankar C (2016) The Association among Feeding Mode, Growth, and

Developmental Outcomes in Infants with Complex Congenital Heart Disease at 6 and

12 Months of Age. J Pediatr 169:154-159.e151.

41. Daymont C, Neal A, Prosnitz A, Cohen MS (2013) Growth in children with congenital

heart disease. Pediatrics 131:e236-242.

21

Figure legends:

Figure 1: Schematic illustration of the Cavalieri method, shown with the cortical and white

matter outlines from FreeSurfer for comparison. Top panel: Control points included in the total

brain volume are shown in green while those excluded from the volume are shown in red.

Middle panels: zoomed image from the inset region for the Cavalieri method (top) and

FreeSurfer (bottom). Bottom panel: Corresponding axial slices from FreeSurfer depicting the

cortical surface (red) and the pial boundaries.

Figure 2: Validation of semi-automated methods for the total intracranial volume (ICV). Figure

2a-c (top panel): correlation plots showing the association between ICV derived with SPM

NewSegment, SPM Segment, FSL, and FreeSurfer vs. ICV obtained using the Cavalieri

method for each subject. Figure 2d-f (bottom panel): Bland-Altman plots showing the bias for

each of the derived volumes.

FAST, Oxford Centre for Functional MRI of the Brain (FMRIB) Automated Segmentation Tool,

SPM, statistical parametric mapping, n=21

Figure 3: Validation of semi-automated methods for the total brain volume (TBV). Figure 3a-c

(top panel): correlation plots showing the association between TBV derived with SPM

NewSegment, SPM Segment, FSL, and FreeSurfer vs. TBV obtained using the Cavalieri

method for each subject. Figure 3d-f (bottom panel): Bland-Altman plots showing the bias for

each of the derived volumes.

FAST, Oxford Centre for Functional MRI of the Brain (FMRIB) Automated Segmentation Tool,

SPM, statistical parametric mapping, n=21

*excluding brainstem

22

Table 1. Estimated intracranial volumes from the automated segmentation tools, the Cavalieri method and head circumference

ICVMean ± SD

Correlation ICV with Cavalieri

r (p value)

Correlation ICV with HC

r (p value)

Cavalieri (gold standard), mL 1079 ± 74 .52(p=.01)

SPM NewSegment, mL 1156 ± 118 † -.00(p=.50)

.37 (p=.10)

SPM Segment, mL 1260 ± 26 † .43

(p=.05).42

(p=.06)

FAST,mL 1007 ± 139 .23

(p=0.3).42

(p=.06)

FreeSurfer, mL 1035 ± 78 .72 (p<.001)

.68(p<.001)

Table 2. Estimated total brain volumes from the automated segmentation tools and the Cavalieri method

TBVMean ± SD

Correlation TBV with Cavalieri

r (p value)Cavalieri (including brainstem), mL 1011 ± 91

Cavalieri (excluding brainstem), mL 996 ± 91

SPM NewSegment, mL 916 ± 86† -.18(p=.44)

SPM Segment, mL 864 ± 171 .58(p=.006)

FAST, mL 785 ± 133 .40

(p=.07)

FreeSurfer (with manual corrections), mL 858 ± 100 .96* (p<.001)

FreeSurfer (without manual corrections), mL 849 ± 106 .97* (p<.001)

23

Table 3. ROC analysis of the automated segmentation tools

ICV TBV

Sensitivity Specificity AUC Sensitivity Specificity AUC

SPM NewSegment 90% 40% 0.65(p=.26)

90% 50% 0.67(p=.18)

SPM Segment 64% 90% 0.82(p=.014)

90% 70% 0.74(p=.07)

FAST 73% 80% 0.74(p=.13)

73% 80% 0.71(p=.11)

FreeSurfer 91% 100% 0.99(p<.001)

91% 100% 0.97(p<.001)

HC 100% 60% 0.87(p=.004)

Kispi, 24/05/16,
R1.6

24

Table legends:

Table 1: FAST, Oxford Centre for Functional MRI of the Brain (FMRIB) Automated

Segmentation Tool, ICV, intracranial volume, HC, Head circumference, SPM, statistical

parametric mapping, n=21

† Significant differences between the two protocols (center A and center B). No significant

difference was demonstrated for tissue segmentation between centers with FSL FAST

(p=0.98), FreeSurfer (p=0.90) or with stereology (p=0.78).

ρ Spearman’s rho correlation and corresponding p-value (for non-normally distributed data)

Table 2: FAST, Oxford Centre for Functional MRI of the Brain (FMRIB) Automated

Segmentation Tool, TBV, total brain volume, SPM, statistical parametric mapping, n=21

† Significant difference between the two protocols (center A and center B). SPM Segment

showed trend-level differences (p=0.097). No significant difference was demonstrated for

tissue segmentation between centers with FAST (p=0.83), FreeSurfer (p=0.98) or with

stereology (p=0.92).

ρ Spearman’s rho correlation and corresponding p-value (for non-normally distributed data)

* excluding brainstem, as FreeSurfer does not include the brainstem in total brain volume

Table 3: FAST, Oxford Centre for Functional MRI of the Brain (FMRIB) Automated

Segmentation Tool, SPM, statistical parametric mapping, ICV, intracranial volume, TBV, total

brain volume, AUC, area under the curve, n=21


Recommended