+ All Categories
Home > Documents > arXiv:1809.10241v1 [cs.CV] 21 Sep 2018arXiv:1809.10241v1 [cs.CV] 21 Sep 2018 also suffers from the...

arXiv:1809.10241v1 [cs.CV] 21 Sep 2018arXiv:1809.10241v1 [cs.CV] 21 Sep 2018 also suffers from the...

Date post: 06-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
11
Classifying Mammographic Breast Density by Residual Learning Jingxu Xu Shenzhen University, Shenzhen, China Cheng Li Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China Yongjin Zhou Shenzhen University, Shenzhen, China Lisha Mou Shenzhen Second Peoples Hospital, the First Affiliated Hospital of Shenzhen University, Shenzhen, China Hairong Zheng,Shanshan Wang* Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China September 28, 2018 [email protected]; [email protected] Abstract Mammographic breast density, a parameter used to de- scribe the proportion of breast tissue fibrosis, is widely adopted as an evaluation characteristic of the likelihood of breast cancer incidence. Existing methods of breast den- sity classification either requires steps of manual operations or achieves only moderate classification accuracies due to the limited model capacity. In this study, we present a ra- diomics approach based on residual learning for the classi- fication of mammographic breast densities. Different from those established approaches, our method possesses several encouraging properties including being almost fully auto- matic, possessing big model capacity, and having high flex- ibility. As a result, it can obtain outstanding classification results without the necessity of result compensation using mammographs taken from different views. The proposed method was instantiated with the INbreast dataset and clas- sification accuracies of 92.6% and 96.8% were obtained for the four BI-RADS (Breast Imaging and Reporting Data System) category task and the two BI-RADS category task, respectively. Both values are significantly higher than the classification results of the current state-of-the-art methods, including the eight-layer convolutional neural network and the high throughput-derived multilayer visual representa- tions. The superior performances achieved with its encour- aging properties indicate that our method has a great poten- tial to be applied as a computer-aided diagnosis tool. 1 Introduction Breast cancer is a big heath threat[30, 5], the incidence of which has increased while its death rates have declined in all age groups in the past decades[5, 29]. This favorable trend of mortality reduction could be related to the improve- ments in the treatment of breast cancer and the widespread adoption of breast cancer screening techniques, especially mammography[29], for early diagnosis. Mammography is the most common and efficient method for breast cancer screening. Clinical studies re- ported that compared to mammographic abnormalities (e.g masses, calcification, architectural distortion, asymme- tries), the change of breast density is another important indi- cator of early breast cancer development[22, 23, 25]. How- ever, inspection of the generated large quantities of mam- mographs by radiologists is tedious and subjective, which arXiv:1809.10241v1 [cs.CV] 21 Sep 2018
Transcript
Page 1: arXiv:1809.10241v1 [cs.CV] 21 Sep 2018arXiv:1809.10241v1 [cs.CV] 21 Sep 2018 also suffers from the intra- and inter-radiologists repro-ducibility problem[11, 2]. The very first research

Classifying Mammographic Breast Density by Residual Learning

Jingxu XuShenzhen University, Shenzhen, China

Cheng LiShenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China

Yongjin ZhouShenzhen University, Shenzhen, China

Lisha MouShenzhen Second Peoples Hospital, the First Affiliated Hospital of Shenzhen University, Shenzhen, China

Hairong Zheng,Shanshan Wang*Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China

September 28, 2018

[email protected]; [email protected]

Abstract

Mammographic breast density, a parameter used to de-scribe the proportion of breast tissue fibrosis, is widelyadopted as an evaluation characteristic of the likelihood ofbreast cancer incidence. Existing methods of breast den-sity classification either requires steps of manual operationsor achieves only moderate classification accuracies due tothe limited model capacity. In this study, we present a ra-diomics approach based on residual learning for the classi-fication of mammographic breast densities. Different fromthose established approaches, our method possesses severalencouraging properties including being almost fully auto-matic, possessing big model capacity, and having high flex-ibility. As a result, it can obtain outstanding classificationresults without the necessity of result compensation usingmammographs taken from different views. The proposedmethod was instantiated with the INbreast dataset and clas-sification accuracies of 92.6% and 96.8% were obtainedfor the four BI-RADS (Breast Imaging and Reporting DataSystem) category task and the two BI-RADS category task,respectively. Both values are significantly higher than theclassification results of the current state-of-the-art methods,including the eight-layer convolutional neural network and

the high throughput-derived multilayer visual representa-tions. The superior performances achieved with its encour-aging properties indicate that our method has a great poten-tial to be applied as a computer-aided diagnosis tool.

1 Introduction

Breast cancer is a big heath threat[30, 5], the incidenceof which has increased while its death rates have declinedin all age groups in the past decades[5, 29]. This favorabletrend of mortality reduction could be related to the improve-ments in the treatment of breast cancer and the widespreadadoption of breast cancer screening techniques, especiallymammography[29], for early diagnosis.

Mammography is the most common and efficientmethod for breast cancer screening. Clinical studies re-ported that compared to mammographic abnormalities (e.gmasses, calcification, architectural distortion, asymme-tries), the change of breast density is another important indi-cator of early breast cancer development[22, 23, 25]. How-ever, inspection of the generated large quantities of mam-mographs by radiologists is tedious and subjective, which

arX

iv:1

809.

1024

1v1

[cs

.CV

] 2

1 Se

p 20

18

Page 2: arXiv:1809.10241v1 [cs.CV] 21 Sep 2018arXiv:1809.10241v1 [cs.CV] 21 Sep 2018 also suffers from the intra- and inter-radiologists repro-ducibility problem[11, 2]. The very first research

also suffers from the intra- and inter-radiologists repro-ducibility problem[11, 2].

The very first research on the importance of breast den-sity began with Wolfe et al., who demonstrated the rela-tionship between mammographic parenchymal patterns andthe risk of developing breast cancer[33]. Following this,Boyd et al. showed a similar correlation between mam-mographic densities and breast cancer risks[4]. Inspiredby these discoveries, a number of studies on breast densityclassification emerged. The American College of Radiolo-gys (ACR) Breast Imaging Reporting and Data System (BI-RADS) groups breasts into four categories according to thedensity with BI-RADS I refers to the lowest densities andBI-RADS IV refers to the highest (BI-RADS I: fat breast(0-25%), BI-RADS II: fat with some fibroglandular tis-sue (26-50%), BI-RADS III: heterogeneously dense breast(51-75%), and BI-RADS IV: extremely dense breast (76-100%)). Female with extremely dense breasts (BI-RADSIV) have a 2-6 times higher risk of developing breast cancerthan female with fatty breasts (BI-RADS I)[11, 14]. There-fore, breast density plays an important role in the early de-tection of breast cancer, and there is an urgent need of anautomatic system which can accurately classify mammo-graphic breast densities.

Initially, many studies measured breast density by quan-tifying the gray-level histograms of mammographs[15, 19,36]. Subsequent studies found that it might be insufficientto classify breasts into the corresponding BI-RADS cate-gories only based on the information of histograms. Forexample, the study by Oliver et al. illustrated that the fourdifferent categories are quite similar with regard to both themean gray-level values and the shapes of the histogram.

To address this issue, researchers turned to applying tra-ditional feature engineering methods to handle the breastdensity classification task. Bovis et al. got a 71.4% accu-racy by using a classifier paradigm where a combination ofthe Fourier and discrete wavelet transforms was investigatedon the first and second-order statistical features[3]. Oliveret al. extracted morphological and texture features frombreast tissue regions which were segmented using a fuzzyc-means clustering technique, and these features were thentreated as inputs for the breast density classifier[22]. Jensenet al. adopted the same breast tissue segmentation methodbut extracted the first and second-order statistical features aswell as morphological features for the Mammographic Im-age Analysis Socity (MIAS) dataset[12]. These two stud-ies achieved 86.0% and 91.4% breast density classificationaccuracies, respectively. Chen et al. evaluated different lo-cal features using texture representation algorithms. Afterthat, they modelled mammographic tissue patterns basedon the local tissue appearances in mammographs[7]. Thework of Indrajeet et al. were based on ROIs manually ex-tracted from image. Then, multi-resolution texture descrip-

tors were extracted from 16 sub-band images which wereobtained from second level decomposition through waveletpacket transform[16]. It could be concluded from thesestudies that the general procedure of breast density clas-sification includes segmenting the breast area, designingand extracting breast density-related features, and inputtingthese features into different classifiers to predict the densitycategories. One major drawback of this procedure is thatprior expert knowledge of the data and a hand-crafting pro-cess are necessary to calculate the quantitative features.

On the other hand, the development of deep learningfields offers a promising solution of using artificial neu-ral networks to automatically extract features for medi-cal image analysis[18, 27, 28, 35]. Convolutional Neu-ral Network (CNN) is one type of these networks that hasshown excellent performances in image classification. CNNcan learn highly nonlinear relationships between the in-puts and outputs without human intervention. A numberof studies have applied deep learning to mammographic re-lated tasks, such as lesion detection, benign and malignantmasses differentiation, microcalcification recognition, andtheir combinations[13, 31, 32, 34, 6]. In respect of thebreast density classification, Mohamed et al. designed aneight-layers CNN to group the mammographs into two cat-egories (scattered density and heterogeneously dense) as asimplification of the complicated four BI-RADS categorytask[20]. Similarly, Ahn et al. designed a CNN architec-ture to learn the image characteristics from mammographsand classify the corresponding breasts into dense and fattytissues[1]. From these pioneer studies, we can find that fewstudies directly classified mammographs into the four BI-RADS categories. One possible reason is that the model ca-pacity of CNN models applied were limited whose shallownetwork structures prevent it from obtaining enough mean-ingful and abstract features for accomplishing the difficulttask.

Radiomics is an emerging method in recent years thatworks by extracting large amounts of advanced quantitativefeatures from medical images and quantifying the predic-tive or prognosis relationships between images and medi-cal outcomes according to the features[17, 26]. Neverthe-less, the advantages of CNNs have not been fully integratedwith the radiomics approach to solve the problems encoun-tered during classifying mammographic breast densitiesinto the four BI-RADS categories. Therefore, in this paper,we propose a CNN-based (residual learning)[10] radiomicsmethod for the automatic extraction of high-throughput fea-tures from mammographs and the subsequent classificationof the breast densities. Specifically, our contributions arethreefold.

1. Our work demonstrates the first attempt of applyinga deep CNN as a radiomics approach to automatically ex-tract high-throughput, high-level, and high-abstract features

2

Page 3: arXiv:1809.10241v1 [cs.CV] 21 Sep 2018arXiv:1809.10241v1 [cs.CV] 21 Sep 2018 also suffers from the intra- and inter-radiologists repro-ducibility problem[11, 2]. The very first research

from mammographs, which serves as the basis of an accu-rate classification model of mammographic breast densities.

2. In addition to the existing situation, where a two-category classification is studied, our proposed method canaccurately classify mammographic breast densities strictlyfollowing the four BI-RADS categories. Moreover, ournetwork possesses the capacity to learn deep features foraccurate BI-RADS category classification from a singlemammographic image. Result compensation from differentviews (such as the craniocaudal view and the mediolateraloblique view) is not required.

3. Our method could be treated as a baseline of mam-mographic breast density classification for clinical appli-cations. Due to the large capacity of residual CNNs, ourmethod could be easily adapted to new datasets with newexperimental parameters through parameter fine-tuning.

The rest of this paper is organized as follows. Section IIgives the detailed description of the dataset and an overviewof the CNN methods based on residual learning and ra-diomics. In Section III, the proposed CNN architecture andthe training details including parameter settings and imple-mentation details are presented. The experimental resultsare introduced in Section IV, followed by a discussion andconclusion in Section V and VI.

2 Materials And Methods

2.1 Dataset

In this study, we evaluated our methods on the publicavailable dataset, INbreast dataset[21], which contains 115cases (410 images). Among the 115 cases, 90 cases arefrom women with both breasts affected (4 images per case)and 25 cases are from mastectomy patients (2 images percase). Two views for each breast were recorded, a cran-iocaudal (CC) view, which is a top to bottom view, and amediolateral oblique (MLO) view, which is a side view. Thedataset provides a breast density assessment of each mam-mograph with the corresponding labels of BI-RADS cate-gories, which makes it suitable for our study. The mam-mographs were acquired on x-ray films and saved by thestandard Digital Imaging and Communications in Medical(DICOM) format. The image matrix has either 3328×4084or 2560×3328 pixels. Among the 409 images(1 missing thelabel), 136 are classified as BI-RADS I, 146 as BI-RADS II,99 as BI-RADS III and 28 as class BI-RADS IV (exampleimages of four categories are shown in Fig 1).

2.2 Data Preprocessing

As introduced in the dataset section, we have a total num-ber of 409 images. As we all know, training CNN modelsrequires a large number of data and data augmentation is

critical step. We also observe a data imbalance between thefour BI-RADS categories that needs to be dealt with. Wefirst performed a four-fold rotation augmentation for the BI-RADS IV images. After that, we randomly separated all theimages into three groups: 349 for training, 77 for validationand 95 for independent test. At last, to augment the trainingdataset, we further processed the training and validation setsthrough rotation by eight random angles, horizontal flip, andvertical flip. Therefore, we have a training dataset of 11168(349 × 8 × 2 × 2) images and a validation dataset of 2464(77× 8× 2× 2).

Another problem that needs to be resolved before thenetwork training is the large size of each image. The origi-nal mammography images have 3328×4084 or 2560×3328pixels. To reduce the computational load and memory us-age, we need to downsample the original images. To reducethe computational load and memory usage, we downsam-pled the original images, i.e. resized the original images to224× 224 pixels.

2.3 CNN-based residual learning for mammo-graph classification

CNNs are a class of deep learning methods that attemptto learn high-level features and attack the computer visionproblems, such as classification, detection, segmentation,and so on. Gradient vanishing is a big problem for CNNswith deep layers. Thanks to the invention of the residualnetwork, CNNs can go substantially deeper now than pre-vious. A detailed description of the residual learning blockwill be presented here. The residual learning was used tosolve the degradation problem after stacking a lot of convo-lution layers. We useH (X) to denote the desired nonlinearoutput feature map of the input feature map x after applyingthe stacked layers. Now, we let the stacked nonlinear layersfit another mapping of:

F (X) = H (X)−X (1)

and H (X) is recast to

F (X) +X (2)

The formulation of F (X) +X can be realized by feed-forward CNN with shortcut connections or skip connections(Fig2). In this case, no extra parameters or computationalburden are added to the training process. Due to the prop-agation of gradients through the shortcut connections, it iseasier to optimize the residual mapping of F (X) than tooptimize the original mapping of H (X). Therefore, byadding residual learning block, deeper networks could bedesigned to extract richer information from images for ourclassification tasks.

3

Page 4: arXiv:1809.10241v1 [cs.CV] 21 Sep 2018arXiv:1809.10241v1 [cs.CV] 21 Sep 2018 also suffers from the intra- and inter-radiologists repro-ducibility problem[11, 2]. The very first research

(a) (b) (c) (d)

Figure 1. Example of mammographs of the different BI-RADS categories. (a) BI-RADS I, (b) BI-RADS II, (c) BI-RADS III, and(d) BI-RADS IV.

Figure 2. Residual learning block.

Next, we will describe in detail the CNN method used forimage classification. After preprocessing, the training, val-idation, and test datasets went through the training and thetest stages respectively as shown in Fig3. CNNs are trainedby feedforward and backpropagation processes. The feed-forward process extracts and selects the features and cal-culates the loss, whereas the backpropagation process op-timizes the network parameters by gradient descent of the

loss function.The feedforward process of CNNs could be interpreted

by the following steps. Firstly, the images pass through theconvolution layers.

Cl = σl (Wl ∗ Cl−1 + bl) (3)

Where l denotes the layer number, σ1 denotes the non-linear activation (the rectified linear activiation (Relu) wasused for this study), W1 and b1 are weights and bias, * de-notes the convolution operation, and C1 denotes the featuremaps with C0 denotes the input. Some convolution layersare followed by downsampling procedure (average poolinglayers).

Cl,pool = averagepooling (Cl) (4)

For our classification task, a softmax activation was in-cluded after three fully connected layers.

CL = softmax (WLCL−1 + bL) (5)

where softmax (x)i = exi∑exi

and CL is the output oflast layer, So the final prediction from the network could besummarized as

Y = C (θ,X) (6)

where θ consists of of all the network parameters to be es-timated and optimized, C denotes the overall forward passnetwork and X means the input and X refers to the input.

On the other hand, the CNN backward process is thebackward propagation of loss gradients, which tries to op-timize the network parameters by addressing the followingcross-entropy loss minimization problem

4

Page 5: arXiv:1809.10241v1 [cs.CV] 21 Sep 2018arXiv:1809.10241v1 [cs.CV] 21 Sep 2018 also suffers from the intra- and inter-radiologists repro-ducibility problem[11, 2]. The very first research

Figure 3. Schematic diagram of residual learning for classification.

θ̂ = argminθ

{−

K∑k=1

[I∑i=1

Y′

i log(C (θ,X))

]}(7)

where I and K are the total number of classification cate-gories and training samples respectively. Y

i is the manuallylabelled ground truth provided by the INbreast dataset.

After the training phase, a classification model is ob-tained with the trained parameters. For new independentsamples, we can generate the probability distributions ofeach case by calculating

Ytest = C(θ̂, Xtest

)(8)

Then the BI-RADS categories of the mammograph im-ages could be determined accordingly.

3 Experiments

3.1 CNN architecture

For our classification task, we applied a 70 weight-layerCNN model. As shown in Fig4, the model could be dividedinto 3 stages and each stage has 7 residual learning blocks

(only 3 residual learning blocks are shown in Fig4). In to-tal, the model has 70 weight layers (67 convolution layersand 3 fully connected layers) without counting the averagepooling and batch normalization layers. All the convolutionkernel size was set to 3×3, and the numbers of convolutionkernel for the three residual stages were set to 64, 128, and256. Moreover, the Relu activation function was adoptedfollowing each convolution. The average pooling size andstrides were set to 2 × 2. For the last three fully connectedlayers, the number of neurons were respectively set to 1024,512 and 4. The first two fully connection layers were setwith the Relu activation function, while the last one usedthe softmax instead.

Different layers of the network mean we are extractingdifferent levels of abstract information from the input sam-ples. To test the sensitivity of our classification model tothe CNN depth, another two CNN configurations were alsoevaluated, the 36 and the 48 weight-layer CNN models.These two models also have 3 stages but have fewer con-volutional layers in each stage. The different configurationswere compared to demonstrate the importance of high-levelfeatures on the final classification performance.

5

Page 6: arXiv:1809.10241v1 [cs.CV] 21 Sep 2018arXiv:1809.10241v1 [cs.CV] 21 Sep 2018 also suffers from the intra- and inter-radiologists repro-ducibility problem[11, 2]. The very first research

Figure 4. The CNN architecture.

3.2 Parameter settings and implementation de-tails

We used the Keras (a deep learning framework withtensorflow as backend) to implement our CNN networksfor the breast density classification task. Tensorboard wasadopted to monitor the entire training process, includingthe evolution of the accuracy and loss. The network train-ing was implemented on a Dell-7910 workstation equippedwith two E5-2640v4 Intel Haswells, a NVIDIA TITAN XPGPU and 64G memory. Adam was used for training, withbatch size of 16, maximal number of iterations of 3200 andinitial learning rate of 0.0001. Random values drew fromthe uniform distribution were used for the weight initializa-tion and zero for the bias initialization.

4 Results

4.1 Training convergence property.

Minimizing the cross-entropy loss is the target of the net-work parameter optimization, and increasing the classifica-tion accuracy reflects the improved capability of a classifi-cation model to differentiate the different categories. There-fore, to monitor the convergence properties of our networkduring the training stage, we plotted out the loss as well asthe accuracy curves of both the training dataset and the val-idation dataset with respect to the iterations (Fig. 5). Thesecurves could present the detailed learning procedure of thenetwork. Our loss results fluctuated stably around zero af-ter 80 epochs (Fig5), which demonstrates that the residualnetwork training converged gradually. The small fluctua-tions might be caused by the differences between the sam-ples. Similar phenomena could be observed from the accu-

6

Page 7: arXiv:1809.10241v1 [cs.CV] 21 Sep 2018arXiv:1809.10241v1 [cs.CV] 21 Sep 2018 also suffers from the intra- and inter-radiologists repro-ducibility problem[11, 2]. The very first research

Figure 5. The plot of accuracy and loss in the trainingstage.

Table 1. PERFORMANCE WITH DIFFERENT NET-WORK CONGFIGURATIONS

Models 36L 48L 70LBI-RADS I 92.00% 88.00% 96.00%BI-RADS II 88.46% 100.00% 96.15%BI-RADS III 90.48% 71.20% 95.24%BI-RADS IV 73.91% 73.91% 82.61%

ALL(accuracy) 86.32% 85.26% 92.63%

racy curves with both training and validation curves showedsmall fluctuations around the accuracy value of 1 after 80epochs (Fig5). These results prove that our network trainingcould converge gradually. Once the network was trained, itcould be used to obtain the classification predictions of newindependent samples.

4.2 Classification performance of different net-work configurations

As explained in the Materials and Methods section, inorder to test whether our network is sensitive to the depthof the residual network, we compared the classification re-sults of our 70 weight-layer CNN model to those of the 36and 48 weight-layer CNN models. Table I summarizes theclassification accuracies of the three different network con-figurations. It could be found that the 36 and 48 weight-layer networks have similar overall classification accura-cies. On the other hand, the 70 weight-layer network hasa significant increased accuracy. CNN models with differ-ent depths can learn features of different hierarchies. We

Table 2. PERFORMANCE WITH DIFFERENT CLASSModels 36L 48L 70L

Scattered density 94.12% 98.04% 100.00%Heterogeneously dense 97.73% 86.36% 95.35%

ALL(accuracy) 95.79% 92.63% 96.84%

believe that the 70 weight-layer CNN model learned higherlevels of features, which led to the improved performancecompared to the 36 and 48 weight-layer models. One phe-nomenon we need to pay attention to is that all of the threenetworks showed much lower classification accuracies forthe BI-RADS IV category, which might be caused by thedata imbalance as we only have 28/410 original images thatare classified as BI-RADS IV in the INbreast dataset.

4.3 Classification according to two categories vsfour categories

Many studies simplified the problem from the originalfour-category classification to a two-category classificationcase. In clinical applications, it is more challenging for ra-diologists to classify the breasts into the four BI-RADS cat-egories correctly due to the difficulty of discerning the vi-sual features of breast tissues between the four categories.Therefore, some studies treated BI-RADS I and BI-RADSII as one scattered density category and BI-RADS III andBI-RADS IV as one heterogeneously dense category incompliance with the clinical requirements.In this respect,we have made small changes to the original residual net-work to deal with the problem of dichotomous classifica-tion. The results are shown in the Table II. We can ob-serve that all of our three residual network configurationsshowed good dichotomous classification performances, es-pecially the 70 weight-layer residual network. Comparedto the other two networks, the 70 weight-layer residual net-work reached a significantly higher overall classification ac-curacy of 96.84%.

4.4 Comparison with state-of-the-art methods

In order to further evaluate the proposed method, wecompared it to two reported neural network based meth-ods, the eight-layer convolutional neural network[20] andthe high-throughput-derived multilayer visual representa-tions (V1-like, HT-L2 and HT-L3)[9]. For the first method,the authors explored an eight-layer CNN to classify breastsbetween scattered density and heterogeneously dense. Weapplied some non-technical changes to make it possibleto compare with our four-class task. For the secondmethod, feature extractors (V1-like, HT-L2 and HT-L3)were first described by Cox et al. and Pinto et al. for facerecognition[8, 24]. Fonseca et al. applied and evaluated the

7

Page 8: arXiv:1809.10241v1 [cs.CV] 21 Sep 2018arXiv:1809.10241v1 [cs.CV] 21 Sep 2018 also suffers from the intra- and inter-radiologists repro-ducibility problem[11, 2]. The very first research

performance of these feature extractors on classifying mam-mographs into the four ACR composition categories[9]. Wehave made a comprehensive comparison of the differentmethods considering both two category and four-categoryclassification problems.

From TABLE III and TABLE IV, it could be con-cluded that for both the two-category problem and the four-category problem, our proposed method always showedhigher classification accuracies. One important reasoncould be that our network was deeper and could extractmore abstract and deeper features, which is very importantfor the accurate classification of the different BI-RADS cat-egories.

5 Discussion

Traditional radiomics methods extract features based onmanual observation and operation, including manual de-sign, extraction and selection. Compared to the traditionalfeature engineering approach, deep convolutional networkswith residual learning can automatically extract high-order,high-abstraction, and subtle features from mammogramsthat are not easily observable to human eyes, which en-ables accurate discrimination of the four BI-RADS cate-gories. Moreover, by working with the whole original im-ages, the classification model has access to all the image-relevant information and elevated performance could be ex-pected. With the proposed method, an overall accuracy of92.63% for the four BI-RADS category classification taskand an accuracy of 96.84% for the two BI-RADS categoryclassification task were obtained. Both were higher thanthe literature reported values, where only relatively shallownetworks were applied.

An exam of breast cancer screening by mammographygenerally comes with CC and MLO views for a singlebreast. Multi-view models which make a classification deci-sion by considering the different views have been reported.However, to accommodate the information from differentviews into the final prediction, different model parametersets need to be trained accordingly, which leads to signifi-cantly increased computational burden and decreased test-ing speed. On the other hand, our proposed model has al-ready shown excellent performance for the mammographicdensity classification task without considering the relation-ships between the different views. Therefore, we could con-clude that the large capacity of our model enables the ex-traction of deep enough features for accurate BI-RADS cat-egory classifications of breasts which avoids the necessityof multi-view compensation.

Different imaging systems or experimental settings gen-erate images of different standards. A trained CNN can onlyproperly handle the domain-specific images. Though in-cluding different types of images into the training process

can help build a more robust CNN model, it is not realisticto collect a dataset which considers all the different possi-bilities. Thanks to the large capacity of CNNs, our classi-fication model could be easily extended depending on theapplication situations. If the dataset to be processed is ina similar domain as the original dataset, the trained CNNmodel could be used directly. However, if the new datasetis in a very different domain from the original dataset, fine-tuning of the trained CNN is required before it could besuccessfully applied. Compared with training from scratch,fine-tuning of CNNs requires much fewer samples and thetraining process is significantly faster.

Our residual learning-based CNN model could serve as abaseline for mammographic breast density classification. Inthe future, we expect to collect more data, especially thoseof BI-RADS IV category, to train a more powerful CNNmodel. We also plan to test the fine-tuning performance ofthe baseline model by using datasets that come from dif-ferent systems or different experimental settings. Finally,we have already collected a number of clinical samples, wewill try our method on these samples to investigate the do-main transfer behavior of the established model. We willmake our code and trained models publicly available onceour manuscript gets accepted to foster the research in thefield.

6 Conclusion

In this study, we have investigated the use of a radiomics-based method through residual learning for mammographicbreast density classification. To the best of our knowledge,it is the first attempt that applies residual learning as a ra-diomics approach to extract high-throughput features frommammographs and classify the breasts accordingly. Thesuperior classification accuracies achieved demonstrate itsfeasibility. Another important advantage of the proposedmethod is that the classification model is trained end-to-end. Sophisticated pre-processing of the mammographicimages such as segmentation of the breast tissues is not re-quired. This allows the proposed method to be automaticand almost no human intervention is needed. In addition,our method has an appealing attribute that it is readily ex-tendable to different experimental settings and applicationsituations. All of these encouraging properties make it agood candidate algorithm for CAD systems.

References

[1] C. K. Ahn, C. Heo, H. Jin, and J. H. Kim. A noveldeep learning-based approach to high-accuracy breastdensity estimation in digital mammography. In Soci-ety of Photo-Optical Instrumentation Engineers, page101342O, 2017.

8

Page 9: arXiv:1809.10241v1 [cs.CV] 21 Sep 2018arXiv:1809.10241v1 [cs.CV] 21 Sep 2018 also suffers from the intra- and inter-radiologists repro-ducibility problem[11, 2]. The very first research

Table 3. CLASSIFICATION ACCURACIES OF DIFFERENT METHODS FOR TWO-CLASS PROBLEMModels/Data Scattered density Heterogeneously dense ALL(accuracy)

V1-like 94.13% 81.82% 88.42%HT-L2 96.08% 75.00% 86.32%HT-L3 94.13% 65.10% 81.05%8-CNN 96.08% 79.55% 88.42%OUR 100.00% 95.35% 96.84%

Table 4. CLASSIFICATION ACCURACIES OF DIFFERENT METHODS FOR FOUR-CLASS PROBLEMModels BI-RADS I BI-RADS II BI-RADS III BI-RADS IV ALLV1-like 72.00% 76.92% 90.48% 56.62% 73.68%HT-L2 68.00% 76.92% 76.20% 60.87% 70.53%HT-L3 64.00% 76.92% 95.24% 52.17% 71.58%8-CNN 76.00% 88.46% 85.71% 65.23% 78.95%OUR 96.00% 96.15% 95.24% 82.61% 92.63%

[2] W. A. Berg, C. Campassi, P. Langenberg, and M. J.Sexton. Breast imaging reporting and data system:inter- and intraobserver variability in feature analy-sis and final assessment. Ajr American Journal ofRoentgenology, 174(6):1769–77, 2000.

[3] K. Bovis and S. Singh. Classification of mammo-graphic breast density using a combined classifierparadigm. International Workshop on Digital Mam-mography, pages 177–180, 2002.

[4] N. F. Boyd, J. W. Byng, R. A. Jong, E. K. Fishell,L. E. Little, A. B. Miller, G. A. Lockwood, D. L.Tritchler, and M. J. Yaffe. Quantitative classifica-tion of mammographic densities and breast cancerrisk: results from the canadian national breast screen-ing study. Journal of the National Cancer Institute,87(9):670–675, 1995.

[5] F. Bray, P. Mccarron, and D. M. Parkin. The chang-ing global patterns of female breast cancer incidenceand mortality. Breast Cancer Research, 6(6):229–39,2004.

[6] G. Carneiro, J. Nascimento, and A. P. Bradley. Au-tomated analysis of unregistered multi-view mammo-grams with deep learning. IEEE Transactions on Med-ical Imaging, PP(99):1–1, 2017.

[7] Z. Chen, E. Denton, and R. Zwiggelaar. Local fea-ture based mammographic tissue pattern modellingand breast density classification. In 2011 4th interna-tional conference on biomedical engineering and in-formatics, pages 351–355, 2011.

[8] D. Cox and N. Pinto. Beyond simple features: A large-scale feature search approach to unconstrained face

recognition. In IEEE International Conference on Au-tomatic Face & Gesture Recognition and Workshops,pages 8–15, 2011.

[9] P. Fonseca, J. Ferrer, J. Pinto, and B. Castaneda. Au-tomatic breast density classification using a convolu-tional neural network architecture search procedure.In Medical Imaging 2015: Computer-Aided Diagno-sis, 2015.

[10] K. He, X. Zhang, S. Ren, and J. Sun. Deep residuallearning for image recognition. In IEEE Conferenceon Computer Vision and Pattern Recognition, pages770–778, 2016.

[11] C. W. Huo, G. L. Chew, K. L. Britt, W. V. Ingman,M. A. Henderson, J. L. Hopper, and E. W. Thomp-son. Mammographic densityła review on the currentunderstanding of its association with breast cancer.Breast Cancer Research & Treatment, 144(3):479–502, 2014.

[12] R. Jensen, Q. Shen, and R. Zwiggelaar. Fuzzy-roughapproaches for mammographic risk analysis. Intelli-gent Data Analysis, 14(2):225–244, 2010.

[13] C. Jin, H. Cai, J. Wang, L. Li, W. Tan, and Y. Xi.Discrimination of breast cancer with microcalcifica-tions on mammography by deep learning. ScientificReports, 6:27327, 2016.

[14] M. Kallenberg, K. Petersen, M. Nielsen, A. Ng,P. Diao, C. Igel, C. Vachon, K. Holland, N. Karsse-meijer, and M. Lillholm. Unsupervised deep learn-ing applied to breast density segmentation and mam-mographic risk scoring. IEEE Trans Med Imaging,35(5):1322–1331, 2016.

9

Page 10: arXiv:1809.10241v1 [cs.CV] 21 Sep 2018arXiv:1809.10241v1 [cs.CV] 21 Sep 2018 also suffers from the intra- and inter-radiologists repro-ducibility problem[11, 2]. The very first research

[15] N. Karssemeijer. Automated classification ofparenchymal patterns in mammograms. Physics inMedicine & Biology, 43(2):365, 1998.

[16] I. Kumar, H. S. Bhadauria, and J. Virmani. Waveletpacket texture descriptors based four-class BIRADSbreast tissue density classification. Procedia Com-puter Science, 70:76–84, 2015.

[17] P. Lambin et al. Radiomics: Extracting more infor-mation from medical images using advanced featureanalysis. European Journal of Cancer, 48(4):441–6,2012.

[18] G. Litjens, T. Kooi, B. E. Bejnordi, S. Aaa, F. Ciompi,M. Ghafoorian, V. D. L. Jawm, G. B. Van, and C. I.Snchez. A survey on deep learning in medical imageanalysis. Medical Image Analysis, 42(9):60–88, 2017.

[19] K. E. Martin, M. A. Helvie, C. Zhou, M. A.Roubidoux, J. E. Bailey, C. Paramagul, C. E.Blane, K. A. Klein, S. S. Sonnad, and H. P. Chan.Mammographic density measured with quantitativecomputer-aided method: comparison with radiolo-gists’ estimates and BI-RADS categories. Radiology,240(3):656–65, 2006.

[20] A. A. Mohamed, W. A. Berg, H. Peng, Y. Luo, R. C.Jankowitz, and S. Wu. A deep learning method forclassifying mammographic breast density categories.Medical Physics, 45(1), 2017.

[21] I. C. Moreira, I. Amaral, I. Domingues, A. Cardoso,M. J. Cardoso, and J. S. Cardoso. INbreast: toward afull-field digital mammographic database. AcademicRadiology, 19(2):236–248, 2012.

[22] A. Oliver, J. Freixenet, R. Marti, J. Pont, E. Perez,E. R. E. Denton, and R. Zwiggelaar. A novel breast tis-sue density classification methodology. IEEE Trans-actions on Information Technology in Biomedicine APublication of the IEEE Engineering in Medicine &Biology Society, 12(1):55, 2008.

[23] A. Oliver, M. Tortajada, X. Llad, J. Freixenet,S. Ganau, L. Tortajada, M. Vilagran, M. Sentłs, andR. Martł. Breast density analysis using an automaticdensity segmentation algorithm. Journal of DigitalImaging, 28(5):1–9, 2015.

[24] N. Pinto, D. Doukhan, J. J. DiCarlo, and D. D. Cox.A high-throughput screening approach to discoveringgood forms of biologically inspired visual representa-tion. PLOS Computational Biology, 5:1–12, 11 2009.

[25] A. Rampun, B. Scotney, P. Morrow, H. Wang, andJ. Winder. Breast density classification using local

quinary patterns with various neighbourhood topolo-gies. 4(1):14, 2018.

[26] G. RJ, K. PE, and H. H. Radiomics: Images are morethan pictures, they are data. Radiology, 278(2):563,2016.

[27] D. Shen, G. Wu, and H. I. Suk. Deep learning in med-ical image analysis. Annual Review of Biomedical En-gineering, 19(1):221–248, 2017.

[28] H. C. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu,I. Nogues, J. Yao, D. Mollura, and R. M. Summers.Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset charac-teristics and transfer learning. IEEE Transactions onMedical Imaging, 35(5):1285, 2016.

[29] E. A. Sickles. Breast cancer screening outcomes inwomen ages 40-49: clinical experience with servicescreening using modern mammography. Jnci Mono-graphs, 22(22):99–104, 1997.

[30] R. Siegel et al. Cancer facts&figures2017. American Cancer SocietyAvailables:https://www.cancer.org/research/cancer-facts-statistics/all-cancer-facts-figures/cancer-facts-figures-2017.html.

[31] S. Suzuki, X. Zhang, N. Homma, K. Ichiji, N. Sugita,Y. Kawasumi, T. Ishibashi, and M. Yoshizawa. Massdetection using deep convolutional neural network formammographic computer-aided diagnosis. In Societyof Instrument and Control Engineers of Japan, pages1382–1386, 2016.

[32] D. Wang, A. Khosla, R. Gargeya, H. Irshad, and A. H.Beck. Deep learning for identifying metastatic breastcancer. CoRR, abs/1606.05718, 2016.

[33] J. N. Wolfe. Risk for breast cancer development deter-mined by mammographic parenchymal pattern. Can-cer, 37(5):2486C2492, 1976.

[34] L. Zhang, S. Jiang, Y. Zhao, J. Feng, B. W. Pogue,and K. D. Paulsen. Direct regularization fromco-registered contrast MRI improves image qualityof MRI-guided near-infrared spectral tomography ofbreast lesions. IEEE Transactions on Medical Imag-ing, PP(99):1–1, 2018.

[35] L. Zhang, L. Lu, R. M. Summers, E. Kebebew, andJ. Yao. Convolutional invasion and expansion net-works for tumor growth prediction. IEEE Transac-tions on Medical Imaging, 37(2):638, 2018.

10

Page 11: arXiv:1809.10241v1 [cs.CV] 21 Sep 2018arXiv:1809.10241v1 [cs.CV] 21 Sep 2018 also suffers from the intra- and inter-radiologists repro-ducibility problem[11, 2]. The very first research

[36] C. Zhou, H. P. Chan, N. Petrick, B. Sahiner, M. A.Helvie, M. A. Roubidoux, L. M. Hadjiiski, and M. M.Goodsitt. Computerized image analysis: estimation ofbreast density on mammograms. In Medical Imaging2000: Image Processing, pages 1056–1069, 2000.

11


Recommended