Research ArticlePulmonary Nodule Detection Model Based on SVM andCT Image Feature-Level Fusion with Rough Sets
Tao Zhou1 Huiling Lu1 Junjie Zhang1 and Hongbin Shi2
1School of Science Ningxia Medical University Ningxia Yinchuan 750004 China2Department of Urology The Generel Hospital of Ningxia Medical University Ningxia Yinchuan 750004 China
Correspondence should be addressed to Tao Zhou zhoutaonxmu126com
Received 4 May 2016 Revised 23 July 2016 Accepted 2 August 2016
Academic Editor Zexuan Ji
Copyright copy 2016 Tao Zhou et alThis is an open access article distributed under theCreativeCommonsAttribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
In order to improve the detection accuracy of pulmonary nodules in CT image considering two problems of pulmonary nodulesdetection model including unreasonable feature structure and nontightness of feature representation a pulmonary nodulesdetection algorithm is proposed based on SVM and CT image feature-level fusion with rough sets Firstly CT images of pulmonarynodule are analyzed and 42-dimensional feature components are extracted including six new 3-dimensional features proposed bythis paper and others 2-dimensional and 3-dimensional features Secondly these features are reduced for five times with rough setbased on feature-level fusionThirdly a grid optimization model is used to optimize the kernel function of support vector machine(SVM) which is used as a classifier to identify pulmonary nodules Finally lung CT images of 70 patients with pulmonary nodulesare collected as the original samples which are used to verify the effectiveness and stability of the proposed model by four groupsrsquocomparative experimentsThe experimental results show that the effectiveness and stability of the proposed model based on roughset feature-level fusion are improved in some degrees
1 Introduction
Lung cancer is a malignant tumor with the highest morbidityand mortality rate in the world posing a serious threat tohuman life and health [1 2]The ability to estimate the risk oflung cancer is important in two common clinical models [3]pulmonary nodules management and risk prediction modelIdentification of early symptomatic in lung cancer is veryimportant to improve early survival and reduce emergencypresentations Early detection is the most popular method toimprove the effectiveness of the treatment of patients withlung cancer Since pulmonary nodules are the early form oflung cancer [4] the detection of pulmonary nodules playsa critical role in the early diagnosis and treatment of lungcancer Recent advances in computed tomography (CT) havea progressively increased spatial resolution and decreasedacquisition timesmaking it possible for high resolutionmul-tiangle 3-dimensional isotropic image of the whole lung tobe acquired in less than 10 seconds This has expanded capa-bilities for the early detection of small pulmonary nodules [4]
It is believed that early detection of lung cancer will resultin earlier treatment at lower stages of the disease therebyimproving the 5-year survival rate which has remainedrelatively constant at 15 for the last 30 years However withthe wide application of CT in the lung imaging the issues ofCT data overloading and subjective interpretation of imagesresult in a high clinical misdiagnosis rate [5]
Computer-Aided Diagnosis (CAD) systems provided abeneficial support and enhance the diagnostic accuracyCAD is capable of performing the preliminary screen of thevast amounts of CT image and marking suspicious lesionsthereby helping radiologists to carry out the quadratic dis-crimination to reduce the workload and improve the accu-racy rate of cancer diagnosis [6 7]
Pulmonary nodule detection technology is one of the hottopics in the field of CAD in recent years For example ROIsegment is a key problem Xia et al [8] using local variationalGaussianmixturemodels to segment brainMRI image Basedon Learning Local Variational GaussianMixtureModels seg-mentation of breast ultrasound images are discussed by Xian
Hindawi Publishing CorporationBioMed Research InternationalVolume 2016 Article ID 8052436 13 pageshttpdxdoiorg10115520168052436
2 BioMed Research International
[9 10] and Santos et al [11] segment the lung parenchymabased on region growing algorithm Magalhaes Barros Nettoet al [12] use growing neural gas (GNG) to segment thelung parenchyma the obtained pulmonary nodules are thenseparated from tissues containing blood vessels and bronchiaccording to the 3D distance transform and finally SVM isused to carry out the effective identification of pulmonarynodules with shape and texture features Ye et al [13] firstlysegment and extract region of interest (ROI) with fuzzythreshold in combinationwith Gaussianmatrix mean curva-ture and Hessian matrix then choose the local shape infor-mation and local intensity dispersion as the feature expres-sion of ROI and finally use theweighted SVM for recognitionof pulmonary nodules Tan et al [14] segment pulmonarynodules based on the blood vessels and nodule enhancementfilter proposed by Li et al [15] then locate the clustering cen-ter of pulmonary nodules based on the divergence calculatedby Gaussian template and achieve ROI extraction and finallyuse the classifier based on genetic model artificial neuralnetwork (ANN) and SVM for comparative analysis of thedetection effectiveness of pulmonary nodules Cascio et al[16] use regional growthmodel andmorphological operationto extract the ROI firstly then reconstruct B-spline surfacebased on 3D spring model in order to extract the related 3Dgray features and shape features and detect the pulmonarynodules using ANN Although the above literature exploresthe methods of detecting pulmonary nodules overall theseare still two disadvantages of these methods in featurestructure design and feature set expression as follows
(1) When extracting and quantifying feature for ROI thefeature structure design is irrational reflected by thefact that the combination of global features and localfeatures and the combination of two-dimensional andthree-dimensional features are not fully considered
(2) When fusing feature data the compactness of featureexpression is a difficult problem Therefore featureredundancy is usually not eliminated Moreover thefeature-level fusion method without prior knowledgeis rarely used
Rough set theory was developed by Zdzislaw Pawlak inthe early 1980s and can be regarded as a new mathematicaltool for feature selection feature extraction and decision rulegeneration without prior knowledge Rough sets provide themechanism to find the minimal set of attributes required toclassify the training samples This minimal set of attributes iscalled reduct and contains the same knowledge as the originalset of attributes in a given information system Thereforereducts can be used to obtain different classifiers Wang et al[17] present a framework for a systematic study of the roughset theory Various views and interpretations of the theoryand different approaches to study the theory are discussedThe relationships between the rough sets and other theoriessuch as fuzzy sets evidence theory granular computing for-mal concept analysis and knowledge spaces are examinedCost of disease prediction and diagnosis can be reduced byapplying machine learning and data mining methods Dis-ease prediction and decision-making play a significant role in
medical diagnosis Udhaya Kumar and Hannah Inbarani [18]put forward a novel neighborhood rough set classificationapproach to deal with medical datasets Experimental resultof the proposed classification algorithm is compared withother existing approaches such as rough set 119870th-nearestneighbor support vector machine BP NN and multilayerperceptron to conclude that the proposed approach is acheaperway for disease prediction anddecision-making Fea-ture Selection (FS) is a solution that involves finding a subsetof prominent features to improve predictive accuracy andto remove the redundant features Thus the learning modelreceives a concise structure without forfeiting the predictiveaccuracy built by using only the selected prominent featuresTherefore nowadays FS is an essential part of knowledge dis-covery Inbarani et al [19] proposed new supervised featureselection methods based on hybridization of Particle SwarmOptimization (PSO) PSO based Relative Reduct (PSO-RR)and PSO based Quick Reduct (PSO-QR) presented for thediseases diagnosis in order to seek to investigate the utilityof a computer-aided diagnosis in the task of differentiatingmalignant nodules frombenign nodules based on single thin-section CT image data In Shah et al [20] CT images of soli-tary pulmonary nodules were contouredmanually on a singlerepresentative slice by a thoracic radiologist Two separatecontours were created for each nodule one including only thesolid portion of the nodule and one including any ground-glass components For each contour 75 features were calcu-lated that measured the attenuation shape and texture of thenoduleThese features were then input into a feature selectionstep and four different classifiers to determine if the diagnosiscould be predicted from the feature vector Hassanien [21]discuss a hybrid scheme that combines the advantages offuzzy sets and rough sets in conjunction with statisticalfeature extraction techniques An application of breast cancerimaging has been chosen and hybridization scheme havebeen applied to see their ability and accuracy to classify thebreast cancer images into two outcomes cancer or noncancer
Based on the above reasons a pulmonary nodule detec-tion model based on rough set (RS) feature-level fusion andSVM is proposed in this paper To overcome the first afore-mentioned disadvantage the shape feature intensity featureand texture feature are extracted For shape feature three new3-dimensional features namely External Spherical Volume(ESV) Surface-Center Distance Standard Deviation (SCD-STD) and External Rectangle Cross Line Distance (ERCLD)are proposed For intensity feature three new 3-dimensionalfeatures namely intensity gradient (from inside to outside)Laplace Divergence Mean (LDM) and Laplace DivergenceDistance (LDD) are proposed Regarding feature descriptiontwo-dimensional texture feature three-dimensional shapefeature and intensity feature are used for quantificationWithregard to the second aforementioned disadvantage rough setfeature-level fusion is adopted since it can fully retain theproperties of the features without prior knowledge Finallya grid optimizationmodel is employed to optimize the kernelfunction of support vector machine (SVM) which is used toconduct the recognition and detection of pulmonary nodulesIn order to verify the validity and stability advantages of
BioMed Research International 3
the model four groups of comparative experiments are per-formed in this paper that is model validation experimentsbefore and after rough set reduction model stability experi-ments before and after rough set reduction validation exper-iments of the superiority of the rough set feature-level fusionmodel and comparative experiments with other pulmonarynodule detection models to compare the performance Theexperimental results show that the method proposed in thispaper can improve to a certain extent the rationality of fea-ture structure and compactness of feature expression therebyimproving the detection accuracy of pulmonary nodules
2 Related Theory
The description of ROI features is determined by bothits comprehensiveness (features cannot be ldquoobservedrdquo withldquomultiperspectiverdquo approach if the features amount is toolittle) and the accuracy of characterization (more quantizedvalues diverged from the real information will cause a lowfeature discrimination) A large number of noise informationsets will reduce the ROI feature extraction accuracy and affectthe final results of detection Therefore for comprehensiveand accurate expression of the morphological structure ofROI and local features six new 3-dimensional features areproposed based on the analysis of ROI for lung CT imageThese new 3-dimensional features are used to qualitativelyanalyze and quantitatively characterize the lesions from 2-dimensional and 3-dimensional perspectives in combinationwith other shape features intensity features texture features
21 Pulmonary Nodules Features in CT Image
211 Shape Characteristics Shape characteristics analyze thespatial distribution of gray values by computing local featuresat each point in the image Shape feature is the most intuitivevisual feature which can be used to describe themainmedicalsigns of CT image of pulmonary nodule ROI such as nodulesign lobulation sign spinous process sign vacuole sign andspicule sign from the perspectives of geometric shape edgeroughness and topology structure In this paper the extractedcomponents of the shape features mainly include perimeterarea volume roundness rectangularity elongation Eulernumber Harris Hu moment ESV SCDSTD and ERCLDHere some features are given [22]
(1) Area
119878 =
119873
sum
119909=1
119872
sum
119910=1
119891 (119909 119910) (1)
where 119891(119909 119910) is the pixels of the target and119872 and119873 are thelength and width respectively
(2) Perimeter
119862 =
119872
sum
119894=1
119873
sum
119895=1
119901 (119894 119895) (2)
where 119901(119894 119895) is the pixels of the target edge and119872 and119873 arethe length and width respectively
(3) Circularity
1198770=1198622
4120587119878 (3)
Circularity describes object shape that is close to thedegree of circular where 119878 is the area of the target region and119862 is circumference of the target region 0 lt 119877
0lt 1 and 119877
0
value reflects the complexity of the measurement boundarythe shape is more complex and the 119877
0value is more smaller
(4) Rectangularity
119877 =119878
(119867 lowast119882) (4)
where 119878 is the area of the target region and119867 and119882 are thelength and width respectively
(5) Elongation
119864 =min (119867119882)max (119867119882)
(5)
Elongation can distinguish different shapes of the images(such as circle square ellipse thin and long and short andwide) where119867 and119882 are the length and width respectively
(6) Euler Number
119864 = 119862 minus 119867 (6)
where 119862 is the number of connection parts and 119867 is thenumber of holes
(7) External Spherical Volume (ESV) ESV is the ratio of eachROI 119860
119894(maximum diameter is dim(119860
119894)) to the External
Spherical Volume VS(119860119894) extracted from three-dimensional
CT image which reflects the similarity between the regionand the sphere as shown in Figure 1(b)
Volume (VS (119860119894)) =
4
3times 120587 times (
dim2)
3
1198641(119860119894) =
Volume (119860119894)
Volume (VS (119860119894))
(7)
(8) Surface-Center Distance STandard Deviation (SCDSTD)SCDSTD is the coordinate distance standard deviation ofeach individual element 119862(119878
119894) and regional center 119862cen(119860 119894)
from the surface of each ROI its value also describes thesimilarity with sphere of ROI If the value is 0 119864
2(119860119894) is
a standard sphere With the increase in 1198642(119860119894) value the
magnitude of the deviation from the sphere in the regionincreases as shown in Figure 1(c)
1198642(119860119894) = std(
1003817100381710038171003817119862 (119878119894) minus 119862cen (119860 119894)1003817100381710038171003817
mean (1003817100381710038171003817119862 (119878119894) minus 119862cen (119860 119894)1003817100381710038171003817)) (8)
(9) External Rectangle Cross Line Distance (ERCLD) ERCLDis the distance from center voxel119862cen(119860 119894) of ROI to the center
4 BioMed Research International
Y
Z
X
(a) Coordinate system
Ai
O
Dim
(Ai )
VS(Ai)
(b) ESV
C(Si)
Ccen(Ai)
(c) SCDSTD
Ccen(Ai)
Ccen(Li)
(d) ERCLD
Figure 1 Three-dimensional character sketch
dim(119871119894) (119894 = 1 2 12) of its 12 intersecting lines which
may indicate that the regional voxel is evenly distributed inthe rectangular body as shown in Figure 1(d)
1198643 (119860) =
1003817100381710038171003817mean (119862cen (119860 119894) minus 119862cen (dim (119871 119894)))1003817100381710038171003817
mean (119862cen (dim (119871 119894))) (9)
212 Hu Moment Characteristics Moments and the relatedinvariants have been extensively analyzed to characterize thepatterns in images The moment invariants are independentof position size and orientation but also independent ofparallel projection Hu [23] was the first person to prove thecentral moment invariants The central geometric momentinvariants are derived based upon algebraic invariantsincluding six absolute orthogonal invariants and one skeworthogonal invariant The moment invariants have beenproved to be the adequatemeasures for tracing image patternsabout the images translation scaling and rotation
Hu moment invariants define seven values computed bynormalizing central moments through order three whichare invariant to object scale position and orientation and alarge number of papers that have significant contribution to
the application of Hu moment Two-dimensional momentsof a digitally sampled 119872 lowast 119873 image that has gray function119891(119909 119910) (119909 = 1 2 119872 119910 = 1 2 119873) are given as
119872119901119902=
119872
sum
119909=1
119873
sum
119910=1
119909119901119910119902119891 (119909 119910) 119901 119902 = 1 2 3 (10)
The moments 119891(119909 119910) translated by an amount (119886 119887) aredefined as
119909 =11989810
11989800
119910 =11989801
11989800
120583119901119902=
119872
sum
119909=1
119873
sum
119910=1
(119909 minus 119909)119901(119910 minus 119910)
119902119891 (119909 119910)
119901 119902 = 1 2 3
(11)
BioMed Research International 5
When a scaling normalization is applied the centralmoments change as
120578119901119902=
120583119901119902
120583120574
00
120574 = ((119901 + 119902)
2) + 1 (12)
In terms of the central moments the seven moments aregiven as
1198621= 12057820+ 12057802
1198622= (12057820minus 12057802)2+ 41205782
11
1198623= (12057830minus 312057812)2+ (312057821minus 12057803)2
1198624= (12057830+ 12057812)2+ (12057803+ 12057821)2
1198625= (12057830minus 312057812) (12057830+ 12057812)
sdot [(12057830+ 12057812)2minus 3 (120578
03+ 12057821)2] + (3120578
21minus 12057803)
sdot (12057821+ 12057803) [3 (120578
30+ 12057812)2minus (12057803+ 12057821)2]
1198626= (12057820minus 12057802) [(12057830+ 12057812)2minus (12057803+ 12057821)2]
+ 412057811(12057830+ 12057812) (12057821+ 12057803)
1198627= (312057821minus 12057803) (12057830+ 12057812)
sdot [(12057830+ 12057812)2minus 3 (120578
03+ 12057821)2] + (3120578
12minus 12057830)
sdot (12057821+ 12057803) [3 (120578
30+ 12057812)2minus (12057803+ 12057821)2]
(13)
Hu 7-moment invariants varywidely in order to compareusing logarithmic function to compress data and hence theactual invariants moment features are 1198621015840
119870
119862119870=10038161003816100381610038161003816log10
100381610038161003816100381610038161198621015840
119870
10038161003816100381610038161003816
10038161003816100381610038161003816119870 = 1 2 7 (14)
The amended moment invariant features possess transla-tion invariance rotational invariance and scale invariance
213 TextureCharacteristics Tamura texture features Tamuratexture based on human visual perception in psychologicalresearch are proposed by Tamura in 1978 Six components ofTamura texture feature correspond with 6 properties in psy-chology three of themare coarseness contrast anddirection-ality which have the good application value in the texturesynthesis image recognition and so on
Texture is the gray distribution which appears repeatedlyin the space position so there are some relationships betweentwo pixels at some distance from each other in image spacecalled gray spatial correlation properties in gray imageGLCM is a common method by studying the relevant rela-tionship of gray image
214 Intensity Features Gray statistical feature is a quantita-tivemethod to describe the basic features of two-dimensionalimage region it is called intensity feature from three-dimen-sional perspective [16] In this paper the extracted com-ponents of intensity features include the mean intensity
intensity variance maximum and minimum intensity dif-ference skewness kurtosis intensity gradient (from insideto outside) Laplace Divergence Mean (LDM) and LaplaceDivergence Distance (LDD)
(1) Intensity Gradient (from Inside to Outside) For ROI 119860119894
with the voxel 119878119894volume greater than 0 morphological
erosion processing is performed continuously and the ratio ofthe mean of the excluded area of each erosion processing tothe mean of the last operation (initial value is 0) is calculateduntil the ratio is zero Consider the following equation where119899 is the number of operations
1198644(119860119894) =
119870
119899 (15)
(2) Laplace Divergence Mean (LDM) According to theLaplacian convolution results with the original CT imageit is found that the nodule surrounding area with smallergray value difference has a significant different divergenceTherefore calculation of Laplace divergence is helpful todistinguish pulmonary nodules from interfering impurities
1198645(119860119894) = mean (119860
119894times La) (16)
(3) Laplace Divergence Distance (LDD) The differencebetween the maximum and minimum values of the Laplacedivergence values is used to describe the range of regionaldivergence
1198646(119860119894) = max (119860
119894times La) minusmin (119860
119894times La) (17)
Table 1 shows the feature set of 42 features based on theabove feature description of ROI To facilitate subsequenttests features are numbered in the order as showed inTable 1 that is the shape features are numbered fs1ndashfs18 theintensity features are numbered fi1ndashfi8 and texture featuresare numbered ft1ndashft16 respectively
22 Rough Set and Attribute Reduction Rough set theory(RST) proposed by Pawlak in 1982 is one of the effectivemathematical tools for processing fuzzy and uncertaintyknowledge Nowadays RST has been applied to a varietyof fields such as artificial intelligence data mining patternrecognition and knowledge discovery Rough set is foundedon the assumption that with every object of the universeof discourse some knowledge is associated Objects char-acterized by the same information are similar in view ofthe available information about them The indiscernibilityrelation generated in this way is the mathematical basis ofrough set theory Any set of all indiscernible objects are calledan elementary set and form a basic granule of knowledgeabout the universe Any union of some elementary sets isreferred to as a crisp set otherwise the set is rough set
Definition 1 An information system 119878 is a quadruple 119878 =(119880 119860 119881 119891) where 119880 is a nonempty and finite set of objects119860 is a nonempty and finite set of attributes 119881 fl ⋃119881
119886with
119881119886being the domain of attribute 119886 and 119891 is an information
6 BioMed Research International
Table 1 ROI feature set
Feature type Feature vectors Dimensionality
Shapefeatures (fs)
Perimeter area volumeroundness rectangularity lengthEulerrsquos number ESV SCDSTDERCLD Hu moment
18
Intensityfeatures (fi)
Mean intensity intensitystandard variancemaximum-minimum intensitydifference value of varianceskewness kurtosis intensitygradient (from inside to outside)LDM LDD
8
Texturefeatures (ft)
Tamura texture features(contrast direction roughness)GLCM (angular second momentmoment of inertia torque deficitsum mean variance sumvariance difference varianceentropy sum entropy differentialentropy information measurecorrelation coefficient maximumcorrelation coefficient)
16
function such that119891(119909 119886) isin 119881119886for every 119909 isin 119880 and every 119886 isin
119860 A decision system is an information system (119880 119862cup119863119881 119891)with119862cap119863 = B where119862 and119863 are called the conditional anddecision attribute sets respectively
For a subset 119875 of 119860 let us define the correspondingequivalence relation as
IND (119875) = (119909 119910) isin 119880 times 119880 | 119891 (119909 119886)
= 119891 (119910 119886) for any 119886 isin 119875(18)
and denote the equivalence class of IND(119875) which containsthe object 119909 isin 119880 by [119909]
119875 that is
[119909]119875 = 119910 isin 119880 | (119909 119910) isin IND (119875) (19)
The factor set of all equivalence classes of IND(119875) isdenoted by 119880119875 that is 119880119875 = [119909]
119875| 119909 isin 119880
As well known attribute reduction is one of the keyissues in RST It is performed in information systems bymeans of the notion of a reduct based on a specializationof the notion of independence due to Marczewski Up tonow much attention has been paid to this issue and manydifferent methods of attribute reduction have been proposedfor decision systems For example the reduction approachesare respectively based on partition discernibility matrixconditional information entropy positive region and antcolony optimization approach
Definition 2 Let 119878 = (119880 119860 119881 119891) be an information systemand 119875 sube 119860 For a subset119883 of 119880 119877
119875(119883) = 119909 isin 119880 | [119909]
119875sube 119883
and 119877119875(119883) = 119909 isin 119880 | [119909]119875cap 119883 = B are called 119875-lower and
119875-upper approximations of119883 respectively
H1
H2
H
r
r
Figure 2 Optimal hyper plane
Definition 3 Let 119878 = (119880 119860 119881 119891) be an information systemand let 119875 and 119876 be two subsets of 119860 Then POS
119875(119876) =
⋃119883isin119880119876
119877119875(119883) is called 119875-positive region of 119876 where 119877
119875(119883)
is the 119875-lower approximation of 119883
Definition 4 Let 119878 = (119880 119860 119881 119891) be a decision system 119886 isin 119862and 119875 sube 119862 If POS
119862(119876) = POS
119862119886(119876) 119886 is said to be 119863-
dispensable in 119862 otherwise 119886 is said to be 119863-indispensablein119862 The set of all the119863-indispensable attributes is called thecore of 119878 and denoted by Core(119878) Furthermore if POS
119875(119876) =
POS119862(119876) and each of the attributes of 119875 is 119863-indispensable
then 119875 is called a reduct of 119878
23 SVM and Its Optimization SVM is a pattern recognitionmethod developed from statistical learning theory based onthe idea of structural risk minimization principle In the caseof ensuring classification accuracy SVM can improve thegeneralization ability of the learning machine by maximizingthe classification interval The biggest advantage of SVMis that it overcomes the overlearning and high dimensionboth of which lead to computational complexity and localextremum problems A reliable classification model basedon SVM is urgently needed for the study of hospitalizationexpenses of patients with gastric cancer
SVM deals with linearly separable data (Figure 2) theassumption is that there are data sets 119878 = 119909
1 119909
119899 and data
marker 119866 = 1199101 119910
119899 where 119909
119894is the input space vector of
the data sample and 119910119894records the category of the sample
The aim of SVM is to find an optimal hyper plane 119867 toseparate these two samples andmake the largest intervalTheoptimal hyper plane119867 is expressed as
119908119879119909 + 119887 = 0 (20)
where 119908 is the weight vector and 119887 is the thresholdThis problem is transformed into the optimal problem of
119908 and 119887
min119908119887
119903 (119908) =1
2119908
119910119894 ((119908 sdot 119909) + 119887) ge 1 119894 = 1 119899
(21)
BioMed Research International 7
In order to simplify the formula the Lagrange dual isintroduced to meet the requirements of KKT (Karush-Kuhn-Tucker) The objective function is transformed into
min120572
1
2
119899
sum
119894=1
119899
sum
119895=1
119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus
119899
sum
119895=1
120572119895
st119899
sum
119894=1
119910119894120572119894= 0 120572
119894ge 0 119894 = 1 2 119899
(22)
As for the linearly inseparable data the penalty parameter119862 and relaxation variable 120585 are introduced in the constraintcondition thus the generalization ability of SVM is increasedand the function is transformed into
min120572
1
2
119899
sum
119894=1
119899
sum
119895=1
119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus
119899
sum
119895=1
120572119895
st119899
sum
119894=1
119910119894120572119894= 0 0 le 120572
119894le 119862
(23)
where 119862 is the artificial setting parameter According to thepractical experience the bigger 119862 the greater separationinterval At the same time it will increase the risk ofgeneralization
The final classification function is
119891 (119909) = sgn(119899
sum
119894=1
119886lowast
119894119910119894(119909119894sdot 119909)) + 119887
lowast (24)
For nonlinear classification data SVM transforms theminto linearly separable data in a high-dimensional space vianonlinear mapping of kernel function and the optimal hyperplane is found in high-dimensional space The kernel func-tion which meets the mercer kernel condition correspondingto the transvection of a spatial transformation is used torealize the nonlinear transformation of linear classification
The corresponding kernel function is defined as
119870(119909119894 119909) = (120593 (119909
119894) 120593 (119909)) (25)
At this point the final classification function is
119891 (119909) = sgn(119899
sum
119894=1
119886119894119910119894119870(119909119894sdot 119909)) + 119887 (26)
Penalty factor 119862 and parameter 119892 of the kernel functionplay an extremely important role in the performance of SVMclassification In order to obtain the optimal classificationresults grid optimization model is used for optimization inthis paper In grid optimization model the parameters to besearched are expressed in the form of grids in a certain spaceand the optimal parameters are selected by traversing all thegrids Therefore grid optimization model has the advantagesof simplicity convenience good stability and easiness to getthe global optimal solution [24] In the learning process ofSVM 10-fold cross-validation is used to calculate the kernel
function parameters and penalty coefficient with the optimalclassification performance which are then applied to theSVM classifier for recognition and detection of pulmonarynodules Finally sensitivity specificity accuracy and pro-cessing time are used as indexes to evaluate the detection ofrelevant experiments
3 Pulmonary Nodule Detection Model
In this paper CT images of 70 cases of patients with pul-monary nodules are used The images are firstly segmented[7] to three different types of pulmonary nodules (solitarypulmonary nodules or SPN vascular adhesion pulmonarynodules or VAPN and pleural adhesion pulmonary nodulesor PAPN) which are marked by radiologists as well as alarge number of nonnodular areas including blood vesselsbones and alveoli Forty-two feature components charac-terizing ROI are extracted from the 2-dimensional and 3-dimensional perspectives including six new 3-dimensionalfeatures proposed in this paper They are composed of 18shape features 8 intensity features and 16 texture featuresThe extracted feature set (identified as the FS) is discretizedand normalized Feature-level fusion of the improved featuredata is performed for five times using rough set model (sincethe reduction of rough set feature subset is not unique in thispaper the extracted feature sets are reduced for five timesand are identified as RS1 RS2 RS3 RS4 and RS5) Featuresubset RS1 is used for comparative experiment Finally SVMparameters are optimized using grid optimization modeland the improved SVM is used in the following four setsof comparative experiments comparative analysis of theeffectiveness and stability of classification before and afterrough set reduction of features comparative analysis of therecognition performance before and after feature-level fusionbased on rough set or PCA comparative analysis of therecognition performance of our proposed method and othermethods Based on the above views we present a flow chartof pulmonary nodule detection model as shown in Figure 3
4 Results and Discussion
41 Experimental Environments In this paper the hardwareand software environments are as follows
Software Environments Windows 7 OS the Matlab R2014bImageJ 148 u and LibSVM
Hardware Environments Intel Core i5 4670-34GHz 80GBof memory and 500GB hard disk
Experimental Data CT images of 70 cases of patients withpulmonary nodules are collected as experimental sampleswhich are marked by radiologists with a size of 512 times 512and a thickness of 2mm They are composed of 2232 CTimages from 38 cases of patients with solitary pulmonarynodules (SPN) 17 cases of patients with vascular adhesionpulmonary nodules (VAPN) and 15 cases of patients withpleural adhesion pulmonary nodules (PAPN) respectively
8 BioMed Research International
ROI
Pulmonarynodule
Otherareas
Blood vesselsbone alveolar
Gridoptimization
methodSVM
Featureextraction
Featurereduction
Classification result
Feature set
RS1 RS2 RS3 RS4 RS5
Rough set
Image segmentation
Shapefeatures
Perimeter area volume roundrectangular length Eulers number ESV
SCDSTD Hu moment ERCLD
Mean intensity intensity standardvariance maximum-minimum intensitydifference value of variance skewness
kurtosis the intensity gradient (by)outside introversion Laplace divergence
Tamura texture features (contrastorientation degree roughness)
GLCM (angular second moment moment ofinertia the deficit sum mean variance sumvariance difference variance entropy andentropy differential entropy information
measure correlation coefficient maximumcorrelation coefficient)
Intensityfeatures
Texturefeatures
CT
SPN VAPN PAPN
Normalized
Discretization
Figure 3 Flow chart of pulmonary nodule detection model
Figure 4 shows the representatives of each type of pulmonarynodules and the corresponding segmentation results
In this paper 42-dimensional features of 70 markedpulmonary nodular areas and 70 randomly selected nonn-odular areas are extracted Table 2 shows the 42-dimensionalfeature values of the lung nodular and nonnodular areasshape features are identified as the fs intensity features areidentified as the fi and texture features are identified as the ftIn order to intuitively understand the distribution of differentfeature values and the discrimination comparison externalsphere volume (ESV) ratio and the standard deviation of
surface-center distance (SCD) are calculated and plotted asbox diagram as shown in Figure 5
42 Feature-Level Fusion Based on Rough Set In order toavoid the attribute value of small range of values dominatedby that of large range of values and reduce the complexityof the statistical computation process the extracted featuresets are firstly preprocessed by normalizing data with biggerdifference and linearly mapping the data to [0 1] Thepreprocessed feature data are then fused for five times usingrough set model The fusion results are shown in Table 3
BioMed Research International 9
(a) Original image of SPN (b) Segment result of SPN (c) Original image of VAPN
(d) Segment result of VAPN (e) Original image of PAPN (f) Segment result of PAPN
Figure 4 Pulmonary nodule segmentation results
Table 2 Feature values of pulmonary nodular areas and nonnodular areas
Shape features (fs) Intensity features (fi) Texture features (ft)Nodular areas Nonnodular areas Nodular areas Nonnodular areas Nodular areas Nonnodular areas95 78 5906 910987 83104 54016159 128 1406 44872 12041 125216284 178 05956 minus039568 04303 0006706517 0211 27348 18669 07709 0727506961 21587 551865 143481 07169 0986503529 07778 05 1 08059 538940 1 139598 206044 01942 0048703186 10295 729905 3546389 07708 0727300686 10197 08059 5349800042 00458 35042 5097100021 00295 06514 0845300013 00268 00971 0614300005 00011 44033 8218620 1 00691 5006114 9 minus05785 minus0424505356 05571 2307 3223903072 050178801738 0207122
10 BioMed Research International
+0Nodular areas Nonnodular areas
02
04
06
08
1
12
14
(a) Boxplot of ESV
+
+
+
01
015
02
025
03
035
04
045
05
055
06
Nodular areas Nonnodular areas(b) Boxplot of SCDSTD
Figure 5 Pulmonary nodule area and the pulmonary nodules boxplot ldquo+rdquo refers to upper and lower bounders of ESV value and SCDSTDvalue
Table 3 Feature reduction based on rough sets
Featuresubset Reduction results Dimensionality
RS1fs4 fs16 fs17 fs18 fi2 fi4 fi6 fi7fi8 ft2 ft4 ft5 ft6 ft7 ft8 ft9ft10 ft11 ft13 ft14 ft15 ft16
21
RS2fs4 fs9 fs16 fs18 fi1 fi2 fi5 ft2ft5 ft6 ft8 ft9 ft10 ft11 ft12 ft13
ft1517
RS3fs9 fs17 fs18 fi1 fi2 fi5 fi7 fi8ft2 ft6 ft7 ft8 ft9 ft10 ft11 ft12
ft14 ft15 ft1619
RS4fs9 fs16 fs18 fi1 fi2 fi5 fi7 fi8ft5 ft6 ft7 ft8 ft9 ft10 ft11 ft12
ft14 ft15 ft1619
RS5fs9 fs16 fs17 fs18 fi1 fi2 fi4 fi5fi7 fi8 ft2 ft5 ft6 ft7 ft8 ft9 ft10
ft12 ft15 ft1620
43 Pulmonary Nodule Detection withSVM Based on Grid Optimization
431 The Model Effectiveness Experiment Tenfold cross-validation is used to calculate the accuracy sensitivity speci-ficity and processing time of classification before and afterrough set reduction (RS1(70 times 21) obtained from experimentone is used as the data set after reduction) and the recogni-tion performance of classifier is compared before and afterreduction The results are shown in Table 4
Experimental results show that pulmonary noduledetection accuracy is increased significantly after feature-level fusion with a decrease in the missed diagnosis ratereflected by the increased sensitivity and the misdiagnosisrate reflected by the increased specificity The processingtime is also shorter after reduction These results indicatethat the feature-level fusion of the extracted feature set with
42 dimensionalities based on rough set model is effectivewhich not only improves the compactness of the feature set(to eliminate redundancy and low degree of differentiationfeatures component) but also corrects the abnormal data ofthe feature set thereby further improving the performance ofpulmonary nodule detection Table 5 shows the effectivenessof the five rough set reduction subsets
432 The Model Stability Experiment The feature data ofpulmonary nodules are tested with RS1(70lowast21) as the datasetfor classification for five rounds with a different ratio oftraining set over testing set of 5020 4030 3535 3535 or2050 Each round of test is carried out with a randomlyselected ratio of training set over testing set and themeanof 10test results is used as the corresponding accuracy sensitivityspecificity and running time of the model The results areshown in Table 6
The experimental results show that with the decreasein the ratio of training set over testing set the decrease inthe classification accuracy of feature subset after rough setreduction is not obvious whereas that of feature set beforerough set reduction is fluctuating to certain extent (Figure 6is more intuitive)These results indicate that the classificationstability of the feature level fusion model based on rough setis higher and is less susceptible to the interference of sampledata Table 7 shows the stability of 5 groups feature subset afterrough set reduction
433 The Superiority of Feature-Level Fusion Model Basedon Rough Set Since PCA is a well-developed model char-acterized by simple calculation and easy programming ithas become the preferred dimension reduction method formost of the feature-level fusion model in order to analyzecomparatively two types of feature-level fusions In this paperPCA-based feature-level fusion of the extracted feature sets isperformed at the same time and the tenfold cross-validationresults are shown in Table 8 Figure 7 shows the classificationperformance of the two types of feature-level fusion methods
BioMed Research International 11
Table 4 Statistics of effectiveness before and after rough set reduction
Serial number Accuracy () Sensibility () Specificity () Processing time (s)
Before reduction
1 9642 9286 100 106102 9196 8393 100 061703 9554 100 9107 054904 8928 100 7857 056305 9554 9107 100 054706 9821 9643 100 054607 9464 8929 100 054608 9553 9107 100 054609 9196 8393 100 0546010 9732 100 9664 05300
Mean 9464 9286 9643 06051
After reduction (Rs1)
1 100 100 100 093702 100 100 100 043603 100 100 100 038704 100 100 100 042105 100 100 100 042106 100 100 100 039007 100 100 100 040608 9167 100 8333 040609 100 100 100 0374010 100 100 100 03930
Mean 9917 100 9833 04571Increase after reduction 453 714 19 0148
Table 5 Effectiveness of rough set reduction subsets
Subset Averageaccuracy ()
Averagesensitivity ()
Averagespecificity ()
Processingtime (s)
RS1 9917 100 9833 04571RS2 975 9667 9833 04650RS3 9917 100 9833 04656RS4 100 100 100 04731RS5 9833 9833 9833 04850Mean 9883 99 9866 04672
(feature subset RS1 fromTable 3 is used and the running timeis 100 times actual time)
Experimental results show that various performanceindicators of the feature-level fusion model based on roughset are better than those based on PCA indicating that therough set is more suitable than PCA to eliminate redundantinformation
434 Comparison with Other Pulmonary Nodule DetectionMethods Pulmonary nodule detection accuracy and FalsePositives per scan (FPs) are used as the evaluation indexesof pulmonary nodule detection methods to compare andanalyze the method proposed in this paper and other five
9293949596979899
100101
5020 4030 3535 3040 2050
Before reductionAfter reduction
Figure 6 Comparative results of feature subsets before and afterrough set reduction
detectionmethods of pulmonary nodules (the optimal detec-tion accuracy is used for all detection methods) The resultsare shown in Table 9 (Pr private database L LIDC)
Experimental results show that the proposed method issuperior to the other pulmonary nodule detection methodsto a certain extent indicating that this method not onlyimproves the comprehensiveness and accuracy of the featuredescription of ROI by supplementing and improving the
12 BioMed Research International
Table 6 Stability statistics of rough set reduction subsets
Trainingsettesting set Accuracy () Sensitivity () Specificity () Running time (s)
Before fusion
5020 9735 9471 100 048734030 9653 9308 9832 038463535 9583 9239 9779 042543040 9616 9558 9674 035602050 9488 9463 9586 04236Mean 9615 9408 97742 04154
After fusion (Rs1)
5020 9971 9941 100 026844030 9896 9958 9846 025683535 9865 9923 9808 023823040 9837 9860 9814 026462050 9825 9767 9884 02636Mean 9879 9884 9870 02583
Table 7 Classification performance of rough set reduction subset
SubsetAverageaccuracy
()
Averagesensitivity
()
Averagespecificity
()
Runningtime (s)
RS1 9917 100 9833 02583RS2 975 9667 9833 02870RS3 9917 100 9833 02560RS4 100 100 100 02531RS5 9833 9833 9833 02656Mean 98834 99 9866 02620
Table 8 Classification performance of feature reduction based onPCA
Serialnumber
Accuracy()
Sensitivity()
Specificity()
10 timesrunningtime (s)
1 9167 8333 100 099702 9674 9348 100 048303 9674 9348 100 048804 9891 100 9783 049505 9348 8696 100 049506 9674 100 9348 051407 9674 100 9348 051208 9457 8913 100 048909 9783 9565 100 0499010 9565 9348 9783 05180Mean 9591 9355 9826 05490
feature components but also improves the firmness of thefeature set by integrating the concept of feature-level fusionbased on rough set to exclude the redundant features and datawith irregular information thereby improving the overallpulmonary nodule detection performance
Table 9 Comparison of the performance of different lung noduledetection methods
Author Database Nodulenumbers
Accuracy() FPs
Santos et al[11] L 260 884 117
MagalhaesBarros Nettoet al [12]
L 48 9065 0138
Ye et al [13] Pr 220 902 82Tan et al [14] L 172 875 4Cascio et al[16] L 148 97 61
Our method Pr 70 9917 047
Rough set
Accuracy Sensitivity Specificity Running time
PCA
020406080
100120
Figure 7 Comparison of two feature-level fusion models
5 Conclusions
In this paper the research status quo of pulmonary noduledetection methods is analyzed and a pulmonary noduledetection model is proposed based on rough set basedfeature-level fusion To address the issues that the featuredescription is insufficient and the characterization is inac-curate in the process of feature extraction six new 3Dfeatures in combination with other 2D and 3D features are
BioMed Research International 13
proposed to extract and quantify the feature information ofROI in this model A rough set based feature-level fusionis employed to reduce the dimensionality of the feature setssince there is redundant information in the extracted high-dimensional features In addition a grid optimization modelis adopted to optimize the SVM kernel function whichis used as the classifier for detection and recognition ofpulmonary nodule Finally the pulmonary nodule detectionperformance of the proposed method is verified with fourgroups of comparative experimentsThe experimental resultsshow that the proposed pulmonary nodule detectionmethodbased on rough set based feature-level fusion is effective withthe classification accuracy that can basicallymeet the require-ments of medical imaging for the detection of pulmonarynodules and therefore is of great value for the detection ofpulmonary nodules and auxiliary diagnosis of lung cancer
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The work is partially supported by National Natural ScienceFoundation of China (Grant No 61561040) Natural ScienceFoundation of Ningxia (Grant No NZ16067) and ScientificResearch Fund of Ningxia Education Department (Grant NoNGY2016084)
References
[1] S P Riaz M Luchtenborg V H Coupland J Spicer M DPeake and H Moslashller ldquoTrends in incidence of small cell lungcancer and all lung cancerrdquo Lung Cancer vol 75 no 3 pp 280ndash284 2012
[2] J Ferlay I Soerjomataram R Dikshit et al ldquoCancer incidenceand mortality worldwide sources methods and major patternsin GLOBOCAN 2012rdquo International Journal of Cancer vol 136no 5 pp E359ndashE386 2015
[3] D R Baldwin ldquoPrediction of risk of lung cancer in populationsand in pulmonary nodules significant progress to drive changesin paradigmsrdquo Lung Cancer vol 89 no 1 pp 1ndash3 2015
[4] S Diciotti G Picozzi M Falchini MMascalchi N Villari andG Valli ldquo3-D segmentation algorithm of small lung nodules inspiral CT imagesrdquo IEEETransactions on Information Technologyin Biomedicine vol 12 no 1 pp 7ndash19 2008
[5] S Theodoridis and K Koutroumbas Pattern Recognition TheAssociated Press 4th edition 2010
[6] B Sahiner H-P Chan L M Hadjiiski et al ldquoEffect of CADon radiologistsrsquo detection of lung nodules on thoracic CT scansanalysis of an observer performance study by nodule sizerdquoAcademic Radiology vol 16 no 12 pp 1518ndash1530 2009
[7] Y Xia S Lu LWen S EberlM Fulham andDD Feng ldquoAuto-mated identification of dementia using FDG-PET imagingrdquoBioMed Research International vol 2014 Article ID 421743 8pages 2014
[8] Y Xia Z Ji and Y Zhang ldquoBrain MRI image segmentationbased on learning local variational Gaussian mixture modelsrdquoNeurocomputing vol 204 pp 189ndash197 2016
[9] M Xian Y Zhang and H D Cheng ldquoFully automatic segmen-tation of breast ultrasound images based on breast characteris-tics in space and frequency domainsrdquo Pattern Recognition vol48 no 2 pp 485ndash497 2015
[10] M Xian Y Zhang H-D Cheng F Xu and J Ding ldquoNeutro-connectedness cutrdquo IEEE Transactions on Image Processing vol25 no 10 pp 4691ndash4703 2016
[11] A M Santos A O de Carvalho Filho A C Silva et alldquoAutomatic detection of small lung nodules in 3D CT datausing Gaussian mixture models Tsallis entropy and SVMrdquoEngineering Applications of Artificial Intelligence vol 36 pp 27ndash39 2014
[12] S Magalhaes Barros Netto A Corrca Silva R AcatauassuNunes and M Gattass ldquoAutomatic segmentation of lungnodules with growing neural gas and support vector machinerdquoComputers in Biology andMedicine vol 42 no 11 pp 1110ndash11212012
[13] X Ye X Lin J Dehmeshki G Slabaugh and G BeddoeldquoShape-based computer-aided detection of lung nodules in tho-racic CT imagesrdquo IEEE Transactions on Biomedical Engineeringvol 56 no 7 pp 1810ndash1820 2009
[14] M Tan R Deklerck B Jansen M Bister and J Cornelis ldquoAnovel computer-aided lung nodule detection system for CTimagesrdquoMedical Physics vol 38 no 10 pp 5630ndash5645 2011
[15] Q Li F Li and K Doi ldquoComputerized Detection of LungNodules in Thin-Section CT Images by Use of SelectiveEnhancement Filters and an Automated Rule-Based ClassifierrdquoAcademic Radiology vol 15 no 2 pp 165ndash175 2008
[16] D Cascio R Magro F Fauci M Iacomi and G RasoldquoAutomatic detection of lung nodules in CT datasets basedon stable 3D mass-spring modelsrdquo Computers in Biology andMedicine vol 42 no 11 pp 1098ndash1109 2012
[17] G Y Wang Y Y Yao and H Yu ldquoA survey on rough set theoryand applicationsrdquo Chinese Journal of Computers vol 32 no 7pp 1229ndash1246 2009
[18] S Udhaya Kumar and H Hannah Inbarani ldquoA novel neigh-borhood rough set based classification approach for medicaldiagnosisrdquo Procedia Computer Science vol 47 pp 351ndash359 2015
[19] H H Inbarani A T Azar and G Jothi ldquoSupervised hybridfeature selection based on PSO and rough sets for medicaldiagnosisrdquo Computer Methods and Programs in Biomedicinevol 113 no 1 pp 175ndash185 2014
[20] S K Shah M F McNitt-Gray S R Rogers et al ldquoComputer-aided diagnosis of the solitary pulmonary nodulerdquo AcademicRadiology vol 12 no 5 pp 570ndash575 2005
[21] A Hassanien ldquoFuzzy rough sets hybrid scheme for breastcancer detectionrdquo Image and Vision Computing vol 25 no 2pp 172ndash183 2007
[22] Z Guo Y Li Y Wang S Liu T Lei and Y Fan ldquoA method ofeffective text extraction for complex video scenerdquoMathematicalProblems in Engineering vol 2016 Article ID 2187647 11 pages2016
[23] M-K Hu ldquoVisual pattern recognition by moment invariantsrdquoIRE Transactions on Information Theory vol 8 no 2 pp 179ndash187 1962
[24] J Chorowski J Wang and J M Zurada ldquoReview and per-formance comparison of SVM- and ELM-based classifiersrdquoNeurocomputing vol 128 pp 507ndash516 2014
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
2 BioMed Research International
[9 10] and Santos et al [11] segment the lung parenchymabased on region growing algorithm Magalhaes Barros Nettoet al [12] use growing neural gas (GNG) to segment thelung parenchyma the obtained pulmonary nodules are thenseparated from tissues containing blood vessels and bronchiaccording to the 3D distance transform and finally SVM isused to carry out the effective identification of pulmonarynodules with shape and texture features Ye et al [13] firstlysegment and extract region of interest (ROI) with fuzzythreshold in combinationwith Gaussianmatrix mean curva-ture and Hessian matrix then choose the local shape infor-mation and local intensity dispersion as the feature expres-sion of ROI and finally use theweighted SVM for recognitionof pulmonary nodules Tan et al [14] segment pulmonarynodules based on the blood vessels and nodule enhancementfilter proposed by Li et al [15] then locate the clustering cen-ter of pulmonary nodules based on the divergence calculatedby Gaussian template and achieve ROI extraction and finallyuse the classifier based on genetic model artificial neuralnetwork (ANN) and SVM for comparative analysis of thedetection effectiveness of pulmonary nodules Cascio et al[16] use regional growthmodel andmorphological operationto extract the ROI firstly then reconstruct B-spline surfacebased on 3D spring model in order to extract the related 3Dgray features and shape features and detect the pulmonarynodules using ANN Although the above literature exploresthe methods of detecting pulmonary nodules overall theseare still two disadvantages of these methods in featurestructure design and feature set expression as follows
(1) When extracting and quantifying feature for ROI thefeature structure design is irrational reflected by thefact that the combination of global features and localfeatures and the combination of two-dimensional andthree-dimensional features are not fully considered
(2) When fusing feature data the compactness of featureexpression is a difficult problem Therefore featureredundancy is usually not eliminated Moreover thefeature-level fusion method without prior knowledgeis rarely used
Rough set theory was developed by Zdzislaw Pawlak inthe early 1980s and can be regarded as a new mathematicaltool for feature selection feature extraction and decision rulegeneration without prior knowledge Rough sets provide themechanism to find the minimal set of attributes required toclassify the training samples This minimal set of attributes iscalled reduct and contains the same knowledge as the originalset of attributes in a given information system Thereforereducts can be used to obtain different classifiers Wang et al[17] present a framework for a systematic study of the roughset theory Various views and interpretations of the theoryand different approaches to study the theory are discussedThe relationships between the rough sets and other theoriessuch as fuzzy sets evidence theory granular computing for-mal concept analysis and knowledge spaces are examinedCost of disease prediction and diagnosis can be reduced byapplying machine learning and data mining methods Dis-ease prediction and decision-making play a significant role in
medical diagnosis Udhaya Kumar and Hannah Inbarani [18]put forward a novel neighborhood rough set classificationapproach to deal with medical datasets Experimental resultof the proposed classification algorithm is compared withother existing approaches such as rough set 119870th-nearestneighbor support vector machine BP NN and multilayerperceptron to conclude that the proposed approach is acheaperway for disease prediction anddecision-making Fea-ture Selection (FS) is a solution that involves finding a subsetof prominent features to improve predictive accuracy andto remove the redundant features Thus the learning modelreceives a concise structure without forfeiting the predictiveaccuracy built by using only the selected prominent featuresTherefore nowadays FS is an essential part of knowledge dis-covery Inbarani et al [19] proposed new supervised featureselection methods based on hybridization of Particle SwarmOptimization (PSO) PSO based Relative Reduct (PSO-RR)and PSO based Quick Reduct (PSO-QR) presented for thediseases diagnosis in order to seek to investigate the utilityof a computer-aided diagnosis in the task of differentiatingmalignant nodules frombenign nodules based on single thin-section CT image data In Shah et al [20] CT images of soli-tary pulmonary nodules were contouredmanually on a singlerepresentative slice by a thoracic radiologist Two separatecontours were created for each nodule one including only thesolid portion of the nodule and one including any ground-glass components For each contour 75 features were calcu-lated that measured the attenuation shape and texture of thenoduleThese features were then input into a feature selectionstep and four different classifiers to determine if the diagnosiscould be predicted from the feature vector Hassanien [21]discuss a hybrid scheme that combines the advantages offuzzy sets and rough sets in conjunction with statisticalfeature extraction techniques An application of breast cancerimaging has been chosen and hybridization scheme havebeen applied to see their ability and accuracy to classify thebreast cancer images into two outcomes cancer or noncancer
Based on the above reasons a pulmonary nodule detec-tion model based on rough set (RS) feature-level fusion andSVM is proposed in this paper To overcome the first afore-mentioned disadvantage the shape feature intensity featureand texture feature are extracted For shape feature three new3-dimensional features namely External Spherical Volume(ESV) Surface-Center Distance Standard Deviation (SCD-STD) and External Rectangle Cross Line Distance (ERCLD)are proposed For intensity feature three new 3-dimensionalfeatures namely intensity gradient (from inside to outside)Laplace Divergence Mean (LDM) and Laplace DivergenceDistance (LDD) are proposed Regarding feature descriptiontwo-dimensional texture feature three-dimensional shapefeature and intensity feature are used for quantificationWithregard to the second aforementioned disadvantage rough setfeature-level fusion is adopted since it can fully retain theproperties of the features without prior knowledge Finallya grid optimizationmodel is employed to optimize the kernelfunction of support vector machine (SVM) which is used toconduct the recognition and detection of pulmonary nodulesIn order to verify the validity and stability advantages of
BioMed Research International 3
the model four groups of comparative experiments are per-formed in this paper that is model validation experimentsbefore and after rough set reduction model stability experi-ments before and after rough set reduction validation exper-iments of the superiority of the rough set feature-level fusionmodel and comparative experiments with other pulmonarynodule detection models to compare the performance Theexperimental results show that the method proposed in thispaper can improve to a certain extent the rationality of fea-ture structure and compactness of feature expression therebyimproving the detection accuracy of pulmonary nodules
2 Related Theory
The description of ROI features is determined by bothits comprehensiveness (features cannot be ldquoobservedrdquo withldquomultiperspectiverdquo approach if the features amount is toolittle) and the accuracy of characterization (more quantizedvalues diverged from the real information will cause a lowfeature discrimination) A large number of noise informationsets will reduce the ROI feature extraction accuracy and affectthe final results of detection Therefore for comprehensiveand accurate expression of the morphological structure ofROI and local features six new 3-dimensional features areproposed based on the analysis of ROI for lung CT imageThese new 3-dimensional features are used to qualitativelyanalyze and quantitatively characterize the lesions from 2-dimensional and 3-dimensional perspectives in combinationwith other shape features intensity features texture features
21 Pulmonary Nodules Features in CT Image
211 Shape Characteristics Shape characteristics analyze thespatial distribution of gray values by computing local featuresat each point in the image Shape feature is the most intuitivevisual feature which can be used to describe themainmedicalsigns of CT image of pulmonary nodule ROI such as nodulesign lobulation sign spinous process sign vacuole sign andspicule sign from the perspectives of geometric shape edgeroughness and topology structure In this paper the extractedcomponents of the shape features mainly include perimeterarea volume roundness rectangularity elongation Eulernumber Harris Hu moment ESV SCDSTD and ERCLDHere some features are given [22]
(1) Area
119878 =
119873
sum
119909=1
119872
sum
119910=1
119891 (119909 119910) (1)
where 119891(119909 119910) is the pixels of the target and119872 and119873 are thelength and width respectively
(2) Perimeter
119862 =
119872
sum
119894=1
119873
sum
119895=1
119901 (119894 119895) (2)
where 119901(119894 119895) is the pixels of the target edge and119872 and119873 arethe length and width respectively
(3) Circularity
1198770=1198622
4120587119878 (3)
Circularity describes object shape that is close to thedegree of circular where 119878 is the area of the target region and119862 is circumference of the target region 0 lt 119877
0lt 1 and 119877
0
value reflects the complexity of the measurement boundarythe shape is more complex and the 119877
0value is more smaller
(4) Rectangularity
119877 =119878
(119867 lowast119882) (4)
where 119878 is the area of the target region and119867 and119882 are thelength and width respectively
(5) Elongation
119864 =min (119867119882)max (119867119882)
(5)
Elongation can distinguish different shapes of the images(such as circle square ellipse thin and long and short andwide) where119867 and119882 are the length and width respectively
(6) Euler Number
119864 = 119862 minus 119867 (6)
where 119862 is the number of connection parts and 119867 is thenumber of holes
(7) External Spherical Volume (ESV) ESV is the ratio of eachROI 119860
119894(maximum diameter is dim(119860
119894)) to the External
Spherical Volume VS(119860119894) extracted from three-dimensional
CT image which reflects the similarity between the regionand the sphere as shown in Figure 1(b)
Volume (VS (119860119894)) =
4
3times 120587 times (
dim2)
3
1198641(119860119894) =
Volume (119860119894)
Volume (VS (119860119894))
(7)
(8) Surface-Center Distance STandard Deviation (SCDSTD)SCDSTD is the coordinate distance standard deviation ofeach individual element 119862(119878
119894) and regional center 119862cen(119860 119894)
from the surface of each ROI its value also describes thesimilarity with sphere of ROI If the value is 0 119864
2(119860119894) is
a standard sphere With the increase in 1198642(119860119894) value the
magnitude of the deviation from the sphere in the regionincreases as shown in Figure 1(c)
1198642(119860119894) = std(
1003817100381710038171003817119862 (119878119894) minus 119862cen (119860 119894)1003817100381710038171003817
mean (1003817100381710038171003817119862 (119878119894) minus 119862cen (119860 119894)1003817100381710038171003817)) (8)
(9) External Rectangle Cross Line Distance (ERCLD) ERCLDis the distance from center voxel119862cen(119860 119894) of ROI to the center
4 BioMed Research International
Y
Z
X
(a) Coordinate system
Ai
O
Dim
(Ai )
VS(Ai)
(b) ESV
C(Si)
Ccen(Ai)
(c) SCDSTD
Ccen(Ai)
Ccen(Li)
(d) ERCLD
Figure 1 Three-dimensional character sketch
dim(119871119894) (119894 = 1 2 12) of its 12 intersecting lines which
may indicate that the regional voxel is evenly distributed inthe rectangular body as shown in Figure 1(d)
1198643 (119860) =
1003817100381710038171003817mean (119862cen (119860 119894) minus 119862cen (dim (119871 119894)))1003817100381710038171003817
mean (119862cen (dim (119871 119894))) (9)
212 Hu Moment Characteristics Moments and the relatedinvariants have been extensively analyzed to characterize thepatterns in images The moment invariants are independentof position size and orientation but also independent ofparallel projection Hu [23] was the first person to prove thecentral moment invariants The central geometric momentinvariants are derived based upon algebraic invariantsincluding six absolute orthogonal invariants and one skeworthogonal invariant The moment invariants have beenproved to be the adequatemeasures for tracing image patternsabout the images translation scaling and rotation
Hu moment invariants define seven values computed bynormalizing central moments through order three whichare invariant to object scale position and orientation and alarge number of papers that have significant contribution to
the application of Hu moment Two-dimensional momentsof a digitally sampled 119872 lowast 119873 image that has gray function119891(119909 119910) (119909 = 1 2 119872 119910 = 1 2 119873) are given as
119872119901119902=
119872
sum
119909=1
119873
sum
119910=1
119909119901119910119902119891 (119909 119910) 119901 119902 = 1 2 3 (10)
The moments 119891(119909 119910) translated by an amount (119886 119887) aredefined as
119909 =11989810
11989800
119910 =11989801
11989800
120583119901119902=
119872
sum
119909=1
119873
sum
119910=1
(119909 minus 119909)119901(119910 minus 119910)
119902119891 (119909 119910)
119901 119902 = 1 2 3
(11)
BioMed Research International 5
When a scaling normalization is applied the centralmoments change as
120578119901119902=
120583119901119902
120583120574
00
120574 = ((119901 + 119902)
2) + 1 (12)
In terms of the central moments the seven moments aregiven as
1198621= 12057820+ 12057802
1198622= (12057820minus 12057802)2+ 41205782
11
1198623= (12057830minus 312057812)2+ (312057821minus 12057803)2
1198624= (12057830+ 12057812)2+ (12057803+ 12057821)2
1198625= (12057830minus 312057812) (12057830+ 12057812)
sdot [(12057830+ 12057812)2minus 3 (120578
03+ 12057821)2] + (3120578
21minus 12057803)
sdot (12057821+ 12057803) [3 (120578
30+ 12057812)2minus (12057803+ 12057821)2]
1198626= (12057820minus 12057802) [(12057830+ 12057812)2minus (12057803+ 12057821)2]
+ 412057811(12057830+ 12057812) (12057821+ 12057803)
1198627= (312057821minus 12057803) (12057830+ 12057812)
sdot [(12057830+ 12057812)2minus 3 (120578
03+ 12057821)2] + (3120578
12minus 12057830)
sdot (12057821+ 12057803) [3 (120578
30+ 12057812)2minus (12057803+ 12057821)2]
(13)
Hu 7-moment invariants varywidely in order to compareusing logarithmic function to compress data and hence theactual invariants moment features are 1198621015840
119870
119862119870=10038161003816100381610038161003816log10
100381610038161003816100381610038161198621015840
119870
10038161003816100381610038161003816
10038161003816100381610038161003816119870 = 1 2 7 (14)
The amended moment invariant features possess transla-tion invariance rotational invariance and scale invariance
213 TextureCharacteristics Tamura texture features Tamuratexture based on human visual perception in psychologicalresearch are proposed by Tamura in 1978 Six components ofTamura texture feature correspond with 6 properties in psy-chology three of themare coarseness contrast anddirection-ality which have the good application value in the texturesynthesis image recognition and so on
Texture is the gray distribution which appears repeatedlyin the space position so there are some relationships betweentwo pixels at some distance from each other in image spacecalled gray spatial correlation properties in gray imageGLCM is a common method by studying the relevant rela-tionship of gray image
214 Intensity Features Gray statistical feature is a quantita-tivemethod to describe the basic features of two-dimensionalimage region it is called intensity feature from three-dimen-sional perspective [16] In this paper the extracted com-ponents of intensity features include the mean intensity
intensity variance maximum and minimum intensity dif-ference skewness kurtosis intensity gradient (from insideto outside) Laplace Divergence Mean (LDM) and LaplaceDivergence Distance (LDD)
(1) Intensity Gradient (from Inside to Outside) For ROI 119860119894
with the voxel 119878119894volume greater than 0 morphological
erosion processing is performed continuously and the ratio ofthe mean of the excluded area of each erosion processing tothe mean of the last operation (initial value is 0) is calculateduntil the ratio is zero Consider the following equation where119899 is the number of operations
1198644(119860119894) =
119870
119899 (15)
(2) Laplace Divergence Mean (LDM) According to theLaplacian convolution results with the original CT imageit is found that the nodule surrounding area with smallergray value difference has a significant different divergenceTherefore calculation of Laplace divergence is helpful todistinguish pulmonary nodules from interfering impurities
1198645(119860119894) = mean (119860
119894times La) (16)
(3) Laplace Divergence Distance (LDD) The differencebetween the maximum and minimum values of the Laplacedivergence values is used to describe the range of regionaldivergence
1198646(119860119894) = max (119860
119894times La) minusmin (119860
119894times La) (17)
Table 1 shows the feature set of 42 features based on theabove feature description of ROI To facilitate subsequenttests features are numbered in the order as showed inTable 1 that is the shape features are numbered fs1ndashfs18 theintensity features are numbered fi1ndashfi8 and texture featuresare numbered ft1ndashft16 respectively
22 Rough Set and Attribute Reduction Rough set theory(RST) proposed by Pawlak in 1982 is one of the effectivemathematical tools for processing fuzzy and uncertaintyknowledge Nowadays RST has been applied to a varietyof fields such as artificial intelligence data mining patternrecognition and knowledge discovery Rough set is foundedon the assumption that with every object of the universeof discourse some knowledge is associated Objects char-acterized by the same information are similar in view ofthe available information about them The indiscernibilityrelation generated in this way is the mathematical basis ofrough set theory Any set of all indiscernible objects are calledan elementary set and form a basic granule of knowledgeabout the universe Any union of some elementary sets isreferred to as a crisp set otherwise the set is rough set
Definition 1 An information system 119878 is a quadruple 119878 =(119880 119860 119881 119891) where 119880 is a nonempty and finite set of objects119860 is a nonempty and finite set of attributes 119881 fl ⋃119881
119886with
119881119886being the domain of attribute 119886 and 119891 is an information
6 BioMed Research International
Table 1 ROI feature set
Feature type Feature vectors Dimensionality
Shapefeatures (fs)
Perimeter area volumeroundness rectangularity lengthEulerrsquos number ESV SCDSTDERCLD Hu moment
18
Intensityfeatures (fi)
Mean intensity intensitystandard variancemaximum-minimum intensitydifference value of varianceskewness kurtosis intensitygradient (from inside to outside)LDM LDD
8
Texturefeatures (ft)
Tamura texture features(contrast direction roughness)GLCM (angular second momentmoment of inertia torque deficitsum mean variance sumvariance difference varianceentropy sum entropy differentialentropy information measurecorrelation coefficient maximumcorrelation coefficient)
16
function such that119891(119909 119886) isin 119881119886for every 119909 isin 119880 and every 119886 isin
119860 A decision system is an information system (119880 119862cup119863119881 119891)with119862cap119863 = B where119862 and119863 are called the conditional anddecision attribute sets respectively
For a subset 119875 of 119860 let us define the correspondingequivalence relation as
IND (119875) = (119909 119910) isin 119880 times 119880 | 119891 (119909 119886)
= 119891 (119910 119886) for any 119886 isin 119875(18)
and denote the equivalence class of IND(119875) which containsthe object 119909 isin 119880 by [119909]
119875 that is
[119909]119875 = 119910 isin 119880 | (119909 119910) isin IND (119875) (19)
The factor set of all equivalence classes of IND(119875) isdenoted by 119880119875 that is 119880119875 = [119909]
119875| 119909 isin 119880
As well known attribute reduction is one of the keyissues in RST It is performed in information systems bymeans of the notion of a reduct based on a specializationof the notion of independence due to Marczewski Up tonow much attention has been paid to this issue and manydifferent methods of attribute reduction have been proposedfor decision systems For example the reduction approachesare respectively based on partition discernibility matrixconditional information entropy positive region and antcolony optimization approach
Definition 2 Let 119878 = (119880 119860 119881 119891) be an information systemand 119875 sube 119860 For a subset119883 of 119880 119877
119875(119883) = 119909 isin 119880 | [119909]
119875sube 119883
and 119877119875(119883) = 119909 isin 119880 | [119909]119875cap 119883 = B are called 119875-lower and
119875-upper approximations of119883 respectively
H1
H2
H
r
r
Figure 2 Optimal hyper plane
Definition 3 Let 119878 = (119880 119860 119881 119891) be an information systemand let 119875 and 119876 be two subsets of 119860 Then POS
119875(119876) =
⋃119883isin119880119876
119877119875(119883) is called 119875-positive region of 119876 where 119877
119875(119883)
is the 119875-lower approximation of 119883
Definition 4 Let 119878 = (119880 119860 119881 119891) be a decision system 119886 isin 119862and 119875 sube 119862 If POS
119862(119876) = POS
119862119886(119876) 119886 is said to be 119863-
dispensable in 119862 otherwise 119886 is said to be 119863-indispensablein119862 The set of all the119863-indispensable attributes is called thecore of 119878 and denoted by Core(119878) Furthermore if POS
119875(119876) =
POS119862(119876) and each of the attributes of 119875 is 119863-indispensable
then 119875 is called a reduct of 119878
23 SVM and Its Optimization SVM is a pattern recognitionmethod developed from statistical learning theory based onthe idea of structural risk minimization principle In the caseof ensuring classification accuracy SVM can improve thegeneralization ability of the learning machine by maximizingthe classification interval The biggest advantage of SVMis that it overcomes the overlearning and high dimensionboth of which lead to computational complexity and localextremum problems A reliable classification model basedon SVM is urgently needed for the study of hospitalizationexpenses of patients with gastric cancer
SVM deals with linearly separable data (Figure 2) theassumption is that there are data sets 119878 = 119909
1 119909
119899 and data
marker 119866 = 1199101 119910
119899 where 119909
119894is the input space vector of
the data sample and 119910119894records the category of the sample
The aim of SVM is to find an optimal hyper plane 119867 toseparate these two samples andmake the largest intervalTheoptimal hyper plane119867 is expressed as
119908119879119909 + 119887 = 0 (20)
where 119908 is the weight vector and 119887 is the thresholdThis problem is transformed into the optimal problem of
119908 and 119887
min119908119887
119903 (119908) =1
2119908
119910119894 ((119908 sdot 119909) + 119887) ge 1 119894 = 1 119899
(21)
BioMed Research International 7
In order to simplify the formula the Lagrange dual isintroduced to meet the requirements of KKT (Karush-Kuhn-Tucker) The objective function is transformed into
min120572
1
2
119899
sum
119894=1
119899
sum
119895=1
119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus
119899
sum
119895=1
120572119895
st119899
sum
119894=1
119910119894120572119894= 0 120572
119894ge 0 119894 = 1 2 119899
(22)
As for the linearly inseparable data the penalty parameter119862 and relaxation variable 120585 are introduced in the constraintcondition thus the generalization ability of SVM is increasedand the function is transformed into
min120572
1
2
119899
sum
119894=1
119899
sum
119895=1
119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus
119899
sum
119895=1
120572119895
st119899
sum
119894=1
119910119894120572119894= 0 0 le 120572
119894le 119862
(23)
where 119862 is the artificial setting parameter According to thepractical experience the bigger 119862 the greater separationinterval At the same time it will increase the risk ofgeneralization
The final classification function is
119891 (119909) = sgn(119899
sum
119894=1
119886lowast
119894119910119894(119909119894sdot 119909)) + 119887
lowast (24)
For nonlinear classification data SVM transforms theminto linearly separable data in a high-dimensional space vianonlinear mapping of kernel function and the optimal hyperplane is found in high-dimensional space The kernel func-tion which meets the mercer kernel condition correspondingto the transvection of a spatial transformation is used torealize the nonlinear transformation of linear classification
The corresponding kernel function is defined as
119870(119909119894 119909) = (120593 (119909
119894) 120593 (119909)) (25)
At this point the final classification function is
119891 (119909) = sgn(119899
sum
119894=1
119886119894119910119894119870(119909119894sdot 119909)) + 119887 (26)
Penalty factor 119862 and parameter 119892 of the kernel functionplay an extremely important role in the performance of SVMclassification In order to obtain the optimal classificationresults grid optimization model is used for optimization inthis paper In grid optimization model the parameters to besearched are expressed in the form of grids in a certain spaceand the optimal parameters are selected by traversing all thegrids Therefore grid optimization model has the advantagesof simplicity convenience good stability and easiness to getthe global optimal solution [24] In the learning process ofSVM 10-fold cross-validation is used to calculate the kernel
function parameters and penalty coefficient with the optimalclassification performance which are then applied to theSVM classifier for recognition and detection of pulmonarynodules Finally sensitivity specificity accuracy and pro-cessing time are used as indexes to evaluate the detection ofrelevant experiments
3 Pulmonary Nodule Detection Model
In this paper CT images of 70 cases of patients with pul-monary nodules are used The images are firstly segmented[7] to three different types of pulmonary nodules (solitarypulmonary nodules or SPN vascular adhesion pulmonarynodules or VAPN and pleural adhesion pulmonary nodulesor PAPN) which are marked by radiologists as well as alarge number of nonnodular areas including blood vesselsbones and alveoli Forty-two feature components charac-terizing ROI are extracted from the 2-dimensional and 3-dimensional perspectives including six new 3-dimensionalfeatures proposed in this paper They are composed of 18shape features 8 intensity features and 16 texture featuresThe extracted feature set (identified as the FS) is discretizedand normalized Feature-level fusion of the improved featuredata is performed for five times using rough set model (sincethe reduction of rough set feature subset is not unique in thispaper the extracted feature sets are reduced for five timesand are identified as RS1 RS2 RS3 RS4 and RS5) Featuresubset RS1 is used for comparative experiment Finally SVMparameters are optimized using grid optimization modeland the improved SVM is used in the following four setsof comparative experiments comparative analysis of theeffectiveness and stability of classification before and afterrough set reduction of features comparative analysis of therecognition performance before and after feature-level fusionbased on rough set or PCA comparative analysis of therecognition performance of our proposed method and othermethods Based on the above views we present a flow chartof pulmonary nodule detection model as shown in Figure 3
4 Results and Discussion
41 Experimental Environments In this paper the hardwareand software environments are as follows
Software Environments Windows 7 OS the Matlab R2014bImageJ 148 u and LibSVM
Hardware Environments Intel Core i5 4670-34GHz 80GBof memory and 500GB hard disk
Experimental Data CT images of 70 cases of patients withpulmonary nodules are collected as experimental sampleswhich are marked by radiologists with a size of 512 times 512and a thickness of 2mm They are composed of 2232 CTimages from 38 cases of patients with solitary pulmonarynodules (SPN) 17 cases of patients with vascular adhesionpulmonary nodules (VAPN) and 15 cases of patients withpleural adhesion pulmonary nodules (PAPN) respectively
8 BioMed Research International
ROI
Pulmonarynodule
Otherareas
Blood vesselsbone alveolar
Gridoptimization
methodSVM
Featureextraction
Featurereduction
Classification result
Feature set
RS1 RS2 RS3 RS4 RS5
Rough set
Image segmentation
Shapefeatures
Perimeter area volume roundrectangular length Eulers number ESV
SCDSTD Hu moment ERCLD
Mean intensity intensity standardvariance maximum-minimum intensitydifference value of variance skewness
kurtosis the intensity gradient (by)outside introversion Laplace divergence
Tamura texture features (contrastorientation degree roughness)
GLCM (angular second moment moment ofinertia the deficit sum mean variance sumvariance difference variance entropy andentropy differential entropy information
measure correlation coefficient maximumcorrelation coefficient)
Intensityfeatures
Texturefeatures
CT
SPN VAPN PAPN
Normalized
Discretization
Figure 3 Flow chart of pulmonary nodule detection model
Figure 4 shows the representatives of each type of pulmonarynodules and the corresponding segmentation results
In this paper 42-dimensional features of 70 markedpulmonary nodular areas and 70 randomly selected nonn-odular areas are extracted Table 2 shows the 42-dimensionalfeature values of the lung nodular and nonnodular areasshape features are identified as the fs intensity features areidentified as the fi and texture features are identified as the ftIn order to intuitively understand the distribution of differentfeature values and the discrimination comparison externalsphere volume (ESV) ratio and the standard deviation of
surface-center distance (SCD) are calculated and plotted asbox diagram as shown in Figure 5
42 Feature-Level Fusion Based on Rough Set In order toavoid the attribute value of small range of values dominatedby that of large range of values and reduce the complexityof the statistical computation process the extracted featuresets are firstly preprocessed by normalizing data with biggerdifference and linearly mapping the data to [0 1] Thepreprocessed feature data are then fused for five times usingrough set model The fusion results are shown in Table 3
BioMed Research International 9
(a) Original image of SPN (b) Segment result of SPN (c) Original image of VAPN
(d) Segment result of VAPN (e) Original image of PAPN (f) Segment result of PAPN
Figure 4 Pulmonary nodule segmentation results
Table 2 Feature values of pulmonary nodular areas and nonnodular areas
Shape features (fs) Intensity features (fi) Texture features (ft)Nodular areas Nonnodular areas Nodular areas Nonnodular areas Nodular areas Nonnodular areas95 78 5906 910987 83104 54016159 128 1406 44872 12041 125216284 178 05956 minus039568 04303 0006706517 0211 27348 18669 07709 0727506961 21587 551865 143481 07169 0986503529 07778 05 1 08059 538940 1 139598 206044 01942 0048703186 10295 729905 3546389 07708 0727300686 10197 08059 5349800042 00458 35042 5097100021 00295 06514 0845300013 00268 00971 0614300005 00011 44033 8218620 1 00691 5006114 9 minus05785 minus0424505356 05571 2307 3223903072 050178801738 0207122
10 BioMed Research International
+0Nodular areas Nonnodular areas
02
04
06
08
1
12
14
(a) Boxplot of ESV
+
+
+
01
015
02
025
03
035
04
045
05
055
06
Nodular areas Nonnodular areas(b) Boxplot of SCDSTD
Figure 5 Pulmonary nodule area and the pulmonary nodules boxplot ldquo+rdquo refers to upper and lower bounders of ESV value and SCDSTDvalue
Table 3 Feature reduction based on rough sets
Featuresubset Reduction results Dimensionality
RS1fs4 fs16 fs17 fs18 fi2 fi4 fi6 fi7fi8 ft2 ft4 ft5 ft6 ft7 ft8 ft9ft10 ft11 ft13 ft14 ft15 ft16
21
RS2fs4 fs9 fs16 fs18 fi1 fi2 fi5 ft2ft5 ft6 ft8 ft9 ft10 ft11 ft12 ft13
ft1517
RS3fs9 fs17 fs18 fi1 fi2 fi5 fi7 fi8ft2 ft6 ft7 ft8 ft9 ft10 ft11 ft12
ft14 ft15 ft1619
RS4fs9 fs16 fs18 fi1 fi2 fi5 fi7 fi8ft5 ft6 ft7 ft8 ft9 ft10 ft11 ft12
ft14 ft15 ft1619
RS5fs9 fs16 fs17 fs18 fi1 fi2 fi4 fi5fi7 fi8 ft2 ft5 ft6 ft7 ft8 ft9 ft10
ft12 ft15 ft1620
43 Pulmonary Nodule Detection withSVM Based on Grid Optimization
431 The Model Effectiveness Experiment Tenfold cross-validation is used to calculate the accuracy sensitivity speci-ficity and processing time of classification before and afterrough set reduction (RS1(70 times 21) obtained from experimentone is used as the data set after reduction) and the recogni-tion performance of classifier is compared before and afterreduction The results are shown in Table 4
Experimental results show that pulmonary noduledetection accuracy is increased significantly after feature-level fusion with a decrease in the missed diagnosis ratereflected by the increased sensitivity and the misdiagnosisrate reflected by the increased specificity The processingtime is also shorter after reduction These results indicatethat the feature-level fusion of the extracted feature set with
42 dimensionalities based on rough set model is effectivewhich not only improves the compactness of the feature set(to eliminate redundancy and low degree of differentiationfeatures component) but also corrects the abnormal data ofthe feature set thereby further improving the performance ofpulmonary nodule detection Table 5 shows the effectivenessof the five rough set reduction subsets
432 The Model Stability Experiment The feature data ofpulmonary nodules are tested with RS1(70lowast21) as the datasetfor classification for five rounds with a different ratio oftraining set over testing set of 5020 4030 3535 3535 or2050 Each round of test is carried out with a randomlyselected ratio of training set over testing set and themeanof 10test results is used as the corresponding accuracy sensitivityspecificity and running time of the model The results areshown in Table 6
The experimental results show that with the decreasein the ratio of training set over testing set the decrease inthe classification accuracy of feature subset after rough setreduction is not obvious whereas that of feature set beforerough set reduction is fluctuating to certain extent (Figure 6is more intuitive)These results indicate that the classificationstability of the feature level fusion model based on rough setis higher and is less susceptible to the interference of sampledata Table 7 shows the stability of 5 groups feature subset afterrough set reduction
433 The Superiority of Feature-Level Fusion Model Basedon Rough Set Since PCA is a well-developed model char-acterized by simple calculation and easy programming ithas become the preferred dimension reduction method formost of the feature-level fusion model in order to analyzecomparatively two types of feature-level fusions In this paperPCA-based feature-level fusion of the extracted feature sets isperformed at the same time and the tenfold cross-validationresults are shown in Table 8 Figure 7 shows the classificationperformance of the two types of feature-level fusion methods
BioMed Research International 11
Table 4 Statistics of effectiveness before and after rough set reduction
Serial number Accuracy () Sensibility () Specificity () Processing time (s)
Before reduction
1 9642 9286 100 106102 9196 8393 100 061703 9554 100 9107 054904 8928 100 7857 056305 9554 9107 100 054706 9821 9643 100 054607 9464 8929 100 054608 9553 9107 100 054609 9196 8393 100 0546010 9732 100 9664 05300
Mean 9464 9286 9643 06051
After reduction (Rs1)
1 100 100 100 093702 100 100 100 043603 100 100 100 038704 100 100 100 042105 100 100 100 042106 100 100 100 039007 100 100 100 040608 9167 100 8333 040609 100 100 100 0374010 100 100 100 03930
Mean 9917 100 9833 04571Increase after reduction 453 714 19 0148
Table 5 Effectiveness of rough set reduction subsets
Subset Averageaccuracy ()
Averagesensitivity ()
Averagespecificity ()
Processingtime (s)
RS1 9917 100 9833 04571RS2 975 9667 9833 04650RS3 9917 100 9833 04656RS4 100 100 100 04731RS5 9833 9833 9833 04850Mean 9883 99 9866 04672
(feature subset RS1 fromTable 3 is used and the running timeis 100 times actual time)
Experimental results show that various performanceindicators of the feature-level fusion model based on roughset are better than those based on PCA indicating that therough set is more suitable than PCA to eliminate redundantinformation
434 Comparison with Other Pulmonary Nodule DetectionMethods Pulmonary nodule detection accuracy and FalsePositives per scan (FPs) are used as the evaluation indexesof pulmonary nodule detection methods to compare andanalyze the method proposed in this paper and other five
9293949596979899
100101
5020 4030 3535 3040 2050
Before reductionAfter reduction
Figure 6 Comparative results of feature subsets before and afterrough set reduction
detectionmethods of pulmonary nodules (the optimal detec-tion accuracy is used for all detection methods) The resultsare shown in Table 9 (Pr private database L LIDC)
Experimental results show that the proposed method issuperior to the other pulmonary nodule detection methodsto a certain extent indicating that this method not onlyimproves the comprehensiveness and accuracy of the featuredescription of ROI by supplementing and improving the
12 BioMed Research International
Table 6 Stability statistics of rough set reduction subsets
Trainingsettesting set Accuracy () Sensitivity () Specificity () Running time (s)
Before fusion
5020 9735 9471 100 048734030 9653 9308 9832 038463535 9583 9239 9779 042543040 9616 9558 9674 035602050 9488 9463 9586 04236Mean 9615 9408 97742 04154
After fusion (Rs1)
5020 9971 9941 100 026844030 9896 9958 9846 025683535 9865 9923 9808 023823040 9837 9860 9814 026462050 9825 9767 9884 02636Mean 9879 9884 9870 02583
Table 7 Classification performance of rough set reduction subset
SubsetAverageaccuracy
()
Averagesensitivity
()
Averagespecificity
()
Runningtime (s)
RS1 9917 100 9833 02583RS2 975 9667 9833 02870RS3 9917 100 9833 02560RS4 100 100 100 02531RS5 9833 9833 9833 02656Mean 98834 99 9866 02620
Table 8 Classification performance of feature reduction based onPCA
Serialnumber
Accuracy()
Sensitivity()
Specificity()
10 timesrunningtime (s)
1 9167 8333 100 099702 9674 9348 100 048303 9674 9348 100 048804 9891 100 9783 049505 9348 8696 100 049506 9674 100 9348 051407 9674 100 9348 051208 9457 8913 100 048909 9783 9565 100 0499010 9565 9348 9783 05180Mean 9591 9355 9826 05490
feature components but also improves the firmness of thefeature set by integrating the concept of feature-level fusionbased on rough set to exclude the redundant features and datawith irregular information thereby improving the overallpulmonary nodule detection performance
Table 9 Comparison of the performance of different lung noduledetection methods
Author Database Nodulenumbers
Accuracy() FPs
Santos et al[11] L 260 884 117
MagalhaesBarros Nettoet al [12]
L 48 9065 0138
Ye et al [13] Pr 220 902 82Tan et al [14] L 172 875 4Cascio et al[16] L 148 97 61
Our method Pr 70 9917 047
Rough set
Accuracy Sensitivity Specificity Running time
PCA
020406080
100120
Figure 7 Comparison of two feature-level fusion models
5 Conclusions
In this paper the research status quo of pulmonary noduledetection methods is analyzed and a pulmonary noduledetection model is proposed based on rough set basedfeature-level fusion To address the issues that the featuredescription is insufficient and the characterization is inac-curate in the process of feature extraction six new 3Dfeatures in combination with other 2D and 3D features are
BioMed Research International 13
proposed to extract and quantify the feature information ofROI in this model A rough set based feature-level fusionis employed to reduce the dimensionality of the feature setssince there is redundant information in the extracted high-dimensional features In addition a grid optimization modelis adopted to optimize the SVM kernel function whichis used as the classifier for detection and recognition ofpulmonary nodule Finally the pulmonary nodule detectionperformance of the proposed method is verified with fourgroups of comparative experimentsThe experimental resultsshow that the proposed pulmonary nodule detectionmethodbased on rough set based feature-level fusion is effective withthe classification accuracy that can basicallymeet the require-ments of medical imaging for the detection of pulmonarynodules and therefore is of great value for the detection ofpulmonary nodules and auxiliary diagnosis of lung cancer
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The work is partially supported by National Natural ScienceFoundation of China (Grant No 61561040) Natural ScienceFoundation of Ningxia (Grant No NZ16067) and ScientificResearch Fund of Ningxia Education Department (Grant NoNGY2016084)
References
[1] S P Riaz M Luchtenborg V H Coupland J Spicer M DPeake and H Moslashller ldquoTrends in incidence of small cell lungcancer and all lung cancerrdquo Lung Cancer vol 75 no 3 pp 280ndash284 2012
[2] J Ferlay I Soerjomataram R Dikshit et al ldquoCancer incidenceand mortality worldwide sources methods and major patternsin GLOBOCAN 2012rdquo International Journal of Cancer vol 136no 5 pp E359ndashE386 2015
[3] D R Baldwin ldquoPrediction of risk of lung cancer in populationsand in pulmonary nodules significant progress to drive changesin paradigmsrdquo Lung Cancer vol 89 no 1 pp 1ndash3 2015
[4] S Diciotti G Picozzi M Falchini MMascalchi N Villari andG Valli ldquo3-D segmentation algorithm of small lung nodules inspiral CT imagesrdquo IEEETransactions on Information Technologyin Biomedicine vol 12 no 1 pp 7ndash19 2008
[5] S Theodoridis and K Koutroumbas Pattern Recognition TheAssociated Press 4th edition 2010
[6] B Sahiner H-P Chan L M Hadjiiski et al ldquoEffect of CADon radiologistsrsquo detection of lung nodules on thoracic CT scansanalysis of an observer performance study by nodule sizerdquoAcademic Radiology vol 16 no 12 pp 1518ndash1530 2009
[7] Y Xia S Lu LWen S EberlM Fulham andDD Feng ldquoAuto-mated identification of dementia using FDG-PET imagingrdquoBioMed Research International vol 2014 Article ID 421743 8pages 2014
[8] Y Xia Z Ji and Y Zhang ldquoBrain MRI image segmentationbased on learning local variational Gaussian mixture modelsrdquoNeurocomputing vol 204 pp 189ndash197 2016
[9] M Xian Y Zhang and H D Cheng ldquoFully automatic segmen-tation of breast ultrasound images based on breast characteris-tics in space and frequency domainsrdquo Pattern Recognition vol48 no 2 pp 485ndash497 2015
[10] M Xian Y Zhang H-D Cheng F Xu and J Ding ldquoNeutro-connectedness cutrdquo IEEE Transactions on Image Processing vol25 no 10 pp 4691ndash4703 2016
[11] A M Santos A O de Carvalho Filho A C Silva et alldquoAutomatic detection of small lung nodules in 3D CT datausing Gaussian mixture models Tsallis entropy and SVMrdquoEngineering Applications of Artificial Intelligence vol 36 pp 27ndash39 2014
[12] S Magalhaes Barros Netto A Corrca Silva R AcatauassuNunes and M Gattass ldquoAutomatic segmentation of lungnodules with growing neural gas and support vector machinerdquoComputers in Biology andMedicine vol 42 no 11 pp 1110ndash11212012
[13] X Ye X Lin J Dehmeshki G Slabaugh and G BeddoeldquoShape-based computer-aided detection of lung nodules in tho-racic CT imagesrdquo IEEE Transactions on Biomedical Engineeringvol 56 no 7 pp 1810ndash1820 2009
[14] M Tan R Deklerck B Jansen M Bister and J Cornelis ldquoAnovel computer-aided lung nodule detection system for CTimagesrdquoMedical Physics vol 38 no 10 pp 5630ndash5645 2011
[15] Q Li F Li and K Doi ldquoComputerized Detection of LungNodules in Thin-Section CT Images by Use of SelectiveEnhancement Filters and an Automated Rule-Based ClassifierrdquoAcademic Radiology vol 15 no 2 pp 165ndash175 2008
[16] D Cascio R Magro F Fauci M Iacomi and G RasoldquoAutomatic detection of lung nodules in CT datasets basedon stable 3D mass-spring modelsrdquo Computers in Biology andMedicine vol 42 no 11 pp 1098ndash1109 2012
[17] G Y Wang Y Y Yao and H Yu ldquoA survey on rough set theoryand applicationsrdquo Chinese Journal of Computers vol 32 no 7pp 1229ndash1246 2009
[18] S Udhaya Kumar and H Hannah Inbarani ldquoA novel neigh-borhood rough set based classification approach for medicaldiagnosisrdquo Procedia Computer Science vol 47 pp 351ndash359 2015
[19] H H Inbarani A T Azar and G Jothi ldquoSupervised hybridfeature selection based on PSO and rough sets for medicaldiagnosisrdquo Computer Methods and Programs in Biomedicinevol 113 no 1 pp 175ndash185 2014
[20] S K Shah M F McNitt-Gray S R Rogers et al ldquoComputer-aided diagnosis of the solitary pulmonary nodulerdquo AcademicRadiology vol 12 no 5 pp 570ndash575 2005
[21] A Hassanien ldquoFuzzy rough sets hybrid scheme for breastcancer detectionrdquo Image and Vision Computing vol 25 no 2pp 172ndash183 2007
[22] Z Guo Y Li Y Wang S Liu T Lei and Y Fan ldquoA method ofeffective text extraction for complex video scenerdquoMathematicalProblems in Engineering vol 2016 Article ID 2187647 11 pages2016
[23] M-K Hu ldquoVisual pattern recognition by moment invariantsrdquoIRE Transactions on Information Theory vol 8 no 2 pp 179ndash187 1962
[24] J Chorowski J Wang and J M Zurada ldquoReview and per-formance comparison of SVM- and ELM-based classifiersrdquoNeurocomputing vol 128 pp 507ndash516 2014
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
BioMed Research International 3
the model four groups of comparative experiments are per-formed in this paper that is model validation experimentsbefore and after rough set reduction model stability experi-ments before and after rough set reduction validation exper-iments of the superiority of the rough set feature-level fusionmodel and comparative experiments with other pulmonarynodule detection models to compare the performance Theexperimental results show that the method proposed in thispaper can improve to a certain extent the rationality of fea-ture structure and compactness of feature expression therebyimproving the detection accuracy of pulmonary nodules
2 Related Theory
The description of ROI features is determined by bothits comprehensiveness (features cannot be ldquoobservedrdquo withldquomultiperspectiverdquo approach if the features amount is toolittle) and the accuracy of characterization (more quantizedvalues diverged from the real information will cause a lowfeature discrimination) A large number of noise informationsets will reduce the ROI feature extraction accuracy and affectthe final results of detection Therefore for comprehensiveand accurate expression of the morphological structure ofROI and local features six new 3-dimensional features areproposed based on the analysis of ROI for lung CT imageThese new 3-dimensional features are used to qualitativelyanalyze and quantitatively characterize the lesions from 2-dimensional and 3-dimensional perspectives in combinationwith other shape features intensity features texture features
21 Pulmonary Nodules Features in CT Image
211 Shape Characteristics Shape characteristics analyze thespatial distribution of gray values by computing local featuresat each point in the image Shape feature is the most intuitivevisual feature which can be used to describe themainmedicalsigns of CT image of pulmonary nodule ROI such as nodulesign lobulation sign spinous process sign vacuole sign andspicule sign from the perspectives of geometric shape edgeroughness and topology structure In this paper the extractedcomponents of the shape features mainly include perimeterarea volume roundness rectangularity elongation Eulernumber Harris Hu moment ESV SCDSTD and ERCLDHere some features are given [22]
(1) Area
119878 =
119873
sum
119909=1
119872
sum
119910=1
119891 (119909 119910) (1)
where 119891(119909 119910) is the pixels of the target and119872 and119873 are thelength and width respectively
(2) Perimeter
119862 =
119872
sum
119894=1
119873
sum
119895=1
119901 (119894 119895) (2)
where 119901(119894 119895) is the pixels of the target edge and119872 and119873 arethe length and width respectively
(3) Circularity
1198770=1198622
4120587119878 (3)
Circularity describes object shape that is close to thedegree of circular where 119878 is the area of the target region and119862 is circumference of the target region 0 lt 119877
0lt 1 and 119877
0
value reflects the complexity of the measurement boundarythe shape is more complex and the 119877
0value is more smaller
(4) Rectangularity
119877 =119878
(119867 lowast119882) (4)
where 119878 is the area of the target region and119867 and119882 are thelength and width respectively
(5) Elongation
119864 =min (119867119882)max (119867119882)
(5)
Elongation can distinguish different shapes of the images(such as circle square ellipse thin and long and short andwide) where119867 and119882 are the length and width respectively
(6) Euler Number
119864 = 119862 minus 119867 (6)
where 119862 is the number of connection parts and 119867 is thenumber of holes
(7) External Spherical Volume (ESV) ESV is the ratio of eachROI 119860
119894(maximum diameter is dim(119860
119894)) to the External
Spherical Volume VS(119860119894) extracted from three-dimensional
CT image which reflects the similarity between the regionand the sphere as shown in Figure 1(b)
Volume (VS (119860119894)) =
4
3times 120587 times (
dim2)
3
1198641(119860119894) =
Volume (119860119894)
Volume (VS (119860119894))
(7)
(8) Surface-Center Distance STandard Deviation (SCDSTD)SCDSTD is the coordinate distance standard deviation ofeach individual element 119862(119878
119894) and regional center 119862cen(119860 119894)
from the surface of each ROI its value also describes thesimilarity with sphere of ROI If the value is 0 119864
2(119860119894) is
a standard sphere With the increase in 1198642(119860119894) value the
magnitude of the deviation from the sphere in the regionincreases as shown in Figure 1(c)
1198642(119860119894) = std(
1003817100381710038171003817119862 (119878119894) minus 119862cen (119860 119894)1003817100381710038171003817
mean (1003817100381710038171003817119862 (119878119894) minus 119862cen (119860 119894)1003817100381710038171003817)) (8)
(9) External Rectangle Cross Line Distance (ERCLD) ERCLDis the distance from center voxel119862cen(119860 119894) of ROI to the center
4 BioMed Research International
Y
Z
X
(a) Coordinate system
Ai
O
Dim
(Ai )
VS(Ai)
(b) ESV
C(Si)
Ccen(Ai)
(c) SCDSTD
Ccen(Ai)
Ccen(Li)
(d) ERCLD
Figure 1 Three-dimensional character sketch
dim(119871119894) (119894 = 1 2 12) of its 12 intersecting lines which
may indicate that the regional voxel is evenly distributed inthe rectangular body as shown in Figure 1(d)
1198643 (119860) =
1003817100381710038171003817mean (119862cen (119860 119894) minus 119862cen (dim (119871 119894)))1003817100381710038171003817
mean (119862cen (dim (119871 119894))) (9)
212 Hu Moment Characteristics Moments and the relatedinvariants have been extensively analyzed to characterize thepatterns in images The moment invariants are independentof position size and orientation but also independent ofparallel projection Hu [23] was the first person to prove thecentral moment invariants The central geometric momentinvariants are derived based upon algebraic invariantsincluding six absolute orthogonal invariants and one skeworthogonal invariant The moment invariants have beenproved to be the adequatemeasures for tracing image patternsabout the images translation scaling and rotation
Hu moment invariants define seven values computed bynormalizing central moments through order three whichare invariant to object scale position and orientation and alarge number of papers that have significant contribution to
the application of Hu moment Two-dimensional momentsof a digitally sampled 119872 lowast 119873 image that has gray function119891(119909 119910) (119909 = 1 2 119872 119910 = 1 2 119873) are given as
119872119901119902=
119872
sum
119909=1
119873
sum
119910=1
119909119901119910119902119891 (119909 119910) 119901 119902 = 1 2 3 (10)
The moments 119891(119909 119910) translated by an amount (119886 119887) aredefined as
119909 =11989810
11989800
119910 =11989801
11989800
120583119901119902=
119872
sum
119909=1
119873
sum
119910=1
(119909 minus 119909)119901(119910 minus 119910)
119902119891 (119909 119910)
119901 119902 = 1 2 3
(11)
BioMed Research International 5
When a scaling normalization is applied the centralmoments change as
120578119901119902=
120583119901119902
120583120574
00
120574 = ((119901 + 119902)
2) + 1 (12)
In terms of the central moments the seven moments aregiven as
1198621= 12057820+ 12057802
1198622= (12057820minus 12057802)2+ 41205782
11
1198623= (12057830minus 312057812)2+ (312057821minus 12057803)2
1198624= (12057830+ 12057812)2+ (12057803+ 12057821)2
1198625= (12057830minus 312057812) (12057830+ 12057812)
sdot [(12057830+ 12057812)2minus 3 (120578
03+ 12057821)2] + (3120578
21minus 12057803)
sdot (12057821+ 12057803) [3 (120578
30+ 12057812)2minus (12057803+ 12057821)2]
1198626= (12057820minus 12057802) [(12057830+ 12057812)2minus (12057803+ 12057821)2]
+ 412057811(12057830+ 12057812) (12057821+ 12057803)
1198627= (312057821minus 12057803) (12057830+ 12057812)
sdot [(12057830+ 12057812)2minus 3 (120578
03+ 12057821)2] + (3120578
12minus 12057830)
sdot (12057821+ 12057803) [3 (120578
30+ 12057812)2minus (12057803+ 12057821)2]
(13)
Hu 7-moment invariants varywidely in order to compareusing logarithmic function to compress data and hence theactual invariants moment features are 1198621015840
119870
119862119870=10038161003816100381610038161003816log10
100381610038161003816100381610038161198621015840
119870
10038161003816100381610038161003816
10038161003816100381610038161003816119870 = 1 2 7 (14)
The amended moment invariant features possess transla-tion invariance rotational invariance and scale invariance
213 TextureCharacteristics Tamura texture features Tamuratexture based on human visual perception in psychologicalresearch are proposed by Tamura in 1978 Six components ofTamura texture feature correspond with 6 properties in psy-chology three of themare coarseness contrast anddirection-ality which have the good application value in the texturesynthesis image recognition and so on
Texture is the gray distribution which appears repeatedlyin the space position so there are some relationships betweentwo pixels at some distance from each other in image spacecalled gray spatial correlation properties in gray imageGLCM is a common method by studying the relevant rela-tionship of gray image
214 Intensity Features Gray statistical feature is a quantita-tivemethod to describe the basic features of two-dimensionalimage region it is called intensity feature from three-dimen-sional perspective [16] In this paper the extracted com-ponents of intensity features include the mean intensity
intensity variance maximum and minimum intensity dif-ference skewness kurtosis intensity gradient (from insideto outside) Laplace Divergence Mean (LDM) and LaplaceDivergence Distance (LDD)
(1) Intensity Gradient (from Inside to Outside) For ROI 119860119894
with the voxel 119878119894volume greater than 0 morphological
erosion processing is performed continuously and the ratio ofthe mean of the excluded area of each erosion processing tothe mean of the last operation (initial value is 0) is calculateduntil the ratio is zero Consider the following equation where119899 is the number of operations
1198644(119860119894) =
119870
119899 (15)
(2) Laplace Divergence Mean (LDM) According to theLaplacian convolution results with the original CT imageit is found that the nodule surrounding area with smallergray value difference has a significant different divergenceTherefore calculation of Laplace divergence is helpful todistinguish pulmonary nodules from interfering impurities
1198645(119860119894) = mean (119860
119894times La) (16)
(3) Laplace Divergence Distance (LDD) The differencebetween the maximum and minimum values of the Laplacedivergence values is used to describe the range of regionaldivergence
1198646(119860119894) = max (119860
119894times La) minusmin (119860
119894times La) (17)
Table 1 shows the feature set of 42 features based on theabove feature description of ROI To facilitate subsequenttests features are numbered in the order as showed inTable 1 that is the shape features are numbered fs1ndashfs18 theintensity features are numbered fi1ndashfi8 and texture featuresare numbered ft1ndashft16 respectively
22 Rough Set and Attribute Reduction Rough set theory(RST) proposed by Pawlak in 1982 is one of the effectivemathematical tools for processing fuzzy and uncertaintyknowledge Nowadays RST has been applied to a varietyof fields such as artificial intelligence data mining patternrecognition and knowledge discovery Rough set is foundedon the assumption that with every object of the universeof discourse some knowledge is associated Objects char-acterized by the same information are similar in view ofthe available information about them The indiscernibilityrelation generated in this way is the mathematical basis ofrough set theory Any set of all indiscernible objects are calledan elementary set and form a basic granule of knowledgeabout the universe Any union of some elementary sets isreferred to as a crisp set otherwise the set is rough set
Definition 1 An information system 119878 is a quadruple 119878 =(119880 119860 119881 119891) where 119880 is a nonempty and finite set of objects119860 is a nonempty and finite set of attributes 119881 fl ⋃119881
119886with
119881119886being the domain of attribute 119886 and 119891 is an information
6 BioMed Research International
Table 1 ROI feature set
Feature type Feature vectors Dimensionality
Shapefeatures (fs)
Perimeter area volumeroundness rectangularity lengthEulerrsquos number ESV SCDSTDERCLD Hu moment
18
Intensityfeatures (fi)
Mean intensity intensitystandard variancemaximum-minimum intensitydifference value of varianceskewness kurtosis intensitygradient (from inside to outside)LDM LDD
8
Texturefeatures (ft)
Tamura texture features(contrast direction roughness)GLCM (angular second momentmoment of inertia torque deficitsum mean variance sumvariance difference varianceentropy sum entropy differentialentropy information measurecorrelation coefficient maximumcorrelation coefficient)
16
function such that119891(119909 119886) isin 119881119886for every 119909 isin 119880 and every 119886 isin
119860 A decision system is an information system (119880 119862cup119863119881 119891)with119862cap119863 = B where119862 and119863 are called the conditional anddecision attribute sets respectively
For a subset 119875 of 119860 let us define the correspondingequivalence relation as
IND (119875) = (119909 119910) isin 119880 times 119880 | 119891 (119909 119886)
= 119891 (119910 119886) for any 119886 isin 119875(18)
and denote the equivalence class of IND(119875) which containsthe object 119909 isin 119880 by [119909]
119875 that is
[119909]119875 = 119910 isin 119880 | (119909 119910) isin IND (119875) (19)
The factor set of all equivalence classes of IND(119875) isdenoted by 119880119875 that is 119880119875 = [119909]
119875| 119909 isin 119880
As well known attribute reduction is one of the keyissues in RST It is performed in information systems bymeans of the notion of a reduct based on a specializationof the notion of independence due to Marczewski Up tonow much attention has been paid to this issue and manydifferent methods of attribute reduction have been proposedfor decision systems For example the reduction approachesare respectively based on partition discernibility matrixconditional information entropy positive region and antcolony optimization approach
Definition 2 Let 119878 = (119880 119860 119881 119891) be an information systemand 119875 sube 119860 For a subset119883 of 119880 119877
119875(119883) = 119909 isin 119880 | [119909]
119875sube 119883
and 119877119875(119883) = 119909 isin 119880 | [119909]119875cap 119883 = B are called 119875-lower and
119875-upper approximations of119883 respectively
H1
H2
H
r
r
Figure 2 Optimal hyper plane
Definition 3 Let 119878 = (119880 119860 119881 119891) be an information systemand let 119875 and 119876 be two subsets of 119860 Then POS
119875(119876) =
⋃119883isin119880119876
119877119875(119883) is called 119875-positive region of 119876 where 119877
119875(119883)
is the 119875-lower approximation of 119883
Definition 4 Let 119878 = (119880 119860 119881 119891) be a decision system 119886 isin 119862and 119875 sube 119862 If POS
119862(119876) = POS
119862119886(119876) 119886 is said to be 119863-
dispensable in 119862 otherwise 119886 is said to be 119863-indispensablein119862 The set of all the119863-indispensable attributes is called thecore of 119878 and denoted by Core(119878) Furthermore if POS
119875(119876) =
POS119862(119876) and each of the attributes of 119875 is 119863-indispensable
then 119875 is called a reduct of 119878
23 SVM and Its Optimization SVM is a pattern recognitionmethod developed from statistical learning theory based onthe idea of structural risk minimization principle In the caseof ensuring classification accuracy SVM can improve thegeneralization ability of the learning machine by maximizingthe classification interval The biggest advantage of SVMis that it overcomes the overlearning and high dimensionboth of which lead to computational complexity and localextremum problems A reliable classification model basedon SVM is urgently needed for the study of hospitalizationexpenses of patients with gastric cancer
SVM deals with linearly separable data (Figure 2) theassumption is that there are data sets 119878 = 119909
1 119909
119899 and data
marker 119866 = 1199101 119910
119899 where 119909
119894is the input space vector of
the data sample and 119910119894records the category of the sample
The aim of SVM is to find an optimal hyper plane 119867 toseparate these two samples andmake the largest intervalTheoptimal hyper plane119867 is expressed as
119908119879119909 + 119887 = 0 (20)
where 119908 is the weight vector and 119887 is the thresholdThis problem is transformed into the optimal problem of
119908 and 119887
min119908119887
119903 (119908) =1
2119908
119910119894 ((119908 sdot 119909) + 119887) ge 1 119894 = 1 119899
(21)
BioMed Research International 7
In order to simplify the formula the Lagrange dual isintroduced to meet the requirements of KKT (Karush-Kuhn-Tucker) The objective function is transformed into
min120572
1
2
119899
sum
119894=1
119899
sum
119895=1
119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus
119899
sum
119895=1
120572119895
st119899
sum
119894=1
119910119894120572119894= 0 120572
119894ge 0 119894 = 1 2 119899
(22)
As for the linearly inseparable data the penalty parameter119862 and relaxation variable 120585 are introduced in the constraintcondition thus the generalization ability of SVM is increasedand the function is transformed into
min120572
1
2
119899
sum
119894=1
119899
sum
119895=1
119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus
119899
sum
119895=1
120572119895
st119899
sum
119894=1
119910119894120572119894= 0 0 le 120572
119894le 119862
(23)
where 119862 is the artificial setting parameter According to thepractical experience the bigger 119862 the greater separationinterval At the same time it will increase the risk ofgeneralization
The final classification function is
119891 (119909) = sgn(119899
sum
119894=1
119886lowast
119894119910119894(119909119894sdot 119909)) + 119887
lowast (24)
For nonlinear classification data SVM transforms theminto linearly separable data in a high-dimensional space vianonlinear mapping of kernel function and the optimal hyperplane is found in high-dimensional space The kernel func-tion which meets the mercer kernel condition correspondingto the transvection of a spatial transformation is used torealize the nonlinear transformation of linear classification
The corresponding kernel function is defined as
119870(119909119894 119909) = (120593 (119909
119894) 120593 (119909)) (25)
At this point the final classification function is
119891 (119909) = sgn(119899
sum
119894=1
119886119894119910119894119870(119909119894sdot 119909)) + 119887 (26)
Penalty factor 119862 and parameter 119892 of the kernel functionplay an extremely important role in the performance of SVMclassification In order to obtain the optimal classificationresults grid optimization model is used for optimization inthis paper In grid optimization model the parameters to besearched are expressed in the form of grids in a certain spaceand the optimal parameters are selected by traversing all thegrids Therefore grid optimization model has the advantagesof simplicity convenience good stability and easiness to getthe global optimal solution [24] In the learning process ofSVM 10-fold cross-validation is used to calculate the kernel
function parameters and penalty coefficient with the optimalclassification performance which are then applied to theSVM classifier for recognition and detection of pulmonarynodules Finally sensitivity specificity accuracy and pro-cessing time are used as indexes to evaluate the detection ofrelevant experiments
3 Pulmonary Nodule Detection Model
In this paper CT images of 70 cases of patients with pul-monary nodules are used The images are firstly segmented[7] to three different types of pulmonary nodules (solitarypulmonary nodules or SPN vascular adhesion pulmonarynodules or VAPN and pleural adhesion pulmonary nodulesor PAPN) which are marked by radiologists as well as alarge number of nonnodular areas including blood vesselsbones and alveoli Forty-two feature components charac-terizing ROI are extracted from the 2-dimensional and 3-dimensional perspectives including six new 3-dimensionalfeatures proposed in this paper They are composed of 18shape features 8 intensity features and 16 texture featuresThe extracted feature set (identified as the FS) is discretizedand normalized Feature-level fusion of the improved featuredata is performed for five times using rough set model (sincethe reduction of rough set feature subset is not unique in thispaper the extracted feature sets are reduced for five timesand are identified as RS1 RS2 RS3 RS4 and RS5) Featuresubset RS1 is used for comparative experiment Finally SVMparameters are optimized using grid optimization modeland the improved SVM is used in the following four setsof comparative experiments comparative analysis of theeffectiveness and stability of classification before and afterrough set reduction of features comparative analysis of therecognition performance before and after feature-level fusionbased on rough set or PCA comparative analysis of therecognition performance of our proposed method and othermethods Based on the above views we present a flow chartof pulmonary nodule detection model as shown in Figure 3
4 Results and Discussion
41 Experimental Environments In this paper the hardwareand software environments are as follows
Software Environments Windows 7 OS the Matlab R2014bImageJ 148 u and LibSVM
Hardware Environments Intel Core i5 4670-34GHz 80GBof memory and 500GB hard disk
Experimental Data CT images of 70 cases of patients withpulmonary nodules are collected as experimental sampleswhich are marked by radiologists with a size of 512 times 512and a thickness of 2mm They are composed of 2232 CTimages from 38 cases of patients with solitary pulmonarynodules (SPN) 17 cases of patients with vascular adhesionpulmonary nodules (VAPN) and 15 cases of patients withpleural adhesion pulmonary nodules (PAPN) respectively
8 BioMed Research International
ROI
Pulmonarynodule
Otherareas
Blood vesselsbone alveolar
Gridoptimization
methodSVM
Featureextraction
Featurereduction
Classification result
Feature set
RS1 RS2 RS3 RS4 RS5
Rough set
Image segmentation
Shapefeatures
Perimeter area volume roundrectangular length Eulers number ESV
SCDSTD Hu moment ERCLD
Mean intensity intensity standardvariance maximum-minimum intensitydifference value of variance skewness
kurtosis the intensity gradient (by)outside introversion Laplace divergence
Tamura texture features (contrastorientation degree roughness)
GLCM (angular second moment moment ofinertia the deficit sum mean variance sumvariance difference variance entropy andentropy differential entropy information
measure correlation coefficient maximumcorrelation coefficient)
Intensityfeatures
Texturefeatures
CT
SPN VAPN PAPN
Normalized
Discretization
Figure 3 Flow chart of pulmonary nodule detection model
Figure 4 shows the representatives of each type of pulmonarynodules and the corresponding segmentation results
In this paper 42-dimensional features of 70 markedpulmonary nodular areas and 70 randomly selected nonn-odular areas are extracted Table 2 shows the 42-dimensionalfeature values of the lung nodular and nonnodular areasshape features are identified as the fs intensity features areidentified as the fi and texture features are identified as the ftIn order to intuitively understand the distribution of differentfeature values and the discrimination comparison externalsphere volume (ESV) ratio and the standard deviation of
surface-center distance (SCD) are calculated and plotted asbox diagram as shown in Figure 5
42 Feature-Level Fusion Based on Rough Set In order toavoid the attribute value of small range of values dominatedby that of large range of values and reduce the complexityof the statistical computation process the extracted featuresets are firstly preprocessed by normalizing data with biggerdifference and linearly mapping the data to [0 1] Thepreprocessed feature data are then fused for five times usingrough set model The fusion results are shown in Table 3
BioMed Research International 9
(a) Original image of SPN (b) Segment result of SPN (c) Original image of VAPN
(d) Segment result of VAPN (e) Original image of PAPN (f) Segment result of PAPN
Figure 4 Pulmonary nodule segmentation results
Table 2 Feature values of pulmonary nodular areas and nonnodular areas
Shape features (fs) Intensity features (fi) Texture features (ft)Nodular areas Nonnodular areas Nodular areas Nonnodular areas Nodular areas Nonnodular areas95 78 5906 910987 83104 54016159 128 1406 44872 12041 125216284 178 05956 minus039568 04303 0006706517 0211 27348 18669 07709 0727506961 21587 551865 143481 07169 0986503529 07778 05 1 08059 538940 1 139598 206044 01942 0048703186 10295 729905 3546389 07708 0727300686 10197 08059 5349800042 00458 35042 5097100021 00295 06514 0845300013 00268 00971 0614300005 00011 44033 8218620 1 00691 5006114 9 minus05785 minus0424505356 05571 2307 3223903072 050178801738 0207122
10 BioMed Research International
+0Nodular areas Nonnodular areas
02
04
06
08
1
12
14
(a) Boxplot of ESV
+
+
+
01
015
02
025
03
035
04
045
05
055
06
Nodular areas Nonnodular areas(b) Boxplot of SCDSTD
Figure 5 Pulmonary nodule area and the pulmonary nodules boxplot ldquo+rdquo refers to upper and lower bounders of ESV value and SCDSTDvalue
Table 3 Feature reduction based on rough sets
Featuresubset Reduction results Dimensionality
RS1fs4 fs16 fs17 fs18 fi2 fi4 fi6 fi7fi8 ft2 ft4 ft5 ft6 ft7 ft8 ft9ft10 ft11 ft13 ft14 ft15 ft16
21
RS2fs4 fs9 fs16 fs18 fi1 fi2 fi5 ft2ft5 ft6 ft8 ft9 ft10 ft11 ft12 ft13
ft1517
RS3fs9 fs17 fs18 fi1 fi2 fi5 fi7 fi8ft2 ft6 ft7 ft8 ft9 ft10 ft11 ft12
ft14 ft15 ft1619
RS4fs9 fs16 fs18 fi1 fi2 fi5 fi7 fi8ft5 ft6 ft7 ft8 ft9 ft10 ft11 ft12
ft14 ft15 ft1619
RS5fs9 fs16 fs17 fs18 fi1 fi2 fi4 fi5fi7 fi8 ft2 ft5 ft6 ft7 ft8 ft9 ft10
ft12 ft15 ft1620
43 Pulmonary Nodule Detection withSVM Based on Grid Optimization
431 The Model Effectiveness Experiment Tenfold cross-validation is used to calculate the accuracy sensitivity speci-ficity and processing time of classification before and afterrough set reduction (RS1(70 times 21) obtained from experimentone is used as the data set after reduction) and the recogni-tion performance of classifier is compared before and afterreduction The results are shown in Table 4
Experimental results show that pulmonary noduledetection accuracy is increased significantly after feature-level fusion with a decrease in the missed diagnosis ratereflected by the increased sensitivity and the misdiagnosisrate reflected by the increased specificity The processingtime is also shorter after reduction These results indicatethat the feature-level fusion of the extracted feature set with
42 dimensionalities based on rough set model is effectivewhich not only improves the compactness of the feature set(to eliminate redundancy and low degree of differentiationfeatures component) but also corrects the abnormal data ofthe feature set thereby further improving the performance ofpulmonary nodule detection Table 5 shows the effectivenessof the five rough set reduction subsets
432 The Model Stability Experiment The feature data ofpulmonary nodules are tested with RS1(70lowast21) as the datasetfor classification for five rounds with a different ratio oftraining set over testing set of 5020 4030 3535 3535 or2050 Each round of test is carried out with a randomlyselected ratio of training set over testing set and themeanof 10test results is used as the corresponding accuracy sensitivityspecificity and running time of the model The results areshown in Table 6
The experimental results show that with the decreasein the ratio of training set over testing set the decrease inthe classification accuracy of feature subset after rough setreduction is not obvious whereas that of feature set beforerough set reduction is fluctuating to certain extent (Figure 6is more intuitive)These results indicate that the classificationstability of the feature level fusion model based on rough setis higher and is less susceptible to the interference of sampledata Table 7 shows the stability of 5 groups feature subset afterrough set reduction
433 The Superiority of Feature-Level Fusion Model Basedon Rough Set Since PCA is a well-developed model char-acterized by simple calculation and easy programming ithas become the preferred dimension reduction method formost of the feature-level fusion model in order to analyzecomparatively two types of feature-level fusions In this paperPCA-based feature-level fusion of the extracted feature sets isperformed at the same time and the tenfold cross-validationresults are shown in Table 8 Figure 7 shows the classificationperformance of the two types of feature-level fusion methods
BioMed Research International 11
Table 4 Statistics of effectiveness before and after rough set reduction
Serial number Accuracy () Sensibility () Specificity () Processing time (s)
Before reduction
1 9642 9286 100 106102 9196 8393 100 061703 9554 100 9107 054904 8928 100 7857 056305 9554 9107 100 054706 9821 9643 100 054607 9464 8929 100 054608 9553 9107 100 054609 9196 8393 100 0546010 9732 100 9664 05300
Mean 9464 9286 9643 06051
After reduction (Rs1)
1 100 100 100 093702 100 100 100 043603 100 100 100 038704 100 100 100 042105 100 100 100 042106 100 100 100 039007 100 100 100 040608 9167 100 8333 040609 100 100 100 0374010 100 100 100 03930
Mean 9917 100 9833 04571Increase after reduction 453 714 19 0148
Table 5 Effectiveness of rough set reduction subsets
Subset Averageaccuracy ()
Averagesensitivity ()
Averagespecificity ()
Processingtime (s)
RS1 9917 100 9833 04571RS2 975 9667 9833 04650RS3 9917 100 9833 04656RS4 100 100 100 04731RS5 9833 9833 9833 04850Mean 9883 99 9866 04672
(feature subset RS1 fromTable 3 is used and the running timeis 100 times actual time)
Experimental results show that various performanceindicators of the feature-level fusion model based on roughset are better than those based on PCA indicating that therough set is more suitable than PCA to eliminate redundantinformation
434 Comparison with Other Pulmonary Nodule DetectionMethods Pulmonary nodule detection accuracy and FalsePositives per scan (FPs) are used as the evaluation indexesof pulmonary nodule detection methods to compare andanalyze the method proposed in this paper and other five
9293949596979899
100101
5020 4030 3535 3040 2050
Before reductionAfter reduction
Figure 6 Comparative results of feature subsets before and afterrough set reduction
detectionmethods of pulmonary nodules (the optimal detec-tion accuracy is used for all detection methods) The resultsare shown in Table 9 (Pr private database L LIDC)
Experimental results show that the proposed method issuperior to the other pulmonary nodule detection methodsto a certain extent indicating that this method not onlyimproves the comprehensiveness and accuracy of the featuredescription of ROI by supplementing and improving the
12 BioMed Research International
Table 6 Stability statistics of rough set reduction subsets
Trainingsettesting set Accuracy () Sensitivity () Specificity () Running time (s)
Before fusion
5020 9735 9471 100 048734030 9653 9308 9832 038463535 9583 9239 9779 042543040 9616 9558 9674 035602050 9488 9463 9586 04236Mean 9615 9408 97742 04154
After fusion (Rs1)
5020 9971 9941 100 026844030 9896 9958 9846 025683535 9865 9923 9808 023823040 9837 9860 9814 026462050 9825 9767 9884 02636Mean 9879 9884 9870 02583
Table 7 Classification performance of rough set reduction subset
SubsetAverageaccuracy
()
Averagesensitivity
()
Averagespecificity
()
Runningtime (s)
RS1 9917 100 9833 02583RS2 975 9667 9833 02870RS3 9917 100 9833 02560RS4 100 100 100 02531RS5 9833 9833 9833 02656Mean 98834 99 9866 02620
Table 8 Classification performance of feature reduction based onPCA
Serialnumber
Accuracy()
Sensitivity()
Specificity()
10 timesrunningtime (s)
1 9167 8333 100 099702 9674 9348 100 048303 9674 9348 100 048804 9891 100 9783 049505 9348 8696 100 049506 9674 100 9348 051407 9674 100 9348 051208 9457 8913 100 048909 9783 9565 100 0499010 9565 9348 9783 05180Mean 9591 9355 9826 05490
feature components but also improves the firmness of thefeature set by integrating the concept of feature-level fusionbased on rough set to exclude the redundant features and datawith irregular information thereby improving the overallpulmonary nodule detection performance
Table 9 Comparison of the performance of different lung noduledetection methods
Author Database Nodulenumbers
Accuracy() FPs
Santos et al[11] L 260 884 117
MagalhaesBarros Nettoet al [12]
L 48 9065 0138
Ye et al [13] Pr 220 902 82Tan et al [14] L 172 875 4Cascio et al[16] L 148 97 61
Our method Pr 70 9917 047
Rough set
Accuracy Sensitivity Specificity Running time
PCA
020406080
100120
Figure 7 Comparison of two feature-level fusion models
5 Conclusions
In this paper the research status quo of pulmonary noduledetection methods is analyzed and a pulmonary noduledetection model is proposed based on rough set basedfeature-level fusion To address the issues that the featuredescription is insufficient and the characterization is inac-curate in the process of feature extraction six new 3Dfeatures in combination with other 2D and 3D features are
BioMed Research International 13
proposed to extract and quantify the feature information ofROI in this model A rough set based feature-level fusionis employed to reduce the dimensionality of the feature setssince there is redundant information in the extracted high-dimensional features In addition a grid optimization modelis adopted to optimize the SVM kernel function whichis used as the classifier for detection and recognition ofpulmonary nodule Finally the pulmonary nodule detectionperformance of the proposed method is verified with fourgroups of comparative experimentsThe experimental resultsshow that the proposed pulmonary nodule detectionmethodbased on rough set based feature-level fusion is effective withthe classification accuracy that can basicallymeet the require-ments of medical imaging for the detection of pulmonarynodules and therefore is of great value for the detection ofpulmonary nodules and auxiliary diagnosis of lung cancer
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The work is partially supported by National Natural ScienceFoundation of China (Grant No 61561040) Natural ScienceFoundation of Ningxia (Grant No NZ16067) and ScientificResearch Fund of Ningxia Education Department (Grant NoNGY2016084)
References
[1] S P Riaz M Luchtenborg V H Coupland J Spicer M DPeake and H Moslashller ldquoTrends in incidence of small cell lungcancer and all lung cancerrdquo Lung Cancer vol 75 no 3 pp 280ndash284 2012
[2] J Ferlay I Soerjomataram R Dikshit et al ldquoCancer incidenceand mortality worldwide sources methods and major patternsin GLOBOCAN 2012rdquo International Journal of Cancer vol 136no 5 pp E359ndashE386 2015
[3] D R Baldwin ldquoPrediction of risk of lung cancer in populationsand in pulmonary nodules significant progress to drive changesin paradigmsrdquo Lung Cancer vol 89 no 1 pp 1ndash3 2015
[4] S Diciotti G Picozzi M Falchini MMascalchi N Villari andG Valli ldquo3-D segmentation algorithm of small lung nodules inspiral CT imagesrdquo IEEETransactions on Information Technologyin Biomedicine vol 12 no 1 pp 7ndash19 2008
[5] S Theodoridis and K Koutroumbas Pattern Recognition TheAssociated Press 4th edition 2010
[6] B Sahiner H-P Chan L M Hadjiiski et al ldquoEffect of CADon radiologistsrsquo detection of lung nodules on thoracic CT scansanalysis of an observer performance study by nodule sizerdquoAcademic Radiology vol 16 no 12 pp 1518ndash1530 2009
[7] Y Xia S Lu LWen S EberlM Fulham andDD Feng ldquoAuto-mated identification of dementia using FDG-PET imagingrdquoBioMed Research International vol 2014 Article ID 421743 8pages 2014
[8] Y Xia Z Ji and Y Zhang ldquoBrain MRI image segmentationbased on learning local variational Gaussian mixture modelsrdquoNeurocomputing vol 204 pp 189ndash197 2016
[9] M Xian Y Zhang and H D Cheng ldquoFully automatic segmen-tation of breast ultrasound images based on breast characteris-tics in space and frequency domainsrdquo Pattern Recognition vol48 no 2 pp 485ndash497 2015
[10] M Xian Y Zhang H-D Cheng F Xu and J Ding ldquoNeutro-connectedness cutrdquo IEEE Transactions on Image Processing vol25 no 10 pp 4691ndash4703 2016
[11] A M Santos A O de Carvalho Filho A C Silva et alldquoAutomatic detection of small lung nodules in 3D CT datausing Gaussian mixture models Tsallis entropy and SVMrdquoEngineering Applications of Artificial Intelligence vol 36 pp 27ndash39 2014
[12] S Magalhaes Barros Netto A Corrca Silva R AcatauassuNunes and M Gattass ldquoAutomatic segmentation of lungnodules with growing neural gas and support vector machinerdquoComputers in Biology andMedicine vol 42 no 11 pp 1110ndash11212012
[13] X Ye X Lin J Dehmeshki G Slabaugh and G BeddoeldquoShape-based computer-aided detection of lung nodules in tho-racic CT imagesrdquo IEEE Transactions on Biomedical Engineeringvol 56 no 7 pp 1810ndash1820 2009
[14] M Tan R Deklerck B Jansen M Bister and J Cornelis ldquoAnovel computer-aided lung nodule detection system for CTimagesrdquoMedical Physics vol 38 no 10 pp 5630ndash5645 2011
[15] Q Li F Li and K Doi ldquoComputerized Detection of LungNodules in Thin-Section CT Images by Use of SelectiveEnhancement Filters and an Automated Rule-Based ClassifierrdquoAcademic Radiology vol 15 no 2 pp 165ndash175 2008
[16] D Cascio R Magro F Fauci M Iacomi and G RasoldquoAutomatic detection of lung nodules in CT datasets basedon stable 3D mass-spring modelsrdquo Computers in Biology andMedicine vol 42 no 11 pp 1098ndash1109 2012
[17] G Y Wang Y Y Yao and H Yu ldquoA survey on rough set theoryand applicationsrdquo Chinese Journal of Computers vol 32 no 7pp 1229ndash1246 2009
[18] S Udhaya Kumar and H Hannah Inbarani ldquoA novel neigh-borhood rough set based classification approach for medicaldiagnosisrdquo Procedia Computer Science vol 47 pp 351ndash359 2015
[19] H H Inbarani A T Azar and G Jothi ldquoSupervised hybridfeature selection based on PSO and rough sets for medicaldiagnosisrdquo Computer Methods and Programs in Biomedicinevol 113 no 1 pp 175ndash185 2014
[20] S K Shah M F McNitt-Gray S R Rogers et al ldquoComputer-aided diagnosis of the solitary pulmonary nodulerdquo AcademicRadiology vol 12 no 5 pp 570ndash575 2005
[21] A Hassanien ldquoFuzzy rough sets hybrid scheme for breastcancer detectionrdquo Image and Vision Computing vol 25 no 2pp 172ndash183 2007
[22] Z Guo Y Li Y Wang S Liu T Lei and Y Fan ldquoA method ofeffective text extraction for complex video scenerdquoMathematicalProblems in Engineering vol 2016 Article ID 2187647 11 pages2016
[23] M-K Hu ldquoVisual pattern recognition by moment invariantsrdquoIRE Transactions on Information Theory vol 8 no 2 pp 179ndash187 1962
[24] J Chorowski J Wang and J M Zurada ldquoReview and per-formance comparison of SVM- and ELM-based classifiersrdquoNeurocomputing vol 128 pp 507ndash516 2014
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
4 BioMed Research International
Y
Z
X
(a) Coordinate system
Ai
O
Dim
(Ai )
VS(Ai)
(b) ESV
C(Si)
Ccen(Ai)
(c) SCDSTD
Ccen(Ai)
Ccen(Li)
(d) ERCLD
Figure 1 Three-dimensional character sketch
dim(119871119894) (119894 = 1 2 12) of its 12 intersecting lines which
may indicate that the regional voxel is evenly distributed inthe rectangular body as shown in Figure 1(d)
1198643 (119860) =
1003817100381710038171003817mean (119862cen (119860 119894) minus 119862cen (dim (119871 119894)))1003817100381710038171003817
mean (119862cen (dim (119871 119894))) (9)
212 Hu Moment Characteristics Moments and the relatedinvariants have been extensively analyzed to characterize thepatterns in images The moment invariants are independentof position size and orientation but also independent ofparallel projection Hu [23] was the first person to prove thecentral moment invariants The central geometric momentinvariants are derived based upon algebraic invariantsincluding six absolute orthogonal invariants and one skeworthogonal invariant The moment invariants have beenproved to be the adequatemeasures for tracing image patternsabout the images translation scaling and rotation
Hu moment invariants define seven values computed bynormalizing central moments through order three whichare invariant to object scale position and orientation and alarge number of papers that have significant contribution to
the application of Hu moment Two-dimensional momentsof a digitally sampled 119872 lowast 119873 image that has gray function119891(119909 119910) (119909 = 1 2 119872 119910 = 1 2 119873) are given as
119872119901119902=
119872
sum
119909=1
119873
sum
119910=1
119909119901119910119902119891 (119909 119910) 119901 119902 = 1 2 3 (10)
The moments 119891(119909 119910) translated by an amount (119886 119887) aredefined as
119909 =11989810
11989800
119910 =11989801
11989800
120583119901119902=
119872
sum
119909=1
119873
sum
119910=1
(119909 minus 119909)119901(119910 minus 119910)
119902119891 (119909 119910)
119901 119902 = 1 2 3
(11)
BioMed Research International 5
When a scaling normalization is applied the centralmoments change as
120578119901119902=
120583119901119902
120583120574
00
120574 = ((119901 + 119902)
2) + 1 (12)
In terms of the central moments the seven moments aregiven as
1198621= 12057820+ 12057802
1198622= (12057820minus 12057802)2+ 41205782
11
1198623= (12057830minus 312057812)2+ (312057821minus 12057803)2
1198624= (12057830+ 12057812)2+ (12057803+ 12057821)2
1198625= (12057830minus 312057812) (12057830+ 12057812)
sdot [(12057830+ 12057812)2minus 3 (120578
03+ 12057821)2] + (3120578
21minus 12057803)
sdot (12057821+ 12057803) [3 (120578
30+ 12057812)2minus (12057803+ 12057821)2]
1198626= (12057820minus 12057802) [(12057830+ 12057812)2minus (12057803+ 12057821)2]
+ 412057811(12057830+ 12057812) (12057821+ 12057803)
1198627= (312057821minus 12057803) (12057830+ 12057812)
sdot [(12057830+ 12057812)2minus 3 (120578
03+ 12057821)2] + (3120578
12minus 12057830)
sdot (12057821+ 12057803) [3 (120578
30+ 12057812)2minus (12057803+ 12057821)2]
(13)
Hu 7-moment invariants varywidely in order to compareusing logarithmic function to compress data and hence theactual invariants moment features are 1198621015840
119870
119862119870=10038161003816100381610038161003816log10
100381610038161003816100381610038161198621015840
119870
10038161003816100381610038161003816
10038161003816100381610038161003816119870 = 1 2 7 (14)
The amended moment invariant features possess transla-tion invariance rotational invariance and scale invariance
213 TextureCharacteristics Tamura texture features Tamuratexture based on human visual perception in psychologicalresearch are proposed by Tamura in 1978 Six components ofTamura texture feature correspond with 6 properties in psy-chology three of themare coarseness contrast anddirection-ality which have the good application value in the texturesynthesis image recognition and so on
Texture is the gray distribution which appears repeatedlyin the space position so there are some relationships betweentwo pixels at some distance from each other in image spacecalled gray spatial correlation properties in gray imageGLCM is a common method by studying the relevant rela-tionship of gray image
214 Intensity Features Gray statistical feature is a quantita-tivemethod to describe the basic features of two-dimensionalimage region it is called intensity feature from three-dimen-sional perspective [16] In this paper the extracted com-ponents of intensity features include the mean intensity
intensity variance maximum and minimum intensity dif-ference skewness kurtosis intensity gradient (from insideto outside) Laplace Divergence Mean (LDM) and LaplaceDivergence Distance (LDD)
(1) Intensity Gradient (from Inside to Outside) For ROI 119860119894
with the voxel 119878119894volume greater than 0 morphological
erosion processing is performed continuously and the ratio ofthe mean of the excluded area of each erosion processing tothe mean of the last operation (initial value is 0) is calculateduntil the ratio is zero Consider the following equation where119899 is the number of operations
1198644(119860119894) =
119870
119899 (15)
(2) Laplace Divergence Mean (LDM) According to theLaplacian convolution results with the original CT imageit is found that the nodule surrounding area with smallergray value difference has a significant different divergenceTherefore calculation of Laplace divergence is helpful todistinguish pulmonary nodules from interfering impurities
1198645(119860119894) = mean (119860
119894times La) (16)
(3) Laplace Divergence Distance (LDD) The differencebetween the maximum and minimum values of the Laplacedivergence values is used to describe the range of regionaldivergence
1198646(119860119894) = max (119860
119894times La) minusmin (119860
119894times La) (17)
Table 1 shows the feature set of 42 features based on theabove feature description of ROI To facilitate subsequenttests features are numbered in the order as showed inTable 1 that is the shape features are numbered fs1ndashfs18 theintensity features are numbered fi1ndashfi8 and texture featuresare numbered ft1ndashft16 respectively
22 Rough Set and Attribute Reduction Rough set theory(RST) proposed by Pawlak in 1982 is one of the effectivemathematical tools for processing fuzzy and uncertaintyknowledge Nowadays RST has been applied to a varietyof fields such as artificial intelligence data mining patternrecognition and knowledge discovery Rough set is foundedon the assumption that with every object of the universeof discourse some knowledge is associated Objects char-acterized by the same information are similar in view ofthe available information about them The indiscernibilityrelation generated in this way is the mathematical basis ofrough set theory Any set of all indiscernible objects are calledan elementary set and form a basic granule of knowledgeabout the universe Any union of some elementary sets isreferred to as a crisp set otherwise the set is rough set
Definition 1 An information system 119878 is a quadruple 119878 =(119880 119860 119881 119891) where 119880 is a nonempty and finite set of objects119860 is a nonempty and finite set of attributes 119881 fl ⋃119881
119886with
119881119886being the domain of attribute 119886 and 119891 is an information
6 BioMed Research International
Table 1 ROI feature set
Feature type Feature vectors Dimensionality
Shapefeatures (fs)
Perimeter area volumeroundness rectangularity lengthEulerrsquos number ESV SCDSTDERCLD Hu moment
18
Intensityfeatures (fi)
Mean intensity intensitystandard variancemaximum-minimum intensitydifference value of varianceskewness kurtosis intensitygradient (from inside to outside)LDM LDD
8
Texturefeatures (ft)
Tamura texture features(contrast direction roughness)GLCM (angular second momentmoment of inertia torque deficitsum mean variance sumvariance difference varianceentropy sum entropy differentialentropy information measurecorrelation coefficient maximumcorrelation coefficient)
16
function such that119891(119909 119886) isin 119881119886for every 119909 isin 119880 and every 119886 isin
119860 A decision system is an information system (119880 119862cup119863119881 119891)with119862cap119863 = B where119862 and119863 are called the conditional anddecision attribute sets respectively
For a subset 119875 of 119860 let us define the correspondingequivalence relation as
IND (119875) = (119909 119910) isin 119880 times 119880 | 119891 (119909 119886)
= 119891 (119910 119886) for any 119886 isin 119875(18)
and denote the equivalence class of IND(119875) which containsthe object 119909 isin 119880 by [119909]
119875 that is
[119909]119875 = 119910 isin 119880 | (119909 119910) isin IND (119875) (19)
The factor set of all equivalence classes of IND(119875) isdenoted by 119880119875 that is 119880119875 = [119909]
119875| 119909 isin 119880
As well known attribute reduction is one of the keyissues in RST It is performed in information systems bymeans of the notion of a reduct based on a specializationof the notion of independence due to Marczewski Up tonow much attention has been paid to this issue and manydifferent methods of attribute reduction have been proposedfor decision systems For example the reduction approachesare respectively based on partition discernibility matrixconditional information entropy positive region and antcolony optimization approach
Definition 2 Let 119878 = (119880 119860 119881 119891) be an information systemand 119875 sube 119860 For a subset119883 of 119880 119877
119875(119883) = 119909 isin 119880 | [119909]
119875sube 119883
and 119877119875(119883) = 119909 isin 119880 | [119909]119875cap 119883 = B are called 119875-lower and
119875-upper approximations of119883 respectively
H1
H2
H
r
r
Figure 2 Optimal hyper plane
Definition 3 Let 119878 = (119880 119860 119881 119891) be an information systemand let 119875 and 119876 be two subsets of 119860 Then POS
119875(119876) =
⋃119883isin119880119876
119877119875(119883) is called 119875-positive region of 119876 where 119877
119875(119883)
is the 119875-lower approximation of 119883
Definition 4 Let 119878 = (119880 119860 119881 119891) be a decision system 119886 isin 119862and 119875 sube 119862 If POS
119862(119876) = POS
119862119886(119876) 119886 is said to be 119863-
dispensable in 119862 otherwise 119886 is said to be 119863-indispensablein119862 The set of all the119863-indispensable attributes is called thecore of 119878 and denoted by Core(119878) Furthermore if POS
119875(119876) =
POS119862(119876) and each of the attributes of 119875 is 119863-indispensable
then 119875 is called a reduct of 119878
23 SVM and Its Optimization SVM is a pattern recognitionmethod developed from statistical learning theory based onthe idea of structural risk minimization principle In the caseof ensuring classification accuracy SVM can improve thegeneralization ability of the learning machine by maximizingthe classification interval The biggest advantage of SVMis that it overcomes the overlearning and high dimensionboth of which lead to computational complexity and localextremum problems A reliable classification model basedon SVM is urgently needed for the study of hospitalizationexpenses of patients with gastric cancer
SVM deals with linearly separable data (Figure 2) theassumption is that there are data sets 119878 = 119909
1 119909
119899 and data
marker 119866 = 1199101 119910
119899 where 119909
119894is the input space vector of
the data sample and 119910119894records the category of the sample
The aim of SVM is to find an optimal hyper plane 119867 toseparate these two samples andmake the largest intervalTheoptimal hyper plane119867 is expressed as
119908119879119909 + 119887 = 0 (20)
where 119908 is the weight vector and 119887 is the thresholdThis problem is transformed into the optimal problem of
119908 and 119887
min119908119887
119903 (119908) =1
2119908
119910119894 ((119908 sdot 119909) + 119887) ge 1 119894 = 1 119899
(21)
BioMed Research International 7
In order to simplify the formula the Lagrange dual isintroduced to meet the requirements of KKT (Karush-Kuhn-Tucker) The objective function is transformed into
min120572
1
2
119899
sum
119894=1
119899
sum
119895=1
119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus
119899
sum
119895=1
120572119895
st119899
sum
119894=1
119910119894120572119894= 0 120572
119894ge 0 119894 = 1 2 119899
(22)
As for the linearly inseparable data the penalty parameter119862 and relaxation variable 120585 are introduced in the constraintcondition thus the generalization ability of SVM is increasedand the function is transformed into
min120572
1
2
119899
sum
119894=1
119899
sum
119895=1
119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus
119899
sum
119895=1
120572119895
st119899
sum
119894=1
119910119894120572119894= 0 0 le 120572
119894le 119862
(23)
where 119862 is the artificial setting parameter According to thepractical experience the bigger 119862 the greater separationinterval At the same time it will increase the risk ofgeneralization
The final classification function is
119891 (119909) = sgn(119899
sum
119894=1
119886lowast
119894119910119894(119909119894sdot 119909)) + 119887
lowast (24)
For nonlinear classification data SVM transforms theminto linearly separable data in a high-dimensional space vianonlinear mapping of kernel function and the optimal hyperplane is found in high-dimensional space The kernel func-tion which meets the mercer kernel condition correspondingto the transvection of a spatial transformation is used torealize the nonlinear transformation of linear classification
The corresponding kernel function is defined as
119870(119909119894 119909) = (120593 (119909
119894) 120593 (119909)) (25)
At this point the final classification function is
119891 (119909) = sgn(119899
sum
119894=1
119886119894119910119894119870(119909119894sdot 119909)) + 119887 (26)
Penalty factor 119862 and parameter 119892 of the kernel functionplay an extremely important role in the performance of SVMclassification In order to obtain the optimal classificationresults grid optimization model is used for optimization inthis paper In grid optimization model the parameters to besearched are expressed in the form of grids in a certain spaceand the optimal parameters are selected by traversing all thegrids Therefore grid optimization model has the advantagesof simplicity convenience good stability and easiness to getthe global optimal solution [24] In the learning process ofSVM 10-fold cross-validation is used to calculate the kernel
function parameters and penalty coefficient with the optimalclassification performance which are then applied to theSVM classifier for recognition and detection of pulmonarynodules Finally sensitivity specificity accuracy and pro-cessing time are used as indexes to evaluate the detection ofrelevant experiments
3 Pulmonary Nodule Detection Model
In this paper CT images of 70 cases of patients with pul-monary nodules are used The images are firstly segmented[7] to three different types of pulmonary nodules (solitarypulmonary nodules or SPN vascular adhesion pulmonarynodules or VAPN and pleural adhesion pulmonary nodulesor PAPN) which are marked by radiologists as well as alarge number of nonnodular areas including blood vesselsbones and alveoli Forty-two feature components charac-terizing ROI are extracted from the 2-dimensional and 3-dimensional perspectives including six new 3-dimensionalfeatures proposed in this paper They are composed of 18shape features 8 intensity features and 16 texture featuresThe extracted feature set (identified as the FS) is discretizedand normalized Feature-level fusion of the improved featuredata is performed for five times using rough set model (sincethe reduction of rough set feature subset is not unique in thispaper the extracted feature sets are reduced for five timesand are identified as RS1 RS2 RS3 RS4 and RS5) Featuresubset RS1 is used for comparative experiment Finally SVMparameters are optimized using grid optimization modeland the improved SVM is used in the following four setsof comparative experiments comparative analysis of theeffectiveness and stability of classification before and afterrough set reduction of features comparative analysis of therecognition performance before and after feature-level fusionbased on rough set or PCA comparative analysis of therecognition performance of our proposed method and othermethods Based on the above views we present a flow chartof pulmonary nodule detection model as shown in Figure 3
4 Results and Discussion
41 Experimental Environments In this paper the hardwareand software environments are as follows
Software Environments Windows 7 OS the Matlab R2014bImageJ 148 u and LibSVM
Hardware Environments Intel Core i5 4670-34GHz 80GBof memory and 500GB hard disk
Experimental Data CT images of 70 cases of patients withpulmonary nodules are collected as experimental sampleswhich are marked by radiologists with a size of 512 times 512and a thickness of 2mm They are composed of 2232 CTimages from 38 cases of patients with solitary pulmonarynodules (SPN) 17 cases of patients with vascular adhesionpulmonary nodules (VAPN) and 15 cases of patients withpleural adhesion pulmonary nodules (PAPN) respectively
8 BioMed Research International
ROI
Pulmonarynodule
Otherareas
Blood vesselsbone alveolar
Gridoptimization
methodSVM
Featureextraction
Featurereduction
Classification result
Feature set
RS1 RS2 RS3 RS4 RS5
Rough set
Image segmentation
Shapefeatures
Perimeter area volume roundrectangular length Eulers number ESV
SCDSTD Hu moment ERCLD
Mean intensity intensity standardvariance maximum-minimum intensitydifference value of variance skewness
kurtosis the intensity gradient (by)outside introversion Laplace divergence
Tamura texture features (contrastorientation degree roughness)
GLCM (angular second moment moment ofinertia the deficit sum mean variance sumvariance difference variance entropy andentropy differential entropy information
measure correlation coefficient maximumcorrelation coefficient)
Intensityfeatures
Texturefeatures
CT
SPN VAPN PAPN
Normalized
Discretization
Figure 3 Flow chart of pulmonary nodule detection model
Figure 4 shows the representatives of each type of pulmonarynodules and the corresponding segmentation results
In this paper 42-dimensional features of 70 markedpulmonary nodular areas and 70 randomly selected nonn-odular areas are extracted Table 2 shows the 42-dimensionalfeature values of the lung nodular and nonnodular areasshape features are identified as the fs intensity features areidentified as the fi and texture features are identified as the ftIn order to intuitively understand the distribution of differentfeature values and the discrimination comparison externalsphere volume (ESV) ratio and the standard deviation of
surface-center distance (SCD) are calculated and plotted asbox diagram as shown in Figure 5
42 Feature-Level Fusion Based on Rough Set In order toavoid the attribute value of small range of values dominatedby that of large range of values and reduce the complexityof the statistical computation process the extracted featuresets are firstly preprocessed by normalizing data with biggerdifference and linearly mapping the data to [0 1] Thepreprocessed feature data are then fused for five times usingrough set model The fusion results are shown in Table 3
BioMed Research International 9
(a) Original image of SPN (b) Segment result of SPN (c) Original image of VAPN
(d) Segment result of VAPN (e) Original image of PAPN (f) Segment result of PAPN
Figure 4 Pulmonary nodule segmentation results
Table 2 Feature values of pulmonary nodular areas and nonnodular areas
Shape features (fs) Intensity features (fi) Texture features (ft)Nodular areas Nonnodular areas Nodular areas Nonnodular areas Nodular areas Nonnodular areas95 78 5906 910987 83104 54016159 128 1406 44872 12041 125216284 178 05956 minus039568 04303 0006706517 0211 27348 18669 07709 0727506961 21587 551865 143481 07169 0986503529 07778 05 1 08059 538940 1 139598 206044 01942 0048703186 10295 729905 3546389 07708 0727300686 10197 08059 5349800042 00458 35042 5097100021 00295 06514 0845300013 00268 00971 0614300005 00011 44033 8218620 1 00691 5006114 9 minus05785 minus0424505356 05571 2307 3223903072 050178801738 0207122
10 BioMed Research International
+0Nodular areas Nonnodular areas
02
04
06
08
1
12
14
(a) Boxplot of ESV
+
+
+
01
015
02
025
03
035
04
045
05
055
06
Nodular areas Nonnodular areas(b) Boxplot of SCDSTD
Figure 5 Pulmonary nodule area and the pulmonary nodules boxplot ldquo+rdquo refers to upper and lower bounders of ESV value and SCDSTDvalue
Table 3 Feature reduction based on rough sets
Featuresubset Reduction results Dimensionality
RS1fs4 fs16 fs17 fs18 fi2 fi4 fi6 fi7fi8 ft2 ft4 ft5 ft6 ft7 ft8 ft9ft10 ft11 ft13 ft14 ft15 ft16
21
RS2fs4 fs9 fs16 fs18 fi1 fi2 fi5 ft2ft5 ft6 ft8 ft9 ft10 ft11 ft12 ft13
ft1517
RS3fs9 fs17 fs18 fi1 fi2 fi5 fi7 fi8ft2 ft6 ft7 ft8 ft9 ft10 ft11 ft12
ft14 ft15 ft1619
RS4fs9 fs16 fs18 fi1 fi2 fi5 fi7 fi8ft5 ft6 ft7 ft8 ft9 ft10 ft11 ft12
ft14 ft15 ft1619
RS5fs9 fs16 fs17 fs18 fi1 fi2 fi4 fi5fi7 fi8 ft2 ft5 ft6 ft7 ft8 ft9 ft10
ft12 ft15 ft1620
43 Pulmonary Nodule Detection withSVM Based on Grid Optimization
431 The Model Effectiveness Experiment Tenfold cross-validation is used to calculate the accuracy sensitivity speci-ficity and processing time of classification before and afterrough set reduction (RS1(70 times 21) obtained from experimentone is used as the data set after reduction) and the recogni-tion performance of classifier is compared before and afterreduction The results are shown in Table 4
Experimental results show that pulmonary noduledetection accuracy is increased significantly after feature-level fusion with a decrease in the missed diagnosis ratereflected by the increased sensitivity and the misdiagnosisrate reflected by the increased specificity The processingtime is also shorter after reduction These results indicatethat the feature-level fusion of the extracted feature set with
42 dimensionalities based on rough set model is effectivewhich not only improves the compactness of the feature set(to eliminate redundancy and low degree of differentiationfeatures component) but also corrects the abnormal data ofthe feature set thereby further improving the performance ofpulmonary nodule detection Table 5 shows the effectivenessof the five rough set reduction subsets
432 The Model Stability Experiment The feature data ofpulmonary nodules are tested with RS1(70lowast21) as the datasetfor classification for five rounds with a different ratio oftraining set over testing set of 5020 4030 3535 3535 or2050 Each round of test is carried out with a randomlyselected ratio of training set over testing set and themeanof 10test results is used as the corresponding accuracy sensitivityspecificity and running time of the model The results areshown in Table 6
The experimental results show that with the decreasein the ratio of training set over testing set the decrease inthe classification accuracy of feature subset after rough setreduction is not obvious whereas that of feature set beforerough set reduction is fluctuating to certain extent (Figure 6is more intuitive)These results indicate that the classificationstability of the feature level fusion model based on rough setis higher and is less susceptible to the interference of sampledata Table 7 shows the stability of 5 groups feature subset afterrough set reduction
433 The Superiority of Feature-Level Fusion Model Basedon Rough Set Since PCA is a well-developed model char-acterized by simple calculation and easy programming ithas become the preferred dimension reduction method formost of the feature-level fusion model in order to analyzecomparatively two types of feature-level fusions In this paperPCA-based feature-level fusion of the extracted feature sets isperformed at the same time and the tenfold cross-validationresults are shown in Table 8 Figure 7 shows the classificationperformance of the two types of feature-level fusion methods
BioMed Research International 11
Table 4 Statistics of effectiveness before and after rough set reduction
Serial number Accuracy () Sensibility () Specificity () Processing time (s)
Before reduction
1 9642 9286 100 106102 9196 8393 100 061703 9554 100 9107 054904 8928 100 7857 056305 9554 9107 100 054706 9821 9643 100 054607 9464 8929 100 054608 9553 9107 100 054609 9196 8393 100 0546010 9732 100 9664 05300
Mean 9464 9286 9643 06051
After reduction (Rs1)
1 100 100 100 093702 100 100 100 043603 100 100 100 038704 100 100 100 042105 100 100 100 042106 100 100 100 039007 100 100 100 040608 9167 100 8333 040609 100 100 100 0374010 100 100 100 03930
Mean 9917 100 9833 04571Increase after reduction 453 714 19 0148
Table 5 Effectiveness of rough set reduction subsets
Subset Averageaccuracy ()
Averagesensitivity ()
Averagespecificity ()
Processingtime (s)
RS1 9917 100 9833 04571RS2 975 9667 9833 04650RS3 9917 100 9833 04656RS4 100 100 100 04731RS5 9833 9833 9833 04850Mean 9883 99 9866 04672
(feature subset RS1 fromTable 3 is used and the running timeis 100 times actual time)
Experimental results show that various performanceindicators of the feature-level fusion model based on roughset are better than those based on PCA indicating that therough set is more suitable than PCA to eliminate redundantinformation
434 Comparison with Other Pulmonary Nodule DetectionMethods Pulmonary nodule detection accuracy and FalsePositives per scan (FPs) are used as the evaluation indexesof pulmonary nodule detection methods to compare andanalyze the method proposed in this paper and other five
9293949596979899
100101
5020 4030 3535 3040 2050
Before reductionAfter reduction
Figure 6 Comparative results of feature subsets before and afterrough set reduction
detectionmethods of pulmonary nodules (the optimal detec-tion accuracy is used for all detection methods) The resultsare shown in Table 9 (Pr private database L LIDC)
Experimental results show that the proposed method issuperior to the other pulmonary nodule detection methodsto a certain extent indicating that this method not onlyimproves the comprehensiveness and accuracy of the featuredescription of ROI by supplementing and improving the
12 BioMed Research International
Table 6 Stability statistics of rough set reduction subsets
Trainingsettesting set Accuracy () Sensitivity () Specificity () Running time (s)
Before fusion
5020 9735 9471 100 048734030 9653 9308 9832 038463535 9583 9239 9779 042543040 9616 9558 9674 035602050 9488 9463 9586 04236Mean 9615 9408 97742 04154
After fusion (Rs1)
5020 9971 9941 100 026844030 9896 9958 9846 025683535 9865 9923 9808 023823040 9837 9860 9814 026462050 9825 9767 9884 02636Mean 9879 9884 9870 02583
Table 7 Classification performance of rough set reduction subset
SubsetAverageaccuracy
()
Averagesensitivity
()
Averagespecificity
()
Runningtime (s)
RS1 9917 100 9833 02583RS2 975 9667 9833 02870RS3 9917 100 9833 02560RS4 100 100 100 02531RS5 9833 9833 9833 02656Mean 98834 99 9866 02620
Table 8 Classification performance of feature reduction based onPCA
Serialnumber
Accuracy()
Sensitivity()
Specificity()
10 timesrunningtime (s)
1 9167 8333 100 099702 9674 9348 100 048303 9674 9348 100 048804 9891 100 9783 049505 9348 8696 100 049506 9674 100 9348 051407 9674 100 9348 051208 9457 8913 100 048909 9783 9565 100 0499010 9565 9348 9783 05180Mean 9591 9355 9826 05490
feature components but also improves the firmness of thefeature set by integrating the concept of feature-level fusionbased on rough set to exclude the redundant features and datawith irregular information thereby improving the overallpulmonary nodule detection performance
Table 9 Comparison of the performance of different lung noduledetection methods
Author Database Nodulenumbers
Accuracy() FPs
Santos et al[11] L 260 884 117
MagalhaesBarros Nettoet al [12]
L 48 9065 0138
Ye et al [13] Pr 220 902 82Tan et al [14] L 172 875 4Cascio et al[16] L 148 97 61
Our method Pr 70 9917 047
Rough set
Accuracy Sensitivity Specificity Running time
PCA
020406080
100120
Figure 7 Comparison of two feature-level fusion models
5 Conclusions
In this paper the research status quo of pulmonary noduledetection methods is analyzed and a pulmonary noduledetection model is proposed based on rough set basedfeature-level fusion To address the issues that the featuredescription is insufficient and the characterization is inac-curate in the process of feature extraction six new 3Dfeatures in combination with other 2D and 3D features are
BioMed Research International 13
proposed to extract and quantify the feature information ofROI in this model A rough set based feature-level fusionis employed to reduce the dimensionality of the feature setssince there is redundant information in the extracted high-dimensional features In addition a grid optimization modelis adopted to optimize the SVM kernel function whichis used as the classifier for detection and recognition ofpulmonary nodule Finally the pulmonary nodule detectionperformance of the proposed method is verified with fourgroups of comparative experimentsThe experimental resultsshow that the proposed pulmonary nodule detectionmethodbased on rough set based feature-level fusion is effective withthe classification accuracy that can basicallymeet the require-ments of medical imaging for the detection of pulmonarynodules and therefore is of great value for the detection ofpulmonary nodules and auxiliary diagnosis of lung cancer
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The work is partially supported by National Natural ScienceFoundation of China (Grant No 61561040) Natural ScienceFoundation of Ningxia (Grant No NZ16067) and ScientificResearch Fund of Ningxia Education Department (Grant NoNGY2016084)
References
[1] S P Riaz M Luchtenborg V H Coupland J Spicer M DPeake and H Moslashller ldquoTrends in incidence of small cell lungcancer and all lung cancerrdquo Lung Cancer vol 75 no 3 pp 280ndash284 2012
[2] J Ferlay I Soerjomataram R Dikshit et al ldquoCancer incidenceand mortality worldwide sources methods and major patternsin GLOBOCAN 2012rdquo International Journal of Cancer vol 136no 5 pp E359ndashE386 2015
[3] D R Baldwin ldquoPrediction of risk of lung cancer in populationsand in pulmonary nodules significant progress to drive changesin paradigmsrdquo Lung Cancer vol 89 no 1 pp 1ndash3 2015
[4] S Diciotti G Picozzi M Falchini MMascalchi N Villari andG Valli ldquo3-D segmentation algorithm of small lung nodules inspiral CT imagesrdquo IEEETransactions on Information Technologyin Biomedicine vol 12 no 1 pp 7ndash19 2008
[5] S Theodoridis and K Koutroumbas Pattern Recognition TheAssociated Press 4th edition 2010
[6] B Sahiner H-P Chan L M Hadjiiski et al ldquoEffect of CADon radiologistsrsquo detection of lung nodules on thoracic CT scansanalysis of an observer performance study by nodule sizerdquoAcademic Radiology vol 16 no 12 pp 1518ndash1530 2009
[7] Y Xia S Lu LWen S EberlM Fulham andDD Feng ldquoAuto-mated identification of dementia using FDG-PET imagingrdquoBioMed Research International vol 2014 Article ID 421743 8pages 2014
[8] Y Xia Z Ji and Y Zhang ldquoBrain MRI image segmentationbased on learning local variational Gaussian mixture modelsrdquoNeurocomputing vol 204 pp 189ndash197 2016
[9] M Xian Y Zhang and H D Cheng ldquoFully automatic segmen-tation of breast ultrasound images based on breast characteris-tics in space and frequency domainsrdquo Pattern Recognition vol48 no 2 pp 485ndash497 2015
[10] M Xian Y Zhang H-D Cheng F Xu and J Ding ldquoNeutro-connectedness cutrdquo IEEE Transactions on Image Processing vol25 no 10 pp 4691ndash4703 2016
[11] A M Santos A O de Carvalho Filho A C Silva et alldquoAutomatic detection of small lung nodules in 3D CT datausing Gaussian mixture models Tsallis entropy and SVMrdquoEngineering Applications of Artificial Intelligence vol 36 pp 27ndash39 2014
[12] S Magalhaes Barros Netto A Corrca Silva R AcatauassuNunes and M Gattass ldquoAutomatic segmentation of lungnodules with growing neural gas and support vector machinerdquoComputers in Biology andMedicine vol 42 no 11 pp 1110ndash11212012
[13] X Ye X Lin J Dehmeshki G Slabaugh and G BeddoeldquoShape-based computer-aided detection of lung nodules in tho-racic CT imagesrdquo IEEE Transactions on Biomedical Engineeringvol 56 no 7 pp 1810ndash1820 2009
[14] M Tan R Deklerck B Jansen M Bister and J Cornelis ldquoAnovel computer-aided lung nodule detection system for CTimagesrdquoMedical Physics vol 38 no 10 pp 5630ndash5645 2011
[15] Q Li F Li and K Doi ldquoComputerized Detection of LungNodules in Thin-Section CT Images by Use of SelectiveEnhancement Filters and an Automated Rule-Based ClassifierrdquoAcademic Radiology vol 15 no 2 pp 165ndash175 2008
[16] D Cascio R Magro F Fauci M Iacomi and G RasoldquoAutomatic detection of lung nodules in CT datasets basedon stable 3D mass-spring modelsrdquo Computers in Biology andMedicine vol 42 no 11 pp 1098ndash1109 2012
[17] G Y Wang Y Y Yao and H Yu ldquoA survey on rough set theoryand applicationsrdquo Chinese Journal of Computers vol 32 no 7pp 1229ndash1246 2009
[18] S Udhaya Kumar and H Hannah Inbarani ldquoA novel neigh-borhood rough set based classification approach for medicaldiagnosisrdquo Procedia Computer Science vol 47 pp 351ndash359 2015
[19] H H Inbarani A T Azar and G Jothi ldquoSupervised hybridfeature selection based on PSO and rough sets for medicaldiagnosisrdquo Computer Methods and Programs in Biomedicinevol 113 no 1 pp 175ndash185 2014
[20] S K Shah M F McNitt-Gray S R Rogers et al ldquoComputer-aided diagnosis of the solitary pulmonary nodulerdquo AcademicRadiology vol 12 no 5 pp 570ndash575 2005
[21] A Hassanien ldquoFuzzy rough sets hybrid scheme for breastcancer detectionrdquo Image and Vision Computing vol 25 no 2pp 172ndash183 2007
[22] Z Guo Y Li Y Wang S Liu T Lei and Y Fan ldquoA method ofeffective text extraction for complex video scenerdquoMathematicalProblems in Engineering vol 2016 Article ID 2187647 11 pages2016
[23] M-K Hu ldquoVisual pattern recognition by moment invariantsrdquoIRE Transactions on Information Theory vol 8 no 2 pp 179ndash187 1962
[24] J Chorowski J Wang and J M Zurada ldquoReview and per-formance comparison of SVM- and ELM-based classifiersrdquoNeurocomputing vol 128 pp 507ndash516 2014
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
BioMed Research International 5
When a scaling normalization is applied the centralmoments change as
120578119901119902=
120583119901119902
120583120574
00
120574 = ((119901 + 119902)
2) + 1 (12)
In terms of the central moments the seven moments aregiven as
1198621= 12057820+ 12057802
1198622= (12057820minus 12057802)2+ 41205782
11
1198623= (12057830minus 312057812)2+ (312057821minus 12057803)2
1198624= (12057830+ 12057812)2+ (12057803+ 12057821)2
1198625= (12057830minus 312057812) (12057830+ 12057812)
sdot [(12057830+ 12057812)2minus 3 (120578
03+ 12057821)2] + (3120578
21minus 12057803)
sdot (12057821+ 12057803) [3 (120578
30+ 12057812)2minus (12057803+ 12057821)2]
1198626= (12057820minus 12057802) [(12057830+ 12057812)2minus (12057803+ 12057821)2]
+ 412057811(12057830+ 12057812) (12057821+ 12057803)
1198627= (312057821minus 12057803) (12057830+ 12057812)
sdot [(12057830+ 12057812)2minus 3 (120578
03+ 12057821)2] + (3120578
12minus 12057830)
sdot (12057821+ 12057803) [3 (120578
30+ 12057812)2minus (12057803+ 12057821)2]
(13)
Hu 7-moment invariants varywidely in order to compareusing logarithmic function to compress data and hence theactual invariants moment features are 1198621015840
119870
119862119870=10038161003816100381610038161003816log10
100381610038161003816100381610038161198621015840
119870
10038161003816100381610038161003816
10038161003816100381610038161003816119870 = 1 2 7 (14)
The amended moment invariant features possess transla-tion invariance rotational invariance and scale invariance
213 TextureCharacteristics Tamura texture features Tamuratexture based on human visual perception in psychologicalresearch are proposed by Tamura in 1978 Six components ofTamura texture feature correspond with 6 properties in psy-chology three of themare coarseness contrast anddirection-ality which have the good application value in the texturesynthesis image recognition and so on
Texture is the gray distribution which appears repeatedlyin the space position so there are some relationships betweentwo pixels at some distance from each other in image spacecalled gray spatial correlation properties in gray imageGLCM is a common method by studying the relevant rela-tionship of gray image
214 Intensity Features Gray statistical feature is a quantita-tivemethod to describe the basic features of two-dimensionalimage region it is called intensity feature from three-dimen-sional perspective [16] In this paper the extracted com-ponents of intensity features include the mean intensity
intensity variance maximum and minimum intensity dif-ference skewness kurtosis intensity gradient (from insideto outside) Laplace Divergence Mean (LDM) and LaplaceDivergence Distance (LDD)
(1) Intensity Gradient (from Inside to Outside) For ROI 119860119894
with the voxel 119878119894volume greater than 0 morphological
erosion processing is performed continuously and the ratio ofthe mean of the excluded area of each erosion processing tothe mean of the last operation (initial value is 0) is calculateduntil the ratio is zero Consider the following equation where119899 is the number of operations
1198644(119860119894) =
119870
119899 (15)
(2) Laplace Divergence Mean (LDM) According to theLaplacian convolution results with the original CT imageit is found that the nodule surrounding area with smallergray value difference has a significant different divergenceTherefore calculation of Laplace divergence is helpful todistinguish pulmonary nodules from interfering impurities
1198645(119860119894) = mean (119860
119894times La) (16)
(3) Laplace Divergence Distance (LDD) The differencebetween the maximum and minimum values of the Laplacedivergence values is used to describe the range of regionaldivergence
1198646(119860119894) = max (119860
119894times La) minusmin (119860
119894times La) (17)
Table 1 shows the feature set of 42 features based on theabove feature description of ROI To facilitate subsequenttests features are numbered in the order as showed inTable 1 that is the shape features are numbered fs1ndashfs18 theintensity features are numbered fi1ndashfi8 and texture featuresare numbered ft1ndashft16 respectively
22 Rough Set and Attribute Reduction Rough set theory(RST) proposed by Pawlak in 1982 is one of the effectivemathematical tools for processing fuzzy and uncertaintyknowledge Nowadays RST has been applied to a varietyof fields such as artificial intelligence data mining patternrecognition and knowledge discovery Rough set is foundedon the assumption that with every object of the universeof discourse some knowledge is associated Objects char-acterized by the same information are similar in view ofthe available information about them The indiscernibilityrelation generated in this way is the mathematical basis ofrough set theory Any set of all indiscernible objects are calledan elementary set and form a basic granule of knowledgeabout the universe Any union of some elementary sets isreferred to as a crisp set otherwise the set is rough set
Definition 1 An information system 119878 is a quadruple 119878 =(119880 119860 119881 119891) where 119880 is a nonempty and finite set of objects119860 is a nonempty and finite set of attributes 119881 fl ⋃119881
119886with
119881119886being the domain of attribute 119886 and 119891 is an information
6 BioMed Research International
Table 1 ROI feature set
Feature type Feature vectors Dimensionality
Shapefeatures (fs)
Perimeter area volumeroundness rectangularity lengthEulerrsquos number ESV SCDSTDERCLD Hu moment
18
Intensityfeatures (fi)
Mean intensity intensitystandard variancemaximum-minimum intensitydifference value of varianceskewness kurtosis intensitygradient (from inside to outside)LDM LDD
8
Texturefeatures (ft)
Tamura texture features(contrast direction roughness)GLCM (angular second momentmoment of inertia torque deficitsum mean variance sumvariance difference varianceentropy sum entropy differentialentropy information measurecorrelation coefficient maximumcorrelation coefficient)
16
function such that119891(119909 119886) isin 119881119886for every 119909 isin 119880 and every 119886 isin
119860 A decision system is an information system (119880 119862cup119863119881 119891)with119862cap119863 = B where119862 and119863 are called the conditional anddecision attribute sets respectively
For a subset 119875 of 119860 let us define the correspondingequivalence relation as
IND (119875) = (119909 119910) isin 119880 times 119880 | 119891 (119909 119886)
= 119891 (119910 119886) for any 119886 isin 119875(18)
and denote the equivalence class of IND(119875) which containsthe object 119909 isin 119880 by [119909]
119875 that is
[119909]119875 = 119910 isin 119880 | (119909 119910) isin IND (119875) (19)
The factor set of all equivalence classes of IND(119875) isdenoted by 119880119875 that is 119880119875 = [119909]
119875| 119909 isin 119880
As well known attribute reduction is one of the keyissues in RST It is performed in information systems bymeans of the notion of a reduct based on a specializationof the notion of independence due to Marczewski Up tonow much attention has been paid to this issue and manydifferent methods of attribute reduction have been proposedfor decision systems For example the reduction approachesare respectively based on partition discernibility matrixconditional information entropy positive region and antcolony optimization approach
Definition 2 Let 119878 = (119880 119860 119881 119891) be an information systemand 119875 sube 119860 For a subset119883 of 119880 119877
119875(119883) = 119909 isin 119880 | [119909]
119875sube 119883
and 119877119875(119883) = 119909 isin 119880 | [119909]119875cap 119883 = B are called 119875-lower and
119875-upper approximations of119883 respectively
H1
H2
H
r
r
Figure 2 Optimal hyper plane
Definition 3 Let 119878 = (119880 119860 119881 119891) be an information systemand let 119875 and 119876 be two subsets of 119860 Then POS
119875(119876) =
⋃119883isin119880119876
119877119875(119883) is called 119875-positive region of 119876 where 119877
119875(119883)
is the 119875-lower approximation of 119883
Definition 4 Let 119878 = (119880 119860 119881 119891) be a decision system 119886 isin 119862and 119875 sube 119862 If POS
119862(119876) = POS
119862119886(119876) 119886 is said to be 119863-
dispensable in 119862 otherwise 119886 is said to be 119863-indispensablein119862 The set of all the119863-indispensable attributes is called thecore of 119878 and denoted by Core(119878) Furthermore if POS
119875(119876) =
POS119862(119876) and each of the attributes of 119875 is 119863-indispensable
then 119875 is called a reduct of 119878
23 SVM and Its Optimization SVM is a pattern recognitionmethod developed from statistical learning theory based onthe idea of structural risk minimization principle In the caseof ensuring classification accuracy SVM can improve thegeneralization ability of the learning machine by maximizingthe classification interval The biggest advantage of SVMis that it overcomes the overlearning and high dimensionboth of which lead to computational complexity and localextremum problems A reliable classification model basedon SVM is urgently needed for the study of hospitalizationexpenses of patients with gastric cancer
SVM deals with linearly separable data (Figure 2) theassumption is that there are data sets 119878 = 119909
1 119909
119899 and data
marker 119866 = 1199101 119910
119899 where 119909
119894is the input space vector of
the data sample and 119910119894records the category of the sample
The aim of SVM is to find an optimal hyper plane 119867 toseparate these two samples andmake the largest intervalTheoptimal hyper plane119867 is expressed as
119908119879119909 + 119887 = 0 (20)
where 119908 is the weight vector and 119887 is the thresholdThis problem is transformed into the optimal problem of
119908 and 119887
min119908119887
119903 (119908) =1
2119908
119910119894 ((119908 sdot 119909) + 119887) ge 1 119894 = 1 119899
(21)
BioMed Research International 7
In order to simplify the formula the Lagrange dual isintroduced to meet the requirements of KKT (Karush-Kuhn-Tucker) The objective function is transformed into
min120572
1
2
119899
sum
119894=1
119899
sum
119895=1
119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus
119899
sum
119895=1
120572119895
st119899
sum
119894=1
119910119894120572119894= 0 120572
119894ge 0 119894 = 1 2 119899
(22)
As for the linearly inseparable data the penalty parameter119862 and relaxation variable 120585 are introduced in the constraintcondition thus the generalization ability of SVM is increasedand the function is transformed into
min120572
1
2
119899
sum
119894=1
119899
sum
119895=1
119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus
119899
sum
119895=1
120572119895
st119899
sum
119894=1
119910119894120572119894= 0 0 le 120572
119894le 119862
(23)
where 119862 is the artificial setting parameter According to thepractical experience the bigger 119862 the greater separationinterval At the same time it will increase the risk ofgeneralization
The final classification function is
119891 (119909) = sgn(119899
sum
119894=1
119886lowast
119894119910119894(119909119894sdot 119909)) + 119887
lowast (24)
For nonlinear classification data SVM transforms theminto linearly separable data in a high-dimensional space vianonlinear mapping of kernel function and the optimal hyperplane is found in high-dimensional space The kernel func-tion which meets the mercer kernel condition correspondingto the transvection of a spatial transformation is used torealize the nonlinear transformation of linear classification
The corresponding kernel function is defined as
119870(119909119894 119909) = (120593 (119909
119894) 120593 (119909)) (25)
At this point the final classification function is
119891 (119909) = sgn(119899
sum
119894=1
119886119894119910119894119870(119909119894sdot 119909)) + 119887 (26)
Penalty factor 119862 and parameter 119892 of the kernel functionplay an extremely important role in the performance of SVMclassification In order to obtain the optimal classificationresults grid optimization model is used for optimization inthis paper In grid optimization model the parameters to besearched are expressed in the form of grids in a certain spaceand the optimal parameters are selected by traversing all thegrids Therefore grid optimization model has the advantagesof simplicity convenience good stability and easiness to getthe global optimal solution [24] In the learning process ofSVM 10-fold cross-validation is used to calculate the kernel
function parameters and penalty coefficient with the optimalclassification performance which are then applied to theSVM classifier for recognition and detection of pulmonarynodules Finally sensitivity specificity accuracy and pro-cessing time are used as indexes to evaluate the detection ofrelevant experiments
3 Pulmonary Nodule Detection Model
In this paper CT images of 70 cases of patients with pul-monary nodules are used The images are firstly segmented[7] to three different types of pulmonary nodules (solitarypulmonary nodules or SPN vascular adhesion pulmonarynodules or VAPN and pleural adhesion pulmonary nodulesor PAPN) which are marked by radiologists as well as alarge number of nonnodular areas including blood vesselsbones and alveoli Forty-two feature components charac-terizing ROI are extracted from the 2-dimensional and 3-dimensional perspectives including six new 3-dimensionalfeatures proposed in this paper They are composed of 18shape features 8 intensity features and 16 texture featuresThe extracted feature set (identified as the FS) is discretizedand normalized Feature-level fusion of the improved featuredata is performed for five times using rough set model (sincethe reduction of rough set feature subset is not unique in thispaper the extracted feature sets are reduced for five timesand are identified as RS1 RS2 RS3 RS4 and RS5) Featuresubset RS1 is used for comparative experiment Finally SVMparameters are optimized using grid optimization modeland the improved SVM is used in the following four setsof comparative experiments comparative analysis of theeffectiveness and stability of classification before and afterrough set reduction of features comparative analysis of therecognition performance before and after feature-level fusionbased on rough set or PCA comparative analysis of therecognition performance of our proposed method and othermethods Based on the above views we present a flow chartof pulmonary nodule detection model as shown in Figure 3
4 Results and Discussion
41 Experimental Environments In this paper the hardwareand software environments are as follows
Software Environments Windows 7 OS the Matlab R2014bImageJ 148 u and LibSVM
Hardware Environments Intel Core i5 4670-34GHz 80GBof memory and 500GB hard disk
Experimental Data CT images of 70 cases of patients withpulmonary nodules are collected as experimental sampleswhich are marked by radiologists with a size of 512 times 512and a thickness of 2mm They are composed of 2232 CTimages from 38 cases of patients with solitary pulmonarynodules (SPN) 17 cases of patients with vascular adhesionpulmonary nodules (VAPN) and 15 cases of patients withpleural adhesion pulmonary nodules (PAPN) respectively
8 BioMed Research International
ROI
Pulmonarynodule
Otherareas
Blood vesselsbone alveolar
Gridoptimization
methodSVM
Featureextraction
Featurereduction
Classification result
Feature set
RS1 RS2 RS3 RS4 RS5
Rough set
Image segmentation
Shapefeatures
Perimeter area volume roundrectangular length Eulers number ESV
SCDSTD Hu moment ERCLD
Mean intensity intensity standardvariance maximum-minimum intensitydifference value of variance skewness
kurtosis the intensity gradient (by)outside introversion Laplace divergence
Tamura texture features (contrastorientation degree roughness)
GLCM (angular second moment moment ofinertia the deficit sum mean variance sumvariance difference variance entropy andentropy differential entropy information
measure correlation coefficient maximumcorrelation coefficient)
Intensityfeatures
Texturefeatures
CT
SPN VAPN PAPN
Normalized
Discretization
Figure 3 Flow chart of pulmonary nodule detection model
Figure 4 shows the representatives of each type of pulmonarynodules and the corresponding segmentation results
In this paper 42-dimensional features of 70 markedpulmonary nodular areas and 70 randomly selected nonn-odular areas are extracted Table 2 shows the 42-dimensionalfeature values of the lung nodular and nonnodular areasshape features are identified as the fs intensity features areidentified as the fi and texture features are identified as the ftIn order to intuitively understand the distribution of differentfeature values and the discrimination comparison externalsphere volume (ESV) ratio and the standard deviation of
surface-center distance (SCD) are calculated and plotted asbox diagram as shown in Figure 5
42 Feature-Level Fusion Based on Rough Set In order toavoid the attribute value of small range of values dominatedby that of large range of values and reduce the complexityof the statistical computation process the extracted featuresets are firstly preprocessed by normalizing data with biggerdifference and linearly mapping the data to [0 1] Thepreprocessed feature data are then fused for five times usingrough set model The fusion results are shown in Table 3
BioMed Research International 9
(a) Original image of SPN (b) Segment result of SPN (c) Original image of VAPN
(d) Segment result of VAPN (e) Original image of PAPN (f) Segment result of PAPN
Figure 4 Pulmonary nodule segmentation results
Table 2 Feature values of pulmonary nodular areas and nonnodular areas
Shape features (fs) Intensity features (fi) Texture features (ft)Nodular areas Nonnodular areas Nodular areas Nonnodular areas Nodular areas Nonnodular areas95 78 5906 910987 83104 54016159 128 1406 44872 12041 125216284 178 05956 minus039568 04303 0006706517 0211 27348 18669 07709 0727506961 21587 551865 143481 07169 0986503529 07778 05 1 08059 538940 1 139598 206044 01942 0048703186 10295 729905 3546389 07708 0727300686 10197 08059 5349800042 00458 35042 5097100021 00295 06514 0845300013 00268 00971 0614300005 00011 44033 8218620 1 00691 5006114 9 minus05785 minus0424505356 05571 2307 3223903072 050178801738 0207122
10 BioMed Research International
+0Nodular areas Nonnodular areas
02
04
06
08
1
12
14
(a) Boxplot of ESV
+
+
+
01
015
02
025
03
035
04
045
05
055
06
Nodular areas Nonnodular areas(b) Boxplot of SCDSTD
Figure 5 Pulmonary nodule area and the pulmonary nodules boxplot ldquo+rdquo refers to upper and lower bounders of ESV value and SCDSTDvalue
Table 3 Feature reduction based on rough sets
Featuresubset Reduction results Dimensionality
RS1fs4 fs16 fs17 fs18 fi2 fi4 fi6 fi7fi8 ft2 ft4 ft5 ft6 ft7 ft8 ft9ft10 ft11 ft13 ft14 ft15 ft16
21
RS2fs4 fs9 fs16 fs18 fi1 fi2 fi5 ft2ft5 ft6 ft8 ft9 ft10 ft11 ft12 ft13
ft1517
RS3fs9 fs17 fs18 fi1 fi2 fi5 fi7 fi8ft2 ft6 ft7 ft8 ft9 ft10 ft11 ft12
ft14 ft15 ft1619
RS4fs9 fs16 fs18 fi1 fi2 fi5 fi7 fi8ft5 ft6 ft7 ft8 ft9 ft10 ft11 ft12
ft14 ft15 ft1619
RS5fs9 fs16 fs17 fs18 fi1 fi2 fi4 fi5fi7 fi8 ft2 ft5 ft6 ft7 ft8 ft9 ft10
ft12 ft15 ft1620
43 Pulmonary Nodule Detection withSVM Based on Grid Optimization
431 The Model Effectiveness Experiment Tenfold cross-validation is used to calculate the accuracy sensitivity speci-ficity and processing time of classification before and afterrough set reduction (RS1(70 times 21) obtained from experimentone is used as the data set after reduction) and the recogni-tion performance of classifier is compared before and afterreduction The results are shown in Table 4
Experimental results show that pulmonary noduledetection accuracy is increased significantly after feature-level fusion with a decrease in the missed diagnosis ratereflected by the increased sensitivity and the misdiagnosisrate reflected by the increased specificity The processingtime is also shorter after reduction These results indicatethat the feature-level fusion of the extracted feature set with
42 dimensionalities based on rough set model is effectivewhich not only improves the compactness of the feature set(to eliminate redundancy and low degree of differentiationfeatures component) but also corrects the abnormal data ofthe feature set thereby further improving the performance ofpulmonary nodule detection Table 5 shows the effectivenessof the five rough set reduction subsets
432 The Model Stability Experiment The feature data ofpulmonary nodules are tested with RS1(70lowast21) as the datasetfor classification for five rounds with a different ratio oftraining set over testing set of 5020 4030 3535 3535 or2050 Each round of test is carried out with a randomlyselected ratio of training set over testing set and themeanof 10test results is used as the corresponding accuracy sensitivityspecificity and running time of the model The results areshown in Table 6
The experimental results show that with the decreasein the ratio of training set over testing set the decrease inthe classification accuracy of feature subset after rough setreduction is not obvious whereas that of feature set beforerough set reduction is fluctuating to certain extent (Figure 6is more intuitive)These results indicate that the classificationstability of the feature level fusion model based on rough setis higher and is less susceptible to the interference of sampledata Table 7 shows the stability of 5 groups feature subset afterrough set reduction
433 The Superiority of Feature-Level Fusion Model Basedon Rough Set Since PCA is a well-developed model char-acterized by simple calculation and easy programming ithas become the preferred dimension reduction method formost of the feature-level fusion model in order to analyzecomparatively two types of feature-level fusions In this paperPCA-based feature-level fusion of the extracted feature sets isperformed at the same time and the tenfold cross-validationresults are shown in Table 8 Figure 7 shows the classificationperformance of the two types of feature-level fusion methods
BioMed Research International 11
Table 4 Statistics of effectiveness before and after rough set reduction
Serial number Accuracy () Sensibility () Specificity () Processing time (s)
Before reduction
1 9642 9286 100 106102 9196 8393 100 061703 9554 100 9107 054904 8928 100 7857 056305 9554 9107 100 054706 9821 9643 100 054607 9464 8929 100 054608 9553 9107 100 054609 9196 8393 100 0546010 9732 100 9664 05300
Mean 9464 9286 9643 06051
After reduction (Rs1)
1 100 100 100 093702 100 100 100 043603 100 100 100 038704 100 100 100 042105 100 100 100 042106 100 100 100 039007 100 100 100 040608 9167 100 8333 040609 100 100 100 0374010 100 100 100 03930
Mean 9917 100 9833 04571Increase after reduction 453 714 19 0148
Table 5 Effectiveness of rough set reduction subsets
Subset Averageaccuracy ()
Averagesensitivity ()
Averagespecificity ()
Processingtime (s)
RS1 9917 100 9833 04571RS2 975 9667 9833 04650RS3 9917 100 9833 04656RS4 100 100 100 04731RS5 9833 9833 9833 04850Mean 9883 99 9866 04672
(feature subset RS1 fromTable 3 is used and the running timeis 100 times actual time)
Experimental results show that various performanceindicators of the feature-level fusion model based on roughset are better than those based on PCA indicating that therough set is more suitable than PCA to eliminate redundantinformation
434 Comparison with Other Pulmonary Nodule DetectionMethods Pulmonary nodule detection accuracy and FalsePositives per scan (FPs) are used as the evaluation indexesof pulmonary nodule detection methods to compare andanalyze the method proposed in this paper and other five
9293949596979899
100101
5020 4030 3535 3040 2050
Before reductionAfter reduction
Figure 6 Comparative results of feature subsets before and afterrough set reduction
detectionmethods of pulmonary nodules (the optimal detec-tion accuracy is used for all detection methods) The resultsare shown in Table 9 (Pr private database L LIDC)
Experimental results show that the proposed method issuperior to the other pulmonary nodule detection methodsto a certain extent indicating that this method not onlyimproves the comprehensiveness and accuracy of the featuredescription of ROI by supplementing and improving the
12 BioMed Research International
Table 6 Stability statistics of rough set reduction subsets
Trainingsettesting set Accuracy () Sensitivity () Specificity () Running time (s)
Before fusion
5020 9735 9471 100 048734030 9653 9308 9832 038463535 9583 9239 9779 042543040 9616 9558 9674 035602050 9488 9463 9586 04236Mean 9615 9408 97742 04154
After fusion (Rs1)
5020 9971 9941 100 026844030 9896 9958 9846 025683535 9865 9923 9808 023823040 9837 9860 9814 026462050 9825 9767 9884 02636Mean 9879 9884 9870 02583
Table 7 Classification performance of rough set reduction subset
SubsetAverageaccuracy
()
Averagesensitivity
()
Averagespecificity
()
Runningtime (s)
RS1 9917 100 9833 02583RS2 975 9667 9833 02870RS3 9917 100 9833 02560RS4 100 100 100 02531RS5 9833 9833 9833 02656Mean 98834 99 9866 02620
Table 8 Classification performance of feature reduction based onPCA
Serialnumber
Accuracy()
Sensitivity()
Specificity()
10 timesrunningtime (s)
1 9167 8333 100 099702 9674 9348 100 048303 9674 9348 100 048804 9891 100 9783 049505 9348 8696 100 049506 9674 100 9348 051407 9674 100 9348 051208 9457 8913 100 048909 9783 9565 100 0499010 9565 9348 9783 05180Mean 9591 9355 9826 05490
feature components but also improves the firmness of thefeature set by integrating the concept of feature-level fusionbased on rough set to exclude the redundant features and datawith irregular information thereby improving the overallpulmonary nodule detection performance
Table 9 Comparison of the performance of different lung noduledetection methods
Author Database Nodulenumbers
Accuracy() FPs
Santos et al[11] L 260 884 117
MagalhaesBarros Nettoet al [12]
L 48 9065 0138
Ye et al [13] Pr 220 902 82Tan et al [14] L 172 875 4Cascio et al[16] L 148 97 61
Our method Pr 70 9917 047
Rough set
Accuracy Sensitivity Specificity Running time
PCA
020406080
100120
Figure 7 Comparison of two feature-level fusion models
5 Conclusions
In this paper the research status quo of pulmonary noduledetection methods is analyzed and a pulmonary noduledetection model is proposed based on rough set basedfeature-level fusion To address the issues that the featuredescription is insufficient and the characterization is inac-curate in the process of feature extraction six new 3Dfeatures in combination with other 2D and 3D features are
BioMed Research International 13
proposed to extract and quantify the feature information ofROI in this model A rough set based feature-level fusionis employed to reduce the dimensionality of the feature setssince there is redundant information in the extracted high-dimensional features In addition a grid optimization modelis adopted to optimize the SVM kernel function whichis used as the classifier for detection and recognition ofpulmonary nodule Finally the pulmonary nodule detectionperformance of the proposed method is verified with fourgroups of comparative experimentsThe experimental resultsshow that the proposed pulmonary nodule detectionmethodbased on rough set based feature-level fusion is effective withthe classification accuracy that can basicallymeet the require-ments of medical imaging for the detection of pulmonarynodules and therefore is of great value for the detection ofpulmonary nodules and auxiliary diagnosis of lung cancer
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The work is partially supported by National Natural ScienceFoundation of China (Grant No 61561040) Natural ScienceFoundation of Ningxia (Grant No NZ16067) and ScientificResearch Fund of Ningxia Education Department (Grant NoNGY2016084)
References
[1] S P Riaz M Luchtenborg V H Coupland J Spicer M DPeake and H Moslashller ldquoTrends in incidence of small cell lungcancer and all lung cancerrdquo Lung Cancer vol 75 no 3 pp 280ndash284 2012
[2] J Ferlay I Soerjomataram R Dikshit et al ldquoCancer incidenceand mortality worldwide sources methods and major patternsin GLOBOCAN 2012rdquo International Journal of Cancer vol 136no 5 pp E359ndashE386 2015
[3] D R Baldwin ldquoPrediction of risk of lung cancer in populationsand in pulmonary nodules significant progress to drive changesin paradigmsrdquo Lung Cancer vol 89 no 1 pp 1ndash3 2015
[4] S Diciotti G Picozzi M Falchini MMascalchi N Villari andG Valli ldquo3-D segmentation algorithm of small lung nodules inspiral CT imagesrdquo IEEETransactions on Information Technologyin Biomedicine vol 12 no 1 pp 7ndash19 2008
[5] S Theodoridis and K Koutroumbas Pattern Recognition TheAssociated Press 4th edition 2010
[6] B Sahiner H-P Chan L M Hadjiiski et al ldquoEffect of CADon radiologistsrsquo detection of lung nodules on thoracic CT scansanalysis of an observer performance study by nodule sizerdquoAcademic Radiology vol 16 no 12 pp 1518ndash1530 2009
[7] Y Xia S Lu LWen S EberlM Fulham andDD Feng ldquoAuto-mated identification of dementia using FDG-PET imagingrdquoBioMed Research International vol 2014 Article ID 421743 8pages 2014
[8] Y Xia Z Ji and Y Zhang ldquoBrain MRI image segmentationbased on learning local variational Gaussian mixture modelsrdquoNeurocomputing vol 204 pp 189ndash197 2016
[9] M Xian Y Zhang and H D Cheng ldquoFully automatic segmen-tation of breast ultrasound images based on breast characteris-tics in space and frequency domainsrdquo Pattern Recognition vol48 no 2 pp 485ndash497 2015
[10] M Xian Y Zhang H-D Cheng F Xu and J Ding ldquoNeutro-connectedness cutrdquo IEEE Transactions on Image Processing vol25 no 10 pp 4691ndash4703 2016
[11] A M Santos A O de Carvalho Filho A C Silva et alldquoAutomatic detection of small lung nodules in 3D CT datausing Gaussian mixture models Tsallis entropy and SVMrdquoEngineering Applications of Artificial Intelligence vol 36 pp 27ndash39 2014
[12] S Magalhaes Barros Netto A Corrca Silva R AcatauassuNunes and M Gattass ldquoAutomatic segmentation of lungnodules with growing neural gas and support vector machinerdquoComputers in Biology andMedicine vol 42 no 11 pp 1110ndash11212012
[13] X Ye X Lin J Dehmeshki G Slabaugh and G BeddoeldquoShape-based computer-aided detection of lung nodules in tho-racic CT imagesrdquo IEEE Transactions on Biomedical Engineeringvol 56 no 7 pp 1810ndash1820 2009
[14] M Tan R Deklerck B Jansen M Bister and J Cornelis ldquoAnovel computer-aided lung nodule detection system for CTimagesrdquoMedical Physics vol 38 no 10 pp 5630ndash5645 2011
[15] Q Li F Li and K Doi ldquoComputerized Detection of LungNodules in Thin-Section CT Images by Use of SelectiveEnhancement Filters and an Automated Rule-Based ClassifierrdquoAcademic Radiology vol 15 no 2 pp 165ndash175 2008
[16] D Cascio R Magro F Fauci M Iacomi and G RasoldquoAutomatic detection of lung nodules in CT datasets basedon stable 3D mass-spring modelsrdquo Computers in Biology andMedicine vol 42 no 11 pp 1098ndash1109 2012
[17] G Y Wang Y Y Yao and H Yu ldquoA survey on rough set theoryand applicationsrdquo Chinese Journal of Computers vol 32 no 7pp 1229ndash1246 2009
[18] S Udhaya Kumar and H Hannah Inbarani ldquoA novel neigh-borhood rough set based classification approach for medicaldiagnosisrdquo Procedia Computer Science vol 47 pp 351ndash359 2015
[19] H H Inbarani A T Azar and G Jothi ldquoSupervised hybridfeature selection based on PSO and rough sets for medicaldiagnosisrdquo Computer Methods and Programs in Biomedicinevol 113 no 1 pp 175ndash185 2014
[20] S K Shah M F McNitt-Gray S R Rogers et al ldquoComputer-aided diagnosis of the solitary pulmonary nodulerdquo AcademicRadiology vol 12 no 5 pp 570ndash575 2005
[21] A Hassanien ldquoFuzzy rough sets hybrid scheme for breastcancer detectionrdquo Image and Vision Computing vol 25 no 2pp 172ndash183 2007
[22] Z Guo Y Li Y Wang S Liu T Lei and Y Fan ldquoA method ofeffective text extraction for complex video scenerdquoMathematicalProblems in Engineering vol 2016 Article ID 2187647 11 pages2016
[23] M-K Hu ldquoVisual pattern recognition by moment invariantsrdquoIRE Transactions on Information Theory vol 8 no 2 pp 179ndash187 1962
[24] J Chorowski J Wang and J M Zurada ldquoReview and per-formance comparison of SVM- and ELM-based classifiersrdquoNeurocomputing vol 128 pp 507ndash516 2014
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
6 BioMed Research International
Table 1 ROI feature set
Feature type Feature vectors Dimensionality
Shapefeatures (fs)
Perimeter area volumeroundness rectangularity lengthEulerrsquos number ESV SCDSTDERCLD Hu moment
18
Intensityfeatures (fi)
Mean intensity intensitystandard variancemaximum-minimum intensitydifference value of varianceskewness kurtosis intensitygradient (from inside to outside)LDM LDD
8
Texturefeatures (ft)
Tamura texture features(contrast direction roughness)GLCM (angular second momentmoment of inertia torque deficitsum mean variance sumvariance difference varianceentropy sum entropy differentialentropy information measurecorrelation coefficient maximumcorrelation coefficient)
16
function such that119891(119909 119886) isin 119881119886for every 119909 isin 119880 and every 119886 isin
119860 A decision system is an information system (119880 119862cup119863119881 119891)with119862cap119863 = B where119862 and119863 are called the conditional anddecision attribute sets respectively
For a subset 119875 of 119860 let us define the correspondingequivalence relation as
IND (119875) = (119909 119910) isin 119880 times 119880 | 119891 (119909 119886)
= 119891 (119910 119886) for any 119886 isin 119875(18)
and denote the equivalence class of IND(119875) which containsthe object 119909 isin 119880 by [119909]
119875 that is
[119909]119875 = 119910 isin 119880 | (119909 119910) isin IND (119875) (19)
The factor set of all equivalence classes of IND(119875) isdenoted by 119880119875 that is 119880119875 = [119909]
119875| 119909 isin 119880
As well known attribute reduction is one of the keyissues in RST It is performed in information systems bymeans of the notion of a reduct based on a specializationof the notion of independence due to Marczewski Up tonow much attention has been paid to this issue and manydifferent methods of attribute reduction have been proposedfor decision systems For example the reduction approachesare respectively based on partition discernibility matrixconditional information entropy positive region and antcolony optimization approach
Definition 2 Let 119878 = (119880 119860 119881 119891) be an information systemand 119875 sube 119860 For a subset119883 of 119880 119877
119875(119883) = 119909 isin 119880 | [119909]
119875sube 119883
and 119877119875(119883) = 119909 isin 119880 | [119909]119875cap 119883 = B are called 119875-lower and
119875-upper approximations of119883 respectively
H1
H2
H
r
r
Figure 2 Optimal hyper plane
Definition 3 Let 119878 = (119880 119860 119881 119891) be an information systemand let 119875 and 119876 be two subsets of 119860 Then POS
119875(119876) =
⋃119883isin119880119876
119877119875(119883) is called 119875-positive region of 119876 where 119877
119875(119883)
is the 119875-lower approximation of 119883
Definition 4 Let 119878 = (119880 119860 119881 119891) be a decision system 119886 isin 119862and 119875 sube 119862 If POS
119862(119876) = POS
119862119886(119876) 119886 is said to be 119863-
dispensable in 119862 otherwise 119886 is said to be 119863-indispensablein119862 The set of all the119863-indispensable attributes is called thecore of 119878 and denoted by Core(119878) Furthermore if POS
119875(119876) =
POS119862(119876) and each of the attributes of 119875 is 119863-indispensable
then 119875 is called a reduct of 119878
23 SVM and Its Optimization SVM is a pattern recognitionmethod developed from statistical learning theory based onthe idea of structural risk minimization principle In the caseof ensuring classification accuracy SVM can improve thegeneralization ability of the learning machine by maximizingthe classification interval The biggest advantage of SVMis that it overcomes the overlearning and high dimensionboth of which lead to computational complexity and localextremum problems A reliable classification model basedon SVM is urgently needed for the study of hospitalizationexpenses of patients with gastric cancer
SVM deals with linearly separable data (Figure 2) theassumption is that there are data sets 119878 = 119909
1 119909
119899 and data
marker 119866 = 1199101 119910
119899 where 119909
119894is the input space vector of
the data sample and 119910119894records the category of the sample
The aim of SVM is to find an optimal hyper plane 119867 toseparate these two samples andmake the largest intervalTheoptimal hyper plane119867 is expressed as
119908119879119909 + 119887 = 0 (20)
where 119908 is the weight vector and 119887 is the thresholdThis problem is transformed into the optimal problem of
119908 and 119887
min119908119887
119903 (119908) =1
2119908
119910119894 ((119908 sdot 119909) + 119887) ge 1 119894 = 1 119899
(21)
BioMed Research International 7
In order to simplify the formula the Lagrange dual isintroduced to meet the requirements of KKT (Karush-Kuhn-Tucker) The objective function is transformed into
min120572
1
2
119899
sum
119894=1
119899
sum
119895=1
119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus
119899
sum
119895=1
120572119895
st119899
sum
119894=1
119910119894120572119894= 0 120572
119894ge 0 119894 = 1 2 119899
(22)
As for the linearly inseparable data the penalty parameter119862 and relaxation variable 120585 are introduced in the constraintcondition thus the generalization ability of SVM is increasedand the function is transformed into
min120572
1
2
119899
sum
119894=1
119899
sum
119895=1
119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus
119899
sum
119895=1
120572119895
st119899
sum
119894=1
119910119894120572119894= 0 0 le 120572
119894le 119862
(23)
where 119862 is the artificial setting parameter According to thepractical experience the bigger 119862 the greater separationinterval At the same time it will increase the risk ofgeneralization
The final classification function is
119891 (119909) = sgn(119899
sum
119894=1
119886lowast
119894119910119894(119909119894sdot 119909)) + 119887
lowast (24)
For nonlinear classification data SVM transforms theminto linearly separable data in a high-dimensional space vianonlinear mapping of kernel function and the optimal hyperplane is found in high-dimensional space The kernel func-tion which meets the mercer kernel condition correspondingto the transvection of a spatial transformation is used torealize the nonlinear transformation of linear classification
The corresponding kernel function is defined as
119870(119909119894 119909) = (120593 (119909
119894) 120593 (119909)) (25)
At this point the final classification function is
119891 (119909) = sgn(119899
sum
119894=1
119886119894119910119894119870(119909119894sdot 119909)) + 119887 (26)
Penalty factor 119862 and parameter 119892 of the kernel functionplay an extremely important role in the performance of SVMclassification In order to obtain the optimal classificationresults grid optimization model is used for optimization inthis paper In grid optimization model the parameters to besearched are expressed in the form of grids in a certain spaceand the optimal parameters are selected by traversing all thegrids Therefore grid optimization model has the advantagesof simplicity convenience good stability and easiness to getthe global optimal solution [24] In the learning process ofSVM 10-fold cross-validation is used to calculate the kernel
function parameters and penalty coefficient with the optimalclassification performance which are then applied to theSVM classifier for recognition and detection of pulmonarynodules Finally sensitivity specificity accuracy and pro-cessing time are used as indexes to evaluate the detection ofrelevant experiments
3 Pulmonary Nodule Detection Model
In this paper CT images of 70 cases of patients with pul-monary nodules are used The images are firstly segmented[7] to three different types of pulmonary nodules (solitarypulmonary nodules or SPN vascular adhesion pulmonarynodules or VAPN and pleural adhesion pulmonary nodulesor PAPN) which are marked by radiologists as well as alarge number of nonnodular areas including blood vesselsbones and alveoli Forty-two feature components charac-terizing ROI are extracted from the 2-dimensional and 3-dimensional perspectives including six new 3-dimensionalfeatures proposed in this paper They are composed of 18shape features 8 intensity features and 16 texture featuresThe extracted feature set (identified as the FS) is discretizedand normalized Feature-level fusion of the improved featuredata is performed for five times using rough set model (sincethe reduction of rough set feature subset is not unique in thispaper the extracted feature sets are reduced for five timesand are identified as RS1 RS2 RS3 RS4 and RS5) Featuresubset RS1 is used for comparative experiment Finally SVMparameters are optimized using grid optimization modeland the improved SVM is used in the following four setsof comparative experiments comparative analysis of theeffectiveness and stability of classification before and afterrough set reduction of features comparative analysis of therecognition performance before and after feature-level fusionbased on rough set or PCA comparative analysis of therecognition performance of our proposed method and othermethods Based on the above views we present a flow chartof pulmonary nodule detection model as shown in Figure 3
4 Results and Discussion
41 Experimental Environments In this paper the hardwareand software environments are as follows
Software Environments Windows 7 OS the Matlab R2014bImageJ 148 u and LibSVM
Hardware Environments Intel Core i5 4670-34GHz 80GBof memory and 500GB hard disk
Experimental Data CT images of 70 cases of patients withpulmonary nodules are collected as experimental sampleswhich are marked by radiologists with a size of 512 times 512and a thickness of 2mm They are composed of 2232 CTimages from 38 cases of patients with solitary pulmonarynodules (SPN) 17 cases of patients with vascular adhesionpulmonary nodules (VAPN) and 15 cases of patients withpleural adhesion pulmonary nodules (PAPN) respectively
8 BioMed Research International
ROI
Pulmonarynodule
Otherareas
Blood vesselsbone alveolar
Gridoptimization
methodSVM
Featureextraction
Featurereduction
Classification result
Feature set
RS1 RS2 RS3 RS4 RS5
Rough set
Image segmentation
Shapefeatures
Perimeter area volume roundrectangular length Eulers number ESV
SCDSTD Hu moment ERCLD
Mean intensity intensity standardvariance maximum-minimum intensitydifference value of variance skewness
kurtosis the intensity gradient (by)outside introversion Laplace divergence
Tamura texture features (contrastorientation degree roughness)
GLCM (angular second moment moment ofinertia the deficit sum mean variance sumvariance difference variance entropy andentropy differential entropy information
measure correlation coefficient maximumcorrelation coefficient)
Intensityfeatures
Texturefeatures
CT
SPN VAPN PAPN
Normalized
Discretization
Figure 3 Flow chart of pulmonary nodule detection model
Figure 4 shows the representatives of each type of pulmonarynodules and the corresponding segmentation results
In this paper 42-dimensional features of 70 markedpulmonary nodular areas and 70 randomly selected nonn-odular areas are extracted Table 2 shows the 42-dimensionalfeature values of the lung nodular and nonnodular areasshape features are identified as the fs intensity features areidentified as the fi and texture features are identified as the ftIn order to intuitively understand the distribution of differentfeature values and the discrimination comparison externalsphere volume (ESV) ratio and the standard deviation of
surface-center distance (SCD) are calculated and plotted asbox diagram as shown in Figure 5
42 Feature-Level Fusion Based on Rough Set In order toavoid the attribute value of small range of values dominatedby that of large range of values and reduce the complexityof the statistical computation process the extracted featuresets are firstly preprocessed by normalizing data with biggerdifference and linearly mapping the data to [0 1] Thepreprocessed feature data are then fused for five times usingrough set model The fusion results are shown in Table 3
BioMed Research International 9
(a) Original image of SPN (b) Segment result of SPN (c) Original image of VAPN
(d) Segment result of VAPN (e) Original image of PAPN (f) Segment result of PAPN
Figure 4 Pulmonary nodule segmentation results
Table 2 Feature values of pulmonary nodular areas and nonnodular areas
Shape features (fs) Intensity features (fi) Texture features (ft)Nodular areas Nonnodular areas Nodular areas Nonnodular areas Nodular areas Nonnodular areas95 78 5906 910987 83104 54016159 128 1406 44872 12041 125216284 178 05956 minus039568 04303 0006706517 0211 27348 18669 07709 0727506961 21587 551865 143481 07169 0986503529 07778 05 1 08059 538940 1 139598 206044 01942 0048703186 10295 729905 3546389 07708 0727300686 10197 08059 5349800042 00458 35042 5097100021 00295 06514 0845300013 00268 00971 0614300005 00011 44033 8218620 1 00691 5006114 9 minus05785 minus0424505356 05571 2307 3223903072 050178801738 0207122
10 BioMed Research International
+0Nodular areas Nonnodular areas
02
04
06
08
1
12
14
(a) Boxplot of ESV
+
+
+
01
015
02
025
03
035
04
045
05
055
06
Nodular areas Nonnodular areas(b) Boxplot of SCDSTD
Figure 5 Pulmonary nodule area and the pulmonary nodules boxplot ldquo+rdquo refers to upper and lower bounders of ESV value and SCDSTDvalue
Table 3 Feature reduction based on rough sets
Featuresubset Reduction results Dimensionality
RS1fs4 fs16 fs17 fs18 fi2 fi4 fi6 fi7fi8 ft2 ft4 ft5 ft6 ft7 ft8 ft9ft10 ft11 ft13 ft14 ft15 ft16
21
RS2fs4 fs9 fs16 fs18 fi1 fi2 fi5 ft2ft5 ft6 ft8 ft9 ft10 ft11 ft12 ft13
ft1517
RS3fs9 fs17 fs18 fi1 fi2 fi5 fi7 fi8ft2 ft6 ft7 ft8 ft9 ft10 ft11 ft12
ft14 ft15 ft1619
RS4fs9 fs16 fs18 fi1 fi2 fi5 fi7 fi8ft5 ft6 ft7 ft8 ft9 ft10 ft11 ft12
ft14 ft15 ft1619
RS5fs9 fs16 fs17 fs18 fi1 fi2 fi4 fi5fi7 fi8 ft2 ft5 ft6 ft7 ft8 ft9 ft10
ft12 ft15 ft1620
43 Pulmonary Nodule Detection withSVM Based on Grid Optimization
431 The Model Effectiveness Experiment Tenfold cross-validation is used to calculate the accuracy sensitivity speci-ficity and processing time of classification before and afterrough set reduction (RS1(70 times 21) obtained from experimentone is used as the data set after reduction) and the recogni-tion performance of classifier is compared before and afterreduction The results are shown in Table 4
Experimental results show that pulmonary noduledetection accuracy is increased significantly after feature-level fusion with a decrease in the missed diagnosis ratereflected by the increased sensitivity and the misdiagnosisrate reflected by the increased specificity The processingtime is also shorter after reduction These results indicatethat the feature-level fusion of the extracted feature set with
42 dimensionalities based on rough set model is effectivewhich not only improves the compactness of the feature set(to eliminate redundancy and low degree of differentiationfeatures component) but also corrects the abnormal data ofthe feature set thereby further improving the performance ofpulmonary nodule detection Table 5 shows the effectivenessof the five rough set reduction subsets
432 The Model Stability Experiment The feature data ofpulmonary nodules are tested with RS1(70lowast21) as the datasetfor classification for five rounds with a different ratio oftraining set over testing set of 5020 4030 3535 3535 or2050 Each round of test is carried out with a randomlyselected ratio of training set over testing set and themeanof 10test results is used as the corresponding accuracy sensitivityspecificity and running time of the model The results areshown in Table 6
The experimental results show that with the decreasein the ratio of training set over testing set the decrease inthe classification accuracy of feature subset after rough setreduction is not obvious whereas that of feature set beforerough set reduction is fluctuating to certain extent (Figure 6is more intuitive)These results indicate that the classificationstability of the feature level fusion model based on rough setis higher and is less susceptible to the interference of sampledata Table 7 shows the stability of 5 groups feature subset afterrough set reduction
433 The Superiority of Feature-Level Fusion Model Basedon Rough Set Since PCA is a well-developed model char-acterized by simple calculation and easy programming ithas become the preferred dimension reduction method formost of the feature-level fusion model in order to analyzecomparatively two types of feature-level fusions In this paperPCA-based feature-level fusion of the extracted feature sets isperformed at the same time and the tenfold cross-validationresults are shown in Table 8 Figure 7 shows the classificationperformance of the two types of feature-level fusion methods
BioMed Research International 11
Table 4 Statistics of effectiveness before and after rough set reduction
Serial number Accuracy () Sensibility () Specificity () Processing time (s)
Before reduction
1 9642 9286 100 106102 9196 8393 100 061703 9554 100 9107 054904 8928 100 7857 056305 9554 9107 100 054706 9821 9643 100 054607 9464 8929 100 054608 9553 9107 100 054609 9196 8393 100 0546010 9732 100 9664 05300
Mean 9464 9286 9643 06051
After reduction (Rs1)
1 100 100 100 093702 100 100 100 043603 100 100 100 038704 100 100 100 042105 100 100 100 042106 100 100 100 039007 100 100 100 040608 9167 100 8333 040609 100 100 100 0374010 100 100 100 03930
Mean 9917 100 9833 04571Increase after reduction 453 714 19 0148
Table 5 Effectiveness of rough set reduction subsets
Subset Averageaccuracy ()
Averagesensitivity ()
Averagespecificity ()
Processingtime (s)
RS1 9917 100 9833 04571RS2 975 9667 9833 04650RS3 9917 100 9833 04656RS4 100 100 100 04731RS5 9833 9833 9833 04850Mean 9883 99 9866 04672
(feature subset RS1 fromTable 3 is used and the running timeis 100 times actual time)
Experimental results show that various performanceindicators of the feature-level fusion model based on roughset are better than those based on PCA indicating that therough set is more suitable than PCA to eliminate redundantinformation
434 Comparison with Other Pulmonary Nodule DetectionMethods Pulmonary nodule detection accuracy and FalsePositives per scan (FPs) are used as the evaluation indexesof pulmonary nodule detection methods to compare andanalyze the method proposed in this paper and other five
9293949596979899
100101
5020 4030 3535 3040 2050
Before reductionAfter reduction
Figure 6 Comparative results of feature subsets before and afterrough set reduction
detectionmethods of pulmonary nodules (the optimal detec-tion accuracy is used for all detection methods) The resultsare shown in Table 9 (Pr private database L LIDC)
Experimental results show that the proposed method issuperior to the other pulmonary nodule detection methodsto a certain extent indicating that this method not onlyimproves the comprehensiveness and accuracy of the featuredescription of ROI by supplementing and improving the
12 BioMed Research International
Table 6 Stability statistics of rough set reduction subsets
Trainingsettesting set Accuracy () Sensitivity () Specificity () Running time (s)
Before fusion
5020 9735 9471 100 048734030 9653 9308 9832 038463535 9583 9239 9779 042543040 9616 9558 9674 035602050 9488 9463 9586 04236Mean 9615 9408 97742 04154
After fusion (Rs1)
5020 9971 9941 100 026844030 9896 9958 9846 025683535 9865 9923 9808 023823040 9837 9860 9814 026462050 9825 9767 9884 02636Mean 9879 9884 9870 02583
Table 7 Classification performance of rough set reduction subset
SubsetAverageaccuracy
()
Averagesensitivity
()
Averagespecificity
()
Runningtime (s)
RS1 9917 100 9833 02583RS2 975 9667 9833 02870RS3 9917 100 9833 02560RS4 100 100 100 02531RS5 9833 9833 9833 02656Mean 98834 99 9866 02620
Table 8 Classification performance of feature reduction based onPCA
Serialnumber
Accuracy()
Sensitivity()
Specificity()
10 timesrunningtime (s)
1 9167 8333 100 099702 9674 9348 100 048303 9674 9348 100 048804 9891 100 9783 049505 9348 8696 100 049506 9674 100 9348 051407 9674 100 9348 051208 9457 8913 100 048909 9783 9565 100 0499010 9565 9348 9783 05180Mean 9591 9355 9826 05490
feature components but also improves the firmness of thefeature set by integrating the concept of feature-level fusionbased on rough set to exclude the redundant features and datawith irregular information thereby improving the overallpulmonary nodule detection performance
Table 9 Comparison of the performance of different lung noduledetection methods
Author Database Nodulenumbers
Accuracy() FPs
Santos et al[11] L 260 884 117
MagalhaesBarros Nettoet al [12]
L 48 9065 0138
Ye et al [13] Pr 220 902 82Tan et al [14] L 172 875 4Cascio et al[16] L 148 97 61
Our method Pr 70 9917 047
Rough set
Accuracy Sensitivity Specificity Running time
PCA
020406080
100120
Figure 7 Comparison of two feature-level fusion models
5 Conclusions
In this paper the research status quo of pulmonary noduledetection methods is analyzed and a pulmonary noduledetection model is proposed based on rough set basedfeature-level fusion To address the issues that the featuredescription is insufficient and the characterization is inac-curate in the process of feature extraction six new 3Dfeatures in combination with other 2D and 3D features are
BioMed Research International 13
proposed to extract and quantify the feature information ofROI in this model A rough set based feature-level fusionis employed to reduce the dimensionality of the feature setssince there is redundant information in the extracted high-dimensional features In addition a grid optimization modelis adopted to optimize the SVM kernel function whichis used as the classifier for detection and recognition ofpulmonary nodule Finally the pulmonary nodule detectionperformance of the proposed method is verified with fourgroups of comparative experimentsThe experimental resultsshow that the proposed pulmonary nodule detectionmethodbased on rough set based feature-level fusion is effective withthe classification accuracy that can basicallymeet the require-ments of medical imaging for the detection of pulmonarynodules and therefore is of great value for the detection ofpulmonary nodules and auxiliary diagnosis of lung cancer
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The work is partially supported by National Natural ScienceFoundation of China (Grant No 61561040) Natural ScienceFoundation of Ningxia (Grant No NZ16067) and ScientificResearch Fund of Ningxia Education Department (Grant NoNGY2016084)
References
[1] S P Riaz M Luchtenborg V H Coupland J Spicer M DPeake and H Moslashller ldquoTrends in incidence of small cell lungcancer and all lung cancerrdquo Lung Cancer vol 75 no 3 pp 280ndash284 2012
[2] J Ferlay I Soerjomataram R Dikshit et al ldquoCancer incidenceand mortality worldwide sources methods and major patternsin GLOBOCAN 2012rdquo International Journal of Cancer vol 136no 5 pp E359ndashE386 2015
[3] D R Baldwin ldquoPrediction of risk of lung cancer in populationsand in pulmonary nodules significant progress to drive changesin paradigmsrdquo Lung Cancer vol 89 no 1 pp 1ndash3 2015
[4] S Diciotti G Picozzi M Falchini MMascalchi N Villari andG Valli ldquo3-D segmentation algorithm of small lung nodules inspiral CT imagesrdquo IEEETransactions on Information Technologyin Biomedicine vol 12 no 1 pp 7ndash19 2008
[5] S Theodoridis and K Koutroumbas Pattern Recognition TheAssociated Press 4th edition 2010
[6] B Sahiner H-P Chan L M Hadjiiski et al ldquoEffect of CADon radiologistsrsquo detection of lung nodules on thoracic CT scansanalysis of an observer performance study by nodule sizerdquoAcademic Radiology vol 16 no 12 pp 1518ndash1530 2009
[7] Y Xia S Lu LWen S EberlM Fulham andDD Feng ldquoAuto-mated identification of dementia using FDG-PET imagingrdquoBioMed Research International vol 2014 Article ID 421743 8pages 2014
[8] Y Xia Z Ji and Y Zhang ldquoBrain MRI image segmentationbased on learning local variational Gaussian mixture modelsrdquoNeurocomputing vol 204 pp 189ndash197 2016
[9] M Xian Y Zhang and H D Cheng ldquoFully automatic segmen-tation of breast ultrasound images based on breast characteris-tics in space and frequency domainsrdquo Pattern Recognition vol48 no 2 pp 485ndash497 2015
[10] M Xian Y Zhang H-D Cheng F Xu and J Ding ldquoNeutro-connectedness cutrdquo IEEE Transactions on Image Processing vol25 no 10 pp 4691ndash4703 2016
[11] A M Santos A O de Carvalho Filho A C Silva et alldquoAutomatic detection of small lung nodules in 3D CT datausing Gaussian mixture models Tsallis entropy and SVMrdquoEngineering Applications of Artificial Intelligence vol 36 pp 27ndash39 2014
[12] S Magalhaes Barros Netto A Corrca Silva R AcatauassuNunes and M Gattass ldquoAutomatic segmentation of lungnodules with growing neural gas and support vector machinerdquoComputers in Biology andMedicine vol 42 no 11 pp 1110ndash11212012
[13] X Ye X Lin J Dehmeshki G Slabaugh and G BeddoeldquoShape-based computer-aided detection of lung nodules in tho-racic CT imagesrdquo IEEE Transactions on Biomedical Engineeringvol 56 no 7 pp 1810ndash1820 2009
[14] M Tan R Deklerck B Jansen M Bister and J Cornelis ldquoAnovel computer-aided lung nodule detection system for CTimagesrdquoMedical Physics vol 38 no 10 pp 5630ndash5645 2011
[15] Q Li F Li and K Doi ldquoComputerized Detection of LungNodules in Thin-Section CT Images by Use of SelectiveEnhancement Filters and an Automated Rule-Based ClassifierrdquoAcademic Radiology vol 15 no 2 pp 165ndash175 2008
[16] D Cascio R Magro F Fauci M Iacomi and G RasoldquoAutomatic detection of lung nodules in CT datasets basedon stable 3D mass-spring modelsrdquo Computers in Biology andMedicine vol 42 no 11 pp 1098ndash1109 2012
[17] G Y Wang Y Y Yao and H Yu ldquoA survey on rough set theoryand applicationsrdquo Chinese Journal of Computers vol 32 no 7pp 1229ndash1246 2009
[18] S Udhaya Kumar and H Hannah Inbarani ldquoA novel neigh-borhood rough set based classification approach for medicaldiagnosisrdquo Procedia Computer Science vol 47 pp 351ndash359 2015
[19] H H Inbarani A T Azar and G Jothi ldquoSupervised hybridfeature selection based on PSO and rough sets for medicaldiagnosisrdquo Computer Methods and Programs in Biomedicinevol 113 no 1 pp 175ndash185 2014
[20] S K Shah M F McNitt-Gray S R Rogers et al ldquoComputer-aided diagnosis of the solitary pulmonary nodulerdquo AcademicRadiology vol 12 no 5 pp 570ndash575 2005
[21] A Hassanien ldquoFuzzy rough sets hybrid scheme for breastcancer detectionrdquo Image and Vision Computing vol 25 no 2pp 172ndash183 2007
[22] Z Guo Y Li Y Wang S Liu T Lei and Y Fan ldquoA method ofeffective text extraction for complex video scenerdquoMathematicalProblems in Engineering vol 2016 Article ID 2187647 11 pages2016
[23] M-K Hu ldquoVisual pattern recognition by moment invariantsrdquoIRE Transactions on Information Theory vol 8 no 2 pp 179ndash187 1962
[24] J Chorowski J Wang and J M Zurada ldquoReview and per-formance comparison of SVM- and ELM-based classifiersrdquoNeurocomputing vol 128 pp 507ndash516 2014
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
BioMed Research International 7
In order to simplify the formula the Lagrange dual isintroduced to meet the requirements of KKT (Karush-Kuhn-Tucker) The objective function is transformed into
min120572
1
2
119899
sum
119894=1
119899
sum
119895=1
119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus
119899
sum
119895=1
120572119895
st119899
sum
119894=1
119910119894120572119894= 0 120572
119894ge 0 119894 = 1 2 119899
(22)
As for the linearly inseparable data the penalty parameter119862 and relaxation variable 120585 are introduced in the constraintcondition thus the generalization ability of SVM is increasedand the function is transformed into
min120572
1
2
119899
sum
119894=1
119899
sum
119895=1
119910119894119910119895120572119894120572119895(119909119894sdot 119909119895) minus
119899
sum
119895=1
120572119895
st119899
sum
119894=1
119910119894120572119894= 0 0 le 120572
119894le 119862
(23)
where 119862 is the artificial setting parameter According to thepractical experience the bigger 119862 the greater separationinterval At the same time it will increase the risk ofgeneralization
The final classification function is
119891 (119909) = sgn(119899
sum
119894=1
119886lowast
119894119910119894(119909119894sdot 119909)) + 119887
lowast (24)
For nonlinear classification data SVM transforms theminto linearly separable data in a high-dimensional space vianonlinear mapping of kernel function and the optimal hyperplane is found in high-dimensional space The kernel func-tion which meets the mercer kernel condition correspondingto the transvection of a spatial transformation is used torealize the nonlinear transformation of linear classification
The corresponding kernel function is defined as
119870(119909119894 119909) = (120593 (119909
119894) 120593 (119909)) (25)
At this point the final classification function is
119891 (119909) = sgn(119899
sum
119894=1
119886119894119910119894119870(119909119894sdot 119909)) + 119887 (26)
Penalty factor 119862 and parameter 119892 of the kernel functionplay an extremely important role in the performance of SVMclassification In order to obtain the optimal classificationresults grid optimization model is used for optimization inthis paper In grid optimization model the parameters to besearched are expressed in the form of grids in a certain spaceand the optimal parameters are selected by traversing all thegrids Therefore grid optimization model has the advantagesof simplicity convenience good stability and easiness to getthe global optimal solution [24] In the learning process ofSVM 10-fold cross-validation is used to calculate the kernel
function parameters and penalty coefficient with the optimalclassification performance which are then applied to theSVM classifier for recognition and detection of pulmonarynodules Finally sensitivity specificity accuracy and pro-cessing time are used as indexes to evaluate the detection ofrelevant experiments
3 Pulmonary Nodule Detection Model
In this paper CT images of 70 cases of patients with pul-monary nodules are used The images are firstly segmented[7] to three different types of pulmonary nodules (solitarypulmonary nodules or SPN vascular adhesion pulmonarynodules or VAPN and pleural adhesion pulmonary nodulesor PAPN) which are marked by radiologists as well as alarge number of nonnodular areas including blood vesselsbones and alveoli Forty-two feature components charac-terizing ROI are extracted from the 2-dimensional and 3-dimensional perspectives including six new 3-dimensionalfeatures proposed in this paper They are composed of 18shape features 8 intensity features and 16 texture featuresThe extracted feature set (identified as the FS) is discretizedand normalized Feature-level fusion of the improved featuredata is performed for five times using rough set model (sincethe reduction of rough set feature subset is not unique in thispaper the extracted feature sets are reduced for five timesand are identified as RS1 RS2 RS3 RS4 and RS5) Featuresubset RS1 is used for comparative experiment Finally SVMparameters are optimized using grid optimization modeland the improved SVM is used in the following four setsof comparative experiments comparative analysis of theeffectiveness and stability of classification before and afterrough set reduction of features comparative analysis of therecognition performance before and after feature-level fusionbased on rough set or PCA comparative analysis of therecognition performance of our proposed method and othermethods Based on the above views we present a flow chartof pulmonary nodule detection model as shown in Figure 3
4 Results and Discussion
41 Experimental Environments In this paper the hardwareand software environments are as follows
Software Environments Windows 7 OS the Matlab R2014bImageJ 148 u and LibSVM
Hardware Environments Intel Core i5 4670-34GHz 80GBof memory and 500GB hard disk
Experimental Data CT images of 70 cases of patients withpulmonary nodules are collected as experimental sampleswhich are marked by radiologists with a size of 512 times 512and a thickness of 2mm They are composed of 2232 CTimages from 38 cases of patients with solitary pulmonarynodules (SPN) 17 cases of patients with vascular adhesionpulmonary nodules (VAPN) and 15 cases of patients withpleural adhesion pulmonary nodules (PAPN) respectively
8 BioMed Research International
ROI
Pulmonarynodule
Otherareas
Blood vesselsbone alveolar
Gridoptimization
methodSVM
Featureextraction
Featurereduction
Classification result
Feature set
RS1 RS2 RS3 RS4 RS5
Rough set
Image segmentation
Shapefeatures
Perimeter area volume roundrectangular length Eulers number ESV
SCDSTD Hu moment ERCLD
Mean intensity intensity standardvariance maximum-minimum intensitydifference value of variance skewness
kurtosis the intensity gradient (by)outside introversion Laplace divergence
Tamura texture features (contrastorientation degree roughness)
GLCM (angular second moment moment ofinertia the deficit sum mean variance sumvariance difference variance entropy andentropy differential entropy information
measure correlation coefficient maximumcorrelation coefficient)
Intensityfeatures
Texturefeatures
CT
SPN VAPN PAPN
Normalized
Discretization
Figure 3 Flow chart of pulmonary nodule detection model
Figure 4 shows the representatives of each type of pulmonarynodules and the corresponding segmentation results
In this paper 42-dimensional features of 70 markedpulmonary nodular areas and 70 randomly selected nonn-odular areas are extracted Table 2 shows the 42-dimensionalfeature values of the lung nodular and nonnodular areasshape features are identified as the fs intensity features areidentified as the fi and texture features are identified as the ftIn order to intuitively understand the distribution of differentfeature values and the discrimination comparison externalsphere volume (ESV) ratio and the standard deviation of
surface-center distance (SCD) are calculated and plotted asbox diagram as shown in Figure 5
42 Feature-Level Fusion Based on Rough Set In order toavoid the attribute value of small range of values dominatedby that of large range of values and reduce the complexityof the statistical computation process the extracted featuresets are firstly preprocessed by normalizing data with biggerdifference and linearly mapping the data to [0 1] Thepreprocessed feature data are then fused for five times usingrough set model The fusion results are shown in Table 3
BioMed Research International 9
(a) Original image of SPN (b) Segment result of SPN (c) Original image of VAPN
(d) Segment result of VAPN (e) Original image of PAPN (f) Segment result of PAPN
Figure 4 Pulmonary nodule segmentation results
Table 2 Feature values of pulmonary nodular areas and nonnodular areas
Shape features (fs) Intensity features (fi) Texture features (ft)Nodular areas Nonnodular areas Nodular areas Nonnodular areas Nodular areas Nonnodular areas95 78 5906 910987 83104 54016159 128 1406 44872 12041 125216284 178 05956 minus039568 04303 0006706517 0211 27348 18669 07709 0727506961 21587 551865 143481 07169 0986503529 07778 05 1 08059 538940 1 139598 206044 01942 0048703186 10295 729905 3546389 07708 0727300686 10197 08059 5349800042 00458 35042 5097100021 00295 06514 0845300013 00268 00971 0614300005 00011 44033 8218620 1 00691 5006114 9 minus05785 minus0424505356 05571 2307 3223903072 050178801738 0207122
10 BioMed Research International
+0Nodular areas Nonnodular areas
02
04
06
08
1
12
14
(a) Boxplot of ESV
+
+
+
01
015
02
025
03
035
04
045
05
055
06
Nodular areas Nonnodular areas(b) Boxplot of SCDSTD
Figure 5 Pulmonary nodule area and the pulmonary nodules boxplot ldquo+rdquo refers to upper and lower bounders of ESV value and SCDSTDvalue
Table 3 Feature reduction based on rough sets
Featuresubset Reduction results Dimensionality
RS1fs4 fs16 fs17 fs18 fi2 fi4 fi6 fi7fi8 ft2 ft4 ft5 ft6 ft7 ft8 ft9ft10 ft11 ft13 ft14 ft15 ft16
21
RS2fs4 fs9 fs16 fs18 fi1 fi2 fi5 ft2ft5 ft6 ft8 ft9 ft10 ft11 ft12 ft13
ft1517
RS3fs9 fs17 fs18 fi1 fi2 fi5 fi7 fi8ft2 ft6 ft7 ft8 ft9 ft10 ft11 ft12
ft14 ft15 ft1619
RS4fs9 fs16 fs18 fi1 fi2 fi5 fi7 fi8ft5 ft6 ft7 ft8 ft9 ft10 ft11 ft12
ft14 ft15 ft1619
RS5fs9 fs16 fs17 fs18 fi1 fi2 fi4 fi5fi7 fi8 ft2 ft5 ft6 ft7 ft8 ft9 ft10
ft12 ft15 ft1620
43 Pulmonary Nodule Detection withSVM Based on Grid Optimization
431 The Model Effectiveness Experiment Tenfold cross-validation is used to calculate the accuracy sensitivity speci-ficity and processing time of classification before and afterrough set reduction (RS1(70 times 21) obtained from experimentone is used as the data set after reduction) and the recogni-tion performance of classifier is compared before and afterreduction The results are shown in Table 4
Experimental results show that pulmonary noduledetection accuracy is increased significantly after feature-level fusion with a decrease in the missed diagnosis ratereflected by the increased sensitivity and the misdiagnosisrate reflected by the increased specificity The processingtime is also shorter after reduction These results indicatethat the feature-level fusion of the extracted feature set with
42 dimensionalities based on rough set model is effectivewhich not only improves the compactness of the feature set(to eliminate redundancy and low degree of differentiationfeatures component) but also corrects the abnormal data ofthe feature set thereby further improving the performance ofpulmonary nodule detection Table 5 shows the effectivenessof the five rough set reduction subsets
432 The Model Stability Experiment The feature data ofpulmonary nodules are tested with RS1(70lowast21) as the datasetfor classification for five rounds with a different ratio oftraining set over testing set of 5020 4030 3535 3535 or2050 Each round of test is carried out with a randomlyselected ratio of training set over testing set and themeanof 10test results is used as the corresponding accuracy sensitivityspecificity and running time of the model The results areshown in Table 6
The experimental results show that with the decreasein the ratio of training set over testing set the decrease inthe classification accuracy of feature subset after rough setreduction is not obvious whereas that of feature set beforerough set reduction is fluctuating to certain extent (Figure 6is more intuitive)These results indicate that the classificationstability of the feature level fusion model based on rough setis higher and is less susceptible to the interference of sampledata Table 7 shows the stability of 5 groups feature subset afterrough set reduction
433 The Superiority of Feature-Level Fusion Model Basedon Rough Set Since PCA is a well-developed model char-acterized by simple calculation and easy programming ithas become the preferred dimension reduction method formost of the feature-level fusion model in order to analyzecomparatively two types of feature-level fusions In this paperPCA-based feature-level fusion of the extracted feature sets isperformed at the same time and the tenfold cross-validationresults are shown in Table 8 Figure 7 shows the classificationperformance of the two types of feature-level fusion methods
BioMed Research International 11
Table 4 Statistics of effectiveness before and after rough set reduction
Serial number Accuracy () Sensibility () Specificity () Processing time (s)
Before reduction
1 9642 9286 100 106102 9196 8393 100 061703 9554 100 9107 054904 8928 100 7857 056305 9554 9107 100 054706 9821 9643 100 054607 9464 8929 100 054608 9553 9107 100 054609 9196 8393 100 0546010 9732 100 9664 05300
Mean 9464 9286 9643 06051
After reduction (Rs1)
1 100 100 100 093702 100 100 100 043603 100 100 100 038704 100 100 100 042105 100 100 100 042106 100 100 100 039007 100 100 100 040608 9167 100 8333 040609 100 100 100 0374010 100 100 100 03930
Mean 9917 100 9833 04571Increase after reduction 453 714 19 0148
Table 5 Effectiveness of rough set reduction subsets
Subset Averageaccuracy ()
Averagesensitivity ()
Averagespecificity ()
Processingtime (s)
RS1 9917 100 9833 04571RS2 975 9667 9833 04650RS3 9917 100 9833 04656RS4 100 100 100 04731RS5 9833 9833 9833 04850Mean 9883 99 9866 04672
(feature subset RS1 fromTable 3 is used and the running timeis 100 times actual time)
Experimental results show that various performanceindicators of the feature-level fusion model based on roughset are better than those based on PCA indicating that therough set is more suitable than PCA to eliminate redundantinformation
434 Comparison with Other Pulmonary Nodule DetectionMethods Pulmonary nodule detection accuracy and FalsePositives per scan (FPs) are used as the evaluation indexesof pulmonary nodule detection methods to compare andanalyze the method proposed in this paper and other five
9293949596979899
100101
5020 4030 3535 3040 2050
Before reductionAfter reduction
Figure 6 Comparative results of feature subsets before and afterrough set reduction
detectionmethods of pulmonary nodules (the optimal detec-tion accuracy is used for all detection methods) The resultsare shown in Table 9 (Pr private database L LIDC)
Experimental results show that the proposed method issuperior to the other pulmonary nodule detection methodsto a certain extent indicating that this method not onlyimproves the comprehensiveness and accuracy of the featuredescription of ROI by supplementing and improving the
12 BioMed Research International
Table 6 Stability statistics of rough set reduction subsets
Trainingsettesting set Accuracy () Sensitivity () Specificity () Running time (s)
Before fusion
5020 9735 9471 100 048734030 9653 9308 9832 038463535 9583 9239 9779 042543040 9616 9558 9674 035602050 9488 9463 9586 04236Mean 9615 9408 97742 04154
After fusion (Rs1)
5020 9971 9941 100 026844030 9896 9958 9846 025683535 9865 9923 9808 023823040 9837 9860 9814 026462050 9825 9767 9884 02636Mean 9879 9884 9870 02583
Table 7 Classification performance of rough set reduction subset
SubsetAverageaccuracy
()
Averagesensitivity
()
Averagespecificity
()
Runningtime (s)
RS1 9917 100 9833 02583RS2 975 9667 9833 02870RS3 9917 100 9833 02560RS4 100 100 100 02531RS5 9833 9833 9833 02656Mean 98834 99 9866 02620
Table 8 Classification performance of feature reduction based onPCA
Serialnumber
Accuracy()
Sensitivity()
Specificity()
10 timesrunningtime (s)
1 9167 8333 100 099702 9674 9348 100 048303 9674 9348 100 048804 9891 100 9783 049505 9348 8696 100 049506 9674 100 9348 051407 9674 100 9348 051208 9457 8913 100 048909 9783 9565 100 0499010 9565 9348 9783 05180Mean 9591 9355 9826 05490
feature components but also improves the firmness of thefeature set by integrating the concept of feature-level fusionbased on rough set to exclude the redundant features and datawith irregular information thereby improving the overallpulmonary nodule detection performance
Table 9 Comparison of the performance of different lung noduledetection methods
Author Database Nodulenumbers
Accuracy() FPs
Santos et al[11] L 260 884 117
MagalhaesBarros Nettoet al [12]
L 48 9065 0138
Ye et al [13] Pr 220 902 82Tan et al [14] L 172 875 4Cascio et al[16] L 148 97 61
Our method Pr 70 9917 047
Rough set
Accuracy Sensitivity Specificity Running time
PCA
020406080
100120
Figure 7 Comparison of two feature-level fusion models
5 Conclusions
In this paper the research status quo of pulmonary noduledetection methods is analyzed and a pulmonary noduledetection model is proposed based on rough set basedfeature-level fusion To address the issues that the featuredescription is insufficient and the characterization is inac-curate in the process of feature extraction six new 3Dfeatures in combination with other 2D and 3D features are
BioMed Research International 13
proposed to extract and quantify the feature information ofROI in this model A rough set based feature-level fusionis employed to reduce the dimensionality of the feature setssince there is redundant information in the extracted high-dimensional features In addition a grid optimization modelis adopted to optimize the SVM kernel function whichis used as the classifier for detection and recognition ofpulmonary nodule Finally the pulmonary nodule detectionperformance of the proposed method is verified with fourgroups of comparative experimentsThe experimental resultsshow that the proposed pulmonary nodule detectionmethodbased on rough set based feature-level fusion is effective withthe classification accuracy that can basicallymeet the require-ments of medical imaging for the detection of pulmonarynodules and therefore is of great value for the detection ofpulmonary nodules and auxiliary diagnosis of lung cancer
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The work is partially supported by National Natural ScienceFoundation of China (Grant No 61561040) Natural ScienceFoundation of Ningxia (Grant No NZ16067) and ScientificResearch Fund of Ningxia Education Department (Grant NoNGY2016084)
References
[1] S P Riaz M Luchtenborg V H Coupland J Spicer M DPeake and H Moslashller ldquoTrends in incidence of small cell lungcancer and all lung cancerrdquo Lung Cancer vol 75 no 3 pp 280ndash284 2012
[2] J Ferlay I Soerjomataram R Dikshit et al ldquoCancer incidenceand mortality worldwide sources methods and major patternsin GLOBOCAN 2012rdquo International Journal of Cancer vol 136no 5 pp E359ndashE386 2015
[3] D R Baldwin ldquoPrediction of risk of lung cancer in populationsand in pulmonary nodules significant progress to drive changesin paradigmsrdquo Lung Cancer vol 89 no 1 pp 1ndash3 2015
[4] S Diciotti G Picozzi M Falchini MMascalchi N Villari andG Valli ldquo3-D segmentation algorithm of small lung nodules inspiral CT imagesrdquo IEEETransactions on Information Technologyin Biomedicine vol 12 no 1 pp 7ndash19 2008
[5] S Theodoridis and K Koutroumbas Pattern Recognition TheAssociated Press 4th edition 2010
[6] B Sahiner H-P Chan L M Hadjiiski et al ldquoEffect of CADon radiologistsrsquo detection of lung nodules on thoracic CT scansanalysis of an observer performance study by nodule sizerdquoAcademic Radiology vol 16 no 12 pp 1518ndash1530 2009
[7] Y Xia S Lu LWen S EberlM Fulham andDD Feng ldquoAuto-mated identification of dementia using FDG-PET imagingrdquoBioMed Research International vol 2014 Article ID 421743 8pages 2014
[8] Y Xia Z Ji and Y Zhang ldquoBrain MRI image segmentationbased on learning local variational Gaussian mixture modelsrdquoNeurocomputing vol 204 pp 189ndash197 2016
[9] M Xian Y Zhang and H D Cheng ldquoFully automatic segmen-tation of breast ultrasound images based on breast characteris-tics in space and frequency domainsrdquo Pattern Recognition vol48 no 2 pp 485ndash497 2015
[10] M Xian Y Zhang H-D Cheng F Xu and J Ding ldquoNeutro-connectedness cutrdquo IEEE Transactions on Image Processing vol25 no 10 pp 4691ndash4703 2016
[11] A M Santos A O de Carvalho Filho A C Silva et alldquoAutomatic detection of small lung nodules in 3D CT datausing Gaussian mixture models Tsallis entropy and SVMrdquoEngineering Applications of Artificial Intelligence vol 36 pp 27ndash39 2014
[12] S Magalhaes Barros Netto A Corrca Silva R AcatauassuNunes and M Gattass ldquoAutomatic segmentation of lungnodules with growing neural gas and support vector machinerdquoComputers in Biology andMedicine vol 42 no 11 pp 1110ndash11212012
[13] X Ye X Lin J Dehmeshki G Slabaugh and G BeddoeldquoShape-based computer-aided detection of lung nodules in tho-racic CT imagesrdquo IEEE Transactions on Biomedical Engineeringvol 56 no 7 pp 1810ndash1820 2009
[14] M Tan R Deklerck B Jansen M Bister and J Cornelis ldquoAnovel computer-aided lung nodule detection system for CTimagesrdquoMedical Physics vol 38 no 10 pp 5630ndash5645 2011
[15] Q Li F Li and K Doi ldquoComputerized Detection of LungNodules in Thin-Section CT Images by Use of SelectiveEnhancement Filters and an Automated Rule-Based ClassifierrdquoAcademic Radiology vol 15 no 2 pp 165ndash175 2008
[16] D Cascio R Magro F Fauci M Iacomi and G RasoldquoAutomatic detection of lung nodules in CT datasets basedon stable 3D mass-spring modelsrdquo Computers in Biology andMedicine vol 42 no 11 pp 1098ndash1109 2012
[17] G Y Wang Y Y Yao and H Yu ldquoA survey on rough set theoryand applicationsrdquo Chinese Journal of Computers vol 32 no 7pp 1229ndash1246 2009
[18] S Udhaya Kumar and H Hannah Inbarani ldquoA novel neigh-borhood rough set based classification approach for medicaldiagnosisrdquo Procedia Computer Science vol 47 pp 351ndash359 2015
[19] H H Inbarani A T Azar and G Jothi ldquoSupervised hybridfeature selection based on PSO and rough sets for medicaldiagnosisrdquo Computer Methods and Programs in Biomedicinevol 113 no 1 pp 175ndash185 2014
[20] S K Shah M F McNitt-Gray S R Rogers et al ldquoComputer-aided diagnosis of the solitary pulmonary nodulerdquo AcademicRadiology vol 12 no 5 pp 570ndash575 2005
[21] A Hassanien ldquoFuzzy rough sets hybrid scheme for breastcancer detectionrdquo Image and Vision Computing vol 25 no 2pp 172ndash183 2007
[22] Z Guo Y Li Y Wang S Liu T Lei and Y Fan ldquoA method ofeffective text extraction for complex video scenerdquoMathematicalProblems in Engineering vol 2016 Article ID 2187647 11 pages2016
[23] M-K Hu ldquoVisual pattern recognition by moment invariantsrdquoIRE Transactions on Information Theory vol 8 no 2 pp 179ndash187 1962
[24] J Chorowski J Wang and J M Zurada ldquoReview and per-formance comparison of SVM- and ELM-based classifiersrdquoNeurocomputing vol 128 pp 507ndash516 2014
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
8 BioMed Research International
ROI
Pulmonarynodule
Otherareas
Blood vesselsbone alveolar
Gridoptimization
methodSVM
Featureextraction
Featurereduction
Classification result
Feature set
RS1 RS2 RS3 RS4 RS5
Rough set
Image segmentation
Shapefeatures
Perimeter area volume roundrectangular length Eulers number ESV
SCDSTD Hu moment ERCLD
Mean intensity intensity standardvariance maximum-minimum intensitydifference value of variance skewness
kurtosis the intensity gradient (by)outside introversion Laplace divergence
Tamura texture features (contrastorientation degree roughness)
GLCM (angular second moment moment ofinertia the deficit sum mean variance sumvariance difference variance entropy andentropy differential entropy information
measure correlation coefficient maximumcorrelation coefficient)
Intensityfeatures
Texturefeatures
CT
SPN VAPN PAPN
Normalized
Discretization
Figure 3 Flow chart of pulmonary nodule detection model
Figure 4 shows the representatives of each type of pulmonarynodules and the corresponding segmentation results
In this paper 42-dimensional features of 70 markedpulmonary nodular areas and 70 randomly selected nonn-odular areas are extracted Table 2 shows the 42-dimensionalfeature values of the lung nodular and nonnodular areasshape features are identified as the fs intensity features areidentified as the fi and texture features are identified as the ftIn order to intuitively understand the distribution of differentfeature values and the discrimination comparison externalsphere volume (ESV) ratio and the standard deviation of
surface-center distance (SCD) are calculated and plotted asbox diagram as shown in Figure 5
42 Feature-Level Fusion Based on Rough Set In order toavoid the attribute value of small range of values dominatedby that of large range of values and reduce the complexityof the statistical computation process the extracted featuresets are firstly preprocessed by normalizing data with biggerdifference and linearly mapping the data to [0 1] Thepreprocessed feature data are then fused for five times usingrough set model The fusion results are shown in Table 3
BioMed Research International 9
(a) Original image of SPN (b) Segment result of SPN (c) Original image of VAPN
(d) Segment result of VAPN (e) Original image of PAPN (f) Segment result of PAPN
Figure 4 Pulmonary nodule segmentation results
Table 2 Feature values of pulmonary nodular areas and nonnodular areas
Shape features (fs) Intensity features (fi) Texture features (ft)Nodular areas Nonnodular areas Nodular areas Nonnodular areas Nodular areas Nonnodular areas95 78 5906 910987 83104 54016159 128 1406 44872 12041 125216284 178 05956 minus039568 04303 0006706517 0211 27348 18669 07709 0727506961 21587 551865 143481 07169 0986503529 07778 05 1 08059 538940 1 139598 206044 01942 0048703186 10295 729905 3546389 07708 0727300686 10197 08059 5349800042 00458 35042 5097100021 00295 06514 0845300013 00268 00971 0614300005 00011 44033 8218620 1 00691 5006114 9 minus05785 minus0424505356 05571 2307 3223903072 050178801738 0207122
10 BioMed Research International
+0Nodular areas Nonnodular areas
02
04
06
08
1
12
14
(a) Boxplot of ESV
+
+
+
01
015
02
025
03
035
04
045
05
055
06
Nodular areas Nonnodular areas(b) Boxplot of SCDSTD
Figure 5 Pulmonary nodule area and the pulmonary nodules boxplot ldquo+rdquo refers to upper and lower bounders of ESV value and SCDSTDvalue
Table 3 Feature reduction based on rough sets
Featuresubset Reduction results Dimensionality
RS1fs4 fs16 fs17 fs18 fi2 fi4 fi6 fi7fi8 ft2 ft4 ft5 ft6 ft7 ft8 ft9ft10 ft11 ft13 ft14 ft15 ft16
21
RS2fs4 fs9 fs16 fs18 fi1 fi2 fi5 ft2ft5 ft6 ft8 ft9 ft10 ft11 ft12 ft13
ft1517
RS3fs9 fs17 fs18 fi1 fi2 fi5 fi7 fi8ft2 ft6 ft7 ft8 ft9 ft10 ft11 ft12
ft14 ft15 ft1619
RS4fs9 fs16 fs18 fi1 fi2 fi5 fi7 fi8ft5 ft6 ft7 ft8 ft9 ft10 ft11 ft12
ft14 ft15 ft1619
RS5fs9 fs16 fs17 fs18 fi1 fi2 fi4 fi5fi7 fi8 ft2 ft5 ft6 ft7 ft8 ft9 ft10
ft12 ft15 ft1620
43 Pulmonary Nodule Detection withSVM Based on Grid Optimization
431 The Model Effectiveness Experiment Tenfold cross-validation is used to calculate the accuracy sensitivity speci-ficity and processing time of classification before and afterrough set reduction (RS1(70 times 21) obtained from experimentone is used as the data set after reduction) and the recogni-tion performance of classifier is compared before and afterreduction The results are shown in Table 4
Experimental results show that pulmonary noduledetection accuracy is increased significantly after feature-level fusion with a decrease in the missed diagnosis ratereflected by the increased sensitivity and the misdiagnosisrate reflected by the increased specificity The processingtime is also shorter after reduction These results indicatethat the feature-level fusion of the extracted feature set with
42 dimensionalities based on rough set model is effectivewhich not only improves the compactness of the feature set(to eliminate redundancy and low degree of differentiationfeatures component) but also corrects the abnormal data ofthe feature set thereby further improving the performance ofpulmonary nodule detection Table 5 shows the effectivenessof the five rough set reduction subsets
432 The Model Stability Experiment The feature data ofpulmonary nodules are tested with RS1(70lowast21) as the datasetfor classification for five rounds with a different ratio oftraining set over testing set of 5020 4030 3535 3535 or2050 Each round of test is carried out with a randomlyselected ratio of training set over testing set and themeanof 10test results is used as the corresponding accuracy sensitivityspecificity and running time of the model The results areshown in Table 6
The experimental results show that with the decreasein the ratio of training set over testing set the decrease inthe classification accuracy of feature subset after rough setreduction is not obvious whereas that of feature set beforerough set reduction is fluctuating to certain extent (Figure 6is more intuitive)These results indicate that the classificationstability of the feature level fusion model based on rough setis higher and is less susceptible to the interference of sampledata Table 7 shows the stability of 5 groups feature subset afterrough set reduction
433 The Superiority of Feature-Level Fusion Model Basedon Rough Set Since PCA is a well-developed model char-acterized by simple calculation and easy programming ithas become the preferred dimension reduction method formost of the feature-level fusion model in order to analyzecomparatively two types of feature-level fusions In this paperPCA-based feature-level fusion of the extracted feature sets isperformed at the same time and the tenfold cross-validationresults are shown in Table 8 Figure 7 shows the classificationperformance of the two types of feature-level fusion methods
BioMed Research International 11
Table 4 Statistics of effectiveness before and after rough set reduction
Serial number Accuracy () Sensibility () Specificity () Processing time (s)
Before reduction
1 9642 9286 100 106102 9196 8393 100 061703 9554 100 9107 054904 8928 100 7857 056305 9554 9107 100 054706 9821 9643 100 054607 9464 8929 100 054608 9553 9107 100 054609 9196 8393 100 0546010 9732 100 9664 05300
Mean 9464 9286 9643 06051
After reduction (Rs1)
1 100 100 100 093702 100 100 100 043603 100 100 100 038704 100 100 100 042105 100 100 100 042106 100 100 100 039007 100 100 100 040608 9167 100 8333 040609 100 100 100 0374010 100 100 100 03930
Mean 9917 100 9833 04571Increase after reduction 453 714 19 0148
Table 5 Effectiveness of rough set reduction subsets
Subset Averageaccuracy ()
Averagesensitivity ()
Averagespecificity ()
Processingtime (s)
RS1 9917 100 9833 04571RS2 975 9667 9833 04650RS3 9917 100 9833 04656RS4 100 100 100 04731RS5 9833 9833 9833 04850Mean 9883 99 9866 04672
(feature subset RS1 fromTable 3 is used and the running timeis 100 times actual time)
Experimental results show that various performanceindicators of the feature-level fusion model based on roughset are better than those based on PCA indicating that therough set is more suitable than PCA to eliminate redundantinformation
434 Comparison with Other Pulmonary Nodule DetectionMethods Pulmonary nodule detection accuracy and FalsePositives per scan (FPs) are used as the evaluation indexesof pulmonary nodule detection methods to compare andanalyze the method proposed in this paper and other five
9293949596979899
100101
5020 4030 3535 3040 2050
Before reductionAfter reduction
Figure 6 Comparative results of feature subsets before and afterrough set reduction
detectionmethods of pulmonary nodules (the optimal detec-tion accuracy is used for all detection methods) The resultsare shown in Table 9 (Pr private database L LIDC)
Experimental results show that the proposed method issuperior to the other pulmonary nodule detection methodsto a certain extent indicating that this method not onlyimproves the comprehensiveness and accuracy of the featuredescription of ROI by supplementing and improving the
12 BioMed Research International
Table 6 Stability statistics of rough set reduction subsets
Trainingsettesting set Accuracy () Sensitivity () Specificity () Running time (s)
Before fusion
5020 9735 9471 100 048734030 9653 9308 9832 038463535 9583 9239 9779 042543040 9616 9558 9674 035602050 9488 9463 9586 04236Mean 9615 9408 97742 04154
After fusion (Rs1)
5020 9971 9941 100 026844030 9896 9958 9846 025683535 9865 9923 9808 023823040 9837 9860 9814 026462050 9825 9767 9884 02636Mean 9879 9884 9870 02583
Table 7 Classification performance of rough set reduction subset
SubsetAverageaccuracy
()
Averagesensitivity
()
Averagespecificity
()
Runningtime (s)
RS1 9917 100 9833 02583RS2 975 9667 9833 02870RS3 9917 100 9833 02560RS4 100 100 100 02531RS5 9833 9833 9833 02656Mean 98834 99 9866 02620
Table 8 Classification performance of feature reduction based onPCA
Serialnumber
Accuracy()
Sensitivity()
Specificity()
10 timesrunningtime (s)
1 9167 8333 100 099702 9674 9348 100 048303 9674 9348 100 048804 9891 100 9783 049505 9348 8696 100 049506 9674 100 9348 051407 9674 100 9348 051208 9457 8913 100 048909 9783 9565 100 0499010 9565 9348 9783 05180Mean 9591 9355 9826 05490
feature components but also improves the firmness of thefeature set by integrating the concept of feature-level fusionbased on rough set to exclude the redundant features and datawith irregular information thereby improving the overallpulmonary nodule detection performance
Table 9 Comparison of the performance of different lung noduledetection methods
Author Database Nodulenumbers
Accuracy() FPs
Santos et al[11] L 260 884 117
MagalhaesBarros Nettoet al [12]
L 48 9065 0138
Ye et al [13] Pr 220 902 82Tan et al [14] L 172 875 4Cascio et al[16] L 148 97 61
Our method Pr 70 9917 047
Rough set
Accuracy Sensitivity Specificity Running time
PCA
020406080
100120
Figure 7 Comparison of two feature-level fusion models
5 Conclusions
In this paper the research status quo of pulmonary noduledetection methods is analyzed and a pulmonary noduledetection model is proposed based on rough set basedfeature-level fusion To address the issues that the featuredescription is insufficient and the characterization is inac-curate in the process of feature extraction six new 3Dfeatures in combination with other 2D and 3D features are
BioMed Research International 13
proposed to extract and quantify the feature information ofROI in this model A rough set based feature-level fusionis employed to reduce the dimensionality of the feature setssince there is redundant information in the extracted high-dimensional features In addition a grid optimization modelis adopted to optimize the SVM kernel function whichis used as the classifier for detection and recognition ofpulmonary nodule Finally the pulmonary nodule detectionperformance of the proposed method is verified with fourgroups of comparative experimentsThe experimental resultsshow that the proposed pulmonary nodule detectionmethodbased on rough set based feature-level fusion is effective withthe classification accuracy that can basicallymeet the require-ments of medical imaging for the detection of pulmonarynodules and therefore is of great value for the detection ofpulmonary nodules and auxiliary diagnosis of lung cancer
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The work is partially supported by National Natural ScienceFoundation of China (Grant No 61561040) Natural ScienceFoundation of Ningxia (Grant No NZ16067) and ScientificResearch Fund of Ningxia Education Department (Grant NoNGY2016084)
References
[1] S P Riaz M Luchtenborg V H Coupland J Spicer M DPeake and H Moslashller ldquoTrends in incidence of small cell lungcancer and all lung cancerrdquo Lung Cancer vol 75 no 3 pp 280ndash284 2012
[2] J Ferlay I Soerjomataram R Dikshit et al ldquoCancer incidenceand mortality worldwide sources methods and major patternsin GLOBOCAN 2012rdquo International Journal of Cancer vol 136no 5 pp E359ndashE386 2015
[3] D R Baldwin ldquoPrediction of risk of lung cancer in populationsand in pulmonary nodules significant progress to drive changesin paradigmsrdquo Lung Cancer vol 89 no 1 pp 1ndash3 2015
[4] S Diciotti G Picozzi M Falchini MMascalchi N Villari andG Valli ldquo3-D segmentation algorithm of small lung nodules inspiral CT imagesrdquo IEEETransactions on Information Technologyin Biomedicine vol 12 no 1 pp 7ndash19 2008
[5] S Theodoridis and K Koutroumbas Pattern Recognition TheAssociated Press 4th edition 2010
[6] B Sahiner H-P Chan L M Hadjiiski et al ldquoEffect of CADon radiologistsrsquo detection of lung nodules on thoracic CT scansanalysis of an observer performance study by nodule sizerdquoAcademic Radiology vol 16 no 12 pp 1518ndash1530 2009
[7] Y Xia S Lu LWen S EberlM Fulham andDD Feng ldquoAuto-mated identification of dementia using FDG-PET imagingrdquoBioMed Research International vol 2014 Article ID 421743 8pages 2014
[8] Y Xia Z Ji and Y Zhang ldquoBrain MRI image segmentationbased on learning local variational Gaussian mixture modelsrdquoNeurocomputing vol 204 pp 189ndash197 2016
[9] M Xian Y Zhang and H D Cheng ldquoFully automatic segmen-tation of breast ultrasound images based on breast characteris-tics in space and frequency domainsrdquo Pattern Recognition vol48 no 2 pp 485ndash497 2015
[10] M Xian Y Zhang H-D Cheng F Xu and J Ding ldquoNeutro-connectedness cutrdquo IEEE Transactions on Image Processing vol25 no 10 pp 4691ndash4703 2016
[11] A M Santos A O de Carvalho Filho A C Silva et alldquoAutomatic detection of small lung nodules in 3D CT datausing Gaussian mixture models Tsallis entropy and SVMrdquoEngineering Applications of Artificial Intelligence vol 36 pp 27ndash39 2014
[12] S Magalhaes Barros Netto A Corrca Silva R AcatauassuNunes and M Gattass ldquoAutomatic segmentation of lungnodules with growing neural gas and support vector machinerdquoComputers in Biology andMedicine vol 42 no 11 pp 1110ndash11212012
[13] X Ye X Lin J Dehmeshki G Slabaugh and G BeddoeldquoShape-based computer-aided detection of lung nodules in tho-racic CT imagesrdquo IEEE Transactions on Biomedical Engineeringvol 56 no 7 pp 1810ndash1820 2009
[14] M Tan R Deklerck B Jansen M Bister and J Cornelis ldquoAnovel computer-aided lung nodule detection system for CTimagesrdquoMedical Physics vol 38 no 10 pp 5630ndash5645 2011
[15] Q Li F Li and K Doi ldquoComputerized Detection of LungNodules in Thin-Section CT Images by Use of SelectiveEnhancement Filters and an Automated Rule-Based ClassifierrdquoAcademic Radiology vol 15 no 2 pp 165ndash175 2008
[16] D Cascio R Magro F Fauci M Iacomi and G RasoldquoAutomatic detection of lung nodules in CT datasets basedon stable 3D mass-spring modelsrdquo Computers in Biology andMedicine vol 42 no 11 pp 1098ndash1109 2012
[17] G Y Wang Y Y Yao and H Yu ldquoA survey on rough set theoryand applicationsrdquo Chinese Journal of Computers vol 32 no 7pp 1229ndash1246 2009
[18] S Udhaya Kumar and H Hannah Inbarani ldquoA novel neigh-borhood rough set based classification approach for medicaldiagnosisrdquo Procedia Computer Science vol 47 pp 351ndash359 2015
[19] H H Inbarani A T Azar and G Jothi ldquoSupervised hybridfeature selection based on PSO and rough sets for medicaldiagnosisrdquo Computer Methods and Programs in Biomedicinevol 113 no 1 pp 175ndash185 2014
[20] S K Shah M F McNitt-Gray S R Rogers et al ldquoComputer-aided diagnosis of the solitary pulmonary nodulerdquo AcademicRadiology vol 12 no 5 pp 570ndash575 2005
[21] A Hassanien ldquoFuzzy rough sets hybrid scheme for breastcancer detectionrdquo Image and Vision Computing vol 25 no 2pp 172ndash183 2007
[22] Z Guo Y Li Y Wang S Liu T Lei and Y Fan ldquoA method ofeffective text extraction for complex video scenerdquoMathematicalProblems in Engineering vol 2016 Article ID 2187647 11 pages2016
[23] M-K Hu ldquoVisual pattern recognition by moment invariantsrdquoIRE Transactions on Information Theory vol 8 no 2 pp 179ndash187 1962
[24] J Chorowski J Wang and J M Zurada ldquoReview and per-formance comparison of SVM- and ELM-based classifiersrdquoNeurocomputing vol 128 pp 507ndash516 2014
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
BioMed Research International 9
(a) Original image of SPN (b) Segment result of SPN (c) Original image of VAPN
(d) Segment result of VAPN (e) Original image of PAPN (f) Segment result of PAPN
Figure 4 Pulmonary nodule segmentation results
Table 2 Feature values of pulmonary nodular areas and nonnodular areas
Shape features (fs) Intensity features (fi) Texture features (ft)Nodular areas Nonnodular areas Nodular areas Nonnodular areas Nodular areas Nonnodular areas95 78 5906 910987 83104 54016159 128 1406 44872 12041 125216284 178 05956 minus039568 04303 0006706517 0211 27348 18669 07709 0727506961 21587 551865 143481 07169 0986503529 07778 05 1 08059 538940 1 139598 206044 01942 0048703186 10295 729905 3546389 07708 0727300686 10197 08059 5349800042 00458 35042 5097100021 00295 06514 0845300013 00268 00971 0614300005 00011 44033 8218620 1 00691 5006114 9 minus05785 minus0424505356 05571 2307 3223903072 050178801738 0207122
10 BioMed Research International
+0Nodular areas Nonnodular areas
02
04
06
08
1
12
14
(a) Boxplot of ESV
+
+
+
01
015
02
025
03
035
04
045
05
055
06
Nodular areas Nonnodular areas(b) Boxplot of SCDSTD
Figure 5 Pulmonary nodule area and the pulmonary nodules boxplot ldquo+rdquo refers to upper and lower bounders of ESV value and SCDSTDvalue
Table 3 Feature reduction based on rough sets
Featuresubset Reduction results Dimensionality
RS1fs4 fs16 fs17 fs18 fi2 fi4 fi6 fi7fi8 ft2 ft4 ft5 ft6 ft7 ft8 ft9ft10 ft11 ft13 ft14 ft15 ft16
21
RS2fs4 fs9 fs16 fs18 fi1 fi2 fi5 ft2ft5 ft6 ft8 ft9 ft10 ft11 ft12 ft13
ft1517
RS3fs9 fs17 fs18 fi1 fi2 fi5 fi7 fi8ft2 ft6 ft7 ft8 ft9 ft10 ft11 ft12
ft14 ft15 ft1619
RS4fs9 fs16 fs18 fi1 fi2 fi5 fi7 fi8ft5 ft6 ft7 ft8 ft9 ft10 ft11 ft12
ft14 ft15 ft1619
RS5fs9 fs16 fs17 fs18 fi1 fi2 fi4 fi5fi7 fi8 ft2 ft5 ft6 ft7 ft8 ft9 ft10
ft12 ft15 ft1620
43 Pulmonary Nodule Detection withSVM Based on Grid Optimization
431 The Model Effectiveness Experiment Tenfold cross-validation is used to calculate the accuracy sensitivity speci-ficity and processing time of classification before and afterrough set reduction (RS1(70 times 21) obtained from experimentone is used as the data set after reduction) and the recogni-tion performance of classifier is compared before and afterreduction The results are shown in Table 4
Experimental results show that pulmonary noduledetection accuracy is increased significantly after feature-level fusion with a decrease in the missed diagnosis ratereflected by the increased sensitivity and the misdiagnosisrate reflected by the increased specificity The processingtime is also shorter after reduction These results indicatethat the feature-level fusion of the extracted feature set with
42 dimensionalities based on rough set model is effectivewhich not only improves the compactness of the feature set(to eliminate redundancy and low degree of differentiationfeatures component) but also corrects the abnormal data ofthe feature set thereby further improving the performance ofpulmonary nodule detection Table 5 shows the effectivenessof the five rough set reduction subsets
432 The Model Stability Experiment The feature data ofpulmonary nodules are tested with RS1(70lowast21) as the datasetfor classification for five rounds with a different ratio oftraining set over testing set of 5020 4030 3535 3535 or2050 Each round of test is carried out with a randomlyselected ratio of training set over testing set and themeanof 10test results is used as the corresponding accuracy sensitivityspecificity and running time of the model The results areshown in Table 6
The experimental results show that with the decreasein the ratio of training set over testing set the decrease inthe classification accuracy of feature subset after rough setreduction is not obvious whereas that of feature set beforerough set reduction is fluctuating to certain extent (Figure 6is more intuitive)These results indicate that the classificationstability of the feature level fusion model based on rough setis higher and is less susceptible to the interference of sampledata Table 7 shows the stability of 5 groups feature subset afterrough set reduction
433 The Superiority of Feature-Level Fusion Model Basedon Rough Set Since PCA is a well-developed model char-acterized by simple calculation and easy programming ithas become the preferred dimension reduction method formost of the feature-level fusion model in order to analyzecomparatively two types of feature-level fusions In this paperPCA-based feature-level fusion of the extracted feature sets isperformed at the same time and the tenfold cross-validationresults are shown in Table 8 Figure 7 shows the classificationperformance of the two types of feature-level fusion methods
BioMed Research International 11
Table 4 Statistics of effectiveness before and after rough set reduction
Serial number Accuracy () Sensibility () Specificity () Processing time (s)
Before reduction
1 9642 9286 100 106102 9196 8393 100 061703 9554 100 9107 054904 8928 100 7857 056305 9554 9107 100 054706 9821 9643 100 054607 9464 8929 100 054608 9553 9107 100 054609 9196 8393 100 0546010 9732 100 9664 05300
Mean 9464 9286 9643 06051
After reduction (Rs1)
1 100 100 100 093702 100 100 100 043603 100 100 100 038704 100 100 100 042105 100 100 100 042106 100 100 100 039007 100 100 100 040608 9167 100 8333 040609 100 100 100 0374010 100 100 100 03930
Mean 9917 100 9833 04571Increase after reduction 453 714 19 0148
Table 5 Effectiveness of rough set reduction subsets
Subset Averageaccuracy ()
Averagesensitivity ()
Averagespecificity ()
Processingtime (s)
RS1 9917 100 9833 04571RS2 975 9667 9833 04650RS3 9917 100 9833 04656RS4 100 100 100 04731RS5 9833 9833 9833 04850Mean 9883 99 9866 04672
(feature subset RS1 fromTable 3 is used and the running timeis 100 times actual time)
Experimental results show that various performanceindicators of the feature-level fusion model based on roughset are better than those based on PCA indicating that therough set is more suitable than PCA to eliminate redundantinformation
434 Comparison with Other Pulmonary Nodule DetectionMethods Pulmonary nodule detection accuracy and FalsePositives per scan (FPs) are used as the evaluation indexesof pulmonary nodule detection methods to compare andanalyze the method proposed in this paper and other five
9293949596979899
100101
5020 4030 3535 3040 2050
Before reductionAfter reduction
Figure 6 Comparative results of feature subsets before and afterrough set reduction
detectionmethods of pulmonary nodules (the optimal detec-tion accuracy is used for all detection methods) The resultsare shown in Table 9 (Pr private database L LIDC)
Experimental results show that the proposed method issuperior to the other pulmonary nodule detection methodsto a certain extent indicating that this method not onlyimproves the comprehensiveness and accuracy of the featuredescription of ROI by supplementing and improving the
12 BioMed Research International
Table 6 Stability statistics of rough set reduction subsets
Trainingsettesting set Accuracy () Sensitivity () Specificity () Running time (s)
Before fusion
5020 9735 9471 100 048734030 9653 9308 9832 038463535 9583 9239 9779 042543040 9616 9558 9674 035602050 9488 9463 9586 04236Mean 9615 9408 97742 04154
After fusion (Rs1)
5020 9971 9941 100 026844030 9896 9958 9846 025683535 9865 9923 9808 023823040 9837 9860 9814 026462050 9825 9767 9884 02636Mean 9879 9884 9870 02583
Table 7 Classification performance of rough set reduction subset
SubsetAverageaccuracy
()
Averagesensitivity
()
Averagespecificity
()
Runningtime (s)
RS1 9917 100 9833 02583RS2 975 9667 9833 02870RS3 9917 100 9833 02560RS4 100 100 100 02531RS5 9833 9833 9833 02656Mean 98834 99 9866 02620
Table 8 Classification performance of feature reduction based onPCA
Serialnumber
Accuracy()
Sensitivity()
Specificity()
10 timesrunningtime (s)
1 9167 8333 100 099702 9674 9348 100 048303 9674 9348 100 048804 9891 100 9783 049505 9348 8696 100 049506 9674 100 9348 051407 9674 100 9348 051208 9457 8913 100 048909 9783 9565 100 0499010 9565 9348 9783 05180Mean 9591 9355 9826 05490
feature components but also improves the firmness of thefeature set by integrating the concept of feature-level fusionbased on rough set to exclude the redundant features and datawith irregular information thereby improving the overallpulmonary nodule detection performance
Table 9 Comparison of the performance of different lung noduledetection methods
Author Database Nodulenumbers
Accuracy() FPs
Santos et al[11] L 260 884 117
MagalhaesBarros Nettoet al [12]
L 48 9065 0138
Ye et al [13] Pr 220 902 82Tan et al [14] L 172 875 4Cascio et al[16] L 148 97 61
Our method Pr 70 9917 047
Rough set
Accuracy Sensitivity Specificity Running time
PCA
020406080
100120
Figure 7 Comparison of two feature-level fusion models
5 Conclusions
In this paper the research status quo of pulmonary noduledetection methods is analyzed and a pulmonary noduledetection model is proposed based on rough set basedfeature-level fusion To address the issues that the featuredescription is insufficient and the characterization is inac-curate in the process of feature extraction six new 3Dfeatures in combination with other 2D and 3D features are
BioMed Research International 13
proposed to extract and quantify the feature information ofROI in this model A rough set based feature-level fusionis employed to reduce the dimensionality of the feature setssince there is redundant information in the extracted high-dimensional features In addition a grid optimization modelis adopted to optimize the SVM kernel function whichis used as the classifier for detection and recognition ofpulmonary nodule Finally the pulmonary nodule detectionperformance of the proposed method is verified with fourgroups of comparative experimentsThe experimental resultsshow that the proposed pulmonary nodule detectionmethodbased on rough set based feature-level fusion is effective withthe classification accuracy that can basicallymeet the require-ments of medical imaging for the detection of pulmonarynodules and therefore is of great value for the detection ofpulmonary nodules and auxiliary diagnosis of lung cancer
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The work is partially supported by National Natural ScienceFoundation of China (Grant No 61561040) Natural ScienceFoundation of Ningxia (Grant No NZ16067) and ScientificResearch Fund of Ningxia Education Department (Grant NoNGY2016084)
References
[1] S P Riaz M Luchtenborg V H Coupland J Spicer M DPeake and H Moslashller ldquoTrends in incidence of small cell lungcancer and all lung cancerrdquo Lung Cancer vol 75 no 3 pp 280ndash284 2012
[2] J Ferlay I Soerjomataram R Dikshit et al ldquoCancer incidenceand mortality worldwide sources methods and major patternsin GLOBOCAN 2012rdquo International Journal of Cancer vol 136no 5 pp E359ndashE386 2015
[3] D R Baldwin ldquoPrediction of risk of lung cancer in populationsand in pulmonary nodules significant progress to drive changesin paradigmsrdquo Lung Cancer vol 89 no 1 pp 1ndash3 2015
[4] S Diciotti G Picozzi M Falchini MMascalchi N Villari andG Valli ldquo3-D segmentation algorithm of small lung nodules inspiral CT imagesrdquo IEEETransactions on Information Technologyin Biomedicine vol 12 no 1 pp 7ndash19 2008
[5] S Theodoridis and K Koutroumbas Pattern Recognition TheAssociated Press 4th edition 2010
[6] B Sahiner H-P Chan L M Hadjiiski et al ldquoEffect of CADon radiologistsrsquo detection of lung nodules on thoracic CT scansanalysis of an observer performance study by nodule sizerdquoAcademic Radiology vol 16 no 12 pp 1518ndash1530 2009
[7] Y Xia S Lu LWen S EberlM Fulham andDD Feng ldquoAuto-mated identification of dementia using FDG-PET imagingrdquoBioMed Research International vol 2014 Article ID 421743 8pages 2014
[8] Y Xia Z Ji and Y Zhang ldquoBrain MRI image segmentationbased on learning local variational Gaussian mixture modelsrdquoNeurocomputing vol 204 pp 189ndash197 2016
[9] M Xian Y Zhang and H D Cheng ldquoFully automatic segmen-tation of breast ultrasound images based on breast characteris-tics in space and frequency domainsrdquo Pattern Recognition vol48 no 2 pp 485ndash497 2015
[10] M Xian Y Zhang H-D Cheng F Xu and J Ding ldquoNeutro-connectedness cutrdquo IEEE Transactions on Image Processing vol25 no 10 pp 4691ndash4703 2016
[11] A M Santos A O de Carvalho Filho A C Silva et alldquoAutomatic detection of small lung nodules in 3D CT datausing Gaussian mixture models Tsallis entropy and SVMrdquoEngineering Applications of Artificial Intelligence vol 36 pp 27ndash39 2014
[12] S Magalhaes Barros Netto A Corrca Silva R AcatauassuNunes and M Gattass ldquoAutomatic segmentation of lungnodules with growing neural gas and support vector machinerdquoComputers in Biology andMedicine vol 42 no 11 pp 1110ndash11212012
[13] X Ye X Lin J Dehmeshki G Slabaugh and G BeddoeldquoShape-based computer-aided detection of lung nodules in tho-racic CT imagesrdquo IEEE Transactions on Biomedical Engineeringvol 56 no 7 pp 1810ndash1820 2009
[14] M Tan R Deklerck B Jansen M Bister and J Cornelis ldquoAnovel computer-aided lung nodule detection system for CTimagesrdquoMedical Physics vol 38 no 10 pp 5630ndash5645 2011
[15] Q Li F Li and K Doi ldquoComputerized Detection of LungNodules in Thin-Section CT Images by Use of SelectiveEnhancement Filters and an Automated Rule-Based ClassifierrdquoAcademic Radiology vol 15 no 2 pp 165ndash175 2008
[16] D Cascio R Magro F Fauci M Iacomi and G RasoldquoAutomatic detection of lung nodules in CT datasets basedon stable 3D mass-spring modelsrdquo Computers in Biology andMedicine vol 42 no 11 pp 1098ndash1109 2012
[17] G Y Wang Y Y Yao and H Yu ldquoA survey on rough set theoryand applicationsrdquo Chinese Journal of Computers vol 32 no 7pp 1229ndash1246 2009
[18] S Udhaya Kumar and H Hannah Inbarani ldquoA novel neigh-borhood rough set based classification approach for medicaldiagnosisrdquo Procedia Computer Science vol 47 pp 351ndash359 2015
[19] H H Inbarani A T Azar and G Jothi ldquoSupervised hybridfeature selection based on PSO and rough sets for medicaldiagnosisrdquo Computer Methods and Programs in Biomedicinevol 113 no 1 pp 175ndash185 2014
[20] S K Shah M F McNitt-Gray S R Rogers et al ldquoComputer-aided diagnosis of the solitary pulmonary nodulerdquo AcademicRadiology vol 12 no 5 pp 570ndash575 2005
[21] A Hassanien ldquoFuzzy rough sets hybrid scheme for breastcancer detectionrdquo Image and Vision Computing vol 25 no 2pp 172ndash183 2007
[22] Z Guo Y Li Y Wang S Liu T Lei and Y Fan ldquoA method ofeffective text extraction for complex video scenerdquoMathematicalProblems in Engineering vol 2016 Article ID 2187647 11 pages2016
[23] M-K Hu ldquoVisual pattern recognition by moment invariantsrdquoIRE Transactions on Information Theory vol 8 no 2 pp 179ndash187 1962
[24] J Chorowski J Wang and J M Zurada ldquoReview and per-formance comparison of SVM- and ELM-based classifiersrdquoNeurocomputing vol 128 pp 507ndash516 2014
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
10 BioMed Research International
+0Nodular areas Nonnodular areas
02
04
06
08
1
12
14
(a) Boxplot of ESV
+
+
+
01
015
02
025
03
035
04
045
05
055
06
Nodular areas Nonnodular areas(b) Boxplot of SCDSTD
Figure 5 Pulmonary nodule area and the pulmonary nodules boxplot ldquo+rdquo refers to upper and lower bounders of ESV value and SCDSTDvalue
Table 3 Feature reduction based on rough sets
Featuresubset Reduction results Dimensionality
RS1fs4 fs16 fs17 fs18 fi2 fi4 fi6 fi7fi8 ft2 ft4 ft5 ft6 ft7 ft8 ft9ft10 ft11 ft13 ft14 ft15 ft16
21
RS2fs4 fs9 fs16 fs18 fi1 fi2 fi5 ft2ft5 ft6 ft8 ft9 ft10 ft11 ft12 ft13
ft1517
RS3fs9 fs17 fs18 fi1 fi2 fi5 fi7 fi8ft2 ft6 ft7 ft8 ft9 ft10 ft11 ft12
ft14 ft15 ft1619
RS4fs9 fs16 fs18 fi1 fi2 fi5 fi7 fi8ft5 ft6 ft7 ft8 ft9 ft10 ft11 ft12
ft14 ft15 ft1619
RS5fs9 fs16 fs17 fs18 fi1 fi2 fi4 fi5fi7 fi8 ft2 ft5 ft6 ft7 ft8 ft9 ft10
ft12 ft15 ft1620
43 Pulmonary Nodule Detection withSVM Based on Grid Optimization
431 The Model Effectiveness Experiment Tenfold cross-validation is used to calculate the accuracy sensitivity speci-ficity and processing time of classification before and afterrough set reduction (RS1(70 times 21) obtained from experimentone is used as the data set after reduction) and the recogni-tion performance of classifier is compared before and afterreduction The results are shown in Table 4
Experimental results show that pulmonary noduledetection accuracy is increased significantly after feature-level fusion with a decrease in the missed diagnosis ratereflected by the increased sensitivity and the misdiagnosisrate reflected by the increased specificity The processingtime is also shorter after reduction These results indicatethat the feature-level fusion of the extracted feature set with
42 dimensionalities based on rough set model is effectivewhich not only improves the compactness of the feature set(to eliminate redundancy and low degree of differentiationfeatures component) but also corrects the abnormal data ofthe feature set thereby further improving the performance ofpulmonary nodule detection Table 5 shows the effectivenessof the five rough set reduction subsets
432 The Model Stability Experiment The feature data ofpulmonary nodules are tested with RS1(70lowast21) as the datasetfor classification for five rounds with a different ratio oftraining set over testing set of 5020 4030 3535 3535 or2050 Each round of test is carried out with a randomlyselected ratio of training set over testing set and themeanof 10test results is used as the corresponding accuracy sensitivityspecificity and running time of the model The results areshown in Table 6
The experimental results show that with the decreasein the ratio of training set over testing set the decrease inthe classification accuracy of feature subset after rough setreduction is not obvious whereas that of feature set beforerough set reduction is fluctuating to certain extent (Figure 6is more intuitive)These results indicate that the classificationstability of the feature level fusion model based on rough setis higher and is less susceptible to the interference of sampledata Table 7 shows the stability of 5 groups feature subset afterrough set reduction
433 The Superiority of Feature-Level Fusion Model Basedon Rough Set Since PCA is a well-developed model char-acterized by simple calculation and easy programming ithas become the preferred dimension reduction method formost of the feature-level fusion model in order to analyzecomparatively two types of feature-level fusions In this paperPCA-based feature-level fusion of the extracted feature sets isperformed at the same time and the tenfold cross-validationresults are shown in Table 8 Figure 7 shows the classificationperformance of the two types of feature-level fusion methods
BioMed Research International 11
Table 4 Statistics of effectiveness before and after rough set reduction
Serial number Accuracy () Sensibility () Specificity () Processing time (s)
Before reduction
1 9642 9286 100 106102 9196 8393 100 061703 9554 100 9107 054904 8928 100 7857 056305 9554 9107 100 054706 9821 9643 100 054607 9464 8929 100 054608 9553 9107 100 054609 9196 8393 100 0546010 9732 100 9664 05300
Mean 9464 9286 9643 06051
After reduction (Rs1)
1 100 100 100 093702 100 100 100 043603 100 100 100 038704 100 100 100 042105 100 100 100 042106 100 100 100 039007 100 100 100 040608 9167 100 8333 040609 100 100 100 0374010 100 100 100 03930
Mean 9917 100 9833 04571Increase after reduction 453 714 19 0148
Table 5 Effectiveness of rough set reduction subsets
Subset Averageaccuracy ()
Averagesensitivity ()
Averagespecificity ()
Processingtime (s)
RS1 9917 100 9833 04571RS2 975 9667 9833 04650RS3 9917 100 9833 04656RS4 100 100 100 04731RS5 9833 9833 9833 04850Mean 9883 99 9866 04672
(feature subset RS1 fromTable 3 is used and the running timeis 100 times actual time)
Experimental results show that various performanceindicators of the feature-level fusion model based on roughset are better than those based on PCA indicating that therough set is more suitable than PCA to eliminate redundantinformation
434 Comparison with Other Pulmonary Nodule DetectionMethods Pulmonary nodule detection accuracy and FalsePositives per scan (FPs) are used as the evaluation indexesof pulmonary nodule detection methods to compare andanalyze the method proposed in this paper and other five
9293949596979899
100101
5020 4030 3535 3040 2050
Before reductionAfter reduction
Figure 6 Comparative results of feature subsets before and afterrough set reduction
detectionmethods of pulmonary nodules (the optimal detec-tion accuracy is used for all detection methods) The resultsare shown in Table 9 (Pr private database L LIDC)
Experimental results show that the proposed method issuperior to the other pulmonary nodule detection methodsto a certain extent indicating that this method not onlyimproves the comprehensiveness and accuracy of the featuredescription of ROI by supplementing and improving the
12 BioMed Research International
Table 6 Stability statistics of rough set reduction subsets
Trainingsettesting set Accuracy () Sensitivity () Specificity () Running time (s)
Before fusion
5020 9735 9471 100 048734030 9653 9308 9832 038463535 9583 9239 9779 042543040 9616 9558 9674 035602050 9488 9463 9586 04236Mean 9615 9408 97742 04154
After fusion (Rs1)
5020 9971 9941 100 026844030 9896 9958 9846 025683535 9865 9923 9808 023823040 9837 9860 9814 026462050 9825 9767 9884 02636Mean 9879 9884 9870 02583
Table 7 Classification performance of rough set reduction subset
SubsetAverageaccuracy
()
Averagesensitivity
()
Averagespecificity
()
Runningtime (s)
RS1 9917 100 9833 02583RS2 975 9667 9833 02870RS3 9917 100 9833 02560RS4 100 100 100 02531RS5 9833 9833 9833 02656Mean 98834 99 9866 02620
Table 8 Classification performance of feature reduction based onPCA
Serialnumber
Accuracy()
Sensitivity()
Specificity()
10 timesrunningtime (s)
1 9167 8333 100 099702 9674 9348 100 048303 9674 9348 100 048804 9891 100 9783 049505 9348 8696 100 049506 9674 100 9348 051407 9674 100 9348 051208 9457 8913 100 048909 9783 9565 100 0499010 9565 9348 9783 05180Mean 9591 9355 9826 05490
feature components but also improves the firmness of thefeature set by integrating the concept of feature-level fusionbased on rough set to exclude the redundant features and datawith irregular information thereby improving the overallpulmonary nodule detection performance
Table 9 Comparison of the performance of different lung noduledetection methods
Author Database Nodulenumbers
Accuracy() FPs
Santos et al[11] L 260 884 117
MagalhaesBarros Nettoet al [12]
L 48 9065 0138
Ye et al [13] Pr 220 902 82Tan et al [14] L 172 875 4Cascio et al[16] L 148 97 61
Our method Pr 70 9917 047
Rough set
Accuracy Sensitivity Specificity Running time
PCA
020406080
100120
Figure 7 Comparison of two feature-level fusion models
5 Conclusions
In this paper the research status quo of pulmonary noduledetection methods is analyzed and a pulmonary noduledetection model is proposed based on rough set basedfeature-level fusion To address the issues that the featuredescription is insufficient and the characterization is inac-curate in the process of feature extraction six new 3Dfeatures in combination with other 2D and 3D features are
BioMed Research International 13
proposed to extract and quantify the feature information ofROI in this model A rough set based feature-level fusionis employed to reduce the dimensionality of the feature setssince there is redundant information in the extracted high-dimensional features In addition a grid optimization modelis adopted to optimize the SVM kernel function whichis used as the classifier for detection and recognition ofpulmonary nodule Finally the pulmonary nodule detectionperformance of the proposed method is verified with fourgroups of comparative experimentsThe experimental resultsshow that the proposed pulmonary nodule detectionmethodbased on rough set based feature-level fusion is effective withthe classification accuracy that can basicallymeet the require-ments of medical imaging for the detection of pulmonarynodules and therefore is of great value for the detection ofpulmonary nodules and auxiliary diagnosis of lung cancer
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The work is partially supported by National Natural ScienceFoundation of China (Grant No 61561040) Natural ScienceFoundation of Ningxia (Grant No NZ16067) and ScientificResearch Fund of Ningxia Education Department (Grant NoNGY2016084)
References
[1] S P Riaz M Luchtenborg V H Coupland J Spicer M DPeake and H Moslashller ldquoTrends in incidence of small cell lungcancer and all lung cancerrdquo Lung Cancer vol 75 no 3 pp 280ndash284 2012
[2] J Ferlay I Soerjomataram R Dikshit et al ldquoCancer incidenceand mortality worldwide sources methods and major patternsin GLOBOCAN 2012rdquo International Journal of Cancer vol 136no 5 pp E359ndashE386 2015
[3] D R Baldwin ldquoPrediction of risk of lung cancer in populationsand in pulmonary nodules significant progress to drive changesin paradigmsrdquo Lung Cancer vol 89 no 1 pp 1ndash3 2015
[4] S Diciotti G Picozzi M Falchini MMascalchi N Villari andG Valli ldquo3-D segmentation algorithm of small lung nodules inspiral CT imagesrdquo IEEETransactions on Information Technologyin Biomedicine vol 12 no 1 pp 7ndash19 2008
[5] S Theodoridis and K Koutroumbas Pattern Recognition TheAssociated Press 4th edition 2010
[6] B Sahiner H-P Chan L M Hadjiiski et al ldquoEffect of CADon radiologistsrsquo detection of lung nodules on thoracic CT scansanalysis of an observer performance study by nodule sizerdquoAcademic Radiology vol 16 no 12 pp 1518ndash1530 2009
[7] Y Xia S Lu LWen S EberlM Fulham andDD Feng ldquoAuto-mated identification of dementia using FDG-PET imagingrdquoBioMed Research International vol 2014 Article ID 421743 8pages 2014
[8] Y Xia Z Ji and Y Zhang ldquoBrain MRI image segmentationbased on learning local variational Gaussian mixture modelsrdquoNeurocomputing vol 204 pp 189ndash197 2016
[9] M Xian Y Zhang and H D Cheng ldquoFully automatic segmen-tation of breast ultrasound images based on breast characteris-tics in space and frequency domainsrdquo Pattern Recognition vol48 no 2 pp 485ndash497 2015
[10] M Xian Y Zhang H-D Cheng F Xu and J Ding ldquoNeutro-connectedness cutrdquo IEEE Transactions on Image Processing vol25 no 10 pp 4691ndash4703 2016
[11] A M Santos A O de Carvalho Filho A C Silva et alldquoAutomatic detection of small lung nodules in 3D CT datausing Gaussian mixture models Tsallis entropy and SVMrdquoEngineering Applications of Artificial Intelligence vol 36 pp 27ndash39 2014
[12] S Magalhaes Barros Netto A Corrca Silva R AcatauassuNunes and M Gattass ldquoAutomatic segmentation of lungnodules with growing neural gas and support vector machinerdquoComputers in Biology andMedicine vol 42 no 11 pp 1110ndash11212012
[13] X Ye X Lin J Dehmeshki G Slabaugh and G BeddoeldquoShape-based computer-aided detection of lung nodules in tho-racic CT imagesrdquo IEEE Transactions on Biomedical Engineeringvol 56 no 7 pp 1810ndash1820 2009
[14] M Tan R Deklerck B Jansen M Bister and J Cornelis ldquoAnovel computer-aided lung nodule detection system for CTimagesrdquoMedical Physics vol 38 no 10 pp 5630ndash5645 2011
[15] Q Li F Li and K Doi ldquoComputerized Detection of LungNodules in Thin-Section CT Images by Use of SelectiveEnhancement Filters and an Automated Rule-Based ClassifierrdquoAcademic Radiology vol 15 no 2 pp 165ndash175 2008
[16] D Cascio R Magro F Fauci M Iacomi and G RasoldquoAutomatic detection of lung nodules in CT datasets basedon stable 3D mass-spring modelsrdquo Computers in Biology andMedicine vol 42 no 11 pp 1098ndash1109 2012
[17] G Y Wang Y Y Yao and H Yu ldquoA survey on rough set theoryand applicationsrdquo Chinese Journal of Computers vol 32 no 7pp 1229ndash1246 2009
[18] S Udhaya Kumar and H Hannah Inbarani ldquoA novel neigh-borhood rough set based classification approach for medicaldiagnosisrdquo Procedia Computer Science vol 47 pp 351ndash359 2015
[19] H H Inbarani A T Azar and G Jothi ldquoSupervised hybridfeature selection based on PSO and rough sets for medicaldiagnosisrdquo Computer Methods and Programs in Biomedicinevol 113 no 1 pp 175ndash185 2014
[20] S K Shah M F McNitt-Gray S R Rogers et al ldquoComputer-aided diagnosis of the solitary pulmonary nodulerdquo AcademicRadiology vol 12 no 5 pp 570ndash575 2005
[21] A Hassanien ldquoFuzzy rough sets hybrid scheme for breastcancer detectionrdquo Image and Vision Computing vol 25 no 2pp 172ndash183 2007
[22] Z Guo Y Li Y Wang S Liu T Lei and Y Fan ldquoA method ofeffective text extraction for complex video scenerdquoMathematicalProblems in Engineering vol 2016 Article ID 2187647 11 pages2016
[23] M-K Hu ldquoVisual pattern recognition by moment invariantsrdquoIRE Transactions on Information Theory vol 8 no 2 pp 179ndash187 1962
[24] J Chorowski J Wang and J M Zurada ldquoReview and per-formance comparison of SVM- and ELM-based classifiersrdquoNeurocomputing vol 128 pp 507ndash516 2014
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
BioMed Research International 11
Table 4 Statistics of effectiveness before and after rough set reduction
Serial number Accuracy () Sensibility () Specificity () Processing time (s)
Before reduction
1 9642 9286 100 106102 9196 8393 100 061703 9554 100 9107 054904 8928 100 7857 056305 9554 9107 100 054706 9821 9643 100 054607 9464 8929 100 054608 9553 9107 100 054609 9196 8393 100 0546010 9732 100 9664 05300
Mean 9464 9286 9643 06051
After reduction (Rs1)
1 100 100 100 093702 100 100 100 043603 100 100 100 038704 100 100 100 042105 100 100 100 042106 100 100 100 039007 100 100 100 040608 9167 100 8333 040609 100 100 100 0374010 100 100 100 03930
Mean 9917 100 9833 04571Increase after reduction 453 714 19 0148
Table 5 Effectiveness of rough set reduction subsets
Subset Averageaccuracy ()
Averagesensitivity ()
Averagespecificity ()
Processingtime (s)
RS1 9917 100 9833 04571RS2 975 9667 9833 04650RS3 9917 100 9833 04656RS4 100 100 100 04731RS5 9833 9833 9833 04850Mean 9883 99 9866 04672
(feature subset RS1 fromTable 3 is used and the running timeis 100 times actual time)
Experimental results show that various performanceindicators of the feature-level fusion model based on roughset are better than those based on PCA indicating that therough set is more suitable than PCA to eliminate redundantinformation
434 Comparison with Other Pulmonary Nodule DetectionMethods Pulmonary nodule detection accuracy and FalsePositives per scan (FPs) are used as the evaluation indexesof pulmonary nodule detection methods to compare andanalyze the method proposed in this paper and other five
9293949596979899
100101
5020 4030 3535 3040 2050
Before reductionAfter reduction
Figure 6 Comparative results of feature subsets before and afterrough set reduction
detectionmethods of pulmonary nodules (the optimal detec-tion accuracy is used for all detection methods) The resultsare shown in Table 9 (Pr private database L LIDC)
Experimental results show that the proposed method issuperior to the other pulmonary nodule detection methodsto a certain extent indicating that this method not onlyimproves the comprehensiveness and accuracy of the featuredescription of ROI by supplementing and improving the
12 BioMed Research International
Table 6 Stability statistics of rough set reduction subsets
Trainingsettesting set Accuracy () Sensitivity () Specificity () Running time (s)
Before fusion
5020 9735 9471 100 048734030 9653 9308 9832 038463535 9583 9239 9779 042543040 9616 9558 9674 035602050 9488 9463 9586 04236Mean 9615 9408 97742 04154
After fusion (Rs1)
5020 9971 9941 100 026844030 9896 9958 9846 025683535 9865 9923 9808 023823040 9837 9860 9814 026462050 9825 9767 9884 02636Mean 9879 9884 9870 02583
Table 7 Classification performance of rough set reduction subset
SubsetAverageaccuracy
()
Averagesensitivity
()
Averagespecificity
()
Runningtime (s)
RS1 9917 100 9833 02583RS2 975 9667 9833 02870RS3 9917 100 9833 02560RS4 100 100 100 02531RS5 9833 9833 9833 02656Mean 98834 99 9866 02620
Table 8 Classification performance of feature reduction based onPCA
Serialnumber
Accuracy()
Sensitivity()
Specificity()
10 timesrunningtime (s)
1 9167 8333 100 099702 9674 9348 100 048303 9674 9348 100 048804 9891 100 9783 049505 9348 8696 100 049506 9674 100 9348 051407 9674 100 9348 051208 9457 8913 100 048909 9783 9565 100 0499010 9565 9348 9783 05180Mean 9591 9355 9826 05490
feature components but also improves the firmness of thefeature set by integrating the concept of feature-level fusionbased on rough set to exclude the redundant features and datawith irregular information thereby improving the overallpulmonary nodule detection performance
Table 9 Comparison of the performance of different lung noduledetection methods
Author Database Nodulenumbers
Accuracy() FPs
Santos et al[11] L 260 884 117
MagalhaesBarros Nettoet al [12]
L 48 9065 0138
Ye et al [13] Pr 220 902 82Tan et al [14] L 172 875 4Cascio et al[16] L 148 97 61
Our method Pr 70 9917 047
Rough set
Accuracy Sensitivity Specificity Running time
PCA
020406080
100120
Figure 7 Comparison of two feature-level fusion models
5 Conclusions
In this paper the research status quo of pulmonary noduledetection methods is analyzed and a pulmonary noduledetection model is proposed based on rough set basedfeature-level fusion To address the issues that the featuredescription is insufficient and the characterization is inac-curate in the process of feature extraction six new 3Dfeatures in combination with other 2D and 3D features are
BioMed Research International 13
proposed to extract and quantify the feature information ofROI in this model A rough set based feature-level fusionis employed to reduce the dimensionality of the feature setssince there is redundant information in the extracted high-dimensional features In addition a grid optimization modelis adopted to optimize the SVM kernel function whichis used as the classifier for detection and recognition ofpulmonary nodule Finally the pulmonary nodule detectionperformance of the proposed method is verified with fourgroups of comparative experimentsThe experimental resultsshow that the proposed pulmonary nodule detectionmethodbased on rough set based feature-level fusion is effective withthe classification accuracy that can basicallymeet the require-ments of medical imaging for the detection of pulmonarynodules and therefore is of great value for the detection ofpulmonary nodules and auxiliary diagnosis of lung cancer
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The work is partially supported by National Natural ScienceFoundation of China (Grant No 61561040) Natural ScienceFoundation of Ningxia (Grant No NZ16067) and ScientificResearch Fund of Ningxia Education Department (Grant NoNGY2016084)
References
[1] S P Riaz M Luchtenborg V H Coupland J Spicer M DPeake and H Moslashller ldquoTrends in incidence of small cell lungcancer and all lung cancerrdquo Lung Cancer vol 75 no 3 pp 280ndash284 2012
[2] J Ferlay I Soerjomataram R Dikshit et al ldquoCancer incidenceand mortality worldwide sources methods and major patternsin GLOBOCAN 2012rdquo International Journal of Cancer vol 136no 5 pp E359ndashE386 2015
[3] D R Baldwin ldquoPrediction of risk of lung cancer in populationsand in pulmonary nodules significant progress to drive changesin paradigmsrdquo Lung Cancer vol 89 no 1 pp 1ndash3 2015
[4] S Diciotti G Picozzi M Falchini MMascalchi N Villari andG Valli ldquo3-D segmentation algorithm of small lung nodules inspiral CT imagesrdquo IEEETransactions on Information Technologyin Biomedicine vol 12 no 1 pp 7ndash19 2008
[5] S Theodoridis and K Koutroumbas Pattern Recognition TheAssociated Press 4th edition 2010
[6] B Sahiner H-P Chan L M Hadjiiski et al ldquoEffect of CADon radiologistsrsquo detection of lung nodules on thoracic CT scansanalysis of an observer performance study by nodule sizerdquoAcademic Radiology vol 16 no 12 pp 1518ndash1530 2009
[7] Y Xia S Lu LWen S EberlM Fulham andDD Feng ldquoAuto-mated identification of dementia using FDG-PET imagingrdquoBioMed Research International vol 2014 Article ID 421743 8pages 2014
[8] Y Xia Z Ji and Y Zhang ldquoBrain MRI image segmentationbased on learning local variational Gaussian mixture modelsrdquoNeurocomputing vol 204 pp 189ndash197 2016
[9] M Xian Y Zhang and H D Cheng ldquoFully automatic segmen-tation of breast ultrasound images based on breast characteris-tics in space and frequency domainsrdquo Pattern Recognition vol48 no 2 pp 485ndash497 2015
[10] M Xian Y Zhang H-D Cheng F Xu and J Ding ldquoNeutro-connectedness cutrdquo IEEE Transactions on Image Processing vol25 no 10 pp 4691ndash4703 2016
[11] A M Santos A O de Carvalho Filho A C Silva et alldquoAutomatic detection of small lung nodules in 3D CT datausing Gaussian mixture models Tsallis entropy and SVMrdquoEngineering Applications of Artificial Intelligence vol 36 pp 27ndash39 2014
[12] S Magalhaes Barros Netto A Corrca Silva R AcatauassuNunes and M Gattass ldquoAutomatic segmentation of lungnodules with growing neural gas and support vector machinerdquoComputers in Biology andMedicine vol 42 no 11 pp 1110ndash11212012
[13] X Ye X Lin J Dehmeshki G Slabaugh and G BeddoeldquoShape-based computer-aided detection of lung nodules in tho-racic CT imagesrdquo IEEE Transactions on Biomedical Engineeringvol 56 no 7 pp 1810ndash1820 2009
[14] M Tan R Deklerck B Jansen M Bister and J Cornelis ldquoAnovel computer-aided lung nodule detection system for CTimagesrdquoMedical Physics vol 38 no 10 pp 5630ndash5645 2011
[15] Q Li F Li and K Doi ldquoComputerized Detection of LungNodules in Thin-Section CT Images by Use of SelectiveEnhancement Filters and an Automated Rule-Based ClassifierrdquoAcademic Radiology vol 15 no 2 pp 165ndash175 2008
[16] D Cascio R Magro F Fauci M Iacomi and G RasoldquoAutomatic detection of lung nodules in CT datasets basedon stable 3D mass-spring modelsrdquo Computers in Biology andMedicine vol 42 no 11 pp 1098ndash1109 2012
[17] G Y Wang Y Y Yao and H Yu ldquoA survey on rough set theoryand applicationsrdquo Chinese Journal of Computers vol 32 no 7pp 1229ndash1246 2009
[18] S Udhaya Kumar and H Hannah Inbarani ldquoA novel neigh-borhood rough set based classification approach for medicaldiagnosisrdquo Procedia Computer Science vol 47 pp 351ndash359 2015
[19] H H Inbarani A T Azar and G Jothi ldquoSupervised hybridfeature selection based on PSO and rough sets for medicaldiagnosisrdquo Computer Methods and Programs in Biomedicinevol 113 no 1 pp 175ndash185 2014
[20] S K Shah M F McNitt-Gray S R Rogers et al ldquoComputer-aided diagnosis of the solitary pulmonary nodulerdquo AcademicRadiology vol 12 no 5 pp 570ndash575 2005
[21] A Hassanien ldquoFuzzy rough sets hybrid scheme for breastcancer detectionrdquo Image and Vision Computing vol 25 no 2pp 172ndash183 2007
[22] Z Guo Y Li Y Wang S Liu T Lei and Y Fan ldquoA method ofeffective text extraction for complex video scenerdquoMathematicalProblems in Engineering vol 2016 Article ID 2187647 11 pages2016
[23] M-K Hu ldquoVisual pattern recognition by moment invariantsrdquoIRE Transactions on Information Theory vol 8 no 2 pp 179ndash187 1962
[24] J Chorowski J Wang and J M Zurada ldquoReview and per-formance comparison of SVM- and ELM-based classifiersrdquoNeurocomputing vol 128 pp 507ndash516 2014
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
12 BioMed Research International
Table 6 Stability statistics of rough set reduction subsets
Trainingsettesting set Accuracy () Sensitivity () Specificity () Running time (s)
Before fusion
5020 9735 9471 100 048734030 9653 9308 9832 038463535 9583 9239 9779 042543040 9616 9558 9674 035602050 9488 9463 9586 04236Mean 9615 9408 97742 04154
After fusion (Rs1)
5020 9971 9941 100 026844030 9896 9958 9846 025683535 9865 9923 9808 023823040 9837 9860 9814 026462050 9825 9767 9884 02636Mean 9879 9884 9870 02583
Table 7 Classification performance of rough set reduction subset
SubsetAverageaccuracy
()
Averagesensitivity
()
Averagespecificity
()
Runningtime (s)
RS1 9917 100 9833 02583RS2 975 9667 9833 02870RS3 9917 100 9833 02560RS4 100 100 100 02531RS5 9833 9833 9833 02656Mean 98834 99 9866 02620
Table 8 Classification performance of feature reduction based onPCA
Serialnumber
Accuracy()
Sensitivity()
Specificity()
10 timesrunningtime (s)
1 9167 8333 100 099702 9674 9348 100 048303 9674 9348 100 048804 9891 100 9783 049505 9348 8696 100 049506 9674 100 9348 051407 9674 100 9348 051208 9457 8913 100 048909 9783 9565 100 0499010 9565 9348 9783 05180Mean 9591 9355 9826 05490
feature components but also improves the firmness of thefeature set by integrating the concept of feature-level fusionbased on rough set to exclude the redundant features and datawith irregular information thereby improving the overallpulmonary nodule detection performance
Table 9 Comparison of the performance of different lung noduledetection methods
Author Database Nodulenumbers
Accuracy() FPs
Santos et al[11] L 260 884 117
MagalhaesBarros Nettoet al [12]
L 48 9065 0138
Ye et al [13] Pr 220 902 82Tan et al [14] L 172 875 4Cascio et al[16] L 148 97 61
Our method Pr 70 9917 047
Rough set
Accuracy Sensitivity Specificity Running time
PCA
020406080
100120
Figure 7 Comparison of two feature-level fusion models
5 Conclusions
In this paper the research status quo of pulmonary noduledetection methods is analyzed and a pulmonary noduledetection model is proposed based on rough set basedfeature-level fusion To address the issues that the featuredescription is insufficient and the characterization is inac-curate in the process of feature extraction six new 3Dfeatures in combination with other 2D and 3D features are
BioMed Research International 13
proposed to extract and quantify the feature information ofROI in this model A rough set based feature-level fusionis employed to reduce the dimensionality of the feature setssince there is redundant information in the extracted high-dimensional features In addition a grid optimization modelis adopted to optimize the SVM kernel function whichis used as the classifier for detection and recognition ofpulmonary nodule Finally the pulmonary nodule detectionperformance of the proposed method is verified with fourgroups of comparative experimentsThe experimental resultsshow that the proposed pulmonary nodule detectionmethodbased on rough set based feature-level fusion is effective withthe classification accuracy that can basicallymeet the require-ments of medical imaging for the detection of pulmonarynodules and therefore is of great value for the detection ofpulmonary nodules and auxiliary diagnosis of lung cancer
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The work is partially supported by National Natural ScienceFoundation of China (Grant No 61561040) Natural ScienceFoundation of Ningxia (Grant No NZ16067) and ScientificResearch Fund of Ningxia Education Department (Grant NoNGY2016084)
References
[1] S P Riaz M Luchtenborg V H Coupland J Spicer M DPeake and H Moslashller ldquoTrends in incidence of small cell lungcancer and all lung cancerrdquo Lung Cancer vol 75 no 3 pp 280ndash284 2012
[2] J Ferlay I Soerjomataram R Dikshit et al ldquoCancer incidenceand mortality worldwide sources methods and major patternsin GLOBOCAN 2012rdquo International Journal of Cancer vol 136no 5 pp E359ndashE386 2015
[3] D R Baldwin ldquoPrediction of risk of lung cancer in populationsand in pulmonary nodules significant progress to drive changesin paradigmsrdquo Lung Cancer vol 89 no 1 pp 1ndash3 2015
[4] S Diciotti G Picozzi M Falchini MMascalchi N Villari andG Valli ldquo3-D segmentation algorithm of small lung nodules inspiral CT imagesrdquo IEEETransactions on Information Technologyin Biomedicine vol 12 no 1 pp 7ndash19 2008
[5] S Theodoridis and K Koutroumbas Pattern Recognition TheAssociated Press 4th edition 2010
[6] B Sahiner H-P Chan L M Hadjiiski et al ldquoEffect of CADon radiologistsrsquo detection of lung nodules on thoracic CT scansanalysis of an observer performance study by nodule sizerdquoAcademic Radiology vol 16 no 12 pp 1518ndash1530 2009
[7] Y Xia S Lu LWen S EberlM Fulham andDD Feng ldquoAuto-mated identification of dementia using FDG-PET imagingrdquoBioMed Research International vol 2014 Article ID 421743 8pages 2014
[8] Y Xia Z Ji and Y Zhang ldquoBrain MRI image segmentationbased on learning local variational Gaussian mixture modelsrdquoNeurocomputing vol 204 pp 189ndash197 2016
[9] M Xian Y Zhang and H D Cheng ldquoFully automatic segmen-tation of breast ultrasound images based on breast characteris-tics in space and frequency domainsrdquo Pattern Recognition vol48 no 2 pp 485ndash497 2015
[10] M Xian Y Zhang H-D Cheng F Xu and J Ding ldquoNeutro-connectedness cutrdquo IEEE Transactions on Image Processing vol25 no 10 pp 4691ndash4703 2016
[11] A M Santos A O de Carvalho Filho A C Silva et alldquoAutomatic detection of small lung nodules in 3D CT datausing Gaussian mixture models Tsallis entropy and SVMrdquoEngineering Applications of Artificial Intelligence vol 36 pp 27ndash39 2014
[12] S Magalhaes Barros Netto A Corrca Silva R AcatauassuNunes and M Gattass ldquoAutomatic segmentation of lungnodules with growing neural gas and support vector machinerdquoComputers in Biology andMedicine vol 42 no 11 pp 1110ndash11212012
[13] X Ye X Lin J Dehmeshki G Slabaugh and G BeddoeldquoShape-based computer-aided detection of lung nodules in tho-racic CT imagesrdquo IEEE Transactions on Biomedical Engineeringvol 56 no 7 pp 1810ndash1820 2009
[14] M Tan R Deklerck B Jansen M Bister and J Cornelis ldquoAnovel computer-aided lung nodule detection system for CTimagesrdquoMedical Physics vol 38 no 10 pp 5630ndash5645 2011
[15] Q Li F Li and K Doi ldquoComputerized Detection of LungNodules in Thin-Section CT Images by Use of SelectiveEnhancement Filters and an Automated Rule-Based ClassifierrdquoAcademic Radiology vol 15 no 2 pp 165ndash175 2008
[16] D Cascio R Magro F Fauci M Iacomi and G RasoldquoAutomatic detection of lung nodules in CT datasets basedon stable 3D mass-spring modelsrdquo Computers in Biology andMedicine vol 42 no 11 pp 1098ndash1109 2012
[17] G Y Wang Y Y Yao and H Yu ldquoA survey on rough set theoryand applicationsrdquo Chinese Journal of Computers vol 32 no 7pp 1229ndash1246 2009
[18] S Udhaya Kumar and H Hannah Inbarani ldquoA novel neigh-borhood rough set based classification approach for medicaldiagnosisrdquo Procedia Computer Science vol 47 pp 351ndash359 2015
[19] H H Inbarani A T Azar and G Jothi ldquoSupervised hybridfeature selection based on PSO and rough sets for medicaldiagnosisrdquo Computer Methods and Programs in Biomedicinevol 113 no 1 pp 175ndash185 2014
[20] S K Shah M F McNitt-Gray S R Rogers et al ldquoComputer-aided diagnosis of the solitary pulmonary nodulerdquo AcademicRadiology vol 12 no 5 pp 570ndash575 2005
[21] A Hassanien ldquoFuzzy rough sets hybrid scheme for breastcancer detectionrdquo Image and Vision Computing vol 25 no 2pp 172ndash183 2007
[22] Z Guo Y Li Y Wang S Liu T Lei and Y Fan ldquoA method ofeffective text extraction for complex video scenerdquoMathematicalProblems in Engineering vol 2016 Article ID 2187647 11 pages2016
[23] M-K Hu ldquoVisual pattern recognition by moment invariantsrdquoIRE Transactions on Information Theory vol 8 no 2 pp 179ndash187 1962
[24] J Chorowski J Wang and J M Zurada ldquoReview and per-formance comparison of SVM- and ELM-based classifiersrdquoNeurocomputing vol 128 pp 507ndash516 2014
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
BioMed Research International 13
proposed to extract and quantify the feature information ofROI in this model A rough set based feature-level fusionis employed to reduce the dimensionality of the feature setssince there is redundant information in the extracted high-dimensional features In addition a grid optimization modelis adopted to optimize the SVM kernel function whichis used as the classifier for detection and recognition ofpulmonary nodule Finally the pulmonary nodule detectionperformance of the proposed method is verified with fourgroups of comparative experimentsThe experimental resultsshow that the proposed pulmonary nodule detectionmethodbased on rough set based feature-level fusion is effective withthe classification accuracy that can basicallymeet the require-ments of medical imaging for the detection of pulmonarynodules and therefore is of great value for the detection ofpulmonary nodules and auxiliary diagnosis of lung cancer
Competing Interests
The authors declare that they have no competing interests
Acknowledgments
The work is partially supported by National Natural ScienceFoundation of China (Grant No 61561040) Natural ScienceFoundation of Ningxia (Grant No NZ16067) and ScientificResearch Fund of Ningxia Education Department (Grant NoNGY2016084)
References
[1] S P Riaz M Luchtenborg V H Coupland J Spicer M DPeake and H Moslashller ldquoTrends in incidence of small cell lungcancer and all lung cancerrdquo Lung Cancer vol 75 no 3 pp 280ndash284 2012
[2] J Ferlay I Soerjomataram R Dikshit et al ldquoCancer incidenceand mortality worldwide sources methods and major patternsin GLOBOCAN 2012rdquo International Journal of Cancer vol 136no 5 pp E359ndashE386 2015
[3] D R Baldwin ldquoPrediction of risk of lung cancer in populationsand in pulmonary nodules significant progress to drive changesin paradigmsrdquo Lung Cancer vol 89 no 1 pp 1ndash3 2015
[4] S Diciotti G Picozzi M Falchini MMascalchi N Villari andG Valli ldquo3-D segmentation algorithm of small lung nodules inspiral CT imagesrdquo IEEETransactions on Information Technologyin Biomedicine vol 12 no 1 pp 7ndash19 2008
[5] S Theodoridis and K Koutroumbas Pattern Recognition TheAssociated Press 4th edition 2010
[6] B Sahiner H-P Chan L M Hadjiiski et al ldquoEffect of CADon radiologistsrsquo detection of lung nodules on thoracic CT scansanalysis of an observer performance study by nodule sizerdquoAcademic Radiology vol 16 no 12 pp 1518ndash1530 2009
[7] Y Xia S Lu LWen S EberlM Fulham andDD Feng ldquoAuto-mated identification of dementia using FDG-PET imagingrdquoBioMed Research International vol 2014 Article ID 421743 8pages 2014
[8] Y Xia Z Ji and Y Zhang ldquoBrain MRI image segmentationbased on learning local variational Gaussian mixture modelsrdquoNeurocomputing vol 204 pp 189ndash197 2016
[9] M Xian Y Zhang and H D Cheng ldquoFully automatic segmen-tation of breast ultrasound images based on breast characteris-tics in space and frequency domainsrdquo Pattern Recognition vol48 no 2 pp 485ndash497 2015
[10] M Xian Y Zhang H-D Cheng F Xu and J Ding ldquoNeutro-connectedness cutrdquo IEEE Transactions on Image Processing vol25 no 10 pp 4691ndash4703 2016
[11] A M Santos A O de Carvalho Filho A C Silva et alldquoAutomatic detection of small lung nodules in 3D CT datausing Gaussian mixture models Tsallis entropy and SVMrdquoEngineering Applications of Artificial Intelligence vol 36 pp 27ndash39 2014
[12] S Magalhaes Barros Netto A Corrca Silva R AcatauassuNunes and M Gattass ldquoAutomatic segmentation of lungnodules with growing neural gas and support vector machinerdquoComputers in Biology andMedicine vol 42 no 11 pp 1110ndash11212012
[13] X Ye X Lin J Dehmeshki G Slabaugh and G BeddoeldquoShape-based computer-aided detection of lung nodules in tho-racic CT imagesrdquo IEEE Transactions on Biomedical Engineeringvol 56 no 7 pp 1810ndash1820 2009
[14] M Tan R Deklerck B Jansen M Bister and J Cornelis ldquoAnovel computer-aided lung nodule detection system for CTimagesrdquoMedical Physics vol 38 no 10 pp 5630ndash5645 2011
[15] Q Li F Li and K Doi ldquoComputerized Detection of LungNodules in Thin-Section CT Images by Use of SelectiveEnhancement Filters and an Automated Rule-Based ClassifierrdquoAcademic Radiology vol 15 no 2 pp 165ndash175 2008
[16] D Cascio R Magro F Fauci M Iacomi and G RasoldquoAutomatic detection of lung nodules in CT datasets basedon stable 3D mass-spring modelsrdquo Computers in Biology andMedicine vol 42 no 11 pp 1098ndash1109 2012
[17] G Y Wang Y Y Yao and H Yu ldquoA survey on rough set theoryand applicationsrdquo Chinese Journal of Computers vol 32 no 7pp 1229ndash1246 2009
[18] S Udhaya Kumar and H Hannah Inbarani ldquoA novel neigh-borhood rough set based classification approach for medicaldiagnosisrdquo Procedia Computer Science vol 47 pp 351ndash359 2015
[19] H H Inbarani A T Azar and G Jothi ldquoSupervised hybridfeature selection based on PSO and rough sets for medicaldiagnosisrdquo Computer Methods and Programs in Biomedicinevol 113 no 1 pp 175ndash185 2014
[20] S K Shah M F McNitt-Gray S R Rogers et al ldquoComputer-aided diagnosis of the solitary pulmonary nodulerdquo AcademicRadiology vol 12 no 5 pp 570ndash575 2005
[21] A Hassanien ldquoFuzzy rough sets hybrid scheme for breastcancer detectionrdquo Image and Vision Computing vol 25 no 2pp 172ndash183 2007
[22] Z Guo Y Li Y Wang S Liu T Lei and Y Fan ldquoA method ofeffective text extraction for complex video scenerdquoMathematicalProblems in Engineering vol 2016 Article ID 2187647 11 pages2016
[23] M-K Hu ldquoVisual pattern recognition by moment invariantsrdquoIRE Transactions on Information Theory vol 8 no 2 pp 179ndash187 1962
[24] J Chorowski J Wang and J M Zurada ldquoReview and per-formance comparison of SVM- and ELM-based classifiersrdquoNeurocomputing vol 128 pp 507ndash516 2014
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom
Submit your manuscripts athttpwwwhindawicom
Stem CellsInternational
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MEDIATORSINFLAMMATION
of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Behavioural Neurology
EndocrinologyInternational Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Disease Markers
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
BioMed Research International
OncologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Oxidative Medicine and Cellular Longevity
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
PPAR Research
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Immunology ResearchHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
ObesityJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational and Mathematical Methods in Medicine
OphthalmologyJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Diabetes ResearchJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Research and TreatmentAIDS
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Gastroenterology Research and Practice
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Parkinsonrsquos Disease
Evidence-Based Complementary and Alternative Medicine
Volume 2014Hindawi Publishing Corporationhttpwwwhindawicom