SummerPresentation_FrankWaggoner_2015

Summer Research Project

Frank WaggonerA Computational Analysis and Classification of Tumors

IntroductionBreast Cancer is the most frequently diagnosed cancer in women and the second most deadly.Mammography is a common method of early detection.While mammograms are incredibly powerful tools, interpreting them is not always easy.

2

Studying MammogramsExperts such as radiologists have been trained to study mammograms and determine whether lesions are benign or malignant, invasive or noninvasive, ect.The human eye, however, cannot always catch subtle details in these images and if even a few are missed, a patient could be misdiagnosed.

Go beyond malignant and benign, also invasive or non-invasive. Also lesion is identified first3

Studying MammogramsComputer aided image analysis is a powerful tool that could potentially help us in the process of studying mammograms.

ROI with segmented lesion (snake segmentation method)

Features77 characteristics or features can be identified and quantified from the examination of mammograms.We believe that we can identify a small number of these characteristics, that when examined, will tell us whether a lesion is malignant or benign.We also believe this process can be applied to other binary classifiers such as invasive or noninvasive for cancerous lesions.

Feature SelectionThe process of feature selection is a useful method for a variety of reasons.Simplifies tests and modelsShorter training timesReduces over-fitting

Dont say things are tools, create methods, methodology. 6

Tools and ResourcesA HIPPA/IRB approved data set of mammograms. Matlab statistical calculations (r2, stepwisefit etc.)Various segmentation programs and methodsROC AnalysisLDA

The Data106 malignant cases - 11 noninvasive- 66 invasive- 29 unknown 76 benign cases

ProcedureInitial tests Feature selection Feature combination

Initial TestsGiven the initial data set, scatterplots were created, plotting different features against each other.On this particular plot, the average values for both benign and malignant features have been circled. Notice the separation.

Initial TestsHere are two more features plotted against each other. Their averages we also plotted. It is not a coincidence that these points are close together, these two features were both found to be very good at separating malignant and benign cases.

Initial TestsROC curves were next plotted for each feature.AUCs for each curve were recorded and compared.This particular curve comes from a feature with a particularly high AUC.

AUC: .7472

Initial TestsThe features with the 5 highest AUCs were, - feature 3: Margin Sharpness, AUC of .6717- feature 4: FWHM (full-width of half maximum) ROI, AUC of .7472 - feature 8: Texture (std. dev), AUC of .7112- feature 9: FWHM margin, AUC of .7456- feature 15: diameter, AUC of .673

Explain ROC analysis, explain in detail what FWHM is.13

ResultsFeature NumberFeature NameAUC3Margin Sharpness.67174FWHM (full-width of half maximum) ROI.7472 8Texture (std. dev).71129FWHM margin.745615Diameter.673

Full Width at Half Max

Several phenotypes are defined in terms of FWHM or Full Width at Half Max.

Comparing MeansNext, the mean values of each feature for both malignant and benign cases were compared.A t-test was used to see if the difference in the means was statistically significant.Feature 4: 7.613E-9Feature 9: 9.508E-9Feature 8: 9.454E-6Feature 3: 1.130E-5Feature 15: 4.480E-5

Same as AUC data, one of the two may not be necessary.16

Stepwise FitUsed to select features by performing multi-linear regression.A stepwise fit function was used for the initial feature selection process.

ResultsFeature NumberPhenotype3Margin Sharpness4FWHM ROI15Diameter16Circularity59Beta 4

These are the results of the the stepwise fit function.

LDALinear Discriminate Analysis Used to find a combination of features that results in a linear classifier.Linear classifier: something that will effectively separate objects into to distinct classes.

Explain how the function ignores 1 case, builds a model and then does the same for every other case.19

ResultsFeature NumberPhenotypeFrequencyMalignant Trend3Margin Sharpness141Negative4FWHM ROI178Positive15Diameter182Positive16Circularity182Positive59Beta 4141Negative9FWHM Margin4Positive50Skewness1Negative

Combined Features

AUC: .76With 95% Confidence Interval [.70, .82]

Contralateral vs. LesionOf our 77 features, the first 32 are from the lesion, while the other 45 come from the contralateral breast.Do features from the contralateral breast have a significant influence on the classifier at hand?

ResultsFeature LocationFeatures SelectedPhenotypesLesion3, 4, 15, 16Margin Sharpness, FWHM ROI, Diameter, Circularity

Contralateral BreastNoneN/A

No phenotypes were selected when only features from the contralateral breast were analyzed. In our previous analysis, the beta 4 phenotype was the only phenotype fromthe contralateral breast to be selected. This feature was not present in the lesion selection,as expected, but was also not selected for the contralateral breast analysis. We canconclude that looking at the contralateral breast is, in fact, useful but only if the lesion isalso examined.

Invasive vs. NoninvasiveOf the Malignant cases, 11 were identified as noninvasive, 66 were identified as invasive and 29 were identified as unknown.ROC analysis was impossible to conduct, however, because the distribution of the noninvasive cases was not normal and the sample size of these cases was small.

Relieff Algorithm Uses K nearest neighbors.Takes in matrix of features, the truth vector and a number k.Classifies based on proximity to other cases.Ranks features in order of importance.

ResultsRanking123456Number6629305813PhentotypeFWHM GrownBeta 7FWHM MarginSum EntropyBeta 3Average Gray Value

The only common feature between these results and our results using LDA is feature 4, FWHM ROI

Features 62 and 58 were found to be highly correlated with each other, as were features 6 and 30. This was expected given which features they represent.

Feature Selection with SVM-RFESupport vector machine (SVM) algorithms use pattern recognition and regression analysis.Can perform linear and non-linear classifications using the kernel trick.Feature ranking was performed for the default kernel setting, which uses a radial basis function and for a linear kernel type.

Kernal trick maps inputs into higher dimensional spaces27

Results: SVM-RFE rbf modelRank123456Feature Number415931661PhenotypeFWHM ROIDiameterFWHM MarginMargin SharpnessCircularityBeta 6

5 of the six features selected were the same as those selected when stepwisefit was performed.The features were combined using LDA and ROC analysis was then performed, yielding an AUC of .82

Results: SVM-RFE linear model Rank123456Feature Number586256111245PhenotypeBeta 3Beta 7Beta 1Contrast2Contrast1MinCDF (5% threshold)

Features 58, 62 and 56 were found to be highly correlated with each other. Features 11 and 12 were also found to be highly correlated.For good measure, the features were combined using LDA and ROC analysis was still performed. An AUC of .54 was found.

Method of SelectionStepwise FitK-nearest NeighborsSVM-RFEFeatures SelectedMargin SharpnessFWHM GrownFWHM ROIFWHM ROIBeta 7DiameterDiameterFWHM MarginFWHM MarginCircularitySum EntropyMargin SharpnessBeta 4Beta 3CircularityAverage Gray ValueBeta 6Overal Performance0.760.730.82Tumor Features SelectedMargin SharpnessRadialGrad ROIFWHM ROIFWHM ROIFWHM ROIDiameterDiamterSum VarianceFWHM MarginCircularityFWHM MarginMargin SharpnessSum AverageCircularityMargin SharpnessTexture(STD div)Tumor Feature Performance0.770.760.81Parenchyma Features SelectedNoneMaxEdgeGradientMinCDF (5% thresholdD6CoarsnessSkewnessSum AverageCoarsnessBalance 2D5Beta 2MinEdgeGradientBeta 6Parenchyma Feature PerformanceN/A0.580.56

Testing for significanceWe want to know if the difference between the AUCs for the overall performance of the SVM selection and the stepwise selection is significant.95% confidence intervals were created for each AUC.Stepwise: .75 (.68, .82)SVM: .82 (.76, .88)95% confidence interval for the difference was found to be, (-.0857, -.0333).The p-value for the difference was found to be < .0001

SourcesLi, Giger, Lan, Janardanan, Sennett, Comparative analysis of image-based phenotypes of mammographic density Journal of Medical ImagingLi, Giger, Lan, Brown, Computerized Analysis of Mammographic Parenchymal Patterns on a Large Clinical Dataset Li, Giger, Sun, Ponsukchareon, Pilot study demonstrating potential association between breast cancer The International Journal of Medical Physics Research and Practice

Yuan, Correlative Analysis of Breast Lesions on Full-Field Digital Mammography and Magnetic Resonance Imaging, A Dissertation Submitted to the Faculty of the Division of Biological SciencesMetz, Receiver Operating Characteristic Analysis: A Tool for the Quantitative Evaluation of Observer Performance and Imaging Systems,

Date post:	14-Jan-2017
Category:	Documents
Upload:	frank-waggoner
View:	11 times
Download:	0 times

SummerPresentation_FrankWaggoner_2015

Documents