+ All Categories
Home > Documents > A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf ·...

A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf ·...

Date post: 05-Aug-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
17
A Hybrid Evolu.onary Feature Selec.on Method for Microarray Data Denson Smith Sumaiya Iqbal Md Tamjidul Hoque {dsmith8, siqbal1, thoque}@uno.edu University of New Orleans
Transcript
Page 1: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a

AHybridEvolu.onaryFeatureSelec.onMethodforMicroarrayData

DensonSmithSumaiyaIqbal

MdTamjidulHoque{dsmith8,siqbal1,thoque}@uno.edu

UniversityofNewOrleans

Page 2: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a

AbstractDNAmicroarraydataallowstheanalysisoftheexpressionlevelofthousandsofgenessimultaneously.Thisprocesscancapturethecurrentstateofthegeneregula.onwithinacellbycapturingmRNAexpressions,insteadoftediousquan.tateandqualita.vemeasurementofproteinexpressions,whichwouldhavebeenmoreaccuratemeasureofthecellularac.vi.es.Aswearemeasuringtheindirectinterac.onusingmRNAexpression,wethereforeneedtohaverobustapproachestoinferthetruesta.s.cs.Thisapproachwillmakeitpossibletohaveclinicallyand/orscien.ficallyusefulpredic.onssuchasdiagnosingdiseases,theiden.fica.onoftumortypesandtreatmentselec.on.Manysta.s.calclassifica.onmethodsareavailableforthistypeoftask.Further,acentraldifficultyinsuchsta.s.calclassifica.onisthat,someofthefeatures(variables)inthedatamaybeirrelevantorredundanttothepredic.ontask.Irrelevantandredundantdatacomplicateandconfoundtheclassifica.onprocess,therefore,itisdesirabletoiden.fyandeliminatevariablesthatarenotusefulfortheclassifica.ontask.TheaimofthisresearchistoproposearobustmethodologyforclassifyingDNAmicroarraydatausingfeatureselec.on,whichistheprocessofiden.fyingandelimina.ngfeaturesthatareirrelevantorredundant.Theproposedmethodperformseffec.vefeatureselec.ontoiden.fyasubsetofgenesthatbestdescribeadisease.Twowell-knownDNAmicroarraydatasetswereusedtovalidatethemethod.

Page 3: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a

FeatureSelec.on•  Theprocessofselec.ngasubsetofrelevantfeatures(variables)forusein

classifica.onmodelconstruc.onisknownasfeatureselec.on(a.k.a.:variableselec.on,aXributeselec.onorvariablesubsetselec.on).

•  Classifica.onmodelsconstructedwithanop.malsubsetoffeatureshavebeendemonstratedbothintheoryandprac.cetobefastertotrain,fastertorun,provideabeXerunderstandingoftheunderlyingprocesses,haveimprovedpredic.veaccuracy,beXergeneraliza.onandreducedmodelcomplexity

Page 4: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a

MicroarrayDataChallengesforClassifica.on

•  Manydatasetsarehighdimensional,i.e.thousandsortensofthousandsoffeatures.

•  Manyofthefeaturesareredundant,irrelevantorweaklyrelevant.

•  Datasetso[encontainsmissingand/orincorrectvalues.

•  Therearepossiblymislabeledsamples.

•  Usually,therearerela.velyfewsamplesavailablefortrainingandvalida.onofthemodel.

Page 5: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a

ExampleDataset:BreastCancer•  Goalistoclassifytestsamplesasrelapseornot-relapse(binary

classifica.on).

•  “WellKnown”datasetfromKentRidgeBio-medicalDatasetRepository.

•  24481geneexpressionra.os

•  78trainingsamples

•  19testsamples

•  Missingdata

Page 6: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a

Gene.cForestFeatureSelec.onAlgorithm

Page 7: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a

ExtraTreeClassifier

Page 8: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a

FeatureImportanceEs.mates

Page 9: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a

FeatureImportanceEs.mates

Page 10: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a

Workflow

Page 11: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a

ResultsDarkercolorsindicatefeaturesthatappearinmorecandidatefeaturesets.Lightercolorsindicatefeaturesthatappearinfewercandidatefeaturesets.Featuresthatdonotappearinanycandidatefeaturesetarelikelytobeirrelevant.Rowswithequalornearequalperformancebutdifferentfeatureslikelycontainfeaturesthataremutuallyredundant.Asetof10candidatefeaturesisgeneratedforeachfitnessmetric:1.  MCC2.  AUC3.  accuracy4.  F15.  (MCC+AUC)/26.  (F1+AUC)/27.  (accuracy+AUC)/28.  (precision+recall)/2

Page 12: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a

Results

bestMCCfound metric:accuracy+AUC elite:4

#features 32AUC 0.8571

accuracy 0.9474precision 1.0000

recall 0.8571F1 0.9231

MCC 0.8895

allfeaturesmetric:None

#features 24187AUC 0.8393

accuracy 0.8421precision 0.8333

recall 0.7143F1 0.7692

MCC 0.6548!!

MCC = (TP ×TN)−(FP ×FN)(TP +FP)(TP +FN)(TN +FP)(TN +FN)

where,TP = the!number!of!true!positivesTN = the!number!of!true!negativesFP = the!number!of!false!positivesFN = the!number!of!false!negatives

MaXhewsCorrela.onCoefficient

Page 13: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a

MethodComparison

Classifica?ontechnique Selec?ontechnique #ofgenes %accuracy ReferenceSVM PSO 20 1.0000[2]SVM ABC 5 0.9470[3]ET GFFS 32 0.9470ProposedmethodJ48 GA 41 0.9381[4]SMV DRF0-1G 44 0.8421[1]

•  PSO–par.cleswarmop.miza.on•  ABC–ar.ficialbeecolony•  GFFS–gene.cforestfeatureselector•  GA–gene.calgorithm•  J48–decisiontree•  LDAGA–lineardiscriminateanalysisgene.calgorithm•  Filter–correla.onofindividualgeneexpressionwithtargetclass

Page 14: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a

Overfisng?

•  Somecandidatefeaturesetsthatperformedwellwiththetrainingdataperformedverypoorlywiththevalida.ondata.

•  Thisislikelyduetospuriousrela.onshipsbetweenirrelevantfeaturesandthetargetclass.•  Ifthisisthecausethenfeatureselec.onmaybeviewedasaformofoverfisngthetrainingdata.•  Thisillustrateswhyavalida.onsetkeptseparateduringfeatureselec.oniscrucial.

Page 15: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a

Conclusions•  Theusualgoaloffeatureselec.onistoiden.fyandremoveallirrelevant

andredundantfeatures

•  Redundantfeaturesprovideanopportunitytomi.gateoratleastpredictperformancelossduetomissingdata

•  Selectedfeaturesmayprovideinsightsofgenescorrelatedwiththedisease

•  Featureselec.onmaybeaformofoverfisngtrainingdata

•  Avalida.ondatasetiscrucialtothefeatureselec.onprocess

Page 16: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a

FutureWork•  Reapplyfeatureselec.onusingonlythecandidatefeaturesetsto

determineifresultsimprove

•  AXempttoreduceoverfisngofthetrainingdataduringfeatureselec.on

•  Formalizethemethodofchoosinganalterna.vefeaturesetinthecaseofmissingdata

•  Completetheprocessonaddi.onalmicroarraydatasets

•  Completetheprocessondatasetsfromdifferentproblemdomains

Page 17: A Hybrid Evolu.onary Feature Selec.on Method for ...cs.uno.edu/~tamjid/Papers/2016_LA_O3.pdf · Many stas.cal classificaon methods are available for this type of task. Further, a

References•  [1]Huerta,E.B.,Duval,B.andHao,J.-K.Geneselec(onfor

microarraydatabyaLDA-basedgene(calgorithm.Springer,City,2008.

•  [2]Sahu,B.andMishra,D.Anovelfeatureselec.onalgorithmusingpar.cleswarmop.miza.onforcancermicroarraydata.ProcediaEngineering,382012),27-31.

•  [3]Garro,B.A.,Rodríguez,K.andVázquez,R.A.Classifica.onofDNAmicroarraysusingar.ficialneuralnetworksandABCalgorithm.AppliedSo=Compu(ng,382016),548-560.

•  [4]Sasikala,S.,aliasBalamurugan,S.A.andGeetha,S.ANovelFeatureSelec.onTechniqueforImprovedSurvivabilityDiagnosisofBreastCancer.ProcediaComputerScience,502015),16-23.


Recommended