Classification of multiple Classification of multiple cancer types by cancer types by
multicategory support vector multicategory support vector machines using gene machines using gene
expression dataexpression data
Support Vector MachineSupport Vector Machine
A classification method which A classification method which successfully diagnosis cancer problemssuccessfully diagnosis cancer problems
Two types Two types Binary SVM:Binary SVM: optimal extension to more than optimal extension to more than
two classes not seen therefore limitation on two classes not seen therefore limitation on its application to multiple tumor typesits application to multiple tumor types
Multicategory SVM:Multicategory SVM: (recently proposed) (recently proposed) Demonstrated on leukemia data and small Demonstrated on leukemia data and small round blue cells of childhood tumor.round blue cells of childhood tumor.
DNA microarray techonologyDNA microarray techonology
This method measures the relative amount This method measures the relative amount of mRNA in isolated cells or biosped of mRNA in isolated cells or biosped tissuestissues
Uses SVM, solves a series of binary Uses SVM, solves a series of binary problems- DAG SVM algorithmproblems- DAG SVM algorithm
MSVM is applied to two gene expression MSVM is applied to two gene expression data setsdata sets
FeaturesFeatures
EffectivenessEffectiveness
Prediction strengthPrediction strength
Effect of data preprocessingEffect of data preprocessing
Gene selectionGene selection
Dimension reductionDimension reduction
Binary SVMBinary SVM
MSVMMSVM
Procedure- 3 class problemProcedure- 3 class problem
Gene expression was monitored for Gene expression was monitored for classification of 2 leukemias ALL acute classification of 2 leukemias ALL acute lymphoblastic leukemia) and AML ( acute lymphoblastic leukemia) and AML ( acute myeloid leukemia) myeloid leukemia)
ALL ALL B-cellB-cellT-cellT-cell
Procedure conc.Procedure conc.
Number of genes 7129Number of genes 712938 samples- training set38 samples- training set34 samples- test set34 samples- test setPreprocessing steps performedPreprocessing steps performed
Thresholding(floor-100, ceiling 16000)Thresholding(floor-100, ceiling 16000)Filtering of genes (max/min <= 5 and max-Filtering of genes (max/min <= 5 and max-
min< =500)min< =500)Base 10 logarithmic transformationBase 10 logarithmic transformation
Procedure conc.Procedure conc.
Standardization of each variableStandardization of each variableVariable selectionVariable selection
Prescreening measure – ratio of between Prescreening measure – ratio of between classes sum of squares to within class sum of classes sum of squares to within class sum of squares for each gene( largest ratios taken)squares for each gene( largest ratios taken)
Heat Map of 40 most important Heat Map of 40 most important genes in training setgenes in training set
Small round blue cell tumors data Small round blue cell tumors data (SRBCTs)(SRBCTs)
4 types4 typesNeuroblastoma (NB)Neuroblastoma (NB)
Rhabdomyosarcoma (RMS)Rhabdomyosarcoma (RMS)
Non Hodgkin lymphoma (NHL)Non Hodgkin lymphoma (NHL)
Ewing family of tumors ( EWS)Ewing family of tumors ( EWS)
Used Artificial Neural Networks (ANN)Used Artificial Neural Networks (ANN)
Training set – 63 samplesTraining set – 63 samples
Test set – 20 samplesTest set – 20 samples
Nearest Neighbor, weighted voting , linear SVM was Nearest Neighbor, weighted voting , linear SVM was applied to dataapplied to data
MSVM was applied for comparisonMSVM was applied for comparison
Logarithm base 10 of expression levelsLogarithm base 10 of expression levels
Predicted decision vectorsPredicted decision vectors
SANNSANN
For multiclass classificationFor multiclass classificationClassification results superior to ANNClassification results superior to ANN ANN uses back propagation algorithmANN uses back propagation algorithmWhy ?Why ?
Non linear connectionsNon linear connections Inclusion of interactions within independent Inclusion of interactions within independent
variables input)variables input) Independence from conventional processesIndependence from conventional processes
LimitationsLimitations
Learned knowledge is contained Learned knowledge is contained 100’s-1000’s weights (synapses)100’s-1000’s weights (synapses)
Cannot be analyzed in a single Cannot be analyzed in a single regression formularegression formula
Combining several ANNsCombining several ANNs
Through ensembles of networksThrough ensembles of networks
An ensemble: collection of finite number of An ensemble: collection of finite number of different classifiersdifferent classifiers
Cascading ANNsCascading ANNs
Two level ANNTwo level ANN
Task : Chest RadiogramsTask : Chest Radiograms
Lung Nodules( Class A)Lung Nodules( Class A)
Without Lung Nodules( Class B)Without Lung Nodules( Class B)
Two level architecture carrying lower Two level architecture carrying lower level and higher level conceptslevel and higher level concepts
Task: differentiate (higher level)Task: differentiate (higher level)Normal cells (class A) Normal cells (class A) From malignant cells (class B) (lower level)From malignant cells (class B) (lower level)
Class B_1Class B_1Class B_2Class B_2Class B_3Class B_3Class B_4Class B_4
One vs. allOne vs. all
Used with SVMUsed with SVM
K binary classes- distinguish one class K binary classes- distinguish one class from all lumped togetherfrom all lumped together
Sample assigned to classifier achieving Sample assigned to classifier achieving greatest output activitygreatest output activity
ALL Pairs approachALL Pairs approach
Builds K(K-1)/2 Binary classifiersBuilds K(K-1)/2 Binary classifiers
K-1 binary classifiers distinguish from K-1 binary classifiers distinguish from other classifiersother classifiers
Output activities summed up –class with Output activities summed up –class with greatest activity is the winning classgreatest activity is the winning class
SANNSANN
Oriented to human decision makingOriented to human decision making
Exclusion performed- preferences Exclusion performed- preferences narrowed downnarrowed down
Classification made by first ANN is a Classification made by first ANN is a preselection for second successive ANNpreselection for second successive ANN
ReferencesReferences
http://info.cchmc.org/presentations/ylee_1http://info.cchmc.org/presentations/ylee_13Dec02.pdf3Dec02.pdf