+ All Categories
Home > Documents > Classification of multiple cancer types by multicategory support vector machines using gene...

Classification of multiple cancer types by multicategory support vector machines using gene...

Date post: 27-Dec-2015
Category:
Upload: pearl-pierce
View: 221 times
Download: 3 times
Share this document with a friend
Popular Tags:
24
Classification of Classification of multiple cancer types multiple cancer types by multicategory by multicategory support vector support vector machines using gene machines using gene expression data expression data
Transcript
Page 1: Classification of multiple cancer types by multicategory support vector machines using gene expression data.

Classification of multiple Classification of multiple cancer types by cancer types by

multicategory support vector multicategory support vector machines using gene machines using gene

expression dataexpression data

Page 2: Classification of multiple cancer types by multicategory support vector machines using gene expression data.

Support Vector MachineSupport Vector Machine

A classification method which A classification method which successfully diagnosis cancer problemssuccessfully diagnosis cancer problems

Two types Two types Binary SVM:Binary SVM: optimal extension to more than optimal extension to more than

two classes not seen therefore limitation on two classes not seen therefore limitation on its application to multiple tumor typesits application to multiple tumor types

Multicategory SVM:Multicategory SVM: (recently proposed) (recently proposed) Demonstrated on leukemia data and small Demonstrated on leukemia data and small round blue cells of childhood tumor.round blue cells of childhood tumor.

Page 3: Classification of multiple cancer types by multicategory support vector machines using gene expression data.

DNA microarray techonologyDNA microarray techonology

This method measures the relative amount This method measures the relative amount of mRNA in isolated cells or biosped of mRNA in isolated cells or biosped tissuestissues

Uses SVM, solves a series of binary Uses SVM, solves a series of binary problems- DAG SVM algorithmproblems- DAG SVM algorithm

MSVM is applied to two gene expression MSVM is applied to two gene expression data setsdata sets

Page 4: Classification of multiple cancer types by multicategory support vector machines using gene expression data.

FeaturesFeatures

EffectivenessEffectiveness

Prediction strengthPrediction strength

Effect of data preprocessingEffect of data preprocessing

Gene selectionGene selection

Dimension reductionDimension reduction

Page 5: Classification of multiple cancer types by multicategory support vector machines using gene expression data.
Page 6: Classification of multiple cancer types by multicategory support vector machines using gene expression data.

Binary SVMBinary SVM

Page 7: Classification of multiple cancer types by multicategory support vector machines using gene expression data.

MSVMMSVM

Page 8: Classification of multiple cancer types by multicategory support vector machines using gene expression data.

Procedure- 3 class problemProcedure- 3 class problem

Gene expression was monitored for Gene expression was monitored for classification of 2 leukemias ALL acute classification of 2 leukemias ALL acute lymphoblastic leukemia) and AML ( acute lymphoblastic leukemia) and AML ( acute myeloid leukemia) myeloid leukemia)

ALL ALL B-cellB-cellT-cellT-cell

Page 9: Classification of multiple cancer types by multicategory support vector machines using gene expression data.

Procedure conc.Procedure conc.

Number of genes 7129Number of genes 712938 samples- training set38 samples- training set34 samples- test set34 samples- test setPreprocessing steps performedPreprocessing steps performed

Thresholding(floor-100, ceiling 16000)Thresholding(floor-100, ceiling 16000)Filtering of genes (max/min <= 5 and max-Filtering of genes (max/min <= 5 and max-

min< =500)min< =500)Base 10 logarithmic transformationBase 10 logarithmic transformation

Page 10: Classification of multiple cancer types by multicategory support vector machines using gene expression data.

Procedure conc.Procedure conc.

Standardization of each variableStandardization of each variableVariable selectionVariable selection

Prescreening measure – ratio of between Prescreening measure – ratio of between classes sum of squares to within class sum of classes sum of squares to within class sum of squares for each gene( largest ratios taken)squares for each gene( largest ratios taken)

Page 11: Classification of multiple cancer types by multicategory support vector machines using gene expression data.

Heat Map of 40 most important Heat Map of 40 most important genes in training setgenes in training set

Page 12: Classification of multiple cancer types by multicategory support vector machines using gene expression data.

Small round blue cell tumors data Small round blue cell tumors data (SRBCTs)(SRBCTs)

4 types4 typesNeuroblastoma (NB)Neuroblastoma (NB)

Rhabdomyosarcoma (RMS)Rhabdomyosarcoma (RMS)

Non Hodgkin lymphoma (NHL)Non Hodgkin lymphoma (NHL)

Ewing family of tumors ( EWS)Ewing family of tumors ( EWS)

Page 13: Classification of multiple cancer types by multicategory support vector machines using gene expression data.

Used Artificial Neural Networks (ANN)Used Artificial Neural Networks (ANN)

Training set – 63 samplesTraining set – 63 samples

Test set – 20 samplesTest set – 20 samples

Nearest Neighbor, weighted voting , linear SVM was Nearest Neighbor, weighted voting , linear SVM was applied to dataapplied to data

MSVM was applied for comparisonMSVM was applied for comparison

Logarithm base 10 of expression levelsLogarithm base 10 of expression levels

Page 14: Classification of multiple cancer types by multicategory support vector machines using gene expression data.
Page 15: Classification of multiple cancer types by multicategory support vector machines using gene expression data.

Predicted decision vectorsPredicted decision vectors

Page 16: Classification of multiple cancer types by multicategory support vector machines using gene expression data.

SANNSANN

For multiclass classificationFor multiclass classificationClassification results superior to ANNClassification results superior to ANN ANN uses back propagation algorithmANN uses back propagation algorithmWhy ?Why ?

Non linear connectionsNon linear connections Inclusion of interactions within independent Inclusion of interactions within independent

variables input)variables input) Independence from conventional processesIndependence from conventional processes

Page 17: Classification of multiple cancer types by multicategory support vector machines using gene expression data.

LimitationsLimitations

Learned knowledge is contained Learned knowledge is contained 100’s-1000’s weights (synapses)100’s-1000’s weights (synapses)

Cannot be analyzed in a single Cannot be analyzed in a single regression formularegression formula

Page 18: Classification of multiple cancer types by multicategory support vector machines using gene expression data.

Combining several ANNsCombining several ANNs

Through ensembles of networksThrough ensembles of networks

An ensemble: collection of finite number of An ensemble: collection of finite number of different classifiersdifferent classifiers

Cascading ANNsCascading ANNs

Page 19: Classification of multiple cancer types by multicategory support vector machines using gene expression data.

Two level ANNTwo level ANN

Task : Chest RadiogramsTask : Chest Radiograms

Lung Nodules( Class A)Lung Nodules( Class A)

Without Lung Nodules( Class B)Without Lung Nodules( Class B)

Page 20: Classification of multiple cancer types by multicategory support vector machines using gene expression data.

Two level architecture carrying lower Two level architecture carrying lower level and higher level conceptslevel and higher level concepts

Task: differentiate (higher level)Task: differentiate (higher level)Normal cells (class A) Normal cells (class A) From malignant cells (class B) (lower level)From malignant cells (class B) (lower level)

Class B_1Class B_1Class B_2Class B_2Class B_3Class B_3Class B_4Class B_4

Page 21: Classification of multiple cancer types by multicategory support vector machines using gene expression data.

One vs. allOne vs. all

Used with SVMUsed with SVM

K binary classes- distinguish one class K binary classes- distinguish one class from all lumped togetherfrom all lumped together

Sample assigned to classifier achieving Sample assigned to classifier achieving greatest output activitygreatest output activity

Page 22: Classification of multiple cancer types by multicategory support vector machines using gene expression data.

ALL Pairs approachALL Pairs approach

Builds K(K-1)/2 Binary classifiersBuilds K(K-1)/2 Binary classifiers

K-1 binary classifiers distinguish from K-1 binary classifiers distinguish from other classifiersother classifiers

Output activities summed up –class with Output activities summed up –class with greatest activity is the winning classgreatest activity is the winning class

Page 23: Classification of multiple cancer types by multicategory support vector machines using gene expression data.

SANNSANN

Oriented to human decision makingOriented to human decision making

Exclusion performed- preferences Exclusion performed- preferences narrowed downnarrowed down

Classification made by first ANN is a Classification made by first ANN is a preselection for second successive ANNpreselection for second successive ANN

Page 24: Classification of multiple cancer types by multicategory support vector machines using gene expression data.

ReferencesReferences

http://info.cchmc.org/presentations/ylee_1http://info.cchmc.org/presentations/ylee_13Dec02.pdf3Dec02.pdf


Recommended