+ All Categories
Home > Documents > December 2, 20051 Feature Selection of DNA Micrroarray Data Presented by: Mohammed Liakat Ali...

December 2, 20051 Feature Selection of DNA Micrroarray Data Presented by: Mohammed Liakat Ali...

Date post: 16-Dec-2015
Category:
Upload: brett-ramsey
View: 217 times
Download: 1 times
Share this document with a friend
46
December 2, 2005 1 Feature Selection of DNA Micrroarray Data Presented by: Mohammed Liakat Ali Course: 60-520 Fall 2005 University of Windsor
Transcript

December 2, 2005 1

Feature Selection of DNA Micrroarray Data

Presented by:Mohammed Liakat AliCourse: 60-520Fall 2005University of Windsor

December 2, 2005 2

Outline Introduction Deployment of Feature Selection methods Feature Selection Methods Class Separability Measures Review of Minimum Redundancy feature

selection methods Comparison with our Experimental Results Conclusions Q & A

December 2, 2005 3

Introduction

Microarray Data Representation of Objects Classifiers Feature Selection vs. Feature

Extraction Optimal Feature Set for

Classification

December 2, 2005 4

Microarray Data Microarray technology is one of the most

promising tools available to life science researchers.

Two technologies are used to produce DNA microarray:

The cDNA arrays the Affymatrix technologies

Also known as DNA chip The final result of microarray experiment

is a set of numbers representing expression level of DNA fragments i.e., genes.

December 2, 2005 5

Representation of Objects Objects are represented by their

characteristic features Three main reasons to keep

dimensionality low: Measurement Cost Classification Accuracy To identify and monitor the target disease or

function types It is very important to represent an

object with features having high discriminating ability.

December 2, 2005 6

Classifiers A classifier will use features of an object

and a discriminant function to assign the object to a category i.e., class.

Domain independent theory of classification is based on the abstraction provided by features of the input data

We can divide classifiers as: linear non-linear

December 2, 2005 7

Feature Selection vs. Feature Extraction

In feature selection we try to find the best subset of the input feature set

In feature extraction we create new features based on transformation or combination of the original feature set

December 2, 2005 8

Optimal Feature Subset for Classification

To find optimal feature subset we have to evaluate objective function for

subsets Exponential complexity

d

m m

d

0

December 2, 2005 9

Deployment of Feature Selection Methods

Based on their relation to the induction algorithm feature selection methods can be grouped as: Embedded: They are a part of induction

algorithms Filter: They are separate processes from

the induction algorithms Wrapper: They are also separate processes

from induction algorithm but they use induction algorithm as a subroutine

December 2, 2005 10

Deployment of Feature Selection Methods

E m b ed d ed F ilte r W rap p er

D ep loym en t o f F ea tu re S e lec tion M eth od s

December 2, 2005 11

Feature Selection Methods

Based on the optimal solution of the problem, we can divide feature selection methods as: Optimal Selection Methods Suboptimal Selection Methods

December 2, 2005 12

Feature Selection Methods

E xh as tive S earch

B rach an d B ou n d S earch

O p tim a l

B es t In d ivid u a l F ea tu res

S eq u en tia l F orw ard S earch (S F S )

S eq u etia l B ackw ard S e lec tion (S B F )

P lu s l take aw ay r

S eq u en tia l F orw ard F loa tin g S earch (S F F S )

S eq u en tia l B ackw ard F loa tin g S earch (S B F S )

S u b O p tim a l

F ea tu re S e lec tion M eth od s

December 2, 2005 13

Optimal Selection Methods

Exhaustive Search Branch and Bound Search

December 2, 2005 14

Exhaustive Search Evaluate all possible subsets

consisting of m features of total d features i.e.,

subsets

Guaranteed to find optimal subset An exponential problem

d

m m

d

0

December 2, 2005 15

Branch and Bound Search

Only fraction of all possible feature subsets will be evaluated

Guaranteed to find optimal subset Criterion function must satisfy the

monotonicity property i.e.,

),,...,(),...,( 111 iii xxxJxxJ

December 2, 2005 16

Suboptimal Selection Methods

Best individual Feature Sequential Forward Selection (SFS) Sequential Backward Selection (SBS) “Plus l take away r” Selection Sequential Forward Floating Search

(SFFS) Sequential Backward Floating Search

(SBFS)

December 2, 2005 17

Best individual Feature

Evaluate all d features individually using an scalar criterion function

Select m best features Clearly a sub optimal method Complexity is O(d)

December 2, 2005 18

Sequential Forward Selection (SFS)

At the beginning select the best feature using a scalar criterion function

Add one feature at a time which along with already selected features to maximize the criterion function, J(.)

A greedy algorithm, cannot retract Complexity is O(d)

December 2, 2005 19

Sequential Backward Selection (SBS)

At the beginning select all d features

Delete one feature at a time and Select the subset which maximize the criterion function, J(.)

Also a greedy algorithm, cannot retract

Complexity is O(d)

December 2, 2005 20

“Plus l take away r” Selection

At first add l features by forward selection, then discard r features by backward selection

Need to decide optimal l and r No subset nesting problems Like

SFS and SBS

December 2, 2005 21

Sequential Forward Floating Search (SFFS)

It is a generalized ‘plus l take away r’ algorithm

The value of l and r are determined automatically

Close to optimal solution Affordable computational cost

December 2, 2005 22

Sequential Backward Floating Search (SBFS)

It is also a generalized ‘plus l take away r’ algorithm like SFFS

The value of l and r are also determined automatically

Close to optimal solution as SFFS More efficient than SFFS for m

closer to d than to 1

December 2, 2005 23

Class Separability Measures

Divergence Scatter Matrices

December 2, 2005 24

Divergence As per Bayes rule, given two classes

ω1 and ω2 and a feature vector x, we select ω1 if P(ω1|x) > P(ω2|x)

Hence ratio has discriminating capability

x)|P(w2

x)|P(w1

December 2, 2005 25

Divergence

For given P(ω1) and P(ω2) same information resides in D12(x) = ln

For completely overlapping classes D12(x) = 0

w2)|P(x

w1)|P(x

December 2, 2005 26

Divergence Since x takes different values, it is

natural to consider mean value over class ω1

D12 =

Similarly for ω2 D21 =

The sum d12 = D12 +D21

dxwxp

wxpwxp

)2|(

)1|(ln)1|(

dxwxp

wxpwxp

)1|(

)2|(ln)2|(

December 2, 2005 27

Scatter Matrices Computation of Divergence is not

easy for non Gaussian distribution

Within class scatter matrix is defined as Sw =

Si is the covariance matrix for class ωi

Si =

M

i

PiSi1

])')([( ixixE

December 2, 2005 28

Scatter Matrices Between class scatter matrix is

defined as Sb =

Where μ0 =

)'0)(0(1

iiPiM

i

iPiM

i

December 2, 2005 29

Scatter Matrices

Total Mixture scatter matrix is defined as

Sm = E[(x-µ0)(x-μ0)’]

Where Sm = Sw + Sb

December 2, 2005 30

Scatter Matrices The following criterion functions

can be defined among others

J1=

J2=

J3 =

Swtrace

Smtrace

SmSwSw

Sm 1

SmSwtrace 1

December 2, 2005 31

Scatter Matrices For equally probable two classes

problem |Sw| is proportional to σ1²+ σ2² |Sb| is proportional to (µ1-µ2)²

22

21

221 )(

FDR

December 2, 2005 32

Review of Minimum Redundancy feature selection

methods

Now we will discuss two minimum redundancy feature selection methods given in the two following papers Ding and Peng (2003) Yu and Liu (2004)

December 2, 2005 33

Review of Minimum Redundancy feature selection

methods

In Ding and Peng (2003) Filter method is used Algorithm is SFS The first feature was selected using maxV1, for all genes in the set S

Si

ihIS

V ),(1

1

December 2, 2005 34

Review of Minimum Redundancy feature selection

methods Suppose already selected m features

for the set X The additional features will be

selected from the set Y = S – X The following two conditions will be

optimized simultaneously 1.

2.

),(max ihIYi

XjYi

jiIX

),(1

min

December 2, 2005 35

Review of Minimum Redundancy feature selection

methods

Mutual information, I of two variable x and y is defined as

Importance of minimum redundancy is highlighted in the paper

ji ji

jiji ypxp

yxpyxpyxI

, )()(

),(log),(),(

December 2, 2005 36

Review of Minimum Redundancy feature selection

methods In Yu and Liu (2004) Filter method is used Algorithm is: Relevance analysis

1 Order features based on decreasing ISU values Redundancy analysis

2 Initialize Fi with the first feature in the list 3 Find and remove all features for which Fi forms an

approximate redundant cover 4 Set Fi as the next remaining feature in the list and

repeat step 3 until the end of the list

December 2, 2005 37

Review of Minimum Redundancy feature selection methods Combines SFS with elimination The entropy of a variable X is defined as H(X) = - The entropy of X after observing values

of another variable Y is defined as H(X|Y) = - The amount by which the entropy of X

decreases reflects additional information about X provided by Y, is called Information Gain

IG(X|Y) = H(X) – H(X|Y)

))((log)( 2 ii

i xPxP

j i

jijij yxPyxPyP ))|((log)|()( 2

December 2, 2005 38

Review of Minimum Redundancy feature selection

methods Symmetrical uncertainty is defined as SU(X, Y) = Individual C-correlation (ISUi): The correlation

between any feature Fi and the class C is called Individual C-correlation, ISUi

Combined C-correlation (CSUi): The correlation between any feature Fi and Fj (i ≠ j) and the class C is called combined C-correlation, CSUi_j

Approximate redundant cover: For two features Fi and Fj, Fi formed an approximate redundant cover for Fj iff ISUi ≥ ISUj and ISUi ≥ CSUi_j

)()(

)|(2

YHXH

YXIG

December 2, 2005 39

Comparison with our Experimental Results

To investigate the problem of feature selection we implement a filter method

We used FDR as criterion function Initial gene selection was based on gene ranking Then Fisher and Loog-Duin Discriminant

techniques are applied to transform the feature space

Then linear and quadratic classifier are used 10-fold cross validation was applied We used Leukemia, Lung cancer, and Breast

cancer data from UCI repository

December 2, 2005 40

Comparison with our Experimental Results

Dataset #G #S #SG RBF #S #SG FQ LDQ FL LDL

Leukemia 7129 72 4 87.50 72 80 98.75 59.23 98.75 95.00

Lung cancer 12533 181 6 98.34 197 367 67.12 49.89 77.32 73.60

Breast cancer 24481 97 67 79.38 97 273 78.63 68.72 78.63 74.70

Table 1. Comparison of gene selection results. RBF = Redundancy Based Filter FQ = Fisher’s Discriminant + Quadratic classifier FL = Fisher’s Discriminant + Linear classifier LDQ = Loog-Duin’s Discriminant + Quadratic classifier LDL = Loog-Duin’s Discriminant + Linear classifier

December 2, 2005 41

Comparison with our Experimental Results

From the table we can observed that RBF selected very compact gene sets for all the cases.

FQ and FL out perform LDQ and LDL in all 3 datasets.

RBF out perform all methods in 1 dataset by big margin.

FQ and FL jointly out perform others in 1 dataset also in big margin.

RBF, FQ, and FL have comparable result in 1 dataset.

December 2, 2005 42

Conclusions

We can conclude that minimum redundancy methods select very compact gene sets. It can help to identify and monitor the target disease or function types.

December 2, 2005 43

Conclusions From our experience, on average the

performance of LDQ is better than FQ because Fisher discrminant analysis is linear in nature.

Here we select gene by FDR ranking. Due this performance of FQ and FL may get enhancement.

From the result we can also conclude that gene selection by only ranking has some merits.

December 2, 2005 44

References1.Blum, A. and Langley, P. (1997). Selection of relevant

features and examples in machine learning. Artificial Intelligence, 97(1-2) 245–271

2. T.M. Cover, “The Best Two Independent Measurements Are Not the Two Best,” IEEE Trans. Systems, Man, and Cybernetics, vol. 4, pp. 116-117, 1974.

2. Ding, C. and Peng, H. C. (2003). Minimum Redundancy Feature Selection from Microarray Gene Expression Data. Proc. Second

3. EEE Computational Systems Bioinformatics Conf., 523-528 4. R. Duda, P. Hart, and D. Stork. Pattern Classification. John

Wiley and Sons, Inc., New York, NY, 2nd edition, 2000.5. K. S. V. Horn and T. Martinez. The Minimum Set Problem.

Neural Networks, 7(3):491–494, 1994.

December 2, 2005 45

References6. Duin R. P. W. Jain, A. K. and J. Mao. Statistical Pattern

Recognition: A review. IEEE Transaction on Pattern Analysis and Machine Intelligence, 22(1), 2000.

7. M. Loog and P.W. Duin. Linear Dimensionality Reduction via a Heteroscedastic Extension of LDA: The Chernoff Criterion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6):732–739, 2004.

8. S. Theodoridis and K. Koutroumbas. Pattern Recognition. Elsevier Academic Press, second edition, 2003.

9. L. Yu and H. Liu. Redundency Based Feature Selection for Microarray Data. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 737 – 742, 2004.

December 2, 2005 46

Q & A

Thanking You


Recommended