+ All Categories
Home > Documents > Searching For a Few Good Features

Searching For a Few Good Features

Date post: 13-Jan-2016
Category:
Upload: signa
View: 20 times
Download: 0 times
Share this document with a friend
Description:
Searching For a Few Good Features. Pathology Informatics 2010. B ȕ lent Yener Rensselaer Polytechnic Institute Department of Computer Science. The Hard Problem: Bad or just Ugly ??. ?. One of the main challenges is to - PowerPoint PPT Presentation
49
Searching For a Few Good Features Bȕlent Yener Rensselaer Polytechnic Institute Department of Computer Science Pathology Informatics 2010
Transcript
Page 1: Searching For a Few Good Features

Searching For a Few Good Features

Bȕlent YenerRensselaer Polytechnic Institute Department of Computer Science

Pathology Informatics 2010

Page 2: Searching For a Few Good Features

The Hard Problem: Bad or just Ugly??

One of the main challenges is to

Unlike healthy tissue, discriminating damaged (diseased but not cancerous) tissue from cancerous one.

We need a few good features!!.

?

Page 3: Searching For a Few Good Features

Brain Tissue - Diffused

Good: healthy

Bad: glioma Ugly: inflammation

Page 4: Searching For a Few Good Features

Gland based tissue: Prostate

Good (Healthy) Ugly (PIN)

Bad (cancerous)

Page 5: Searching For a Few Good Features

Gland based tissue: Breast

Good

Ugly (in Situ)

Bad (invesive)

Page 6: Searching For a Few Good Features

Bone Tissue Images

Healthy (good)

Fracture

Osteosarcoma (bad)

Fracture(ugly)

Page 7: Searching For a Few Good Features

Two related problems

• Feature Extraction– Identify and compute attributes that will characterize the information

encoded in the histology images– Need to quantify!

• Feature Selection– Identify an optimal subset.

Page 8: Searching For a Few Good Features

Feature Selection

• Select a subset of the original features– reduces the number of features (dimensionality reduction)– removes irrelevant or redundant data (noise reduction)

• speeding up a data mining algorithm• improving prediction accuracy

• It is an hard optimization problem!• Optimal feature selection is an exhaustive search of all

possible subsets of features of the chosen cardinality.– Too expensive

• In practice Adhoc heuristics

Page 9: Searching For a Few Good Features

Greedy Algorithms

• A local optimum is searched– evaluate a candidate subset of features– modify the subset and evaluate it– if the new subset is an improvement over the old– then take it as current– else

• If algorithm is deterministic reject the modifications (e.g. hill climbing)

• Else accept with a probability (e.g. simulated annealing).

Page 10: Searching For a Few Good Features

Methods (partial list)• Exhaustive search: evaluate possible subsets.• Branch and Bound Search: enumerate a fraction of the subsets---

can find optimum but worst-case is exponential.• Best features (isolated): evaluate all m features in isolation–-- no

guarantee for optimum• Sequential Forward Selection: start with the best feature and add

one at a time – no back tracking• SBS: start with all d features and eliminate one at a time—more

expensive than SFS and no backtracking either.• Variants of SFS and SBS: start with k best features and then

delete r of them.. etc

dm

Page 11: Searching For a Few Good Features

Types of Algorithms• Supervised, unsupervised , and semi-supervised (embedded)

feature selection algorithms – e.g. (PCA) is a unsupervised feature extraction method- finds a set of

mutually orthogonal basis functions that capture the directions of maximum variance in the data.

• But these features may not be useful for discriminating between data in different classes.

• Wrappers (wrap the selection process around the learning algorithm), Filters (examine intrinsic properties of the data)

• Feature selection algorithms with filter and embedded models may return either a subset of selected features or the weights (measuring feature relevance) of all features.

Page 12: Searching For a Few Good Features

Relevance and redundancy

• A feature is statistically relevant if its removal from a feature set will reduce the prediction power.

• A feature may be redundant due to the existence of other relevant features, which provide similar prediction power as this feature.

Page 13: Searching For a Few Good Features

Filter Model

• Filtering is independent from the algorithm• It is a preprocessing step• Example: Relief method

All d features Subset selection m<d features Induction Algorithm

Algorithms inducing concept descriptions from examples (i.e. learning algorithms)

Page 14: Searching For a Few Good Features

Relief Method• It assigns relevance to features based on their ability to disambiguate

similar samples– Similarity is defined by proximity in feature space. – Relevant features accumulate high positive weights, while irrelevant features

retain near-zero weights.– For each target sample,

• find the nearest sample in feature space of the same category, the “hit” sample.

• find the nearest sample of the other category, the “miss” sample. – The relevance of feature f near the target sample is measured as:

Source: K. Kira and L.A. Rendell

Page 15: Searching For a Few Good Features

Other Filter Algorithms• Laplacian Score: focuses local structure of the data space, computes a score

to reflect its locality preserving power.• SPEC: similar but uses normalized Laplacian matrix.• Fisher Score: assigns the highest score to the feature on which the data

points of different classes are far from each other.• Chi-square Score: tests independence whether the class label is independent

of a particular feature.• Minimum-Redundancy-Maximum-Relevance (mRmR): selects features that

are mutually far away from each other, while they still have "high" correlation to the classication variable. (approximation to maximizing the dependency between the joint distribution of the selected features and the classication variable.)

• Kruskal Wallis: non-parametric method. Based on ranks for comparing the population medians among groups.

• Information Gain: measures of dependence between the feature and the class label.

Source: Zhao et al http://featureselection.asu.edu

Page 16: Searching For a Few Good Features

Wrapper Model

Source: Zhao et al http://featureselection.asu.edu

Page 17: Searching For a Few Good Features

BLogReg : Gavin C. Cawley and Nicola L. C. Talbot. Gene selection in cancer classication using sparse logistic regression with bayesian regularization. Bioinformatics, 22(19):2348{2355, 2006.

CFS : Mark A. Hall and Lloyd A. Smith. Feature selection for machine learning: Comparing a correlationbased fllter approach to the wrapper, 1999.

Chi-Square : H. Liu and R. Setiono. Chi2: Feature selection and discretization of numeric attributes. In J.F. Vassilopoulos, editor, Proceedings of the Seventh IEEE International Conference on Tools with Articial Intelligence, November 5-8, 1995, pages 388{391, Herndon, Virginia, 1995. IEEE Computer Society.

FCBF: H. Liu and L. Yu. Feature selection for high-dimensional data: A fast correlation-based lter solution. In Correlation-Based Filter Solution". In Proceedings of The Twentieth International Conference on Machine Leaning (ICML-03), pages 856{863, Washington, D.C., 2003. ICM.

Fisher Score : R.O. Duda, P.E. Hart, and D.G. Stork. Pattern Classication. John Wiley & Sons, New York, 2 edition, 2001.

Information Gain: T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley, 1991.

Kruskal-Wallis : L. J. Wei. Asymptotic conservativeness and eciency of kruskal-wallis test for k dependent samples.

Journal of the American Statistical Association, 76(376):1006{1009, December 1981.

mRMR : F. Ding C. Peng, H. Long. Feature selection based on mutual information: Criteria of maxdependency, max-relevance, and min-redundancy. IEEE TRANSACTIONS ON PATTERN ANAL- YSIS AND MACHINE INTELLIGENCE, 27(8):1226{1238, 2005.

Relief : K. Kira and L.A. Rendell. A practical approach to feature selection. In Sleeman and P. Edwards, editors, Proceedings of the Ninth International Conference on Machine Learning (ICML-92), pages 249{256. Morgan Kaufmann, 1992.

SBMLR: Gavin C. Cawley, Nicola L. C. Talbot, and Mark Girolami. Sparse multinomial logistic regression via bayesian l1 regularisation. In NIPS, pages 209{216, 2006.

Spectrum: Huan Liu and Zheng Zhao. Spectral feature selection for supervised and unsupervised learning.

Proceedings of the 24th International Conference on Machine Learning, 2007.

Source: Zhao et al http://featureselection.asu.edu

Page 18: Searching For a Few Good Features

Feature Space over Histology Images is Large

• Texture based • Intensity based• Graph theoretical

– Voronoi graphs– Cell-graphs

Page 19: Searching For a Few Good Features

Voronoi Graphs and its Features

• Minimum Spanning tree and its properties

Page 20: Searching For a Few Good Features

Represent the tissue as a graph:– A node of the graph represents a cell or cell cluster– An edge of the graph represents a relation between a pair of nodes

(e.g., spatial, ECM)– generalization of Voronoi graphs

Cell-Graphs

(a) Healthy(b) Damaged

(c) Cancerous

Page 21: Searching For a Few Good Features

What do we gain from Cell-graphs ?• Mathematical representation

– We can apply operands on them using

• (multi) Linear Algebra• Algorithms

• We can quantify the structural properties with mathematically well defined graph metrics.

• Subgraph mining– Descriptor subgraphs– Subgraph search in a large graph– Subgraph Kernels

Adjacency matrix:

Normalized Laplacian:

1 if u and v are adjacent( , )

0 otherwiseA u v

1 if and 0,

1( , ) if and are adjacent,

0 otherwise.

v

u v

u v d

L u v u vd d

Page 22: Searching For a Few Good Features

Cell-graph Features• Local: cell-level

– Graph theoretical: e.g. Degree, clustering coeff.– Morphological: e.g., shape

• Global: tissue-level– Graph theoretical– Spectral

Page 23: Searching For a Few Good Features

Number # of nodes in the largest connected component

Total (#) of nodes in the graph

# of Nodes Number of cells.

# of Edges Number of links between cells.

Average Degree Average number of “neighboring” cells computed over all the nodes in a cell-graph.

Giant Connected Ratio

Clustering Coefficient Ci. where k is the number of neighbors of the node i and is the number of existing links between i and its neighbors. We exclude the nodes with degree 1 (Dorogovtsev and Mendes, 2002).

% of Isolated Points (Pnts) Percentage of nodes that have no edges incident to them

% of end Pnts Percentage of nodes that have exactly one edge incident to them

# of Central Pnts A node i is a central point of a graph if its eccentricity equals the min. eccentricity (i.e., graph radius). The set of all central points is called the graph center, cardinality of this set is the definition of this metric.

Eccentricity Closeness Given shortest path lengths between a node i and all of the reachable nodes around it, the eccentricity and the closeness of the node i are defined as the maximum and the average of these shortest path lengths, respectively.

Spectral Radius Maximum absolute value of eigenvalues in spectrum of a graph, which is the set of graph eigenvalues.

2nd Eigen Value Second largest eigen value in the graph spectrum.

Eigen Exponent The slope of the sorted eigen values as a function of their orders in log-log scale.

Trace Sum of the eigen values.

Triangles Clique of 3 nodes.

Cliques A (sub)graph such that every pair of nodes are connected with a distinct edge.

Subgraph Density A bound on the clustering coefficient of a subgraph (e.g., at least 0.9).

Bipartite Cliques A complete bipartite graph: all possible edges are present

(2 ) / ( ( 1))i iC E k k

Rich Set of Features for Description and Classification

Page 24: Searching For a Few Good Features

Cell-graph Feature Selection

• Pairwise correlation of features

Goal: to find a set of features which are pairwise independent.

• Discriminative power

Goal: to find a smaller subset of features which are as expressive as all feature set.

Page 25: Searching For a Few Good Features

Pairwise Correlation Graph

• The correlation between the graph features, themselves, can be represented as correlation graph.

• The correlation graph can be obtained in the procedure below.– Calculate the nxn correlation matrix for n features and obtain the

correlation coefficients (n = 20 in this case). – Create nodes for each feature which are located in a circular manner.– Set a threshold for correlation and establish an edge between two

feature nodes if |correlation coefficient| ≥ threshold (threshold = 0.9 in this case) .

Page 26: Searching For a Few Good Features

Correlation Graphs for Healthy TissueBreast Brain

Bone

Page 27: Searching For a Few Good Features

Correlation Graphs for Cancerous TissueBreast Brain

Bone

Page 28: Searching For a Few Good Features

Observations on Correlation Graphs

• The correlation graphs differ greatly depending on tissue type and (dis) functional status.

• The complexity of the correlation graph (number of edges) depends on the tissue type and tissue status.– Some features in some cases can show cluster structures (E.g. node

number, edge number and average degree in breast - healthy), – but a cluster structure may not be in all cases (E.g. brain - cancer).

• The features are highly correlated.

Page 29: Searching For a Few Good Features

Interpretation

• The strong correlation means a high dependency between the features, which causes a complex joint probability density function. Any probabilistic/statistic model attempt should be aware of this complexity.

• An uncorrelated feature does not necessarily mean a distinguishing feature. It might not be a discriminative feature for classification.

• The high correlation may indicate that a smaller subset of features might be enough to discriminate the classes – but not always

Page 30: Searching For a Few Good Features

Feature Selection: good, bad, and ugly

Breast – Average Degree Brain – Average Degree

Page 31: Searching For a Few Good Features

Feature Selection - cont

Breast – End Point Percentage Brain – End Point Percentage

Page 32: Searching For a Few Good Features

Feature Selection

Need a few god features!

Two phase approach:– Find the best classifier (MLP) – Determine the features

Page 33: Searching For a Few Good Features

Feature Selection• The data is not linearly separable. Also the features, as expected,

show different distributions in each tissue type.• 10-fold cross-validation results (accuracy percentages) for breast

tissue using – Adaboost (30 C4.5 trees),– k-nn (k = 5), – MLP (1 hidden layer, 12 hidden units, back propagation).

with all existing 20 features are obtained to see which classifier is more successful in classifying the data for cell-graph features .

• These classifiers are used since they are good at separating non-linearly distributed data and they are from different classification algorithm families.

Page 34: Searching For a Few Good Features

Feature Selection – next step

• The classification problem is reduced into 2-class problems (healthy vs. cancerous, healthy vs. damaged, damaged vs. cancerous).

• Number of edges and number of nodes are excluded. This exclusion also decrease the runtime for selection.

Page 35: Searching For a Few Good Features

Details

• An exhaustive search over 18 features is done using MLP. Since MLP has given the highest accuracy rate with all feature, it is intuitively expected to show higher accuracy than the other classifiers during subset selection.

• The procedure is described below.– Start with an empty selected feature subset with 0 accuracy

percentage. (seq. forward selection alg).– Repeat the procedure below for all possible feature subset (218).

• Train the classifier and validate its accuracy with 10-fold cross-validation.• If the average 10-fold CV accuracy percentage of the current subset is higher than

the selected feature subset, assign the current subset as the selected feature subset.

Page 36: Searching For a Few Good Features

MLP + Exhaustive Search Results on Breast Cancer

• The results for breast data is given below. (no normalization)

Features Selected Accuracy Percentage

Healthy vs. Cancer

Clustering CoefficientsMax and Min EccentricityPerc. Of Isolated PointsPerc. Of End PointsPerc. Of. Central Points

84.71 ± 2.7

Healthy vs. Damaged

Average DegreeExcluding Clustering Coeff.Max Eccentricity 90%Effective Hop DiameterPerc. Of Isolated Point

80.52 ± 4.5

Damaged vs. Cancer 15 features out of 18 70.59 ± 3.64

Page 37: Searching For a Few Good Features

Cell-graph Feature Selection with Relief Method1. Average degree2. Average Clustering coefficient3. Average eccentricity4. Maximum eccentricity5. Minimum eccentricity6. Average effective eccentricity7. Maximum effective eccentricity8. Minimum effective eccentricity9. Average path length (closeness)10. Giant connected ratio11. Percentage of isolated points12. Percentage of end points13. Number of central points14. Percentage of central points15. Number of nodes16. Number of edges17. Spectral radius18. Second largest eigenvalue19. Trace20. Energy21. Number of eigenvalues

Page 38: Searching For a Few Good Features

Relief based Cell-graph Feature Selection Result

Page 39: Searching For a Few Good Features

Selected Features for Different Normalization

Page 40: Searching For a Few Good Features

Modeling branching Morphogensis

Page 41: Searching For a Few Good Features

Problem Definition

• Treated with ROCK (Rhoassociated coil-coil kinase) that regulates branching morphogenesis

• Untreated

• Can we quantify the organizing principles and distinguish between different states of branching process?

Page 42: Searching For a Few Good Features

Even a Richer Set of Features

1 Average_degree 2 C 3 C2 4 D 5 Average_eccentricity 6 Maximum_eccentricity_(diameter) 7 Minimum_eccentricity_(radius) 8 Average_eccentricity_90 9 Maximum_eccentricity_90 10 Minimum_eccentricity_90 11 Average_path_length_(closeness) 12 Giant_connected_ratio 13 Number_of_Connected_Components 14 Percentage_of_isolated_points 15 Percentage_of_end_points 16 Number_of_central_points 17 Percentage_of_central_points 18 Number_of_nodes 19 Number_of_edges

20 elongation_ 21 area 22 orientation 23 eccentricity 24 perimeter 25 circularity_ 26 solidity

38 degree_cluster_1 39 degree_cluster_2 40 degree_cluster_3 41 clustering_coefficient_C_cluster_1 42 clustering_coefficient_C_cluster_2 43 clustering_coefficient_C_cluster_3 44 clustering_coefficient_D_cluster_1 45 clustering_coefficient_D_cluster_2 46 clustering_coefficient_D_cluster_3 47 eccentricity_cluster_1 48 eccentricity_cluster_2 49 eccentricity_cluster_3 50 effective_eccentricity_cluster_1_ 51 effective_eccentricity_cluster_2 52 effective_eccentricity_cluster_3 53 closeness_cluster_1 54 closeness_cluster_2 55 closeness_cluster_3

27 largest_eigen_adjacency_ 28 second_largest_adjacency 29 trace_adjacency_ 30 energy_adjacency 31 #of_zeros_normalized_laplacian 32 slope_0-1_normalized_laplacian 33 #of_ones_normalized_laplacian 34 slope_1-2_normalized_laplacian 35 #of_twos_normalized_laplacian 36 trace_laplacian 37 energy_laplacian

Page 43: Searching For a Few Good Features

Classifier Comparison• Since MLP has a higher overall accuracy, it is used in later studies in

feature selection.

Adaboost k-nn MLP

Overall (%) 67.3 ± 3.27 68.13 ± 1.29 73.24 ± 1.94

Inflamed (%) 57.96 ± 9.53 65.93 ± 2.65 54.0 7± 6.28

Healthy (%) 73.82 ± 5.23 75 ± 1.70 78.38 ± 2.78

Cancerous (%) 67.82 ± 7.5 65.21 ± 1.78 78.99 ± 2.63

Page 44: Searching For a Few Good Features

Epithelial vs Mesenchymal comparison in treated tissue samples

Feature Selection Algorithm Best CV rate

SVM No Feature Selection 100

Fscore selection: select features: 7,10,26,44,45 100

CfsSubsetEval: 7,10,14,15,16,21,25,26,43,44,45 100

ConsistencySubsetEval: 10,14 95.24

ReliefFAttributeEval: 26,7 100

SymmetricalUncertAttributeEval: 14,44 100

SVD Based: 12,20,22,23,26,41,42,44,49,52,54,55 95.238

Page 45: Searching For a Few Good Features

Epithelial vs Mesenchymal comparison in untreated tissue samples

Feature Selection Algorithm Best CV rate

SVM No Feature Selection 97.619

Fscore selection 7,26,35 97.619

CfsSubsetEval: 6,9,14,15,25,26,43 95.2381

ConsistencySubsetEval: 14,21,25 97.619

ReliefFAttributeEval: 7,26,35 97.619

SymmetricalUncertAttributeEval: 6,7,9,15,25,26,44 97.619

SVD Based: 2,4,12,20,22,26,27,28,41,49,52,55 88.0952

Page 46: Searching For a Few Good Features

Treated mesenchymal vs untreated mesenchymal comparison

Feature Selection Algorithm Best CV rate

SVM No Feature Selection 80.95

Fscore selection: select features: 3,4,21,24,26,27,39,45 80.95

CfsSubsetEval: 24 76.1905

ConsistencySubsetEval: 24 76.1905

ReliefFAttributeEval: 21,24,3,39,45,26,2,27,4,35,33,28,42 90.4762

SymmetricalUncertAttributeEval: 24,18,20,19,16,15,17,25,27,26,21

76.1905

SVD Based: 12,20,23,24,26,41,44,45,49,52,55 69.0476

Page 47: Searching For a Few Good Features

Treated epithelial vs untreated epithelial comparison

Feature Selection Algorithm Best CV rate

SVM No Feature Selection 83.33

Fscore selection: 3 88.09

CfsSubsetEval: 3,44,45 88.09

ConsistencySubsetEval: 3,44 85.71

ReliefFAttributeEval: 3,44,45,46,49 85.71

SymmetricalUncertAttributeEval: 3,44,45 88.09

SVD Based: 1,2,12,20,41,45,46,49,52,53,55 76.1905

Page 48: Searching For a Few Good Features

Concluding Remarks

• Feature extraction and selection are strongly coupled for accuracy– always room for new features

• Feature selection performance depends on the induction algorithm (i.e., learning algorithm)

• Quantifiable features are not always interpretable- mapping the features to biology or pathology is crucial link!

Page 49: Searching For a Few Good Features

Thank you!


Recommended