+ All Categories
Home > Documents > Stable Feature Selection: Theory and Algorithms

Stable Feature Selection: Theory and Algorithms

Date post: 25-Feb-2016
Category:
Upload: harper
View: 42 times
Download: 0 times
Share this document with a friend
Description:
Stable Feature Selection: Theory and Algorithms. Presenter: Yue Han Advisor: Lei Yu. Ph.D. Dissertation 4/26/2012. Outline. Introduction and Motivation Background and Related Work Major Contributions Publications Theoretical Framework for Stable Feature Selection - PowerPoint PPT Presentation
42
Stable Feature Selection: Theory and Algorithms Presenter: Yue Han Advisor: Lei Yu Ph.D. Dissertation 4/26/2012 1
Transcript

Variance Reduction for Stable Feature Selection

Stable Feature Selection: Theory and AlgorithmsPresenter: Yue HanAdvisor: Lei Yu

Ph.D. Dissertation4/26/2012

1Hello, everyone! Welcome to my dissertation defense. Lets get started!The topic of my dissertation is about 1OutlineIntroduction and MotivationBackground and Related WorkMajor ContributionsPublicationsTheoretical Framework for Stable Feature SelectionEmpirical Framework : Margin Based Instance WeightingEmpirical StudyGeneral Experimental SetupExperiments on Synthetic DataExperiments on Real-World DataConclusion and Future Work22Feature Selection Applications

Gene SelectionPixelSelection

SportsTravelPoliticsTechArtistLifeScienceInternet.BusinessHealthElectionsWordSelection3Feature selection, not only a preprocessing step to prepare data for mining tasks, but also a knowledge discovery tool to extract valuable information from data.Biologists interested into a subset of genes to explain the observed phenomenon(disease types or symptoms).Researchers in computer graphics interested into a set of expressive pixels to capture the facial expression of humans.Natural language processing engineers interested into a set of representative terms or words to achieve better understanding of the document.3Feature Selection from High-dimensional DataFeature Selection:Alleviating the effect of the curse of dimensionality.Enhancing generalization capability.Speeding up learning process.Improving model interpretability.p: # of features n: # of samplesHigh-dimensional data: p >> nCurse of Dimensionality:Effects on distance functionsIn optimization and learningIn Bayesian statisticsKnowledge Discovery on High-dimensional Data4From the examples, we can observe the number of features is huge compared to the number of samples/instancesConventional learning approaches lose effects when being applied to the high-dimensional data directly, curse of dimensionalityInstead of , feature selection is involved to reduce the data dimensionality, here is the Feature selection can 4Stability of Feature Selection Stability of Feature Selection: the insensitivity of the result of a feature selection algorithm to variations to the training set.Training Data Learning ModelTraining Data Learning ModelTraining Data Learning ModelLearning AlgorithmStability of Learning Algorithm isfirstly examined by Turney in 1995 Stability of feature selection was relatively neglected before and attracted interests from researchers in data mining recently.

Training Data Feature SubsetTraining Data Feature SubsetTraining Data Feature SubsetFeature Selection MethodConsistent or not???Stability Issue of Feature Selection5As we introduced, applying a feature selection method on the training data, we get a feature subset.If we have variations to the training data(all drawn from the same data space), feature selection results consistent or not?Stability of feature selection, defined as Stability of learning algorithm is studied Unfortunately, stability of feature selection is under-addressed.5Motivation for Stable Feature SelectionGiven Unlimited Sample Size:Feature selection results from D1 and D2 are the sameGiven Limited Sample Size: (n


Recommended