+ All Categories
Home > Documents > Feature Selection Pattern Recognition

Feature Selection Pattern Recognition

Date post: 19-Oct-2015
Category:
Upload: rischan-mafrur
View: 36 times
Download: 0 times
Share this document with a friend
Description:
Feature Selection, Normalization, Pattern Recognition

of 28

Transcript
  • Feature Selectionby Rischan Mafrur

  • Techniques for Features Selection Outlier Removal Data Normalization t-TEST The Receiver Operating Characteristic Curve Fishers Discriminant Ratio

  • Outlier Removal

  • Learning by ExampleProblem Example 4.2.1 [page: 107]

    We have N(100) data random in 1 dimension Gaussian with mean value =1 & variance =0.16

    add five outlier point [6.2 , -6.4, 4.2, 15, 6.8]

    How we can remove the outlier data?

    Generate data setAdding some outliers valueScramble the data Find outliers and the index

  • Cont..Result:

    Now we can identify the value and the position of the outliers

  • Data Normalization

  • 3 Normalization Methods

    By Standard Deviation

    Min Max Value Range

    Softmax Normalization

  • Example 4.3.1 [page:109]

    data : The problem is how we can normalize this data?

  • Matlab code solution

  • NormalizeStd Function

  • NormalizeMnMx Function

  • NormalizeSoftMax Function

  • ResultOriginal Data

    by Std

    by Min Max [-1,1]

    by SoftMax [0.5]

  • t-TEST

  • Learning by ExampleProblem in Example 4.4.1 [page :112]

    Assuming the data set is normally distributed.

    We have 2 Gaussian Class with m1= 8.75, and m2 =9, and the variance = 4.

    Generate the vectors x1,x2 each containing N =1000.Assumed we dont know about mean and the variance, we just know about the vectors x1 and x2, and then we want to know the equality of means both of data.we use the significance level : 5% (level of confidence 95 %) and 0.1 % (level of confidence 99.9 %)

  • Cont...In t-test we have two hypotheses :

    H0 : The mean values of the data in two classes are equal.

    H1 : The mean values of the data in two classes are not equal.

    In this case, when the significance level 5% the result h =1, which implies that the hypothesis of the equality of the means can be rejected.And when the significance level 0.1 % the result h=0, which implies that no evidence to reject the hypothesis of equality of the means.

    m1 = 8.75, m2 = 9, when significance level 5% implies the means of two classes is not equal but for the significance level 0.1 % implies the means of two classes is equal.so we can conclude :

  • ROCReceiver Operating Characteristic

    ROC is a measure of the class-discrimination capability of a specific feature.

    It measures the overlap between the pdfs describing the data distribution of the feature in two classes [Theo 09, Section 5.5].

  • Learning by ExampleProblem in Example 4.5.1 [page: 113]

    We have 2 classes 1 dimensional Gaussian with m1=2, and m2 =0

    We must plotting using plotHist Compute and Plot the corresponding

    AUC values using the function ROC.

    We also can try using different m value: [m1,m2] =[0,0] [m1,m2] =[2,2] [m1,m2] =[5,5] [m1,m2] =[2,0] [m1,m2] =[5,0]

  • ROC Curve

  • AUC value

  • Plot PlotHist [m1,m2] =[0,0] PlotHist [m1,m2] =[2,2] PlotHist [m1,m2] =[5,5]

    Roc Curve [m1,m2] =[0,0] Roc Curve [m1,m2] =[2,2] Roc Curve [m1,m2] =[5,5]

  • Roc Curve[m1,m2] =[2,0]

    Roc Curve[m1,m2] =[5,0]

    PlotHist [m1,m2] =[5,0] PlotHist

    [m1,m2] =[2,0]

  • Fishers Discriminant Ratio

  • FDRFDR commonly used for quantify the discriminatory power of individual features between two classes.

  • Learning by Example

    Problem in Example 4.6.2 [page: 115] In this case, we have a data like in

    Table 4.3. We have 2 data, Cirrhotic Liver

    and Fatty Liver with 4 features (mean, std, skew, & kurtosis)

    The problem is which one has to choose the most informative feature?

    so we can use FDR for select which the data has most informative feature.

  • ResultWe can see the result :According to the result the higher FDR value is mean with FDR= 13.8893.so the most informative features is the mean.

  • Thank you :)


Recommended