Date post: | 22-Feb-2017 |
Category: |
Business |
Upload: | hansa-khan |
View: | 46 times |
Download: | 0 times |
WHAT IS WEKA ? Weka stands for Waikato Environment for
knowledge. Weka contains tools for data pre- processing, classification, regression and clustering. Weka is a collection of machine learning
algorithm for data mining task.
HOW WEKA START
From window desktop: click start, choose All programs,
choose Weka 3-7 to start Weka. Then the first interface window
appear.
EXPLORER Explorer is used for pre-
processing, attribute selection, learning and visualization.
When we select explorer the environment that will open is:
Now I click on open file to open a data file from the folder where data files are stored.
Then I select my dataset “CONTACT LENSES”
Every instance consist a number of attributes
CHOOSE FILER First we choose filter. There are two filters: Supervised unsupervised. We then selected unsupervised filter: In unsupervised filter there are two options Instance attribute We selected attribute: There are many attributes but we choose the attribute
that is Nominal To Binary.
Firstly there is a simple classifier ZeroR. Determines the most common class Or the median (in the case of numeric values) Tests how well the class can be predicted without considering other attributes
THERE ARE FOUR OPTIONS
Use training set: The classifier is evaluated on how well it predicts the
class of the instances it was trained on.
Supplied test set: The classifier is evaluated on how well it
predicts the class of a set of instances loaded from a file. Clicking the Set... Button brings up a dialog allowing you to choose the file to test on.
Percentage split: • The classifier is evaluated on how well it
predicts a certain percentage of the data which is held out for testing. The amount of data held out depends on the value entered in the % field.
Cross-validation (CV): The classifier is evaluated by cross-validation,
using the number of folds that are entered in the Folds text field.
Having 10 folds means 90% of full data is used for training (and 10% for testing) in each fold test.
cross-validation produces a fair estimation of test performance.
SUPPLIED TEST
When we choose supplied test set data it gives the same result as when we choose training set. The results are same of both supplied test set and training set.
TRUE POSITIVE (TP) The True Positive (TP) rate is the proportion of
examples which were classified as class x, among all examples which truly have class x, i.e. how much part of the class was captured. It is equivalent to Recall. In the confusion matrix, this is the diagonal element divided by the sum over the relevant row, i.e.4/(4+0+1)=0.8 for class soft and 1/(0+1+3)=0.425 for class hard 4/(4+0+1)=0.8 for none class in our example.
FALSE POSITIVE (FP): The False Positive (FP) rate is the proportion of
examples which were classified as class x, but belong to a different class, among all examples which are not of class x. In the matrix, this is the column sum of class x minus the diagonal element, divided by the rows sums of all other classes; i.e. 1/1+2+12=0.053 for class soft and 1/1+0+4=0.8 for class hard.
PRECISION The Precision is the proportion of the examples
which truly have class x among all those which were classified as class x. In the matrix, this is the diagonal element divided by the sum over the relevant column, i.e. 4/(4+0+1)=0.8 for class soft and 1/(0+1+3)=0.333 for class hard class 12/(12+3+1)=0.75 for class none
F-MEASURE2*Precision*Recall / (Precision + Recall)A combined measure for precision andRecall for class soft (2*0.8*0.8)/(0.8+0.8)=0.8 for class hard (2*0.333*0.25)/(0.333+0.8)=0.286 for class none (2*0.75*0.8)/(0.75+0.8)=0.774
ROC (RECEIVER OPERATING CHARACTERISTICS) AND RECALL:Accuracy is measured by the area under the
ROC curve. An area of 1 represents a perfect test; an area of .5 represents a worthless test. A rough guide for classifying the accuracy of a diagnostic test is the traditional academic point system: .90-1 = excellent (A)
Recall: All the documents that have exactly retrieved from the query.It is equivalent to TP.