Post on 19-Jan-2015
description
transcript
1
Lazy Associative Classification
Adriano Veloso, Wagner Meira Jr, Mohammed J. ZakiComputer Science Dept, Federal University of Minas Gerais, BrazilComputer Science Dept, Rensselaer Polytechnic Institute, Troy, US
AICDM’06
Reporter: Chieh-Chang YangDate: 2007.03.19
2
Outline
Introduction Related Work Eager Associative Classifiers Lazy Associative Classifiers Experiments Conclusions
3
Introduction Classification is a well-suited problem
and several models have been proposed. Among these models, decision tree
classifiers are particularly suited because it’s relatively fast and simple.
Decision trees perform a greedy search for rules by selecting the most promising features. Such greedy search may prune important rules.
4
Introduction As an alternative models, associative
classifiers first mine association rules from training data, and use these rules to build a classifier.
Associative classifiers perform a global search for rules satisfying some quality constraints. However, this global search may generate a large number of rules.
5
Introduction In this paper we propose a novel lazy associ
ative classifier, in which the computation is performed on a demand-driven basis. It focus on the features that actually occur within the test instance while generating the rules.
We assess the performance of the lazy associative classifier, and prove that it outperforms the eager associative one and decision tree classifier.
6
Related Work Most existing work on associative classification relies on
developing new algorithms to improve the overall accuracy. CBA generate a single rule-set and rank the rules according to
their confidence/support. It selects the best rule to be applied to each test instance.
HARMONY uses an instance-centric rule-generation approach that it assures the inclusion of at least one rule for each training instance in the final rule set.
CMAR uses multiple rules to perform the classification. CPAR adopts a greedy technique to generate a smaller rule-
sets. CAEP explores the concept of emerging patterns that usually
predict accurately all classes even if their populations are unbalanced.
7
Related Work Rule induction classifiers includes RISE, RIPPER, and
SLEEPER. RISE performs a complete overfiting by considering e
ach instance as a rule, and then generalizes the rules. RIPPER and SLEEPER extend the “ overfit and prune
” paradigm, that is, they start with a large rule-set and prune it using several heuristics.
SLEEPER also associates a probability with each rule, weighting the contribution of the rule during classification.
8
Eager Associative Classifier
Decision Trees and Decision Rules Entropy-based Associative
Classifier
9
Decision trees and decision rules Given any subset of training instance S, let
si denote the number of instance with class ci, and |S|=Σsi. Then pi=si/|S| denotes the probability of class ci in S.
The entropy of S is E(S)= Σpi log pi. For any partition of S into m subsets, with S
=∪Si, the split entropy is E({Si})= Σ(|Si|/|S|)E(Si).
The information gain for the split is I(S,{Si})=E(S)-E({Si}).
10
Decision trees and decision rules
A decision tree is built using a greedy, recursive splitting strategy, where the best split is chosen at each internal node according to the information gain.
The splitting at a node stops when all instances are from a single class or if the size of the node falls below a minimum support threshold, called minsup.
11
Decision trees and decision rules
12
Entropy-based Associative Classifier We denote as class association rules (CAR) those
association rules of the form X-> c, where the antecedent (X) is composed of feature variables and the consequent (c) is just a class.
CAR may be generated by a slightly modified association rule mining algorithm. Each itemset must contain a class and the rule generation also follows a template in which the consequent is just a class.
CARs are ranked in decreasing order of information gain. During the testing phase, the associative classifier simply checks whether each CAR matches the test instance; the class associated with the first match is chosen.
13
Entropy-based Associative Classifier
14
Entropy-based Associative Classifier
Three CARs match the test instance of our example using EAC:
1. {windy=false and temperature=cool->play=yes}
2. {outlook=sunny and humidity=high->play=no}3. {outlook=sunny and temperature=cool->play=
yes} First rule is selected. In our example, the test c
ase is recognized by only one rule in the decision tree, while the same test case is recognized by three CARs in the associative classifier.
15
Entropy-based Associative Classifier They discuss two theorems about the perf
ormance of decision trees and eager associative classifiers. They have proved both are true.
1. The rules derived from a decision tree are a subset of the CARs mined using an eager associative classifier based on information gain.
2. CARs perform no worse than decision tree rules, according to the information gain principle.
16
Entropy-based Associative Classifier
17
Lazy Associative Classifier
18
Lazy Associative Classifier
By definition, both CeAand Cl
A are composed of CARs {X->c} in which X≤A. Because DA≤D, for a given minsup, if a rule {X->c} is frequent in D, then it must also be frequent in DA. Since Cl
A is generated from DA and
CeA is generated from D (and DA≤D), Ce
A ≤ ClA.
19
Lazy Associative Classifier
20
Lazy Associative Classifier & Eager Associative Classifier
Suppose minsup is set to 40% (|D|=10, so must occur at least 4 times in D), the set of CARs found by eager classifier is composed of these two:
1. {windy=false and humidity=normal->play=yes}2. {windy=false and temperature=cool->play=yes} None of the two CARs matches the testing instan
ce. The lazy classifier found two CARs in DA (only nee
d two times in DA):1. {outlook=overcast->play=yes}2. {temperature=hot->play=yes}
21
Lazy Associative Classifier & Eager Associative Classifier
Intuitively, lazy classifier perform better than eager classifiers because of two characteristics:
1. Missing CARs: Eager classifiers search for CARs in a large search space, which is induced by all features of the training data. While this strategy generates a large rule-set, CARs that are important to some specific test instances may be missed.
2. Highly Disjunctive Spaces: Eager classifiers generate CARs before the test instance is even known. For this reason, eager classifiers often combine small disjuncts in order to generate more general predicitions. This can reduce classification performance in highly disjunctive spaces.
22
Problems of Lazy Associative Classifier
The aforementioned discussion show an intuitive concept: the more CARs are generated, the better is the classifier.
However, the same concept also leads to overfitting, reducing the generalization and affecting the classification accuracy.
23
Problems of Lazy Associative Classifier
In fact, overfitting and high sensitivity to irrelevant features are shortcomings of lazy classifications.
A natural solution is to identify and discard the irrelevant features. Thus, feature selection methods may be used.
In experiments we show that our lazy classifiers were not seriously affected by overfitting because only the best and more general CARs are used.
24
Problems of Lazy Associative Classifier
Another disadvantage is that the lazy classifiers is typically require more work to classify all test instances.
However, simple caching mechanisms are very effective to decrease this workload. The basic idea is that different test instances may induce different rule-sets, but different rule-sets may share common CARs.
25
Experimental Evaluation
In this section we show the experimental results for the evaluation of the proposed classifiers in terms of classification effectiveness and computational performance.
Our evaluation is based on a comparison against C4.5 and LazyDT decision tree classifiers. We also compare our numbers to some results from other associative classifiers, such as CPAR, CMAR, and HARMONY, and to some results from rule induction classifiers, such as RISE, RIPPER, and SLEEPER.
26
Experimental Evaluation
We used 26 datasets from the UCI Machine Learning Repository to compare the effectiveness of the classifiers.
In all experiments we used 10-fold cross-validation. We quantify the classification effectiveness of the cl
assifiers through the conventional error rate. We used the entropy method to discretize continuo
us attributes. In the experiments we set minimum confidence to
50% and minsup to 1%.
27
Comparison Between Decision Trees ,Eager Classifiers, and Lazy Classifiers
28
Comparison Between Decision Trees ,Eager Classifiers, and Lazy Classifiers
29
Comparison Between Rule Induction and Associative Classifiers
30
Overfitting and Underfitting
31
Execution Times
32
Conclusions We present an assessment of associative cl
assification and propose improvements to associative classification by introducing a novel lazy classifier.
An important feature of the proposed lazy classifier is its ability to deal with the small disjuncts problem.
We also compare the proposed classifiers against other three associative classifiers and three rule induction classifiers and outperform them in most of the cases.