Date post: | 04-Jun-2018 |
Category: |
Documents |
Upload: | naveen-jaishankar |
View: | 227 times |
Download: | 0 times |
of 129
8/13/2019 Chap4 Classification Sep13
1/129
Unit 5
Classification: Basic Concepts, Decision
Trees, and Model Evaluation
by
PangNing Tan, Vipin Kumar, Michael Steinbach
8/13/2019 Chap4 Classification Sep13
2/129
Examples of Classification Task
Predicting tumor cells as benign or malignant
Classifying credit card transactionsas legitimate or fraudulent
Classifying secondary structures of proteinas alpha-helix, beta-sheet, or randomcoil
Categorizing news stories as finance,weather, entertainment, sports, etc
8/13/2019 Chap4 Classification Sep13
3/129
Definition
Classification is the task of learning a target function f that maps each attribute set
x to one of the predefined class labels y.
Target function known as classification model.
Descriptive Modeling: Distinguishes between objects of different classes.
(mammal, reptile, bird, fish, or amphibian).
Predictive Modeling: To predict class label of unknown records. (class- fish)
Gila monster Cold-blooded scales Not give Birth ?
8/13/2019 Chap4 Classification Sep13
4/129
Name Give Birth Lay Eggs Can Fly Live in Water Have Legs Class
human yes no no no yes mammals
python no yes no no no reptilessalmon no yes no yes no fishes
whale yes no no yes no mammals
frog no yes no sometimes yes amphibians
komodo no yes no no yes reptiles
bat yes no yes no yes mammals
pigeon no yes yes no yes birdscat yes no no no yes mammals
leopard shark yes no no yes no fishes
turtle no yes no sometimes yes reptiles
penguin no yes no sometimes yes birds
porcupine yes no no no yes mammals
eel no yes no yes no fishessalamander no yes no sometimes yes amphibians
gila monster no yes no no yes reptiles
platypus no yes no no yes mammals
owl no yes yes no yes birds
dolphin yes no no yes no mammals
eagle no yes yes no yes birds
Vertebrate data set (20)
8/13/2019 Chap4 Classification Sep13
5/129
General Approach to solve a Classification Problem
Apply
Model
Induction
Deduction
Learn
Model
Model
Tid Attrib1
Attrib2 Attrib3 Class
1
Yes Large
125K No
2
No Medium
100K No
3
No Small
70K No
4
Yes Medium
120K No
5
No Large
95K Yes
6
No Medium
60K No
7
Yes Large
220K No
8
No Small
85K Yes
9
No Medium
75K No
10
No Small
90K Yes10
Tid Attrib1
Attrib2 Attrib3 Class
11
No Small
55K ?
12
Yes Medium
80K ?
13
Yes Large
110K ?
14
No Small
95K ?
15
No Large
67K ?10
Test Set
Learning
algorithm
Training Set
8/13/2019 Chap4 Classification Sep13
6/129
Classification
Task of assigning objects to predefined category
Given a collection of records (training set )
Each record contains a set of attributes, one of the attributes is the class.
Find a model for class attribute as a function of the values of other
attributes.
Goal: previously unseen records should be assigned a class as accurately as
possible.
A test setis used to determine the accuracy of the model. Usually, the
given data set is divided into training and test sets, with training set used
to build the model and test set used to validate it.
8/13/2019 Chap4 Classification Sep13
7/129
Performance metrics
Accuracy = Number of correct predictions = f11+ f00
Total number of predictions f11+ f10+ f01+ f00 Error Rate = Number of wrong predictions = f10+ f01
Total number of predictions f11+ f10+ f01+ f00
Classification Techniques
Decision Tree Induction Methods
Rule-based Classifier Methods
Nearest-Neighbor classifiers
Bayesian Classifiers
8/13/2019 Chap4 Classification Sep13
8/129
Decision Tree Induction (How it works?)
Root node : No incoming edges and zero or more outgoing edges.
Internal Nodes: Exactly one incoming edge & two or more outgoing edges.
Leaf or terminal nodes: Exactly one incoming edge and no outgoing edges.
8/13/2019 Chap4 Classification Sep13
9/129
Classifying a unlabeled vertebrate
8/13/2019 Chap4 Classification Sep13
10/129
How to build a Decision Tree
Algorithm which employ greedy method in a reasonable amount of time is
used.
One such algorithm is Hunts algorithm.
8/13/2019 Chap4 Classification Sep13
11/129
Hunts Algorithm
1. If all the records in Dtbelong to the same class yt , then tis a leaf
node labeled as yt
2. If Dtcontains records that :
belong to more than one class, an attribute test condition is used
to split the data into smaller subsets.
Child node created for each outcome of the test condition and
the records in Dtare distributed to the children based on theoutcomes.
3. Recursively apply the procedure to each subset.
Dt
?
8/13/2019 Chap4 Classification Sep13
12/129
8/13/2019 Chap4 Classification Sep13
13/129
Conditions & Issues
First means most of the borrowers repaid the loans
We need to consider data from both class; so we take the root as home owner.
Left child is splitted again to continue splitting
Some records can be empty with no nodes associated with it
Records with identical attribute cannot be split further.
Design Issues:
(How) Training records splitting is based on attribute test condition tosmaller subsets.
Procedure to stop splitting is to exapand a node until all the records
belonging to same class have identical attribute values.
8/13/2019 Chap4 Classification Sep13
14/129
Method for Expressing Attribute Test Conditions
Depends on attribute types
Binary Nominal
Ordinal
Continuous
Depends on number of ways to split
2-way split Multi-way split
8/13/2019 Chap4 Classification Sep13
15/129
Binary Attributes
Test condition generates two potential outcomes
8/13/2019 Chap4 Classification Sep13
16/129
Nominal Attributes
Multi-way split:Use as many partitions as distinctvalues.
Binary split: Divides values into two subsets.Need to find optimal partitioning.
Decision tree algorithm like CART produce 2k-1
CarTypeFamily
Sports
Luxury
CarType{Family,Luxury} {Sports}
CarType{Sports,Luxury} {Family}
OR
8/13/2019 Chap4 Classification Sep13
17/129
8/13/2019 Chap4 Classification Sep13
18/129
Produce Multi-way split or binary.
Grouped as long it does not violate the order ofattribute values.
4.10 (a) (b) preserve the order but (c) combines small & large and
also medium & extra large
Ordinal Attributes
8/13/2019 Chap4 Classification Sep13
19/129
Continuous Attributes
Different ways of handling
Discretizationto form an ordinal categoricalattribute
Staticdiscretize once at the beginning
Dynamicranges can be found by equal intervalbucketing, equal frequency bucketing
(percentiles), or clustering.
Binary Decision: (A < v) or (A
v)consider all possible splits and finds the best cut
can be more compute intensive
8/13/2019 Chap4 Classification Sep13
20/129
Continuous Attributes
8/13/2019 Chap4 Classification Sep13
21/129
Measures for selecting the Best Split
)|( tjp
)|( tip Fraction of records belonging to class i at a given node t
Measures for selecting the best split is based on the degree of
impurity of child nodes
Smaller the degree of impurity the more skewed the class
distribution
Node with class distribution (0,1) has impurity=0; node with
uniform distribution (0.5, 0.5) has highest impurity
8/13/2019 Chap4 Classification Sep13
22/129
Which test condition is the best?
d i h li
8/13/2019 Chap4 Classification Sep13
23/129
How to determine the Best Split
Greedy approach:
Nodes with homogeneousclass distribution are
preferred
Need a measure of node impurity:
C0: 5
C1: 5
C0: 9
C1: 1
Non-homogeneous,
High degree of impurity
Homogeneous,
Low degree of impurity
f d i
8/13/2019 Chap4 Classification Sep13
24/129
Measures of Node Impurity
Gini Index
Entropy
Misclassification error
i
tiptGINI 2)]|([1)(
i
tiptiptEntropy )|(log)|()( 2
(NOTE: p( j | t) is the relative frequency of class j at node t).
)]/([max1)( tipttionerrorClassifica i
Where c is the number of classes and 0 log2 0 = 0 is entropy calculations
M f I i GINI
8/13/2019 Chap4 Classification Sep13
25/129
Measure of Impurity: GINI
Gini Index for a given node t :
(NOTE: p( j | t) is the relative frequency of class j at node t).
Maximum (1 - 1/nc) when records are equally distributed among all
classes, implying least interesting information
Minimum (0.0) when all records belong to one class, implying mostinteresting information
C1 0
C2 6
Gini=0.000
C1 2
C2 4
Gini=0.444
C1 3
C2 3
Gini=0.500
C1 1
C2 5
Gini=0.278
i
tiptGINI 2)]|([1)(
P(C1) = 0/6 = 0 P(C2) = 6/6 = 1
Gini = 1P(C1)2P(C2)2= 101 = 0
P(C1) = 1/6 P(C2) = 5/6
Gini = 1(1/6)2(5/6)2= 0.278
P(C1) = 2/6 P(C2) = 4/6
Gini = 1(2/6)2(4/6)2= 0.444
P(C1) = 3/6 P(C2) = 3/6
Gini = 1(3/6)2(3/6)2 =0.5
Al i S li i C i i b d INFO
8/13/2019 Chap4 Classification Sep13
26/129
Alternative Splitting Criteria based on INFO
Entropy at a given node t:
(NOTE: p( j | t) is the relative frequency of class j at node t).
Measures homogeneity of a node.
Maximum (log nc) when records are equally distributed among all
classes implying least information
Minimum (0.0) when all records belong to one class, implying most
information
Entropy based computations are similar to the GINI index
computations
i
tiptiptEntropy )|(log)|()( 2
E l f ti E t
8/13/2019 Chap4 Classification Sep13
27/129
Examples for computing Entropy
C1 0
C2 6
C1 2
C2 4
C1 1
C2 5
P(C1) = 0/6 = 0 P(C2) = 6/6 = 1
Entropy =0 log201 log21 =00 = 0
P(C1) = 1/6 P(C2) = 5/6
Entropy =(1/6) log2(1/6)(5/6) log2(1/6) = 0.65
P(C1) = 2/6 P(C2) = 4/6
Entropy =(2/6) log2(2/6)(4/6) log2(4/6) = 0.92
i
tiptiptEntropy )|(log)|()( 2
8/13/2019 Chap4 Classification Sep13
28/129
Splitting Criteria based on Classification Error
Classification error at a node t :
Measures misclassification error made by a node.
Maximum (1 - 1/nc) when records are equally distributed
among all classes, implying least interesting information
Minimum (0.0) when all records belong to one class, implying
most interesting information
)]/([max1)( tipttionerrorClassifica i
E l f C ti E
8/13/2019 Chap4 Classification Sep13
29/129
Examples for Computing Error
C1 0
C2 6
C1 2
C2 4
C1 1
C2 5
P(C1) = 0/6 = 0 P(C2) = 6/6 = 1
Error = 1max (0, 1) = 11 = 0
P(C1) = 1/6 P(C2) = 5/6
Error = 1max (1/6, 5/6) = 15/6 = 1/6
P(C1) = 2/6 P(C2) = 4/6
Error = 1max (2/6, 4/6) = 14/6 = 1/3
)]/([max1)( tipttionerrorClassifica i
8/13/2019 Chap4 Classification Sep13
30/129
Mi l ifi ti E Gi i
8/13/2019 Chap4 Classification Sep13
31/129
Misclassification Error vs Gini
A?
Yes No
Node N1 Node N2
Parent
C1 7
C2 3
Gini = 0.42
N1 N2
C1 3 4
C2 0 3
Gini=0.361
Gini(N1)
= 1(3/3)2(0/3)2
= 0
Gini(N2)= 1(4/7)2(3/7)2
= 0.489
Gini(Children)
= 3/10 * 0
+ 7/10 * 0.489
= 0.342
Gini improves !!
Info mation Gain to find best split
8/13/2019 Chap4 Classification Sep13
32/129
Information Gain to find best split
B?
Yes No
Node N3 Node N4
A?
Yes No
Node N1 Node N2
Before Splitting:
C0 N10
C1 N11
C0 N20
C1 N21
C0 N30
C1 N31
C0 N40
C1 N41
C0 N00
C1 N01M0
M1 M2 M3 M4
M12 M34
Difference in entropy gives information Gain
Gain = M0M12 vs M0M34
Splitting of Binary Attributes
8/13/2019 Chap4 Classification Sep13
33/129
Splitting of Binary Attributes
i ib C i G d
8/13/2019 Chap4 Classification Sep13
34/129
Binary Attributes: Computing GINI Index
Splits into two partitions
Effect of Weighing partitions:
Larger and Purer Partitions are sought for.
B?
Yes No
Node N1 Node N2
Parent
C0 6
C1 6
Gini = 0.500
N1 N2C0 1 5
C1 4 2
Gini=0.333
Gini(N1)
= 1(5/6)2(2/6)2
= 0.194
Gini(N2)
= 1(1/6)2(4/6)2
= 0.528
Gini(Children)
= 7/12 * 0.194 + 5/12 * 0.528
= 0.333
8/13/2019 Chap4 Classification Sep13
35/129
Splitting of Nominal Attributes: Computing Gini Index
For each distinct value, gather counts for each class in the dataset
Use the count matrix to make decisions
Multiway split
Gini{family} = 0.375 ; Gini {sports } = 0 ; Gini {Luxury} = 0.219
Gini (car type) = (4/20) * 0.375 + (8/20) * 0 + (8/20) * 0.219 = 0.163
8/13/2019 Chap4 Classification Sep13
36/129
Splitting of Continuous Attributes: Computing Gini Index
Use Binary Decisions based on one value
Several Choices for the splitting value
Number of possible splitting values
= Number of distinct values
Each splitting value has a count matrix associated
with it
Class counts in each of the partitions, A < v
and A v
Simple method to choose best v
For each v, scan the database to gather count
matrix and compute its Gini index
Computationally Inefficient! Repetition of work.
Tid Refund MaritalStatus
TaxableIncome Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes10
TaxableIncome
> 80K?
Yes No
C ti Att ib t C ti Gi i I d
8/13/2019 Chap4 Classification Sep13
37/129
Continuous Attributes: Computing Gini Index...
For efficient computation: for each attribute,
Sort the attribute on values
Linearly scan these values, each time updating the count matrix and
computing Gini index
Choose the split position that has the least Gini index
S litti B d INFO
8/13/2019 Chap4 Classification Sep13
38/129
Splitting Based on INFO...
Information Gain:
Parent Node, p is split into k partitions;
niis number of records in partition i
Measures Reduction in Entropy achieved because of the split.
Choose the split that achieves most reduction (maximizes GAIN)
Used in ID3 and C4.5
Disadvantage: Tends to prefer splits that result in large number of
partitions, each being small but pure.
k
i
i
sp li tiEntropy
nnpEntropyGAIN
1)()(
Example of Information Gain
8/13/2019 Chap4 Classification Sep13
39/129
Example of Information Gain
Class P: buys_computer = yes
Class N: buys_computer = noage pi ni I(pi, ni)
40 3 2 0.971
694.0)2,3(145
)0,4(14
4)3,2(
14
5)(
I
IIDInfoage
048.0)_(
151.0)(
029.0)(
ratingcreditGain
studentGain
incomeGain
246.0)()()( DInfoDInfoageGain age
age income student credit_rating buys_computer
40 low yes fair yes
>40 low yes excellent no
3140 low yes excellent yes
8/13/2019 Chap4 Classification Sep13
40/129
Splitting the samples using age
income student credit_rating buys_computer
high no fair nohigh no excellent no
medium no fair no
low yes fair yes
medium yes excellent yes
income student credit_rating buys_computer
high no fair yeslow yes excellent yes
medium no excellent yes
high yes fair yes
income student credit_rating buys_computer
medium no fair yeslow yes fair yes
low yes excellent no
medium yes fair yes
medium no excellent no
age?
40
labeled yes
Splitting Based on INFO
8/13/2019 Chap4 Classification Sep13
41/129
Splitting Based on INFO...
Gain Ratio:
Parent Node, p is split into k partitions
niis the number of records in partition I
Adjusts Information Gain by the entropy of the partitioning (SplitINFO).
Higher entropy partitioning (large number of small partitions) is penalized!
Used in C4.5
Designed to overcome the disadvantage of Information Gain
SplitINFO
GAINGainRATIO Split
sp li t
k
i
ii
n
n
n
nSplitINFO
1log
Decision Tree Induction
8/13/2019 Chap4 Classification Sep13
42/129
Decision Tree Induction
Greedy strategy.
Split the records based on an attribute testthat optimizes certain criterion.
Issues
Determine how to split the records
How to specify the attribute test condition?
How to determine the best split? Determine when to stop splitting
Stopping Criteria for Tree Induction
8/13/2019 Chap4 Classification Sep13
43/129
Stopping Criteria for Tree Induction
Stop expanding a node when all the recordsbelong to the same class
Stop expanding a node when all the records havesimilar attribute values
Early termination
Algorithm : Decision Tree Algorithm
8/13/2019 Chap4 Classification Sep13
44/129
Algorithm : Decision Tree Algorithm
TreeGrowth (E, F)
1. If stopping_cond(E, F) =truethen2. leaf = createNode ()
3. leaf.label = Classify (E)
4. return leaf.
5. else
6. root = createNode()
7. root. test_cond = find_best_split (E,F)
8. let V= { v | v is poosible outcome of root. test_cond }.
9. for each v V do
10.
Ev = { e | root. test_cond (e) = v and e V }.11. child = TreeGrowth (Ev , F)
12. add child as descendent of root and label the edge ( rootchild) as v
13. end for
14.end if
15.return root
8/13/2019 Chap4 Classification Sep13
45/129
createNode()create a new node. Has a test condition or a
class label (node.label)
find_best_split ()attribute to be selected as as test condition for
splitting records. Entropy, Gini, Error.
Classify()determine class label to be assigned to leaf node.
leaf.label =argmax p(i | t)
stopping _cond()terminate the tree growth by testing whether all
records has same class label or same attribute values.
Later tree pruning and overfitting.
Decision Tree Based Classification
8/13/2019 Chap4 Classification Sep13
46/129
Decision Tree Based Classification
Advantages: Inexpensive to construct
Extremely fast at classifying unknown records
Easy to interpret for small-sized trees
Accuracy is comparable to other classification
techniques for many simple data sets
8/13/2019 Chap4 Classification Sep13
47/129
8/13/2019 Chap4 Classification Sep13
48/129
Decision Tree Induction
8/13/2019 Chap4 Classification Sep13
49/129
Decision Tree Induction
Many Algorithms: Hunts Algorithm (one of the earliest)
CART
ID3, C4.5
SLIQ,SPRINT
Computing Impurity Measure
8/13/2019 Chap4 Classification Sep13
50/129
Computing Impurity Measure
Tid Refund MaritalStatus
TaxableIncome Class
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 ? Single 90K Yes10
Class
= Yes
Class
= No
Refund=Yes 0 3
Refund=No 2 4
Refund=? 1 0
Split on Refund:
Entropy(Refund=Yes) = 0
Entropy(Refund=No)
= -(2/6)log(2/6)(4/6)log(4/6) = 0.9183
Entropy(Children)
= 0.3 (0) + 0.6 (0.9183) = 0.551
Gain = 0.9 (0.88130.551) = 0.3303
Missing
value
Before Splitting:
Entropy(Parent)
= -0.3 log(0.3)-(0.7)log(0.7) = 0.8813
Rule-Based Classifier
8/13/2019 Chap4 Classification Sep13
51/129
Rule Based Classifier
Classify records by using a collection ofifthen rules
Rule: (Condition) y
where
Conditionis a conjunctions of attributes
yis the class label
LHS: rule antecedent or condition
RHS: rule consequent
Examples of classification rules:
(Blood Type=Warm) (Lay Eggs=Yes) Birds
(Taxable Income < 50K) (Refund=Yes) Evade=No
Rule-based Classifier (Example)
8/13/2019 Chap4 Classification Sep13
52/129
Rule based Classifier (Example)
R1: (Give Birth = no) (Can Fly = yes) Birds
R2: (Give Birth = no) (Live in Water = yes) Fishes
R3: (Give Birth = yes) (Blood Type = warm) Mammals
R4: (Give Birth = no) (Can Fly = no) Reptiles
R5: (Live in Water = sometimes) Amphibians
Name Blood Type Give Birth Can Fly Live in Water Class
human warm yes no no mammalspython cold no no no reptilessalmon cold no no yes fishes
whale warm yes no yes mammalsfrog cold no no sometimes amphibianskomodo cold no no no reptilesbat warm yes yes no mammalspigeon warm no yes no birdscat warm yes no no mammalsleopard shark cold yes no yes fishesturtle cold no no sometimes reptilespenguin warm no no sometimes birds
porcupine warm yes no no mammalseel cold no no yes fishessalamander cold no no sometimes amphibiansgila monster cold no no no reptilesplatypus warm no no no mammalsowl warm no yes no birdsdolphin warm yes no yes mammalseagle warm no yes no birds
Application of Rule-Based Classifier
8/13/2019 Chap4 Classification Sep13
53/129
Application of Rule Based Classifier
A rule rcoversan instance x if the attributes of the instance
satisfy the condition of the rule (Aj op vj ) attribute test which is an attribute- value pair called
conjunct
Conditioni = (A1 op v1 ) (A2 op v2 ) . (Ak op vk )
The rule R1 covers a hawk => Bird
The rule R3 covers the grizzly bear => Mammal
Name Blood Type Give Birth Can Fly Live in Water Class
hawk warm no yes no ?grizzly bear warm yes no no ?
8/13/2019 Chap4 Classification Sep13
54/129
Tid Refund Marital Taxable
8/13/2019 Chap4 Classification Sep13
55/129
Status Income Class
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 NoSingle
90KYes10
(Status=Single) No
Coverage = ---- %, Accuracy = -------------%
How does Rule-based Classifier Work?
8/13/2019 Chap4 Classification Sep13
56/129
How does Rule based Classifier Work?
R1: (Give Birth = no) (Can Fly = yes) Birds
R2: (Give Birth = no) (Live in Water = yes) FishesR3: (Give Birth = yes) (Blood Type = warm) Mammals
R4: (Give Birth = no) (Can Fly = no) Reptiles
R5: (Live in Water = sometimes) Amphibians
A lemur triggers rule R3, so it is classified as a mammalA turtle triggers both R4 and R5
A dogfish shark triggers none of the rules
Name Blood Type Give Birth Can Fly Live in Water Class
lemur warm yes no no ?turtle cold no no sometimes ?dogfish shark cold yes no yes ?
Characteristics of Rule-Based Classifier
8/13/2019 Chap4 Classification Sep13
57/129
Characteristics of Rule Based Classifier
Mutually exclusive rules
Classifier contains mutually exclusive rules ifthe rules are independent of each other
Every record is covered by at most one rule
Exhaustive rules
Classifier has exhaustive coverage if it
accounts for every possible combination ofattribute values
Each record is covered by at least one rule
From Decision Trees To Rules
8/13/2019 Chap4 Classification Sep13
58/129
From Decision Trees To Rules
YESYESNONO
NONO
NONO
Yes No
{Married}{Single,
Divorced}
< 80K > 80K
Taxable
Income
Marital
Status
Refund
Classification Rules
(Refund=Yes) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income>80K) ==> Yes
(Refund=No, Marital Status={Married}) ==> No
Rules are mutually exclusive and exhaustive
Rule set contains as much information as the
tree
Rules Can Be Simplified
8/13/2019 Chap4 Classification Sep13
59/129
p
YESYESNONO
NONO
NONO
Yes No
{Married}{Single,
Divorced}
< 80K > 80K
Taxable
Income
Marital
Status
Refund
Tid Refund MaritalStatus
TaxableIncome Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes10
Initial Rule: (Refund=No) (Status=Married) No
Simplified Rule: (Status=Married) No
Effect of Rule Simplification
8/13/2019 Chap4 Classification Sep13
60/129
p
Rules are no longer mutually exclusive
A record may trigger more than one rule
Solution?
Ordered rule set
Unordered rule setuse voting schemes
Rules are no longer exhaustive
A record may not trigger any rules Solution?
Use a default class
Ordered Rule Set
8/13/2019 Chap4 Classification Sep13
61/129
Rules are rank ordered according to their priority An ordered rule set is known as a decision list
When a test record is presented to the classifier It is assigned to the class label of the highest ranked rule it has
triggered
If none of the rules fired, it is assigned to the default class
R1: (Give Birth = no) (Can Fly = yes) Birds
R2: (Give Birth = no) (Live in Water = yes) Fishes
R3: (Give Birth = yes) (Blood Type = warm) Mammals
R4: (Give Birth = no) (Can Fly = no) Reptiles
R5: (Live in Water = sometimes) Amphibians
Name Blood Type Give Birth Can Fly Live in Water Class
turtle cold no no sometimes ?
Rule Ordering Schemes
8/13/2019 Chap4 Classification Sep13
62/129
g
Rule-based ordering Individual rules are ranked based on their quality
Class-based ordering Rules that belong to the same class appear together
Rule-based Ordering
(Refund=Yes) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income>80K) ==> Yes
(Refund=No, Marital Status={Married}) ==> No
Class-based Ordering
(Refund=Yes) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income No
(Refund=No, Marital Status={Married}) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income>80K) ==> Yes
Building Classification Rules
8/13/2019 Chap4 Classification Sep13
63/129
g
Direct Method:
Extract rules directly from data
e.g.: RIPPER, CN2, Holtes 1R
Indirect Method:Extract rules from other classification models (e.g.
decision trees, neural networks, etc).
e.g: C4.5rules
Direct Method: Sequential Covering
8/13/2019 Chap4 Classification Sep13
64/129
q g
1. Start from an empty rule
2. Grow a rule using the Learn-One-Rule function
3. Remove training records covered by the rule
4. Repeat Step (2) and (3) until stopping criterion
is met
8/13/2019 Chap4 Classification Sep13
65/129
Example of Sequential Covering
8/13/2019 Chap4 Classification Sep13
66/129
p q g
(iii) Step 2
R1
(iv) Step 3
R1
R2
Aspects of Sequential Covering
8/13/2019 Chap4 Classification Sep13
67/129
p q g
Rule Growing
Instance Elimination
Rule Evaluation
Stopping Criterion
Rule Pruning
Rule Growing
8/13/2019 Chap4 Classification Sep13
68/129
g
Two common strategies
Status =
Single
Status =
DivorcedStatus =
Married
Income
> 80K...
Yes: 3
No: 4{ }
Yes: 0
No: 3
Refund=
No
Yes: 3
No: 4
Yes: 2
No: 1
Yes: 1
No: 0
Yes: 3
No: 1
(a) General-to-specific
Refund=No,
Status=Single,
Income=85K
(Class=Yes)
Refund=No,
Status=Single,
Income=90K
(Class=Yes)
Refund=No,
Status = Single
(Class = Yes)
(b) Specific-to-general
Ripper Algorithm
8/13/2019 Chap4 Classification Sep13
69/129
Sequential covering Algorithm:
Start from an empty rule: {} => class
Add conjuncts that maximizes FOILs information gain measure:
R0: {} => class (initial rule)
R1: {A} => class (rule after adding conjunct)
Gain(R0, R1) = t [ log (p1/(p1+n1))log (p0/(p0 + n0)) ]
where t: number of positive instances covered by both R0 and R1
p0: number of positive instances covered by R0
n0: number of negative instances covered by R0
p1: number of positive instances covered by R1
n1: number of negative instances covered by R1
Instance Elimination
8/13/2019 Chap4 Classification Sep13
70/129
Why do we need to
eliminate instances? Otherwise, the next rule is
identical to previous rule
Why do we removepositive instances? Ensure that the next rule is
different
Why do we removenegative instances? Prevent underestimating
accuracy of rule
Compare rules R2 and R3in the diagram
class = +
class = -
+
+ +
++
++
+
++
+
+
+
+
+
+
++
+
+
-
-
--
- --
--
- -
-
-
-
-
--
-
-
-
-
+
+
++
+
+
+
R1
R3 R2
+
+
8/13/2019 Chap4 Classification Sep13
71/129
Stopping Criterion and Rule Pruning
8/13/2019 Chap4 Classification Sep13
72/129
Stopping criterion
Compute the gain If gain is not significant, discard the new rule
Rule Pruning Similar to post-pruning of decision trees
Reduced Error Pruning:Remove one of the conjuncts in the rule
Compare error rate on validation set before andafter pruning
If error improves, prune the conjunct
Instance Elimination
8/13/2019 Chap4 Classification Sep13
73/129
Why do we need to
eliminate instances? Otherwise, the next rule is
identical to previous rule
Why do we removepositive instances?
Ensure that the next rule isdifferent
Why do we removenegative instances? Prevent underestimating
accuracy of rule
Compare rules R2 and R3in the diagram
class = +
class = -
+
+ +
++
++
+
++
+
+
+
+
+
+
++
+
+
-
-
--
- --
--
- -
-
-
-
-
--
-
-
-
-
+
+
++
+
+
+
R1
R3 R2
+
+
8/13/2019 Chap4 Classification Sep13
74/129
Stopping Criterion and Rule Pruning
8/13/2019 Chap4 Classification Sep13
75/129
Stopping criterion
Compute the gain If gain is not significant, discard the new rule
Rule Pruning Similar to post-pruning of decision trees
Reduced Error Pruning:Remove one of the conjuncts in the rule
Compare error rate on validation set before andafter pruning
If error improves, prune the conjunct
Indirect Methods
8/13/2019 Chap4 Classification Sep13
76/129
Rule Set
r1: (P=No,Q=No) ==> -r2: (P=No,Q=Yes) ==> +
r3: (P=Yes,R=No) ==> +
r4: (P=Yes,R=Yes,Q=No) ==> -
r5: (P=Yes,R=Yes,Q=Yes) ==> +
P
Q R
Q- + +
- +
No No
No
Yes Yes
Yes
No Yes
Rule set from decision tree
Each rule with a class + or -.
R2, R3, r5 predict positive class when Q=yes
Rules simplified as r2 : (Q=yes) +
r3: (P=yes) (R=No)+
Rule generation : C4.5rules
8/13/2019 Chap4 Classification Sep13
77/129
Extract classification rules from every path of a
decision tree For each rule, r: A y,
consider an alternative rule r: A y where A isobtained by removing one of the conjuncts in A
Compare the pessimistic error rate for r againstall rs
Prune if one of the rs has lower pessimistic errorrate
Repeat until we can no longer improvegeneralization error
Rule ordering : C4.5rules
8/13/2019 Chap4 Classification Sep13
78/129
Use of class ordering where rules that predict same
class grouped into same subsets. Compute description length of each subset
The classes are arranged in increasing order of
their total description length. Class with smallest description length is given
highest priority.
Description length = L(exception) + g * L(model)
g is a parameter that takes into account the presenceof redundant attributes in a rule set(default value = 0.5)
Example
8/13/2019 Chap4 Classification Sep13
79/129
Name Give Birth Lay Eggs Can Fly Live in Water Have Legs Class
human yes no no no yes mammals
python no yes no no no reptilessalmon no yes no yes no fisheswhale yes no no yes no mammals
frog no yes no sometimes yes amphibianskomodo no yes no no yes reptilesbat yes no yes no yes mammals
pigeon no yes yes no yes birdscat yes no no no yes mammals
leopard shark yes no no yes no fishesturtle no yes no sometimes yes reptiles
penguin no yes no sometimes yes birdsporcupine yes no no no yes mammals
eel no yes no yes no fishessalamander no yes no sometimes yes amphibians
gila monster no yes no no yes reptilesplatypus no yes no no yes mammals
owl no yes yes no yes birdsdolphin yes no no yes no mammalseagle no yes yes no yes birds
C4.5 versus C4.5rules versus RIPPER
8/13/2019 Chap4 Classification Sep13
80/129
C4.5rules:
(Give Birth=No, Can Fly=Yes) Birds
(Give Birth=No, Live in Water=Yes) Fishes
(Give Birth=Yes) Mammals
(Give Birth=No, Can Fly=No, Live in Water=No)Reptiles
( ) Amphibians
Give
Birth?
Live In
Water?
Can
Fly?
Mammals
Fishes Amphibians
Birds Reptiles
Yes No
Yes
Sometimes
No
Yes No
RIPPER:
(Live in Water=Yes) Fishes
(Have Legs=No) Reptiles
(Give Birth=No, Can Fly=No, Live In Water=No)Reptiles
(Can Fly=Yes,Give Birth=No) Birds() Mammals
Advantages of Rule-Based Classifiers
8/13/2019 Chap4 Classification Sep13
81/129
As highly expressive as decision trees
Easy to interpret
Easy to generate
Can classify new instances rapidly
Performance comparable to decision trees
Rule-Based Classifier
8/13/2019 Chap4 Classification Sep13
82/129
Classify records by using a collection of
ifthen rules
Rule: (Condition) y
where
Conditionis a conjunctions of attributes
yis the class label
LHS: rule antecedent or condition
RHS: rule consequent
Examples of classification rules:
(Blood Type=Warm) (Lay Eggs=Yes) Birds
(Taxable Income < 50K) (Refund=Yes) Evade=No
8/13/2019 Chap4 Classification Sep13
83/129
Application of Rule-Based Classifier
8/13/2019 Chap4 Classification Sep13
84/129
A rule rcoversan instance x if the attributes of the instance
satisfy the condition of the rule (Aj op vj ) attribute test which is an attribute- value pair called
conjunct
Conditioni = (A1 op v1 ) (A2 op v2 ) . (Ak op vk )
The rule R1 covers a hawk => Bird
The rule R3 covers the grizzly bear => Mammal
Name Blood Type Give Birth Can Fly Live in Water Class
hawk warm no yes no ?grizzly bear warm yes no no ?
Rule Coverage and Accuracy
8/13/2019 Chap4 Classification Sep13
85/129
Coverage of a rule:
Fraction of records that satisfy the antecedent of a rule= | A | / |D|
Accuracy of a rule:
Fraction of records that satisfy both the antecedent and
consequent of a rule= | A y | / |D|
(Gives Birth=yes) (Body Temperature = warm-blooded) Mammals
Coverage = 33%, Accuracy = 6/6 = 100%
Tid Refund MaritalStatus
TaxableIncome Class
8/13/2019 Chap4 Classification Sep13
86/129
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes10(Status=Single) No
Coverage = ---- %, Accuracy = -------------%
How does Rule-based Classifier Work?
8/13/2019 Chap4 Classification Sep13
87/129
R1: (Give Birth = no) (Can Fly = yes) Birds
R2: (Give Birth = no)
(Live in Water = yes)
FishesR3: (Give Birth = yes) (Blood Type = warm) Mammals
R4: (Give Birth = no) (Can Fly = no) Reptiles
R5: (Live in Water = sometimes) Amphibians
A lemur triggers rule R3, so it is classified as a mammalA turtle triggers both R4 and R5
A dogfish shark triggers none of the rules
Name Blood Type Give Birth Can Fly Live in Water Class
lemur warm yes no no ?turtle cold no no sometimes ?dogfish shark cold yes no yes ?
8/13/2019 Chap4 Classification Sep13
88/129
From Decision Trees To Rules
8/13/2019 Chap4 Classification Sep13
89/129
YESYESNONO
NONO
NONO
Yes No
{Married}
{Single,
Divorced}
< 80K > 80K
Taxable
Income
Marital
Status
Refund
Classification Rules
(Refund=Yes) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income>80K) ==> Yes
(Refund=No, Marital Status={Married}) ==> No
Rules are mutually exclusive and exhaustive
Rule set contains as much information as the
tree
Rules Can Be Simplified
8/13/2019 Chap4 Classification Sep13
90/129
YESYESNONO
NONO
NONO
Yes No
{Married}{Single,
Divorced}
< 80K > 80K
Taxable
Income
Marital
Status
Refund
Tid Refund MaritalStatus
TaxableIncome Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes10
Initial Rule: (Refund=No) (Status=Married) No
Simplified Rule: (Status=Married) No
Effect of Rule Simplification
8/13/2019 Chap4 Classification Sep13
91/129
Rules are no longer mutually exclusive
A record may trigger more than one rule
Solution?
Ordered rule set
Unordered rule setuse voting schemes
Rules are no longer exhaustive
A record may not trigger any rules Solution?
Use a default class
Ordered Rule Set
8/13/2019 Chap4 Classification Sep13
92/129
Rules are rank ordered according to their priority An ordered rule set is known as a decision list
When a test record is presented to the classifier It is assigned to the class label of the highest ranked rule it has
triggered
If none of the rules fired, it is assigned to the default class
R1: (Give Birth = no) (Can Fly = yes) Birds
R2: (Give Birth = no) (Live in Water = yes) Fishes
R3: (Give Birth = yes) (Blood Type = warm) Mammals
R4: (Give Birth = no) (Can Fly = no) Reptiles
R5: (Live in Water = sometimes) Amphibians
Name Blood Type Give Birth Can Fly Live in Water Class
turtle cold no no sometimes ?
Rule Ordering Schemes
8/13/2019 Chap4 Classification Sep13
93/129
Rule-based ordering Individual rules are ranked based on their quality
Class-based ordering Rules that belong to the same class appear together
Rule-based Ordering
(Refund=Yes) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income No
(Refund=No, Marital Status={Single,Divorced},Taxable Income>80K) ==> Yes
(Refund=No, Marital Status={Married}) ==> No
Class-based Ordering
(Refund=Yes) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income No
(Refund=No, Marital Status={Married}) ==> No
(Refund=No, Marital Status={Single,Divorced},
Taxable Income>80K) ==> Yes
Building Classification Rules
8/13/2019 Chap4 Classification Sep13
94/129
Direct Method:
Extract rules directly from data
e.g.: RIPPER, CN2, Holtes 1R
Indirect Method:Extract rules from other classification models (e.g.
decision trees, neural networks, etc).
e.g: C4.5rules
Direct Method: Sequential Covering
8/13/2019 Chap4 Classification Sep13
95/129
1. Start from an empty rule
2. Grow a rule using the Learn-One-Rule function
3. Remove training records covered by the rule
4. Repeat Step (2) and (3) until stopping criterion
is met
Example of Sequential Covering
8/13/2019 Chap4 Classification Sep13
96/129
(i) Original Data (ii) Step 1
Example of Sequential Covering
8/13/2019 Chap4 Classification Sep13
97/129
(iii) Step 2
R1
(iv) Step 3
R1
R2
Aspects of Sequential Covering
8/13/2019 Chap4 Classification Sep13
98/129
Rule Growing
Instance Elimination
Rule Evaluation
Stopping Criterion
Rule Pruning
Rule Growing
8/13/2019 Chap4 Classification Sep13
99/129
Two common strategies
Status =
Single
Status =
DivorcedStatus =
Married
Income
> 80K...
Yes: 3
No: 4{ }
Yes: 0
No: 3
Refund=
No
Yes: 3
No: 4
Yes: 2
No: 1
Yes: 1
No: 0
Yes: 3
No: 1
(a) General-to-specific
Refund=No,
Status=Single,
Income=85K
(Class=Yes)
Refund=No,
Status=Single,
Income=90K
(Class=Yes)
Refund=No,
Status = Single
(Class = Yes)
(b) Specific-to-general
Rule Growing (Examples)
8/13/2019 Chap4 Classification Sep13
100/129
CN2 Algorithm:
Start from an empty conjunct: {} Add conjuncts that minimizes the entropy measure: {A}, {A,B},
Determine the rule consequent by taking majority class of instancescovered by the rule
RIPPER Algorithm:
Start from an empty rule: {} => class Add conjuncts that maximizes FOILs information gain measure:
R0: {} => class (initial rule)
R1: {A} => class (rule after adding conjunct)
Gain(R0, R1) = t [ log (p1/(p1+n1))log (p0/(p0 + n0)) ]
where t: number of positive instances covered by both R0 and R1
p0: number of positive instances covered by R0
n0: number of negative instances covered by R0
p1: number of positive instances covered by R1
n1: number of negative instances covered by R1
Instance Elimination
8/13/2019 Chap4 Classification Sep13
101/129
Why do we need to
eliminate instances? Otherwise, the next rule isidentical to previous rule
Why do we removepositive instances?
Ensure that the next rule isdifferent
Why do we removenegative instances? Prevent underestimating
accuracy of rule Compare rules R2 and R3
in the diagram
class = +
class = -
+
+ +
++
++
+
++
+
+
+
+
+
+
++
+
+
-
-
--
- --
--
- -
-
-
-
-
--
-
-
-
-
+
+
++
+
+
+
R1
R3 R2
+
+
Rule Evaluation
8/13/2019 Chap4 Classification Sep13
102/129
Metrics:
Accuracy
Laplace
M-estimate
knnc
1
knkpnc
n : Number of instancescovered by rule
nc: Number of instancescovered by rule
k: Number of classes
p: Prior probability
nnc
Stopping Criterion and Rule Pruning
8/13/2019 Chap4 Classification Sep13
103/129
Stopping criterion
Compute the gain If gain is not significant, discard the new rule
Rule Pruning Similar to post-pruning of decision trees
Reduced Error Pruning:Remove one of the conjuncts in the rule
Compare error rate on validation set before andafter pruning
If error improves, prune the conjunct
Summary of Direct Method
8/13/2019 Chap4 Classification Sep13
104/129
Grow a single rule
Remove Instances from rule
Prune the rule (if necessary)
Add rule to Current Rule Set
Repeat
Direct Method: RIPPER
8/13/2019 Chap4 Classification Sep13
105/129
For 2-class problem, choose one of the classes as
positive class, and the other as negative class Learn rules for positive class
Negative class will be default class
For multi-class problem
Order the classes according to increasing classprevalence (fraction of instances that belong to aparticular class)
Learn the rule set for smallest class first, treat the rest
as negative class
Repeat with next smallest class as positive class
Direct Method: RIPPER
8/13/2019 Chap4 Classification Sep13
106/129
Growing a rule:
Start from empty rule Add conjuncts as long as they improve FOILs
information gain
Stop when rule no longer covers negative examples
Prune the rule immediately using incremental reducederror pruning
Measure for pruning: v = (p-n)/(p+n)p: number of positive examples covered by the rule in
the validation set
n: number of negative examples covered by the rule inthe validation set
Pruning method: delete any final sequence ofconditions that maximizes v
Direct Method: RIPPER
8/13/2019 Chap4 Classification Sep13
107/129
Building a Rule Set:
Use sequential covering algorithmFinds the best rule that covers the current set ofpositive examples
Eliminate both positive and negative examplescovered by the rule
Each time a rule is added to the rule set,compute the new description length
stop adding new rules when the new descriptionlength is d bits longer than the smallest descriptionlength obtained so far
Direct Method: RIPPER
8/13/2019 Chap4 Classification Sep13
108/129
Optimize the rule set:
For each rule rin the rule set R
Consider 2 alternative rules:
Replacement rule (r*): grow new rule from scratch
Revised rule(r): add conjuncts to extend the rule r
Compare the rule set for r against the rule set for r*and r
Choose rule set that minimizes MDL principle
Repeat rule generation and rule optimizationfor the remaining positive examples
Indirect Methods
8/13/2019 Chap4 Classification Sep13
109/129
Rule Set
r1: (P=No,Q=No) ==> -
r2: (P=No,Q=Yes) ==> +r3: (P=Yes,R=No) ==> +
r4: (P=Yes,R=Yes,Q=No) ==> -
r5: (P=Yes,R=Yes,Q=Yes) ==> +
P
Q R
Q- + +
- +
No No
No
Yes Yes
Yes
No Yes
Rule set from decision tree
Each rule with a class + or -.
R2, R3, r5 predict positive class when Q=yes
Rules simplified as r2 : (Q=yes) +
r3: (P=yes) (R=No)+
Rule generation : C4.5rules
8/13/2019 Chap4 Classification Sep13
110/129
Extract classification rules from every path of a
decision tree For each rule, r: A y,
consider an alternative rule r: A y where A isobtained by removing one of the conjuncts in A
Compare the pessimistic error rate for r againstall rs
Prune if one of the rs has lower pessimistic errorrate
Repeat until we can no longer improvegeneralization error
8/13/2019 Chap4 Classification Sep13
111/129
Example
8/13/2019 Chap4 Classification Sep13
112/129
Name Give Birth Lay Eggs Can Fly Live in Water Have Legs Class
human yes no no no yes mammals
python no yes no no no reptilessalmon no yes no yes no fisheswhale yes no no yes no mammals
frog no yes no sometimes yes amphibianskomodo no yes no no yes reptilesbat yes no yes no yes mammals
pigeon no yes yes no yes birdscat yes no no no yes mammals
leopard shark yes no no yes no fishesturtle no yes no sometimes yes reptiles
penguin no yes no sometimes yes birdsporcupine yes no no no yes mammals
eel no yes no yes no fishes
salamander no yes no sometimes yes amphibiansgila monster no yes no no yes reptilesplatypus no yes no no yes mammals
owl no yes yes no yes birdsdolphin yes no no yes no mammalseagle no yes yes no yes birds
C4.5 versus C4.5rules versus RIPPER
8/13/2019 Chap4 Classification Sep13
113/129
C4.5rules:
(Give Birth=No, Can Fly=Yes) Birds
(Give Birth=No, Live in Water=Yes) Fishes
(Give Birth=Yes) Mammals
(Give Birth=No, Can Fly=No, Live in Water=No)Reptiles
( ) Amphibians
Give
Birth?
Live In
Water?
Can
Fly?
Mammals
Fishes Amphibians
Birds Reptiles
Yes No
Yes
Sometimes
No
Yes No
RIPPER:(Live in Water=Yes) Fishes
(Have Legs=No) Reptiles
(Give Birth=No, Can Fly=No, Live In Water=No)Reptiles
(Can Fly=Yes,Give Birth=No) Birds
() Mammals
8/13/2019 Chap4 Classification Sep13
114/129
Advantages of Rule-Based Classifiers
8/13/2019 Chap4 Classification Sep13
115/129
As highly expressive as decision trees
Easy to interpret
Easy to generate
Can classify new instances rapidly
Performance comparable to decision trees
Instance-Based Classifiers
8/13/2019 Chap4 Classification Sep13
116/129
Atr1 ... AtrN Class
A
B
B
C
A
C
B
Set of Stored Cases
Atr1 ... AtrN
Unseen Case
Store the training records
Use training records to
predict the class label of
unseen cases
Instance Based Classifiers
8/13/2019 Chap4 Classification Sep13
117/129
Examples:
Rote-learnerMemorizes entire training data and performsclassification only if attributes of record match one ofthe training examples exactly
Nearest neighbor
Uses k closest points (nearest neighbors) for
performing classification
Nearest Neighbor Classifiers
8/13/2019 Chap4 Classification Sep13
118/129
Basic idea:
If it walks like a duck, quacks like a duck, thenits probably a duck
Training
Records
TestRecord
Compute
Distance
Choose k of the
nearest records
Nearest-Neighbor Classifiers
8/13/2019 Chap4 Classification Sep13
119/129
Requires three things
The set of stored records
Distance Metric to computedistance between records
The value of k, the number ofnearest neighbors to retrieve
To classify an unknown record:
Compute distance to othertraining records
Identify knearest neighbors
Use class labels of nearestneighbors to determine theclass label of unknown record(e.g., by taking majority vote)
Unknown record
Definition of Nearest Neighbor
8/13/2019 Chap4 Classification Sep13
120/129
X X X
(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor
K-nearest neighbors of a record x are data pointsthat have the k smallest distance to x
1 nearest-neighbor
8/13/2019 Chap4 Classification Sep13
121/129
Voronoi Diagram
Nearest Neighbor Classification
8/13/2019 Chap4 Classification Sep13
122/129
Compute distance between two points:
Euclidean distance
Determine the class from nearest neighbor list
take the majority vote of class labels among
the k-nearest neighbors Weigh the vote according to distance
weight factor, w = 1/d2
i ii
qpqpd 2)(),(
Nearest Neighbor Classification
8/13/2019 Chap4 Classification Sep13
123/129
Choosing the value of k:
If k is too small, sensitive to noise points If k is too large, neighborhood may include points from
other classes
X
Nearest Neighbor Classification
8/13/2019 Chap4 Classification Sep13
124/129
Scaling issues
Attributes may have to be scaled to preventdistance measures from being dominated byone of the attributes
Example:height of a person may vary from 1.5m to 1.8m
weight of a person may vary from 90lb to 300lb
income of a person may vary from $10K to $1M
Nearest Neighbor Classification
8/13/2019 Chap4 Classification Sep13
125/129
Problem with Euclidean measure:
High dimensional datacurse of dimensionality
Can produce counter-intuitive results
1 1 1 1 1 1 1 1 1 1 1 0
0 1 1 1 1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1
vs
d = 1.4142 d = 1.4142
Solution: Normalize the vectors to unit length
Nearest neighbor Classification
8/13/2019 Chap4 Classification Sep13
126/129
k-NN classifiers are lazy learners
It does not build models explicitly
Unlike eager learners such as decision treeinduction and rule-based systems
Classifying unknown records are relativelyexpensive
Example: PEBLS
8/13/2019 Chap4 Classification Sep13
127/129
PEBLS: Parallel Examplar-Based Learning
System (Cost & Salzberg) Works with both continuous and nominal
features
For nominal features, distance between twonominal values is computed using modified valuedifference metric (MVDM)
Each record is assigned a weight factor
Number of nearest neighbor, k = 1
8/13/2019 Chap4 Classification Sep13
128/129
Example: PEBLS
8/13/2019 Chap4 Classification Sep13
129/129
d
i
iiYX YXdwwYX1
2),(),(
Tid Refund MaritalStatus
TaxableIncome Cheat
X Yes Single 125K No
Y No Married 100K No10
Distance between record X and record Y:
where:
correctlypredictsXtimesofNumber
predictionforusedisXtimesofNumberXw
wX 1 if X makes accurate prediction most of the time