23/4/22 Chap4 Inductive Learning Zhongzhi Shi 1
Advanced Computing Seminar Data Mining and Its
Industrial Applications — Chapter 4 —
Inductive LearningZhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr
Knowledge and Software Engineering LabAdvanced Computing Research Centre
School of Computer and Information ScienceUniversity of South Australia
)
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 2
Outline
Introduction Machine learning Version space and bias Decision tree learning Ripper algorithm Summary
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 3
Basic Concepts
Data: Store on any media with certain format
Information: Assign meaning to concrete data
knowledge: Refine from information
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 4
Why Data Mining?Why Data Mining?
Rich Data, Poor Knowledge
Data KnowledgeKnowledge Decision Decision MakingMaking
Pattern Trends Concept Relation Model Association
Rules Sequence
E-commerce Resource
distribution Trade Business
Intelligence E-Science
Finance Economic Government Post Population Life cycle
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 5
Data Mining vs Knowledge Discovery
Data mining Extraction of interesting (non-trivial,
implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data
Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 6
Data Mining: A KDD Process
Data mining—core of knowledge discovery process
Data Cleaning
Data Integration
Databases
Data Warehouse
Task-relevant Data
Selection
Data Mining
Pattern Evaluation
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 7
Data Warehouse ProcessOrganizationReadiness
Assessment
BusinessStrategyDefinition
DataWarehouseArchitecture
DefinitionData
WarehouseInfrastructure
Design
Design andBuild
DataExploitation
Implementation
• Meta data management• Data access• Systems Integration
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 8
Macro Picture
Data Mining Approach to Data Warehouse Design
Desired star schema
Attribute• Width• Type• NULL allowed• Name• Key
Numeric• Maximum• Minimum• Average• Standard deviation
Text fields• Number of spaces• Numerals used• Average length
Designed Star SchemaMapping Rules
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 9
Detailed picture
InfoSource
1
Extractor
Similarity Calculator
Attribute Classifier
Integrator
InfoSource
1Info
Source1
Translator
DesiredStar Schema
Designed Star SchemaMapping rules
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 10
Knowledge Representation
Production system Frame Semantic networks First order logic Ontology
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 11
Production System
Rules IF (conditions) Then (conclusions)
If ( animal has wing) and (animal can fly) Then (animal is a bird)
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 12
Production System
MYCIN
$<rule> = IF <antecedent> THEN <action> (ELSE <action>$
$<antecedent> = AND <condition>$
$<condition> = OR <condition> | <predicate> <associative-tripe>$
$<associative-tripe> = <attribute> <object> <value>$
$<action> = <consequent>) | <procedure>$
$<consequent> = <associative-triple> <certainty-factor>$
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 13
Frame Structure
FRAME FRAME-NAME
SLOT-NAME-1: ASPECT-11 ASPECT-VALUE-11
ASPECT-12 ASPECT-VALUE-12
ASPECT-1m AWPECT-VALUE-1m
...... ......
SOLT-NAME-n: ASPECT-n1 ASPECT VALUE-n1
ASPECT-n2 ASPECT-VAPECT-VALUE-n2
ASPECT-n1 ASPECT-VALUE-n1
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 14
Semantic Networks
node: objects arc: relationships
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 15
First Order Logic
Student(John) Teacher(Markus) Father(x,y) Father(y,z) Grandfather(x,z):-Father(x,y),Father(y,z) If ( animal has wing) and (animal can fly) Then (animal is a bird)
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 16
Ontology
Semantic Web:
Ontology OWL Ontology schema Description Logic
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 17
Outline
Introduction Machine learning Version space and bias Decision tree learning Ripper algorithm Summary
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 18
The Essence of Learning
Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the same task or tasks drawn from the same population more efficiently and more effectively the next time. [Simon 1983]
Machine learning is the study of how to make machines acquire new knowledge, new skills, and reorganize existing knowledge.
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 19
The Essence of Learning
The environment supplies the source information to the learning system. The level and quality of the information will significantly affect the learning strategy.
Feedback
Environment
Learning Element
Knowledge Base
Performance Element
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 20
The Essence of Learning
The environment = Information source
Database Text Web pages Image Video Space data
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 21
The Essence of Learning The learning element uses this information to
make improvements in an explicit knowledge base, and the performance element uses the knowledge base to perform its task.
Inductive learning Analogical Learning Explanation Learning Genetic algorithm Neural network
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 22
Paradigms for Machine Learning
The inductive paradigm The most widely studied method for symbolic learning is one
of inducing a general concept description from a sequence of instances of the concept and known counterexamples of the concept. The task is to build a concept description from which all the previous positive instances can be rederived by universal instantiation but none of the previous negative instances can be rederived by the same process.
The analogical paradigm Analogical reasoning is a strategy of inference that allows the
transfer of knowledge from a known area into another area with similar properties.
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 23
Paradigms for Machine Learning
The analytic paradigm The methods attempt to formulate a generalization after analyzing
few instances in terms of the systems's knowledge. Mainly deductive rather than inductive mechanisms are used for such learning.
The genetic paradigm Genetic algorithms have been inspired by a direct analogy to
mutations in biological reproduction and Darwinian natural selection. In principle, genetic algorithms encode a parallel search through concept space, with each process attempting coarse-grain hill climbing.
The connectionist paradigm Connectionist learning systems, also called ``neural networks“.
Connectionist learning consists of readjusting weights in a fixed-topology network via specific learning algorithms
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 24
The Essence of Learning
The knowledge base contains predefined concepts, domain constrains heuristic rules and so on.
Knowledge representation Knowledge consistence Knowledge redundancy
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 25
The Essence of Learning
The performance element. The learning element is trying to improve the action of the performance element. The performance element applies knowledge to solve problems and evaluate the learning effects.
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 26
On Concept The term ``concept" is an universal notion which reflects a general,abstract, and essential features. For example, ``triangle", ``animal",``computer", all of them are concept. Horse, tiger, bird and so on arecalled as example of the concept ``animal".
Concept contains two meanings, extension and intension. Intension. The set of attributes which reflect the essential features
of a concept is called intension. Extension. The set of examples which satisfy the definition of a concept is called extension.
Fruit Student
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 27
Concept Description In general, a concept can be described by the concept name, and
list of the attributes and attribute-value pairs, that is,
(Concept name (Attribute 1 Value1) (Attribute2 Value2) … (Attributen Valuen)
In addition, concept description can be represented by first order logic.
Each attribute is a predicate, concept name and attribute value can be viewed as arguments. Concept description is represented by predicate calculus
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 28
Attribute Types Nominal attribute is one that
takes on a finite, unordered set of mutually exclusive values.
Linear attribute Structured attribute
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 29
Attribute Types Nominal attribute is one that
takes on a finite, unordered set of mutually exclusive values.
For examples• Color: red, green, blue• Traffic: airline, railway, ship
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 30
Attribute Types Linear attribute
For examples• Age: 1,2,…100• Temperature: 20, 21,… • Distance: 1km, 2km,…
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 31
Attribute Types Structured attribute For examples:• Tree structure •
computer
Hardware Software
CPU Memory
Computing Control
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 32
Inductive Learning From particular examples to general
conclusion, principle, rule
apple eat tomato eat banana eat … … fruit eat
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 33
Inductive Learning Given: • Premise statements. Consists of facts, specific observations, intermediate generalizations that provide information about some objects, phenomena, processes, and so on. • Tentative inductive assertion. Provides a priori hypothesis held about the objects in the premise statement. • Background knowledge. Contains general and domain-specific concepts for interpreting the premises and
inference rules relevant to the task of inference Find: Inductive assertion (hypothesis). It strongly or weakly
implies the premise statements in the context of background knowledge and satisfies the preference criterion.
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 34
Inductive Learning • Simplest form: learn a function from examples
f is the target function
An example is a pair (x, f(x))
Problem: find a hypothesis hsuch that h ≈ fgiven a training set of examples
(This is a highly simplified model of real learning:– Ignores prior knowledge– Assumes examples are given)–
•
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 35
Inductive Learning Method • Construct/adjust h to agree with f on training set• (h is consistent if it agrees with f on all examples)• E.g., curve fitting:••
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 36
Inductive Learning Method • Construct/adjust h to agree with f on training set• (h is consistent if it agrees with f on all examples)• E.g., curve fitting:••
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 37
Inductive Learning Method • Construct/adjust h to agree with f on training set• (h is consistent if it agrees with f on all examples)• E.g., curve fitting:••
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 38
Inductive Learning Method • Construct/adjust h to agree with f on training set• (h is consistent if it agrees with f on all examples)• E.g., curve fitting:••
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 39
Best-Hypothesis Positive example generalize Negative example specialize Drawbacks: check previous examples & backtrack
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 40
Outline
Introduction Machine learning Version space and bias Decision tree learning Ripper algorithm Summary
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 41
Hypothesis Space Concept description Extension
a certain set of examples predicted to be satisfied by the hypothesis
Bias any preference for one hypothesis over another
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 42
Training Examples for Enjoy Sport
Sky Temp Humidity Wind Water Forecast EnjoySport
Sunny Warm Normal Strong Warm Same YESSunny Warm High Strong Warm Same YESRainy Cold High Strong Warm Change NOSunny Warm High Strong Cool Change YES
What is the general concept?
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 43
is more_general_than_or_equal_to relation
Definition of more_general_than_or_equal_to relation:Let hj and hk be boolean-valued functions defined over X. Then hj is more_general_than_or_equal_to hk (hj g hk) iff
(xX) [(hk(x)=1)(hj(x)=1)]
In our case the most general hypothesis - that every day is a positive example - is represented by ?, ?, ?, ?, ?, ?,and the most specific possible hypothesis - that no day is positive example - is represented by , , , , , .
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 44
Example of the Ordering of Hypotheses
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 45
Version Space Search
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 46
Version Space Example
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 47
Representing Version Space
The General boundary, G, of version space VSH,E, is the set of its maximally general members
The Specific boundary, S, of version space VSH,E, is the set of its maximally specific members
Every member of the version space lies between these boundariesVSH,E, = {hH | (sS) (gG) (ghs)} where xy means x is more general or equal to y
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 48
Candidate-elimination algorithm
1 Initilize H to be the whole space. Thus, the G set contains only the null description, and the S set is consistent with the first observed positive training instance.
2. For each subsequent instance, i, BEGIN IF i is a positive instance, THEN BEGIN Retain in G only those generalizations which match I. Update S to generalize the elements in S as little as possible, so that they will match i.
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 49
Candidate-elimination algorithm
ELSE IF i is a negative instance, THEN BEGIN Retain in S only those generalizations which do not match I. Update G to specialize the elements in G as little as possible, so that they will not match i.3 Repeat step 2 until G = S and this is a singleton set. When this occurs, H has
collapsed to include only a single concept.4 Output H.
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 50
Converging Boundaries of the G and S sets
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 51
Example Trace (1)
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 52
Example Trace (2)
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 53
Example Trace (3)
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 54
Example Trace (4)
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 55
How to Classify new Instances? New instance i is classified as a positive
instance if every hypothesis in the current version space classifies it as positive.
Efficient test - iff the instance satisfies every member of S
New instance i is classified as a negative instance if every hypothesis in the current version space classifies it as negative.
Efficient test - iff the instance satisfies none of the members of G
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 56
New Instances to be Classified
A Sunny, Warm, Normal, Strong, Cool, Change (YES)B Rainy, Cold, Normal, Light, Warm, Same (NO)C Sunny, Warm, Normal, Light, Warm, Same (Ppos(C)=3/6)
D Sunny, Cold, Normal, Strong, Warm, Same (Ppos(C)=2/6)
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 57
Remarks on Version Space and Candidate-Elimination
The algorithm outputs a set of all hypotheses consistent with the training examples iff there are no errors in the training data iff there is some hypothesis in H that
correctly describes the target concept The target concept is exactly learned when the
S and G boundary sets converge to a single identical hypothesis.
Applications learning regularities in chemical mass
spectroscopy learning control rules for heuristic search
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 58
Drawbacks of Version Space
Assume consistent training data Noise-sensitive Comments on version space
though not practical in most real-world learning problems, they provide a good deal of insight into the logical structure of hypothesis space
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 59
Version-Space Merging
VS1 VS2
S1 S2
G1 G2
G1 2
S1 2
VS1 2
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 60
Version-Space Merging
Conceptional each new piece of information new version space
Practical parallel ambiguous, inconsistent data, background domain theories
VSMVSI
VSn
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 61
IVSM Examples
any-shape
Polyhedron Spheroid
any-size
Large Small
Cube Pyram
id Octoploid
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 62
IVSM Examples Example Instance S Instance G Resulting S Resulting G
[S,C] [S,C] [ ? , ? ] [S,C] [ ? , ? ]X [S,Sp] [L,?] [?,Po][S,C] [?,Po]X [L,O] [S,?] [?,C][S,C] [?,C] [S,Py]
[?,Py] [?,C] [S,P] [S,P] [ ? , ? ] [S,C] [S,Po]
[S,Py]
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 63
Bias
Definition any basis for choosing one generalization over another any factor that influences the definition or selection of
inductive hypotheses Representational bias
lauguage, language implementation, primitive terms Procedural (algorithmic) bias
order of traversal of the states in the space defined by a representational bias
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 64
Bias
Program
Training set
Search Knowledge
Bias
Training Examples
Hypothesis
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 65
Bias Selection & Evaluation
Real-world domains have potentially hundreds of features and sources of data
Why is bias selection important? improve the predictive accuracy of the learner improve performance goals
Selection: static vs. dynamic Evaluation: basis for bias selection
online and empirical vs. offline and analytical
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 66
Multi-Tiered Bias System
Bias shifting bias selection occurs again after learning has begun useful when the knowledge for bias selection is not available
prior to learning, but can be gathered during learning Multi-tierd bias
make embedded biases explicit! reduce the cost of system and knowledge engineering flexible system design, conceptual simplicity
Characterize learning as search within multiple tiers!
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 67
Multi-Tiered Bias Search Space
L(H)
H
P(l(H)))
P(l(L(H))) L(L(H)) L(P(l(H))) P(l(P(l(H))))
RepresentationalBias Space
ProceduralBias Space
Hypothesis Space
Procedural Meta-Bias Spaces
Representational Meta-Bias Spaces
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 68
Outline
Introduction Machine learning Version space and bias Decision tree learning Ripper algorithm Summary
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 69
Decision Tree Learning1966 Hunt, Marin, Stone: CLS1983 Quinlan: ID31986 Schlimmer, Fisher: ID4 ,
Incremental learning1988 Utgoff: ID51993 Quinlan: C4.5, C5
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 70
Play tennis: Training examplesDay Outlook Temperature Humidity Wind Play TennisD1 Sunny Hot High Weak NoD2 Sunny Hot High Strong NoD3 Overcast Hot High Weak YesD4 Rain Mild High Weak YesD5 Rain Cool Normal Weak YesD6 Rain Cool Normal Strong NoD7 Overcast Cool Normal Strong YesD8 Sunny Mild High Weak NoD9 Sunny Cool Normal Weak YesD10 Rain Mild Normal Weak YesD11 Sunny Mild Normal Strong YesD12 Overcast Mild High Strong
YesD13 Overcast Hot NormalWeak YesD14 Rain Mild High Strong No
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 71
CLS learning algorithm
Decision tree each internal node tests an attribute each branch corresponds to attribute
value each leaf node assigns a classification
Decision trees are inherently disjunctive, since each branch leaving a decision node corresponds to a separate disjunctive case. Decision trees can be used to represent disjunctive concepts.
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 72
CLS learning algorithm
The CLS algorithm starts with an empty decision tree and gradually refines it, by adding decision nodes, until the tree correctly classifies all the training instances. The algorithm operates over a set of training instances, C, as follows:
If all instances in C are positive, then create a YES node and halt. If all instances in C are negative, create a NO node and halt. Otherwise, select (using some heuristic criterion) an attribute, A, with values v1,…,vn and create the decision tree.
Partition the training instances in C into subsets C1,…,Cn
according to the values of V. Apply the algorithm recursively to each of the sets Ci.
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 73
ID3 Approach
ID3 algorithm build decision tree based on training
objects with known class labels to classify testing objects
rank attributes with information gain measure
minimal height the least number of tests to classify an
object
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 74
Decision Tree Representation Representation:
Internal node test on some property (attribute) Branch corresponds to attribute value Leaf node assigns a classification
Decision trees represent a disjunction of conjunctions of constraints on the attribute values of instances
(Outlook = Sunny Humidity = Normal) (Outlook = Overcast)
(Outlook = Rain Wind = Weak)
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 75
Decision Tree Example
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 76
Appropriate problems for decision Trees
Instances are represented by attribute-value pairs
Target function has discrete output values Disjunctive hypothesis may be required Possibly noisy training data
data may contain errors data may contain missing attribute
values
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 77
Learning of Decision TreesTop-Down Induction of Decision Trees
Algorithm: The ID3 learning algorithm (Quinlan, 1986)If all examples from E belong to the same class Cj then label the leaf with Cj else
select the “best” decision attribute A with values v1, v2, …, vn for next node
divide the training set S into S1, …, Sn according to values v1,…,vn
recursively build subtrees T1, …, Tn for S1, …, Sn generate decision tree T
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 78
Entropy S - a sample of training examples; p+ (p-) is a proportion of positive (negative)
examples in S Entropy(S) = expected number of bits needed to
encode the classification of an arbitrary member of S
Information theory: optimal length code assigns-log2 p bits to message having probability p
Expected number of bits to encode “+” or “-” of random member of S: Entropy(S) - p- log2 p- - p+ log2 p+
Generally for c different classesEntropy(S) c- pi log2 pi
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 79
Entropy
The entropy function relative to a boolean classification, as the proportion of positive examples varies between 0 and 1
entropy as a measure of impurity in a collection of examples
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 80
Information Gain Search Heuristic
Gain(S,A) - the expected reduction in entropy caused by partitioning the examples of S according to the attribute A. a measure of the effectiveness of an attribute in
classifying the training data
Values(A) - possible values of the attribute A Sv - subset of S, for which attribute A has value v
The best attribute has maximal Gain(S,A) Aim is to minimise the number of tests needed for
class.
Gain S A Entropy SSvSv Values A
( , ) = ( ) -( )
Entropy Sv( )
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 81
Play Tennis: Information GainValues(Wind) = {Weak, Strong}
S = [9+, 5-], E(S) = 0.940 Sweak = [6+, 2-], E(Sweak) = 0.811 Sstrong = [3+, 3-], E(Sstrong) = 1.0
Gain(S,Wind) = E(S) - (8/14) E(Sweak) - (6/14) E(Sstrong) = 0.940 - (8/14) 0.811 - (6/14) 1.0 = 0.048
Gain(S,Outlook) = 0.246Gain(S,Humidity) = 0.151Gain(S,Temperature) = 0.029
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 82
Entropy and Information Gain
S contains si tuples of class Ci for i = {1, …, m}
Information measures info required to classify any arbitrary tuple
Entropy of attribute A with values {a1,a2,…,av}
Information gained by branching on attribute A
sslog
ss),...,s,ssI( i
m
i
im21 2
1
)s,...,s(Is
s...sE(A) mjj
v
j
mjj1
1
1
E(A))s,...,s,I(sGain(A) m 21
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 83
The ID3 Algorithm function ID3 (R: a set of non-categorical attributes,
C: the categorical attribute, S: a training set) returns a decision tree;
begin If S is empty, return a single node with value Failure; If S consists of records all with the same value for
the categorical attribute, return a single node with that value; If R is empty, then return a single node with as value
the most frequent of the values of the categorical attribute that are found in records of S; [note that then there
will be errors, that is, records that will be improperly classified];
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 84
The ID3 Algorithm Let D be the attribute with largest Gain(D,S)
among attributes in R; Let {dj| j=1,2, .., m} be the values of attribute D; Let {Sj| j=1,2, .., m} be the subsets of S consisting
respectively of records with value dj for attribute D; Return a tree with root labeled D and arcs
labeled d1, d2, .., dm going respectively to the trees ID3(R-{D}, C, S1), ID3(R-{D}, C, S2), .., ID3(R-
{D}, C, Sm); end ID3;
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 85
C4.5 c4.5 is a program that creates a decision tree
based on a set of labeled input data. This decision tree can then be tested against
unseen labeled test data to quantify how well it generalizes.
The software for C4.5 can be obtained with Quinlan's book. A wide variety of training and test data is available, some provided by Quinlan.
Quinlan,J.R is working at RULEQUEST RESEARCH
company, See5/C5.0 has been designed to operate on large databases and incorporates innovations such as boosting.
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 86
C4.5 C4.5 is a software extension of the basic ID3 algorithm designed by
Quinlan to address the following issues not dealt with by ID3: Avoiding overfitting the data
Determining how deeply to grow a decision tree. Reduced error pruning. Rule post-pruning. Handling continuous attributes.
e.g., temperature Choosing an appropriate attribute selection measure. Handling training data with missing attribute values. Handling attributes with differing costs. Improving computational efficiency.
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 87
Running c4.5 On cunix.columbia.edu
~amr2104/c4.5/bin/c4.5 –u –f filestem c4.5 expects to find 3 files
filestem.names filestem.data filestem.test
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 88
File Format: .names The file begins with a comma separated list
of classes ending with a period, followed by a blank line E.g, >50K, <=50K.
The remaining lines have the following format (note the end of line period): Attribute: {ignore, discrete n, continuous,
list}.
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 89
Example: census.names>50K, <=50K.
age: continuous.workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov,
etc. fnlwgt: continuous.education: Bachelors, Some-college, 11th, HS-grad, Prof-school, etc. education-num: continuous.marital-status: Married-civ-spouse, Divorced, Never-married, etc.occupation: Tech-support, Craft-repair, Other-service, Sales, etc.
relationship: Wife, Own-child, Husband, Not-in-family, Unmarried.race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black.sex: Female, Male.capital-gain: continuous.capital-loss: continuous.hours-per-week: continuous.native-country: United-States, Cambodia, England, Puerto-Rico,
Canada, etc.
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 90
File Format: .data, .test Each line in these data files is a comma
separated list of attribute values ending with a class label followed by a period. The attributes must be in the same order
as described in the .names file. Unavailable values can be entered as ‘?’
When creating test sets, make sure that you remove these data points from the training data.
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 91
Example: adult.test25, Private, 226802, 11th, 7, Never-married, Machine-op-inspct, Own-child,
Black, Male, 0, 0, 40, United-States, <=50K.38, Private, 89814, HS-grad, 9, Married-civ-spouse, Farming-fishing,
Husband, White, Male, 0, 0, 50, United-States, <=50K.28, Local-gov, 336951, Assoc-acdm, 12, Married-civ-spouse, Protective-serv,
Husband, White, Male, 0, 0, 40, United-States, >50K.44, Private, 160323, Some-college, 10, Married-civ-spouse, Machine-op-
inspct, Husband, Black, Male, 7688, 0, 40, United-States, >50K.18, ?, 103497, Some-college, 10, Never-married, ?, Own-child, White, Female,
0, 0, 30, United-States, <=50K.34, Private, 198693, 10th, 6, Never-married, Other-service, Not-in-family,
White, Male, 0, 0, 30, United-States, <=50K.29, ?, 227026, HS-grad, 9, Never-married, ?, Unmarried, Black, Male, 0, 0,
40, United-States, <=50K.63, Self-emp-not-inc, 104626, Prof-school, 15, Married-civ-spouse, Prof-
specialty, Husband, White, Male, 3103, 0, 32, United-States, >50K.24, Private, 369667, Some-college, 10, Never-married, Other-service,
Unmarried, White, Female, 0, 0, 40, United-States, <=50K.55, Private, 104996, 7th-8th, 4, Married-civ-spouse, Craft-repair, Husband,
White, Male, 0, 0, 10, United-States, <=50K.65, Private, 184454, HS-grad, 9, Married-civ-spouse, Machine-op-inspct,
Husband, White, Male, 6418, 0, 40, United-States, >50K.36, Federal-gov, 212465, Bachelors, 13, Married-civ-spouse, Adm-clerical, Husband, White, Male, 0, 0, 40, United-States, <=50K.
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 92
c4.5 Output The decision tree proper.
(weighted training examples/weighted training error)
Tables of training error and testing error Confusion matrix
You’ll want to pipe the output of c4.5 to a text file for later viewing. E.g., c4.5 –u –f filestem > filestem.results
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 93
Example outputcapital-gain > 6849 : >50K (203.0/6.2)| capital-gain <= 6849 :| | capital-gain > 6514 : <=50K (7.0/1.3)| | capital-gain <= 6514 :| | | marital-status = Married-civ-spouse: >50K (18.0/1.3)| | | marital-status = Divorced: <=50K (2.0/1.0)| | | marital-status = Never-married: >50K (0.0)| | | marital-status = Separated: >50K (0.0)| | | marital-status = Widowed: >50K (0.0)| | | marital-status = Married-spouse-absent: >50K (0.0)| | | marital-status = Married-AF-spouse: >50K (0.0)
Tree saved
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 94
Example outputEvaluation on training data (4660 items):
Before Pruning After Pruning---------------- ---------------------------Size Errors Size Errors Estimate
1692 366( 7.9%) 92 659(14.1%) (16.0%) <<
Evaluation on test data (2376 items):
Before Pruning After Pruning---------------- ---------------------------Size Errors Size Errors Estimate
1692 421(17.7%) 92 354(14.9%) (16.0%) <<
(a) (b) <-classified as ---- ---- 328 251 (a): class >50K 103 1694 (b): class <=50K
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 95
k-fold Cross Validation Start with one large data set. Using a script, randomly divide this data set
into k sets. At each iteration, use k-1 sets to train the
decision tree, and the remaining set to test the model.
Repeat this k times and take the average testing error.
The avg. error describes how well the learning algorithm can be applied to the data set.
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 96
Outline Introduction Machine learning Version space and bias Decision tree learning Ripper algorithm Summary
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 97
Inductive Learning
Inductive “Learning from Examples.”
Training Examples Decision Rules
data-case 1 : decision i1
data-case 2 : decision i2
: :data-case n : decision in
InductiveLearning
Unit
pattern 1 decision j1
pattern 2 decision j2
: :pattern n decision jn
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 98
Ripper Ripper (Repeated Incremental Pruning to Producing
Error Reduction) Ripper algorithm proposed by Cohen in 1995 Ripper is consisted of two phase: the first is to
determine the initial rule set and the second is post-process rule optimization
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 99
Ripper separate-and-conquer rule learning algorithm. First the
training data are divided into a growing set and a pruning set. Then this algorithm generates a rule set in a greedy fashion, a rule at a time. While generating a rule Ripper searches the most valuable rule for the current growing set in rule space which can be defined in the form of BNF. Immediately after a rule is extracted on growing set, it is pruned on pruning set. After pruning, the corresponding examples covered by that rule in the training set (growing and pruning sets) are deleted. The remaining training data are re-partitioned after each rule is learned in order to help stabilize any problems caused by a “bad-split”. This process is repeated until the terminal conditions satisfy.
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 100
Ripper procedure Rule_Generating(Pos,Neg) begin Ruleset := {} while Pos ¹ {} do /* grow and prune a new rule */ split (Pos,Neg) into (GrowPos,GrowNeg) and (PrunePos,PruneNeg) Rule := GrowRule(GrowPos,GrowNeg) Rule := PruneRule(Rule,PrunePos,PruneNeg) if the terminal conditions satisfy then return Ruleset else add Rule to Ruleset remove examples covered by Rule from (Pos,Neg) endif endwhile return Ruleset end
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 101
Ripper After each rule is added into the rule set, the
total description length, an integer value, of the rule set is computed. The description length gives a measure of the complexity and accuracy of a rule set. The terminal conditions satisfy when there are no positive examples left or the description length of the current rule set is more than the user-specified threshold.
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 102
Ripper Post-process rule optimization Ripper uses some post-pruning techniques to
optimize the rule set. This optimization is processed on the possible remaining positive examples. Re-optimizing the resultant rule set is called RIPPER2, and the general case of re-optimizing “k” times is called RIPPERk.
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 103
Outline
Introduction Machine learning Version space and bias Decision tree learning Ripper algorithm Summary
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 104
Summary Inductive Learning is an important
approach for data mining Version space can be used to explain
generalization and specialization ID 3 and C4.5 Ripper algorithms generate efficient
rules
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 105
References Zhongzhi Shi. Principles of Machine Learning. International
Academic Publishers, 1992 Jiawei Han and Micheline Kamber. Data Mining: Concepts and
Techniques Morgsn Kaufmann Publishers, 2000 Zhongzhi Shi. Knowledge Discovery. Tsinghua University Press.
2002 H. Liu and H. Motoda. Feature Selection for Knowledge
Discovery and Data Mining. Kluwer Academic Publishers, 1998. R. S. Michalski. A theory and methodology of inductive learning.
In Michalski et al., editor, Machine Learning: An Artificial Intelligence Approach, Vol. 1, Morgan Kaufmann, 1983.
T. M. Mitchell. Version spaces: A candidate elimination approach to rule learning. IJCAI'97, Cambridge, MA.
Quinlan,J.R.: C4.5: Programs for Machine Learning Morgan Kauffman, 1993
T. M. Mitchell. Machine Learning. McGraw Hill, 1997. J. R. Quinlan. Induction of decision trees. Machine Learning,
1:81-106, 1986.
23/4/22 Chap4 Inductive Learning Zhongzhi Shi 106
www.intsci.ac.cn/shizz/
Questions?!Questions?!