Date post: | 21-Nov-2023 |
Category: |
Documents |
Upload: | independent |
View: | 0 times |
Download: | 0 times |
1
Learning Cross-level Certain and Possible Rules by Rough Sets
Tzung-Pei Hong†**, Chun-E Lin ‡, Jiann-Horng Lin ‡, Shyue-Liang Wang*
†Department of Electrical Engineering National University of Kaohsiung Kaohsiung, 811, Taiwan, R.O.C.
[email protected] ‡Institute of Information Management
I-Shou University Kaohsiung, 840, Taiwan, R.O.C.
[email protected], [email protected] *Department of Computer Science New York Institute of Technology
1855 Broadway, New York 10023, U.S.A. [email protected]
Abstract
Machine learning can extract desired knowledge and ease the development
bottleneck in building expert systems. Among the proposed approaches, deriving rules
from training examples is the most common. Given a set of examples, a learning
program tries to induce rules that describe each class. Recently, the rough-set theory
has been widely used in dealing with data classification problems. Most of the
previous studies on rough sets focused on deriving certain rules and possible rules on
the single concept level. Data with hierarchical attribute values are, however,
commonly seen in real-world applications. This paper thus attempts to propose a new
learning algorithm based on rough sets to find cross-level certain and possible rules
from training data with hierarchical attribute values. It is more complex than learning
rules from training examples with single-level values, but may derive more general
knowledge from data. Boundary approximations, instead of upper approximations, are
used to find possible rules, thus reducing some subsumption checking. Some pruning
heuristics are also adopted in the proposed algorithm to avoid unnecessary search.
Keywords: machine learning, rough set, certain rule, possible rule, hierarchical value.
-----------------------------------
**Corresponding author.
2
1. Introduction
Expert systems have been widely used in domains where mathematical models
cannot easily be built, human experts are not available or the cost of querying an
expert is high. Although a wide variety of expert systems have been built, knowledge
acquisition remains a development bottleneck. Usually, a knowledge engineer is
needed to establish a dialog with a human expert and to encode the knowledge elicited
into a knowledge base to produce an expert system. The process is, however, very
time-consuming [1][2]. Building a large-scale expert system involves creating and
extending a large knowledge base over the course of many months or years. Thence
shortening the development time is the most important factor for the success of an
expert system. Machine-learning techniques have thus been developed to ease the
knowledge-acquisition bottleneck. Among the proposed approaches, deriving rules
from training examples is the most common [5][6][9][10][11][15]. Given a set of
examples, a learning program tries to induce rules that describe each class.
Recently, the rough-set theory has been used in reasoning and knowledge
acquisition for expert systems [3]. It was proposed by Pawlak in 1982 [12] with the
concept of equivalence classes as its basic principle. Several applications and
extensions of the rough-set theory have also been proposed. Examples are
Lambert-Torres et al.’s knowledge-base reduction [7], Zhong et al.'s rule discovery
[18], Lee et al.'s hierarchical granulation structure [8] and Tsumto's attribute-oriented
generalization [17]. Because of the success of the rough-set theory in knowledge
acquisition, many researchers in the machine-learning fields are very interested in this
research topic since it offers opportunities to discover useful information from
training examples.
3
Most of the previous studies on rough sets focused on deriving certain rules and
possible rules on the single concept level. Hierarchical attribute values are, however,
usually predefined in real-world applications. Deriving rules on multiple concept
levels may thus lead to the discovery of more general and important knowledge from
data. It is, however, more complex than learning rules from training examples with
single-level values. In this paper, we thus propose a new learning algorithm based on
rough sets to find cross-level certain and possible rules from training data with
hierarchical attribute values. Boundary approximations, instead of upper
approximations, are used to find possible rules, thus reducing some subsumption
checking. Some pruning heuristics are also used in the proposed algorithm to avoid
unnecessary search.
The remainder of this paper is organized as follows. The rough-set theory is
briefly reviewed in Section 2. Management of hierarchical attribute values by rough
sets is described in Section 3. The notation and definitions used in this paper are given
in Section 4. A new learning algorithm based on the rough-set theory to induce
cross-level certain and possible rules are proposed in Section 5. An example to
illustrate the proposed algorithm is given in Section 6. Some discussion is taken in
Section 7. Conclusions and future works are finally given in Section 8.
2. Review of the Rough-Set Theory
The rough-set theory, proposed by Pawlak in 1982 [12][14], can serve as a new
mathematical tool for dealing with data classification problems. It adopts the concept
of equivalence classes to partition training instances according to some criteria. Two
kinds of partitions are formed in the mining process: lower approximations and upper
4
approximations, from which certain and possible rules can easily be derived.
Formally, let U be a set of training examples (objects), A be a set of attributes
describing the examples, C be a set of classes, and Vj be a value domain of an attribute
Aj. Also let )(ijv be the value of attribute Aj for the i-th object Obj(i). When two
objects Obj(i) and Obj(k) have the same value of attribute Aj, (that is, )(ijv = )(k
jv ), Obj(i)
and Obj(k) are said to have an indiscernibility relation (or an equivalence relation) on
attribute Aj. Also, if Obj(i) and Obj(k) have the same values for each attribute in subset
B of A, Obj(i) and Obj(k) are also said to have an indiscernibility (equivalence) relation
on attribute set B. These equivalence relations thus partition the object set U into
disjoint subsets, denoted by U/B, and the partition including Obj(i) is denoted
B(Obj(i)).
Example 1: Table 1 shows a data set containing ten objects U={Obj(1), Obj(2), …,
Obj(10)}, two attributes A={(Transport, Residence)}, and a class set Consumption Style.
The class set has two possible values: {Low (L), High (H)}.
Table 1: The training data used in this example.
Transport Residence Consumption Style
Obj(1) Expensive car Villa High
Obj(2) Cheap car Single house High
Obj(3) Ordinary train Suite Low
Obj(4) Express train Villa Low
Obj(5) Ordinary train Suite Low
Obj(6) Express train Single house Low
Obj(7) Cheap car Single house High
5
Obj(8) Express train Single house High
Obj(9) Ordinary train Suite Low
Obj(10) Express train Apartment Low
Since Obj(2) and Obj(7) have the same attribute value Cheap Car for attribute
Transport, they share an indiscernibility relation and thus belong to the same
equivalence class for Transport. The equivalence partitions for singleton attributes
can be derived as follows:
U/{Transport} = {{(Obj(1))}, {(Obj(3))(Obj(5))(Obj(9))},
{(Obj(4))(Obj(6))(Obj(8))(Obj(10))}, {(Obj(2))(Obj(7))}}, and
U/{Residence} = {{(Obj(1),)(Obj(4))}, {(Obj(2))(Obj(6))(Obj(7))(Obj(8))},
{(Obj(3))(Obj(5))(Obj(9))}, {(Obj(10))}}.
The sets of equivalence classes for subset B are referred to as B-elementary sets.
Also, {Transport}(Obj(2)) = {Transport}(Obj(7)) = {Obj(2), Obj(7)}.
The rough-set approach analyzes data according to two basic concepts, namely
the lower and the upper approximations of a set. Let X be an arbitrary subset of the
universe U, and B be an arbitrary subset of attribute set A. The lower and the upper
approximations for B on X, denoted B*(X) and B*(X) respectively, are defined as
follows:
B*(X) = {x | x ∈ U, B(x) ⊆ X }, and
B*(X) = {x | x ∈ U and B(x) ∩ X ≠ ∅ }.
6
Elements in B*(x) can be classified as members of set X with full certainty using
attribute set B, so B*(x) is called the lower approximation of X. Similarly, elements in
B*(x) can be classified as members of the set X with only partial certainty using
attribute set B, so B*(x) is called the upper approximation of X.
Example 2: Continuing from Example 1, assume X={Obj(1), Obj(2), Obj(7),
Obj(8)}. The lower and the upper approximations of attribute Transport with respect to
X can be calculated as follows:
Transport*(X)= {{(Obj(1))}, {(Obj(2))(Obj(7))}}, and
Transport*(X)={{Obj(1)}, {(Obj(2))(Obj(7))}, {(Obj(4))(Obj(6))(Obj(8))(Obj(10))}}.
After the lower and the upper approximations have been found, the rough-set
theory can then be used to derive both certain and uncertain information and induce
certain and possible rules from them [3][5].
Lambert-Torres et al. found unimportant attributes from lower and upper
approximations and deleted them from a database [7]. Zhong et al. proposed a new
incremental learning algorithm based on the generalization distribution table, which
maintained the probabilistic relationships between the possible instances and the
possible concepts [18]. Two sets of generalizations were formed from the table based
on the rough set model. One set consisted of all consistent generalizations and the
other consisted of all contradictory generalizations, which were similar to the S and G
sets in the version space approach. The generalizations were then gradually adjusted
7
according to new instances. The examples in the database could then be merged since
some attributes were removed. The resulting database was thus a compact database.
Yao formed a stratified granulation structure with respect to different levels of rough
set approximations by incrementally clustering objects with the same characteristics
together [17]. Also, Lee et al. simplified classification rules for data mining using
rough set theory [8]. The proposed classification method generated minimal
classification rules and made the analysis of information systems easy. Tasumoto
presented a knowledge discovery system based on rough sets and attribute-oriented
generalization [16]. It was used not only to acquire several sets of attributes important
for classification, but also to evaluate how precisely the attributes of a database were
able to classify data.
The advantage of the rough set theory lies in its simplicity from a mathematical
point of view since it requires only finite sets, equivalence relations, and cardinalities
[13].
3. Hierarchical Attribute Values
Most of the previous studies on rough sets focused on finding certain rules and
possible rules on the single concept level. However, hierarchical attribute values are
usually predefined in real-world applications and can be represented by hierarchy
trees. Terminal nodes on the trees represent actual attribute values appearing in
training examples; internal nodes represent value clusters formed from their
lower-level nodes. Deriving rules on multiple concept levels may lead to the
discovery of more general and important knowledge from data. A simple example for
attribute Transport is given in Figure 1.
8
Figure 1: An example of predefined hierarchical values for attribute Transport
In Figure 1, the attribute Transport falls into two general values: Train and Car.
Train can be further classified into two more specific values Express Train and
Ordinary Train. Similarly, assume Car is divided into Expensive Car and Cheap Car.
Only the terminal attribute values (Express Train, Ordinary Train, Expensive Car,
Cheap Car) can appear in training examples.
The concept of equivalence classes in the rough set theory makes it very suitable
for finding cross-level certain and possible rules from training examples with
hierarchical values. The equivalence class of a non-terminal-level attribute value for
attribute Aj can be easily found by the union of its underlying terminal-level
equivalence classes for Aj. Also, the equivalence class of a cross-level attribute value
combination for more than two attributes can be derived from the intersection of the
equivalence classes of its single attribute values.
Example 3: Continuing from Example 2, assume the hierarchical values for
attribute Transport are the same as those in Figure 1. The equivalence class for
Transport
Train Car
Express Train
Ordinary Train
Expensive Car
Cheap Car
9
Transport = Train is then the union of the equivalence classes for Transport =
Expressive Train and Transport = Ordinary Train. Similarly, the equivalence class for
Transport = Car is the union of the equivalence classes for Transport = Expensive
Car and Transport = Cheap Car. Thus:
U/{Transport nt } = {{(Obj(1))(Obj(2))(Obj(7))}, {(Obj(3))(Obj(4))(Obj(5))
(Obj(6))(Obj(8))(Obj(9))(Obj(10))}},
where U/{Transport nt } represents the non-terminal-level elementary set for attribute
Transport.
In this paper, we will thus propose a rough-set-based learning algorithm for
deriving cross-level certain and possible rules from training examples with
hierarchical attribute values.
4. Notation and Definitions
According to the definitions of the lower approximation and the upper
approximation, it is easily seen that the upper approximation includes the lower
approximation. Thus each certain rule derived from the lower approximation will also
be derived from the upper approximation. It thus causes redundant derivation and
wastes computational time. The proposed algorithm thus uses the boundary
approximation, instead of the upper approximation, to derive the pure possible rules.
It can thus reduce the subsumption checking needed. For convenience, the symbol
B*(X) is used from here on to represent the boundary approximation, instead of the
upper approximation, of attribute subset B on X. The boundary approximation for a
10
subset B is defined as follows:
B*(X) = {x | x ∈ U and B(x) ∩ X ≠ ∅, B(x) ⊄ X }.
The notation used in this paper is shown below.
U: the universe of all objects;
n: the total number of objects in U;
Obj(i) : the i-th object, 1≤ i ≤ n;
C: the set of classes to be determined;
c: the total number of classes in C;
Xl: the l-th class, 1≤ l≤ c;
A: the set of all attributes describing U;
m: the total number of attributes in A;
Aj: the j-th attribute, 1≤ j≤ m;
)(ijv : the value of Aj for Obj(i);
| tjA |: the number of terminal attribute values for Aj;
itjA : the i-th terminal value of Aj, 1≤ i≤ | t
jA |;
kntjA : the number of non-terminal-level attribute values of Aj on the k-th level;
kintjA : the i-th non-terminal-level value of Aj on the k-th level, 1≤ i ≤ knt
jA ;
tjA * : the terminal-level lower approximation of each single attribute Aj;
*tjA : the terminal-level boundary approximation of each single attribute Aj;
ntjA * : the non-terminal-level lower approximation of each single attribute Aj;
11
*ntjA : the non-terminal-level boundary approximation of each single attribute Aj;
Bj: an arbitrary subset of A;
)( )(ij objB : the equivalence class of Bj in which obj(i) exists.
5. The Algorithm
In the section, a new learning algorithm based on rough sets is proposed to find
cross-level certain and possible rules from training data with hierarchical attribute
values. The algorithm first finds the terminal-level elementary sets of the single
attributes. These equivalence classes can then be used later to find the
non-terminal-level elementary sets for the single attributes and the cross-level
elementary sets for more than one attribute. Lower approximations are used to derive
certain rules. Boundary approximations, instead of upper approximations, are used to
find possible rules, thus reducing some subsumption checking. The algorithm
calculates the lower and the boundary approximations of single attributes from the
terminal level to the root level. After that, the lower and the boundary approximations
of more than one attribute are derived based on the results of single attributes. Some
pruning heuristics are also used to avoid unnecessary search. The rule-derivation
process based on these approximations is then performed to find maximally generally
certain rules and all possible rules. The details of the proposed learning algorithm are
12
described as follows.
A rough-set-based learning algorithm for training examples with hierarchical
attribute values:
Input: A data set U with n objects, each of which has m decision attributes with
hierarchical values and belongs to one of c classes.
Output: A set of multiple-level certain and possible rules.
Step 1: Partition the object set into disjoint subsets according to class labels. Denote
each subset of objects belonging to class Cl as Xl.
Step 2: Find terminal-level elementary sets of single attributes; that is, if an object
obj(i) has a terminal value vj(i) for attribute Aj, put obj(i) in the equivalence
class from Aj = vj(i).
Step 3: Link each terminal node for Aj = itjA , 1 ≤ i ≤ | t
jA |, in the taxonomy tree for
attribute Aj to the equivalence class for Aj = itjA , where | it
jA | is the number of
terminal attribute values for Aj.
Step 4: Set l to 1, where l is used to represent the number of the class currently being
processed.
Step 5: Compute the terminal-level lower approximation of each single attribute Aj
for class Xl as:
},X)x(A,Ux|)x(A{)X(A ltj
tjl
t*j ⊆∈=
where )x(Atj is the terminal-level equivalence class including object x and
derived from attribute Aj.
Step 6: Compute the terminal-level boundary approximation of each single attribute
Aj for class Xl as:
13
}.X)x(A,X)x(A,Ux|)x(A{)X(A ltjl
tj
tjl
*tj ⊄≠∩∈= φ
Step 7: Compute the non-terminal-level lower and boundary approximations of each
single attribute Aj for class Xl from the terminal level to the root level in the
following substeps:
(a) Derive the equivalence class of the i-th non-terminal-level attribute value
kintjA for attribute Aj on level k by the union of its underlying
terminal-level equivalence classes.
(b) Put the equivalence class of a non-terminal-level attribute value kintjA in
the k-level lower approximation for attribute Aj if all the equivalence
classes of the underlying attribute values of kintjA are in the (k+1)-level
lower approximation for attribute Aj.
(c) Put the equivalence class of a non-terminal-level attribute value kintjA in
the k-level boundary approximation for attribute Aj if at least one
equivalence class of the underlying attribute values of kintjA is in the
(k+1)-level boundary approximation for attribute Aj.
Step 8: Set q = 2, where q is used to count the number of attributes currently being
processed.
Step 9: Compute the lower and the boundary approximations of each attribute set Bj
with q attributes (on any levels) for class Xl from the terminal level to the
root level by the following substeps:
(a) Skip all the combinations of attribute values in Bj which have the
equivalence classes of their any value subsets already in the lower
approximation for Xl .
(b) Derive the equivalence class of each remaining combination of attribute
14
values by the intersection of the equivalence classes of its single attribute
values.
(c) Put the equivalence class )x(B j of each combination in substep (b) into
the lower approximation class Xl if )x(B j ⊆ Xl.
(d) Put the equivalence class )x(B j of each combination in substep (b) into
the boundary approximation class if )x(B j ∩ Xl ≠ ∅ and )x(B j ⊄ Xl.
Step 10: Set q=q+1 and repeat 8 to 10 until q>m.
Step 11: Derive the certain rules from the lower approximations.
Step 12: Remove certain rules with condition parts more specific than those of some
other certain rules.
Step 13: Derive the possible rules from the boundary approximations and calculate
their plausibility values as:
)x(B
)X)x(B))x(B(p
j
ljj
∩= ,
where )x(B j is the equivalence class including x and derived from
attribute set Bj.
Step 14: Set l = l + 1 and repeat Steps 5 to 14 until l > c.
Step 15: Output the certain rules and possible rules.
6. An Example
In this section, an example is given to show how the proposed algorithm can be
used to generate certain and possible rules from training data with hierarchical values.
15
Assume the training data set is shown in Table 2.
Table 2: The training data used in this example
Transport Residence Consumption Style
Obj(1) Expensive car Villa High
Obj(2) Cheap car Single house High
Obj(3) Ordinary train Suite Low
Obj(4) Express train Villa Low
Obj(5) Ordinary train Suite Low
Obj(6) Express train Single house Low
Obj(7) Cheap car Single house High
Obj(8) Express train Single house High
Obj(9) Ordinary train Suite Low
Obj(10) Express train Apartment Low
Table 2 contains ten objects U={Obj(1), Obj(2), …, Obj(10)}, two decision
attributes A={Transport, House}, and a class attribute C={Consumption Style}. The
possible values of each decision attribute are organized into a taxonomy, as shown in
Figures 2 and 3.
Transport
Train Car
Express Train
Ordinary Train
Expensive Car
Cheap Car
Figure 2: Hierarchy of Transport
16
In Figures 2 and 3, there are three levels of hierarchical attribute values for
attributes Transport and Residence. The roots representing the generic names of
attributes are located on level 0 (such as “Transport” and “Residence”), the internal
nodes representing categories (such as “Train”) are on level 1, and the terminal nodes
representing actual values (such as “Express Train”) are on level 2. Only values of
terminal nodes can appear in training examples. Assume the class has only two
possible values: {High (H), Low (L)}. The proposed algorithm then processes the data
in Table 2 as follows.
Step 1: Since two classes exist in the data set, two partitions are found as
follows:
XH ={Obj(1), Obj(2), Obj(7), Obj(8)}, and
XL = {Obj(3), Obj(4), Obj(5), Obj(6), Obj(9), Obj(10)}.
Step 2: The terminal-level elementary sets are formed for the two attributes as
follows:
Residence
House Building
Villa Single House
Suite Apartment
Figure 3: Hierarchy of Residence
17
U/{Transport t } = {{(Obj(1))}, {(Obj(3))(Obj(5))(Obj(9))},
{(Obj(4))(Obj(6))(Obj(8))(Obj(10))}, {(Obj(2))(Obj(7))}};
U/{Residence t } = {{(Obj(1),)(Obj(4))}, {(Obj(2))(Obj(6))(Obj(7))(Obj(8))},
{(Obj(3))(Obj(5))(Obj(9))}, {(Obj(10))}}.
Step 3: Each terminal node is linked to its equivalence class for later usage.
Results for the two attributes are respectively shown in Figures 4 and 5.
Transport
Train Car
Express Train
Ordinary Train
Expensive Car
Cheap Car
Figure 4: Linking the equivalence classes to the terminal nodes in the Transport taxonomy
Obj(1) Obj(3) Obj(5) Obj(9) Obj(4) Obj(6) Obj(8) Obj(10) Obj(7) Obj(2)
18
Step 4: l is set at 1, where l is used to represent the number of the class currently
being processed . In this example, assume the class XH is first processed.
Step 5: The terminal-level lower approximation of attribute Transport for class
XH is first calculated. Since the two terminal-level equivalence classes {(Obj(1))} and
{(Obj(2))(Obj(7))} for Transport are completely included in XH, which is {Obj(1), Obj(2),
Obj(7), Obj(8)}, the lower approximation of attribute Transport for class XH is thus:
Transport t* (XH) = {{(Obj(1))}, {(Obj(2))(Obj(7))}}.
Similarly, the terminal-level lower approximation of attribute Residence for class
XH is calculated as:
Residence t* (XH) = ∅.
Residence
House Building
Villa Single House
Suite Apartment
Figure 5: Linking the equivalence classes to the terminal nodes in the Residence taxonomy
Obj(1) Obj(4) Obj(3) Obj(5) Obj(10) Obj(9) Obj(2) Obj(6) Obj(8) Obj(7)
19
Step 6: The terminal-level boundary approximation of attribute Transport for
class XH is calculated. Since only the equivalence class{(Obj(4))(Obj(6))(Obj(8))(Obj(10))}
has intersection with XH and not covered by XH, its boundary approximation is thus:
Transport *t (XH) = {(Obj(4))(Obj(6))(Obj(8))(Obj(10))}.
Similarly, the terminal-level boundary approximation of attribute Residence for
class XH is calculated as:
Residence *t (XH) = {{(Obj(1))(Obj(4))}, {(Obj(2))(Obj(6))(Obj(7))(Obj(8))}}.
Step 7: The non-terminal-level lower and boundary approximations of single
attributes for class XH are computed by the following substeps.
(a) The equivalence classes for non-terminal-level attribute values are first
calculated from their underlying terminal-level equivalence classes. Take the attribute
value Train in Transport as an example. Its equivalence class is the union of the two
equivalence classes from the terminal nodes Express Train and Ordinary Train.
Similarly, the equivalence class for the attribute value Car in Transport is the union of
the two equivalence classes from the terminal nodes Expensive Car and Cheap Car.
The equivalence classes for attribute Transport on level 1 are then shown as follows:
U/{Transport nt } = {{(Obj(1))(Obj(2))(Obj(7))}, {(Obj(3))(Obj(4))(Obj(5))
(Obj(6))(Obj(8))(Obj(9))(Obj(10))}}.
20
Similarly, the non-terminal-level equivalence classes for the attribute Residence
on level 1 are found as:
U/{Residence nt } = {{(Obj(1))(Obj(2))(Obj(4))(Obj(6))(Obj(7))(Obj(8))},
{(Obj(3))(Obj(5))(Obj(9))(Obj(10))}}.
(b) The equivalence class of a non-terminal-level attribute value is in the lower
approximation if all the equivalence classes of its underlying attribute values on the
lower levels are also in the lower approximation. In this example, only the
equivalence classes of the underlying attribute values of Transport=Car are in the
lower approximation for XH. The lower approximations for XH on level 1 are thus:
Transport nt* (XH) = {(Obj(1))(Obj(2))(Obj(7))}, and
Residence nt* (XH) = ∅.
(c) The equivalence class of a non-terminal-level attribute value is in the
boundary approximation if at least one equivalence class of its underlying attribute
values on the lower levels is also in the boundary approximation. The boundary
approximations for XH on level 1 are thus found as:
Transport *nt (XH) = {(Obj(3))(Obj(4))(Obj(5))(Obj(6))(Obj(8))(Obj(9)) (Obj(10))}, and
Residence *nt (XH) = {(Obj(1))(Obj(2))(Obj(4))(Obj(6))(Obj(7))(Obj(8))}.
Step 8: q is set at 2, where q is used to count the number of attributes currently
being processed.
21
Step 9: The lower and the boundary approximations of each attribute set with
two attributes for class XH on the terminal level are found in the following substeps. In
this example, only the attribute set {Transport, Residence} contains two attributes.
(a) The attribute set {Transport, Residence} has the following possible
combinations of values on the terminal level:
(Transport=Express Train, Residence=Villa),
(Transport=Express Train, Residence=Single House),
(Transport=Express Train, Residence=Suite),
(Transport=Express Train, Residence=Apartment),
(Transport=Ordinary Train, Residence=Villa),
(Transport=Ordinary Train, Residence=Single House),
(Transport=Ordinary Train, Residence=Suite),
(Transport=Ordinary Train, Residence=Apartment),
(Transport=Expensive Car, Residence=Villa),
(Transport=Expensive Car, Residence=Single House),
(Transport=Expensive Car, Residence=Suite),
(Transport=Expensive Car, Residence=Apartment),
(Transport=Cheap Car, Residence=Villa),
(Transport=Cheap Car, Residence=Single House),
(Transport=Cheap Car, Residence=Suite), and
(Transport=Cheap Car, Residence=Apartment).
Since the equivalence classes for the two single attribute values (Transport =
22
Expensive Car) and (Transport = Cheap Car) are in the lower approximation for XH,
the above combinations including (Transport = Expensive Car) and (Transport =
Cheap Car) won’t then be considered in the later steps. Thus, only eight combinations
are considered.
(b) The equivalence class of each remaining value combination for {Transport,
Residence} is then derived by the intersection of the equivalence classes of its single
attribute values. Take the combination (Transport = Express Train, Residence = Villa)
as an example. The equivalence class for (Transport = Express Train) is
{(Obj(4))(Obj(6))(Obj(8))(Obj(10))} and for (Residence = Villa) is {(Obj(1))(Obj(4))}. The
equivalence class for (Transport = Express Train, Residence = Villa) is thus the
intersection of {(Obj(4))(Obj(6))(Obj(8))(Obj(10))} and {(Obj(1))(Obj(4))}, which is
{(Obj(4))}.
The equivalence classes for the other value combinations of {Transport,
Residence} can be similarly derived. Thus:
U/{Transport t , Residence t } = {{(Obj(4))}, {(Obj(6))(Obj(8))}, {(Obj(10))},
{(Obj(3))(Obj(5))(Obj(9))}}.
Note that {(Obj(1))} and {(Obj(2))(Obj(7))} won’t be considered since they are in
the lower approximation for XH from the single attribute value (Transport = Expensive
Car) and (Transport = Cheap Car).
(c) The lower approximation of each 2-attribute subset Bj for class XH on the
23
terminal level is first derived. Only {Transport, Residence} is considered in this
example. Since no equivalence classes in U/{Transport, Residence} are in XH, the
lower approximation of {Transport, Residence} for XH on the terminal level is thus:
{Transport t , Residence t } * (XH) = ∅.
(d) The boundary approximation of each 2-attribute subset Bj for class XH on the
terminal level is then derived. Take {Transport, Residence} as an example. Since
(Obj(8)) in the equivalence class {(Obj(6))(Obj(8))} of U/{Transport t , Residence t } is in
XH and (Obj(6)) is not, {(Obj(6))(Obj(8))} is thus in the boundary approximation of
{Transport, Residence} for XH on the terminal level. The boundary approximation of
{Transport, Residence} for XH on the terminal level are shown below:
{Transport t , Residence t } * (XH) = {(Obj(6))(Obj(8))}.
The same process is then repeated from the terminal level to the root level for
finding the lower and the boundary approximations of {Transport, Residence} on
other levels. Result are shown as follows:
{Transport nt , Residence t } * (XH) = ∅,
{Transport t , Residence nt } * (XH) = ∅,
{Transport nt , Residence nt } * (XH) = ∅,
{Transport nt , Residence t } * (XH) = {(Obj(6))(Obj(8))},
{Transport t , Residence nt } * (XH) = {(Obj(4))(Obj(6))(Obj(8))}, and
{Transport nt , Residence nt } * (XH) = {(Obj(4))(Obj(6))(Obj(8))}.
24
Step 10: q = 2 + 1 = 3. Since q > m (= 2), the next step is executed.
Step 11: All the certain rules are then derived from the lower approximations.
Results for this example are shown below:
1. If Transport is Expensive Car then Consumption Style is High;
2. If Transport is Cheap Car then Consumption Style is High;
3. If Transport is Car then Consumption Style is High.
Step 12: Since the condition parts of the first and second certain rules are more
specific than the third certain rule, the first two rules are removed from the certain
rule set.
Step 13: All the possible rules are derived from the boundary approximations.
The plausibility measure of each rule is also calculated in this step. For example, the
plausibility measure of the equivalence class {(Obj(4))(Obj(6))(Obj(8))(Obj(10))} of
Transport = Express Train in the boundary approximation for class XH is calculated as
follows:
P (If Transport = Express Train then XH)
),,,(),,,(),,,(
)10()8()6()4(
)8()7()2()1()10()8()6()4(
objobjobjobjobjobjobjobjobjobjobjobj ∩=
= 0.25.
25
All the resulting possible rules with their plausibility values are shown below:
1. If Transport is Express Train then Consumption Style is High,
with plausibility = 0.25;
2. If Residence is Single House then Consumption Style is High,
with plausibility = 0.75;
3. If Transport is Train then Consumption Style is High,
with plausibility = 0.14;
4. If Residence is House then Consumption Style is High,
with plausibility = 0.66;
5. If Transport is Express Train and Residence is Single House then
consumption style is High, with plausibility = 0.5;
6. If Transport is Train and Residence is Single House then Consumption Style
is High, with plausibility = 0.5;
7. If Transport is Train and Residence is House then consumption style is High,
with plausibility = 0.33;
8. If Transport is Express Train and Residence is House as then Consumption
Style is High, with plausibility = 0.33.
Step 14: l = l + 1 = 2. Steps 5 to 14 are then repeated for the other class XL.
Step 15: All the certain rules and possible rules are then output:
7. Discussion
In the proposed learning algorithm for handling training examples with
26
hierarchical values, only the maximally general certain rules, instead of all certain
ones, are kept for classification. Certain rules which are not maximally general are
removed since they provide no other new information. Take the maximally general
rule “If Transport is Car then Consumption Style is High” derived in the above
section as an example. All the descendent rules covered by the maximally general rule
according to the taxonomy relation in Figure 2 are shown as follows:
1. If Transport is Expensive Car then Consumption Style is High;
2. If Transport is Cheap Car then Consumption Style is High.
It can be easily verified that the above two rules are also certain rules. Besides,
any rules generated by adding additional constraints into the maximally general rule
or into its descendent rules are also certain. These include the following 18 rules:
1. If Transport is Car and Residence is Villa then Consumption Style is High;
2. If Transport is Car and Residence is Single House then Consumption Style is
High;
3. If Transport is Car and Residence is Suite then Consumption Style is High;
4. If Transport is Car and Residence is Apartment then Consumption Style is
High;
5. If Transport is Car and Residence is House then Consumption Style is High;
6. If Transport is Car and Residence is Building then Consumption Style is
High;
7. If Transport is Expensive Car and Residence is Villa then Consumption Style
is High;
8. If Transport is Expensive Car and Residence is Single House then
27
Consumption Style is High;
9. If Transport is Expensive Car and Residence is Suite then Consumption Style
is High;
10. If Transport is Expensive Car and Residence is Apartment then
Consumption Style is High;
11. If Transport is Expensive Car and Residence is House then Consumption
Style is High;
12. If Transport is Expensive Car and Residence is Building then Consumption
Style is High;
13. If Transport is Cheap Car and Residence is Villa then Consumption Style is
High;
14. If Transport is Cheap Car and Residence is Single House then Consumption
Style is High;
15. If Transport is Cheap Car and Residence is Suite then Consumption Style is
High;
16. If Transport is Cheap Car and Residence is Apartment then Consumption
Style is High;
17. If Transport is Cheap Car and Residence is House then Consumption Style
is High;
18. If Transport is Cheap Car and Residence is Building then Consumption
Style is High;
The pruning procedure is embedded in the proposed algorithm. The above
subsumption relation for certain rules is, however, not valid for possible rules. The
plausibility of a parent possible rule will always lie between the minimum and the
28
maximum plausibility values of its children rules. Take the possible rule “If Transport
is Train then Consumption Style is High, with plausibility = 0.14” derived in the
above section as an example. Both its descendent rules according to the taxonomy
relation in Figure 2 are shown as follows:
1. If Transport is Express Train then Consumption Style is High,
with plausibility = 0.25;
2. If Transport is Ordinary Train then Consumption Style is High,
with plausibility = 0.
It can be seen that the plausibility of the parent rule is between 0.25 and 0. Note
that the second child rule will not be actually kept since its plausibility is zero. It is
only shown here to demonstrate the relationship of the plausibility values in parent
and child rules. The child rules with plausibility values less than their parent rules will
also be kept by the proposed algorithm since they may provide some useful
information about the classification. When a new event satisfies both a child rule and
its parent rule, it is more accurate to derive the plausibility of the consequence from
the child rule than from the parent rule. However, if a new event has an unknown
attribute value, but a known non-terminal value, it can still be inferred using the
parent rules. The proposed algorithm thus keeps all the possible rules except for those
with plausibility = 0. If the child rules with plausibility values less than their parent
rules won’t be kept, the proposed algorithm can be easily modified by simply adding
a subsumption checking after the step of generating the possible rules.
Besides, a plausibility threshold can be used in the proposed algorithm to avoid
overwhelming possible rules. The rules with their plausibility values less than the
29
threshold will thus be pruned. This checking step can easily be embedded in finding
the boundary approximation to reduce the computational time further.
8. Conclusions and Future Works
In this paper, we have proposed a new learning algorithm based on rough sets to
find cross-level certain and possible rules from training data with hierarchical
attribute values. The proposed method adopts the concept of equivalence classes to
find the terminal-level elementary sets of single attributes. These equivalence classes
are then easily used to find the non-terminal-level elementary sets of single attributes
and the cross-level elementary sets of multiple attributes by the union and the
intersection operations. Lower and boundary approximations are then derived from
the elementary sets from the terminal level to the root level. Boundary approximations,
instead of upper approximations, are used in the proposed algorithm to find possible
rules, thus reducing some subsumption checking. Lower approximations are used to
derive maximally general certain rules. Some pruning heuristics are also used to avoid
unnecessary search. The rules derived can be used to infer a new event with both
terminal and non-terminal attribute values.
The limitation of the proposed algorithm is its application to symbolic data. If
numerical data are fed, they will first be converted into intervals. Currently, we are
trying to apply fuzzy concepts to managing numerical data to enhance its power.
References
[1] B. G. Buchanan and E. H. Shortliffe, Rule-Based Expert System: The MYCIN
30
Experiments of the Standford Heuristic Programming Projects, Massachusetts:
Addison-Wesley, 1984.
[2] M. R. Chmielewski, J. W. Grzymala-Busse, N. W. Pererson and S. Than. “The
rule induction system LERS – A version for personal computers,” Foundation of
Computation Decision Science Vol. 18, No. 3, 1993, pp. 181-212.
[3] J. W. Grzymala-Busse, “Knowledge acquisition under uncertainty: A rough set
approach,” Journal of Intelligent Robotic Systems, Vol. 1, 1988, pp. 3-16.
[4] S. Hirano, X. Sun, and S. Tsumoto, “Dealing with multiple types of expert
knowledge in medical image segmentation: A rough sets style approach,” The
2002 IEEE International Conference on Fuzzy system, Vol. 2, 2002, pp. 884
-889.
[5] T. P. Hong, T. T. Wang and S. L. Wang, "Knowledge acquisition from
quantitative data using the rough-set theory," Intelligent Data Analysis, Vol. 4,
2000, pp. 289-304.
[6] T. P. Hong, T. T. Wang and B. C. Chien, "Mining approximate coverage rules,"
International Journal of Fuzzy Systems, Vol. 3, No. 2, 2001, pp. 409-414.
[7] G. Lambert-Torres, A. P. Alves da Silva, V. H. Quintana and L. E. Borges da Silva,
“Knowledge-base reduction based on rough set techniques,” The Canadian
Conference on Electrical and Computer Engineering, 1996, pp. 278-281.
[8] C. H. Lee, S. H. Swo and S. C. Choi, ”Rule discovery using hierarchical
classification structure with rough sets,” The 9th IFSA World Congress and the
20th NAFIPS International Conference, Vol. 1, 2001, pp. 447 -452.
[9] P. J. Lingras and Y. Y. Yao, “Data mining using extensions of the rough set
model,” Journal of the American Society for Information Science, Vol. 49, No. 5,
1998, pp. 415-422.
31
[10] R. S. Michalski, J. G. Carbonell and T. M. Mitchell, Machine Learning: An
Artificial Intelligence Approach, Vol. 1, California: Kaufmann Publishers, 1983.
[11] R. S. Michalski, J. G. Carbonell and T. M. Mitchell, Machine Learning: An
Artificial Intelligence Approach, Vol. 2, 1984.
[12] Z. Pawlak, “Rough set,” International Journal of Computer and Information
Sciences, Vol. 11, No. 5, 1982, pp. 341-356.
[13] Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer
Academic Publishers, 1991.
[14] Z. Pawlak, “Why rough sets?,” The Fifth IEEE International Conference on
Fuzzy Systems, Vol. 2, 1996, pp. 738 –743.
[15] S. Tsumoto, “Extraction of experts’ decision rules from clinical databases using
rough set model,” Intelligent Data Analysis, Vol. 2, 1998, pp. 215-227.
[16] S. Tsumto, “Knowledge discovery in medical databases based on rough sets and
attribute-oriented generalization,” The 1998 IEEE International Conference on
Fuzzy Systems, Vol. 2, 1998, pp. 1296 -1301.
[17] Y. Y. Yao, “Stratified rough sets and granular computing,” The 18th International
Conference of the North American, 1999, pp. 800 -804.
[18] N. Zhong, J. Z. Dong, S. Ohsuga and T. Y. Lin, “An incremental, probabilistic
rough set approach to rule discovery,” The IEEE International Conference on
Fuzzy Systems, Vol. 2, 1998, pp. 933 –938.