Learning cross-level certain and possible rules by rough sets

1

Learning Cross-level Certain and Possible Rules by Rough Sets

Tzung-Pei Hong†**, Chun-E Lin ‡, Jiann-Horng Lin ‡, Shyue-Liang Wang*

†Department of Electrical Engineering National University of Kaohsiung Kaohsiung, 811, Taiwan, R.O.C.

[email protected] ‡Institute of Information Management

I-Shou University Kaohsiung, 840, Taiwan, R.O.C.

[email protected], [email protected] *Department of Computer Science New York Institute of Technology

1855 Broadway, New York 10023, U.S.A. [email protected]

Abstract

Machine learning can extract desired knowledge and ease the development

bottleneck in building expert systems. Among the proposed approaches, deriving rules

from training examples is the most common. Given a set of examples, a learning

program tries to induce rules that describe each class. Recently, the rough-set theory

has been widely used in dealing with data classification problems. Most of the

previous studies on rough sets focused on deriving certain rules and possible rules on

the single concept level. Data with hierarchical attribute values are, however,

commonly seen in real-world applications. This paper thus attempts to propose a new

learning algorithm based on rough sets to find cross-level certain and possible rules

from training data with hierarchical attribute values. It is more complex than learning

rules from training examples with single-level values, but may derive more general

knowledge from data. Boundary approximations, instead of upper approximations, are

used to find possible rules, thus reducing some subsumption checking. Some pruning

heuristics are also adopted in the proposed algorithm to avoid unnecessary search.

Keywords: machine learning, rough set, certain rule, possible rule, hierarchical value.

-----------------------------------

**Corresponding author.

2

1. Introduction

Expert systems have been widely used in domains where mathematical models

cannot easily be built, human experts are not available or the cost of querying an

expert is high. Although a wide variety of expert systems have been built, knowledge

acquisition remains a development bottleneck. Usually, a knowledge engineer is

needed to establish a dialog with a human expert and to encode the knowledge elicited

into a knowledge base to produce an expert system. The process is, however, very

time-consuming [1][2]. Building a large-scale expert system involves creating and

extending a large knowledge base over the course of many months or years. Thence

shortening the development time is the most important factor for the success of an

expert system. Machine-learning techniques have thus been developed to ease the

knowledge-acquisition bottleneck. Among the proposed approaches, deriving rules

from training examples is the most common [5][6][9][10][11][15]. Given a set of

examples, a learning program tries to induce rules that describe each class.

Recently, the rough-set theory has been used in reasoning and knowledge

acquisition for expert systems [3]. It was proposed by Pawlak in 1982 [12] with the

concept of equivalence classes as its basic principle. Several applications and

extensions of the rough-set theory have also been proposed. Examples are

Lambert-Torres et al.’s knowledge-base reduction [7], Zhong et al.'s rule discovery

[18], Lee et al.'s hierarchical granulation structure [8] and Tsumto's attribute-oriented

generalization [17]. Because of the success of the rough-set theory in knowledge

acquisition, many researchers in the machine-learning fields are very interested in this

research topic since it offers opportunities to discover useful information from

training examples.

3

Most of the previous studies on rough sets focused on deriving certain rules and

possible rules on the single concept level. Hierarchical attribute values are, however,

usually predefined in real-world applications. Deriving rules on multiple concept

levels may thus lead to the discovery of more general and important knowledge from

data. It is, however, more complex than learning rules from training examples with

single-level values. In this paper, we thus propose a new learning algorithm based on

rough sets to find cross-level certain and possible rules from training data with

hierarchical attribute values. Boundary approximations, instead of upper

approximations, are used to find possible rules, thus reducing some subsumption

checking. Some pruning heuristics are also used in the proposed algorithm to avoid

unnecessary search.

The remainder of this paper is organized as follows. The rough-set theory is

briefly reviewed in Section 2. Management of hierarchical attribute values by rough

sets is described in Section 3. The notation and definitions used in this paper are given

in Section 4. A new learning algorithm based on the rough-set theory to induce

cross-level certain and possible rules are proposed in Section 5. An example to

illustrate the proposed algorithm is given in Section 6. Some discussion is taken in

Section 7. Conclusions and future works are finally given in Section 8.

2. Review of the Rough-Set Theory

The rough-set theory, proposed by Pawlak in 1982 [12][14], can serve as a new

mathematical tool for dealing with data classification problems. It adopts the concept

of equivalence classes to partition training instances according to some criteria. Two

kinds of partitions are formed in the mining process: lower approximations and upper

4

approximations, from which certain and possible rules can easily be derived.

Formally, let U be a set of training examples (objects), A be a set of attributes

describing the examples, C be a set of classes, and Vj be a value domain of an attribute

Aj. Also let )(ijv be the value of attribute Aj for the i-th object Obj(i). When two

objects Obj(i) and Obj(k) have the same value of attribute Aj, (that is, )(ijv = )(k

jv ), Obj(i)

and Obj(k) are said to have an indiscernibility relation (or an equivalence relation) on

attribute Aj. Also, if Obj(i) and Obj(k) have the same values for each attribute in subset

B of A, Obj(i) and Obj(k) are also said to have an indiscernibility (equivalence) relation

on attribute set B. These equivalence relations thus partition the object set U into

disjoint subsets, denoted by U/B, and the partition including Obj(i) is denoted

B(Obj(i)).

Example 1: Table 1 shows a data set containing ten objects U={Obj(1), Obj(2), …,

Obj(10)}, two attributes A={(Transport, Residence)}, and a class set Consumption Style.

The class set has two possible values: {Low (L), High (H)}.

Table 1: The training data used in this example.

Transport Residence Consumption Style

Obj(1) Expensive car Villa High

Obj(2) Cheap car Single house High

Obj(3) Ordinary train Suite Low

Obj(4) Express train Villa Low


Obj(6) Express train Single house Low


5

Obj(8) Express train Single house High


Obj(10) Express train Apartment Low

Since Obj(2) and Obj(7) have the same attribute value Cheap Car for attribute

Transport, they share an indiscernibility relation and thus belong to the same

equivalence class for Transport. The equivalence partitions for singleton attributes

can be derived as follows:

U/{Transport} = {{(Obj(1))}, {(Obj(3))(Obj(5))(Obj(9))},

{(Obj(4))(Obj(6))(Obj(8))(Obj(10))}, {(Obj(2))(Obj(7))}}, and

U/{Residence} = {{(Obj(1),)(Obj(4))}, {(Obj(2))(Obj(6))(Obj(7))(Obj(8))},

{(Obj(3))(Obj(5))(Obj(9))}, {(Obj(10))}}.

The sets of equivalence classes for subset B are referred to as B-elementary sets.

Also, {Transport}(Obj(2)) = {Transport}(Obj(7)) = {Obj(2), Obj(7)}.

The rough-set approach analyzes data according to two basic concepts, namely

the lower and the upper approximations of a set. Let X be an arbitrary subset of the

universe U, and B be an arbitrary subset of attribute set A. The lower and the upper

approximations for B on X, denoted B*(X) and B*(X) respectively, are defined as

follows:

B*(X) = {x | x ∈ U, B(x) ⊆ X }, and

B*(X) = {x | x ∈ U and B(x) ∩ X ≠ ∅ }.

6

Elements in B*(x) can be classified as members of set X with full certainty using

attribute set B, so B*(x) is called the lower approximation of X. Similarly, elements in

B*(x) can be classified as members of the set X with only partial certainty using

attribute set B, so B*(x) is called the upper approximation of X.

Example 2: Continuing from Example 1, assume X={Obj(1), Obj(2), Obj(7),

Obj(8)}. The lower and the upper approximations of attribute Transport with respect to

X can be calculated as follows:

Transport*(X)= {{(Obj(1))}, {(Obj(2))(Obj(7))}}, and

Transport*(X)={{Obj(1)}, {(Obj(2))(Obj(7))}, {(Obj(4))(Obj(6))(Obj(8))(Obj(10))}}.

After the lower and the upper approximations have been found, the rough-set

theory can then be used to derive both certain and uncertain information and induce

certain and possible rules from them [3][5].

Lambert-Torres et al. found unimportant attributes from lower and upper

approximations and deleted them from a database [7]. Zhong et al. proposed a new

incremental learning algorithm based on the generalization distribution table, which

maintained the probabilistic relationships between the possible instances and the

possible concepts [18]. Two sets of generalizations were formed from the table based

on the rough set model. One set consisted of all consistent generalizations and the

other consisted of all contradictory generalizations, which were similar to the S and G

sets in the version space approach. The generalizations were then gradually adjusted

7

according to new instances. The examples in the database could then be merged since

some attributes were removed. The resulting database was thus a compact database.

Yao formed a stratified granulation structure with respect to different levels of rough

set approximations by incrementally clustering objects with the same characteristics

together [17]. Also, Lee et al. simplified classification rules for data mining using

rough set theory [8]. The proposed classification method generated minimal

classification rules and made the analysis of information systems easy. Tasumoto

presented a knowledge discovery system based on rough sets and attribute-oriented

generalization [16]. It was used not only to acquire several sets of attributes important

for classification, but also to evaluate how precisely the attributes of a database were

able to classify data.

The advantage of the rough set theory lies in its simplicity from a mathematical

point of view since it requires only finite sets, equivalence relations, and cardinalities

[13].

3. Hierarchical Attribute Values

Most of the previous studies on rough sets focused on finding certain rules and

possible rules on the single concept level. However, hierarchical attribute values are

usually predefined in real-world applications and can be represented by hierarchy

trees. Terminal nodes on the trees represent actual attribute values appearing in

training examples; internal nodes represent value clusters formed from their

lower-level nodes. Deriving rules on multiple concept levels may lead to the

discovery of more general and important knowledge from data. A simple example for

attribute Transport is given in Figure 1.

8

Figure 1: An example of predefined hierarchical values for attribute Transport

In Figure 1, the attribute Transport falls into two general values: Train and Car.

Train can be further classified into two more specific values Express Train and

Ordinary Train. Similarly, assume Car is divided into Expensive Car and Cheap Car.

Only the terminal attribute values (Express Train, Ordinary Train, Expensive Car,

Cheap Car) can appear in training examples.

The concept of equivalence classes in the rough set theory makes it very suitable

for finding cross-level certain and possible rules from training examples with

hierarchical values. The equivalence class of a non-terminal-level attribute value for

attribute Aj can be easily found by the union of its underlying terminal-level

equivalence classes for Aj. Also, the equivalence class of a cross-level attribute value

combination for more than two attributes can be derived from the intersection of the

equivalence classes of its single attribute values.

Example 3: Continuing from Example 2, assume the hierarchical values for

attribute Transport are the same as those in Figure 1. The equivalence class for

Transport

Train Car

Express Train

Ordinary Train

Expensive Car

Cheap Car

9

Transport = Train is then the union of the equivalence classes for Transport =

Expressive Train and Transport = Ordinary Train. Similarly, the equivalence class for

Transport = Car is the union of the equivalence classes for Transport = Expensive

Car and Transport = Cheap Car. Thus:

U/{Transport nt } = {{(Obj(1))(Obj(2))(Obj(7))}, {(Obj(3))(Obj(4))(Obj(5))

(Obj(6))(Obj(8))(Obj(9))(Obj(10))}},

where U/{Transport nt } represents the non-terminal-level elementary set for attribute

Transport.

In this paper, we will thus propose a rough-set-based learning algorithm for

deriving cross-level certain and possible rules from training examples with

hierarchical attribute values.

4. Notation and Definitions

According to the definitions of the lower approximation and the upper

approximation, it is easily seen that the upper approximation includes the lower

approximation. Thus each certain rule derived from the lower approximation will also

be derived from the upper approximation. It thus causes redundant derivation and

wastes computational time. The proposed algorithm thus uses the boundary

approximation, instead of the upper approximation, to derive the pure possible rules.

It can thus reduce the subsumption checking needed. For convenience, the symbol

B*(X) is used from here on to represent the boundary approximation, instead of the

upper approximation, of attribute subset B on X. The boundary approximation for a

10

subset B is defined as follows:

B*(X) = {x | x ∈ U and B(x) ∩ X ≠ ∅, B(x) ⊄ X }.

The notation used in this paper is shown below.

U: the universe of all objects;

n: the total number of objects in U;

Obj(i) : the i-th object, 1≤ i ≤ n;

C: the set of classes to be determined;

c: the total number of classes in C;

Xl: the l-th class, 1≤ l≤ c;

A: the set of all attributes describing U;

m: the total number of attributes in A;

Aj: the j-th attribute, 1≤ j≤ m;

)(ijv : the value of Aj for Obj(i);

| tjA |: the number of terminal attribute values for Aj;

itjA : the i-th terminal value of Aj, 1≤ i≤ | t

jA |;

kntjA : the number of non-terminal-level attribute values of Aj on the k-th level;

kintjA : the i-th non-terminal-level value of Aj on the k-th level, 1≤ i ≤ knt

jA ;

tjA * : the terminal-level lower approximation of each single attribute Aj;

*tjA : the terminal-level boundary approximation of each single attribute Aj;

ntjA * : the non-terminal-level lower approximation of each single attribute Aj;

11

*ntjA : the non-terminal-level boundary approximation of each single attribute Aj;

Bj: an arbitrary subset of A;

)( )(ij objB : the equivalence class of Bj in which obj(i) exists.

5. The Algorithm

In the section, a new learning algorithm based on rough sets is proposed to find

cross-level certain and possible rules from training data with hierarchical attribute

values. The algorithm first finds the terminal-level elementary sets of the single

attributes. These equivalence classes can then be used later to find the

non-terminal-level elementary sets for the single attributes and the cross-level

elementary sets for more than one attribute. Lower approximations are used to derive

certain rules. Boundary approximations, instead of upper approximations, are used to

find possible rules, thus reducing some subsumption checking. The algorithm

calculates the lower and the boundary approximations of single attributes from the

terminal level to the root level. After that, the lower and the boundary approximations

of more than one attribute are derived based on the results of single attributes. Some

pruning heuristics are also used to avoid unnecessary search. The rule-derivation

process based on these approximations is then performed to find maximally generally

certain rules and all possible rules. The details of the proposed learning algorithm are

12

described as follows.

A rough-set-based learning algorithm for training examples with hierarchical

attribute values:

Input: A data set U with n objects, each of which has m decision attributes with

hierarchical values and belongs to one of c classes.

Output: A set of multiple-level certain and possible rules.

Step 1: Partition the object set into disjoint subsets according to class labels. Denote

each subset of objects belonging to class Cl as Xl.

Step 2: Find terminal-level elementary sets of single attributes; that is, if an object

obj(i) has a terminal value vj(i) for attribute Aj, put obj(i) in the equivalence

class from Aj = vj(i).

Step 3: Link each terminal node for Aj = itjA , 1 ≤ i ≤ | t

jA |, in the taxonomy tree for

attribute Aj to the equivalence class for Aj = itjA , where | it

jA | is the number of

terminal attribute values for Aj.

Step 4: Set l to 1, where l is used to represent the number of the class currently being

processed.

Step 5: Compute the terminal-level lower approximation of each single attribute Aj

for class Xl as:

},X)x(A,Ux|)x(A{)X(A ltj

tjl

t*j ⊆∈=

where )x(Atj is the terminal-level equivalence class including object x and

derived from attribute Aj.

Step 6: Compute the terminal-level boundary approximation of each single attribute

Aj for class Xl as:

13

}.X)x(A,X)x(A,Ux|)x(A{)X(A ltjl

tj

tjl

*tj ⊄≠∩∈= φ

Step 7: Compute the non-terminal-level lower and boundary approximations of each

single attribute Aj for class Xl from the terminal level to the root level in the

following substeps:

(a) Derive the equivalence class of the i-th non-terminal-level attribute value

kintjA for attribute Aj on level k by the union of its underlying

terminal-level equivalence classes.

(b) Put the equivalence class of a non-terminal-level attribute value kintjA in

the k-level lower approximation for attribute Aj if all the equivalence

classes of the underlying attribute values of kintjA are in the (k+1)-level

lower approximation for attribute Aj.

(c) Put the equivalence class of a non-terminal-level attribute value kintjA in

the k-level boundary approximation for attribute Aj if at least one

equivalence class of the underlying attribute values of kintjA is in the

(k+1)-level boundary approximation for attribute Aj.

Step 8: Set q = 2, where q is used to count the number of attributes currently being

processed.

Step 9: Compute the lower and the boundary approximations of each attribute set Bj

with q attributes (on any levels) for class Xl from the terminal level to the

root level by the following substeps:

(a) Skip all the combinations of attribute values in Bj which have the

equivalence classes of their any value subsets already in the lower

approximation for Xl .

(b) Derive the equivalence class of each remaining combination of attribute

14

values by the intersection of the equivalence classes of its single attribute

values.

(c) Put the equivalence class )x(B j of each combination in substep (b) into

the lower approximation class Xl if )x(B j ⊆ Xl.

(d) Put the equivalence class )x(B j of each combination in substep (b) into

the boundary approximation class if )x(B j ∩ Xl ≠ ∅ and )x(B j ⊄ Xl.

Step 10: Set q=q+1 and repeat 8 to 10 until q>m.

Step 11: Derive the certain rules from the lower approximations.

Step 12: Remove certain rules with condition parts more specific than those of some

other certain rules.

Step 13: Derive the possible rules from the boundary approximations and calculate

their plausibility values as:

)x(B

)X)x(B))x(B(p

j

ljj

∩= ,

where )x(B j is the equivalence class including x and derived from

attribute set Bj.

Step 14: Set l = l + 1 and repeat Steps 5 to 14 until l > c.

Step 15: Output the certain rules and possible rules.

6. An Example

In this section, an example is given to show how the proposed algorithm can be

used to generate certain and possible rules from training data with hierarchical values.

15

Assume the training data set is shown in Table 2.

Table 2: The training data used in this example

Transport Residence Consumption Style

Obj(1) Expensive car Villa High



Obj(4) Express train Villa Low


Obj(6) Express train Single house Low


Obj(8) Express train Single house High


Obj(10) Express train Apartment Low

Table 2 contains ten objects U={Obj(1), Obj(2), …, Obj(10)}, two decision

attributes A={Transport, House}, and a class attribute C={Consumption Style}. The

possible values of each decision attribute are organized into a taxonomy, as shown in

Figures 2 and 3.

Transport

Train Car

Express Train

Ordinary Train

Expensive Car

Cheap Car

Figure 2: Hierarchy of Transport

16

In Figures 2 and 3, there are three levels of hierarchical attribute values for

attributes Transport and Residence. The roots representing the generic names of

attributes are located on level 0 (such as “Transport” and “Residence”), the internal

nodes representing categories (such as “Train”) are on level 1, and the terminal nodes

representing actual values (such as “Express Train”) are on level 2. Only values of

terminal nodes can appear in training examples. Assume the class has only two

possible values: {High (H), Low (L)}. The proposed algorithm then processes the data

in Table 2 as follows.

Step 1: Since two classes exist in the data set, two partitions are found as

follows:

XH ={Obj(1), Obj(2), Obj(7), Obj(8)}, and

XL = {Obj(3), Obj(4), Obj(5), Obj(6), Obj(9), Obj(10)}.

Step 2: The terminal-level elementary sets are formed for the two attributes as

follows:

Residence

House Building

Villa Single House

Suite Apartment

Figure 3: Hierarchy of Residence

17

U/{Transport t } = {{(Obj(1))}, {(Obj(3))(Obj(5))(Obj(9))},

{(Obj(4))(Obj(6))(Obj(8))(Obj(10))}, {(Obj(2))(Obj(7))}};

U/{Residence t } = {{(Obj(1),)(Obj(4))}, {(Obj(2))(Obj(6))(Obj(7))(Obj(8))},

{(Obj(3))(Obj(5))(Obj(9))}, {(Obj(10))}}.

Step 3: Each terminal node is linked to its equivalence class for later usage.

Results for the two attributes are respectively shown in Figures 4 and 5.

Transport

Train Car

Express Train

Ordinary Train

Expensive Car

Cheap Car

Figure 4: Linking the equivalence classes to the terminal nodes in the Transport taxonomy

Obj(1) Obj(3) Obj(5) Obj(9) Obj(4) Obj(6) Obj(8) Obj(10) Obj(7) Obj(2)

18

Step 4: l is set at 1, where l is used to represent the number of the class currently

being processed . In this example, assume the class XH is first processed.

Step 5: The terminal-level lower approximation of attribute Transport for class

XH is first calculated. Since the two terminal-level equivalence classes {(Obj(1))} and

{(Obj(2))(Obj(7))} for Transport are completely included in XH, which is {Obj(1), Obj(2),

Obj(7), Obj(8)}, the lower approximation of attribute Transport for class XH is thus:

Transport t* (XH) = {{(Obj(1))}, {(Obj(2))(Obj(7))}}.

Similarly, the terminal-level lower approximation of attribute Residence for class

XH is calculated as:

Residence t* (XH) = ∅.

Residence

House Building

Villa Single House

Suite Apartment

Figure 5: Linking the equivalence classes to the terminal nodes in the Residence taxonomy

Obj(1) Obj(4) Obj(3) Obj(5) Obj(10) Obj(9) Obj(2) Obj(6) Obj(8) Obj(7)

19

Step 6: The terminal-level boundary approximation of attribute Transport for

class XH is calculated. Since only the equivalence class{(Obj(4))(Obj(6))(Obj(8))(Obj(10))}

has intersection with XH and not covered by XH, its boundary approximation is thus:

Transport *t (XH) = {(Obj(4))(Obj(6))(Obj(8))(Obj(10))}.

Similarly, the terminal-level boundary approximation of attribute Residence for

class XH is calculated as:

Residence *t (XH) = {{(Obj(1))(Obj(4))}, {(Obj(2))(Obj(6))(Obj(7))(Obj(8))}}.

Step 7: The non-terminal-level lower and boundary approximations of single

attributes for class XH are computed by the following substeps.

(a) The equivalence classes for non-terminal-level attribute values are first

calculated from their underlying terminal-level equivalence classes. Take the attribute

value Train in Transport as an example. Its equivalence class is the union of the two

equivalence classes from the terminal nodes Express Train and Ordinary Train.

Similarly, the equivalence class for the attribute value Car in Transport is the union of

the two equivalence classes from the terminal nodes Expensive Car and Cheap Car.

The equivalence classes for attribute Transport on level 1 are then shown as follows:

U/{Transport nt } = {{(Obj(1))(Obj(2))(Obj(7))}, {(Obj(3))(Obj(4))(Obj(5))

(Obj(6))(Obj(8))(Obj(9))(Obj(10))}}.

20

Similarly, the non-terminal-level equivalence classes for the attribute Residence

on level 1 are found as:

U/{Residence nt } = {{(Obj(1))(Obj(2))(Obj(4))(Obj(6))(Obj(7))(Obj(8))},

{(Obj(3))(Obj(5))(Obj(9))(Obj(10))}}.

(b) The equivalence class of a non-terminal-level attribute value is in the lower

approximation if all the equivalence classes of its underlying attribute values on the

lower levels are also in the lower approximation. In this example, only the

equivalence classes of the underlying attribute values of Transport=Car are in the

lower approximation for XH. The lower approximations for XH on level 1 are thus:

Transport nt* (XH) = {(Obj(1))(Obj(2))(Obj(7))}, and

Residence nt* (XH) = ∅.

(c) The equivalence class of a non-terminal-level attribute value is in the

boundary approximation if at least one equivalence class of its underlying attribute

values on the lower levels is also in the boundary approximation. The boundary

approximations for XH on level 1 are thus found as:

Transport *nt (XH) = {(Obj(3))(Obj(4))(Obj(5))(Obj(6))(Obj(8))(Obj(9)) (Obj(10))}, and

Residence *nt (XH) = {(Obj(1))(Obj(2))(Obj(4))(Obj(6))(Obj(7))(Obj(8))}.

Step 8: q is set at 2, where q is used to count the number of attributes currently

being processed.

21

Step 9: The lower and the boundary approximations of each attribute set with

two attributes for class XH on the terminal level are found in the following substeps. In

this example, only the attribute set {Transport, Residence} contains two attributes.

(a) The attribute set {Transport, Residence} has the following possible

combinations of values on the terminal level:

(Transport=Express Train, Residence=Villa),

(Transport=Express Train, Residence=Single House),

(Transport=Express Train, Residence=Suite),

(Transport=Express Train, Residence=Apartment),

(Transport=Ordinary Train, Residence=Villa),

(Transport=Ordinary Train, Residence=Single House),

(Transport=Ordinary Train, Residence=Suite),

(Transport=Ordinary Train, Residence=Apartment),

(Transport=Expensive Car, Residence=Villa),

(Transport=Expensive Car, Residence=Single House),

(Transport=Expensive Car, Residence=Suite),

(Transport=Expensive Car, Residence=Apartment),

(Transport=Cheap Car, Residence=Villa),

(Transport=Cheap Car, Residence=Single House),

(Transport=Cheap Car, Residence=Suite), and

(Transport=Cheap Car, Residence=Apartment).

Since the equivalence classes for the two single attribute values (Transport =

22

Expensive Car) and (Transport = Cheap Car) are in the lower approximation for XH,

the above combinations including (Transport = Expensive Car) and (Transport =

Cheap Car) won’t then be considered in the later steps. Thus, only eight combinations

are considered.

(b) The equivalence class of each remaining value combination for {Transport,

Residence} is then derived by the intersection of the equivalence classes of its single

attribute values. Take the combination (Transport = Express Train, Residence = Villa)

as an example. The equivalence class for (Transport = Express Train) is

{(Obj(4))(Obj(6))(Obj(8))(Obj(10))} and for (Residence = Villa) is {(Obj(1))(Obj(4))}. The

equivalence class for (Transport = Express Train, Residence = Villa) is thus the

intersection of {(Obj(4))(Obj(6))(Obj(8))(Obj(10))} and {(Obj(1))(Obj(4))}, which is

{(Obj(4))}.

The equivalence classes for the other value combinations of {Transport,

Residence} can be similarly derived. Thus:

U/{Transport t , Residence t } = {{(Obj(4))}, {(Obj(6))(Obj(8))}, {(Obj(10))},

{(Obj(3))(Obj(5))(Obj(9))}}.

Note that {(Obj(1))} and {(Obj(2))(Obj(7))} won’t be considered since they are in

the lower approximation for XH from the single attribute value (Transport = Expensive

Car) and (Transport = Cheap Car).

(c) The lower approximation of each 2-attribute subset Bj for class XH on the

23

terminal level is first derived. Only {Transport, Residence} is considered in this

example. Since no equivalence classes in U/{Transport, Residence} are in XH, the

lower approximation of {Transport, Residence} for XH on the terminal level is thus:

{Transport t , Residence t } * (XH) = ∅.

(d) The boundary approximation of each 2-attribute subset Bj for class XH on the

terminal level is then derived. Take {Transport, Residence} as an example. Since

(Obj(8)) in the equivalence class {(Obj(6))(Obj(8))} of U/{Transport t , Residence t } is in

XH and (Obj(6)) is not, {(Obj(6))(Obj(8))} is thus in the boundary approximation of

{Transport, Residence} for XH on the terminal level. The boundary approximation of

{Transport, Residence} for XH on the terminal level are shown below:

{Transport t , Residence t } * (XH) = {(Obj(6))(Obj(8))}.

The same process is then repeated from the terminal level to the root level for

finding the lower and the boundary approximations of {Transport, Residence} on

other levels. Result are shown as follows:

{Transport nt , Residence t } * (XH) = ∅,

{Transport t , Residence nt } * (XH) = ∅,

{Transport nt , Residence nt } * (XH) = ∅,

{Transport nt , Residence t } * (XH) = {(Obj(6))(Obj(8))},

{Transport t , Residence nt } * (XH) = {(Obj(4))(Obj(6))(Obj(8))}, and

{Transport nt , Residence nt } * (XH) = {(Obj(4))(Obj(6))(Obj(8))}.

24

Step 10: q = 2 + 1 = 3. Since q > m (= 2), the next step is executed.

Step 11: All the certain rules are then derived from the lower approximations.

Results for this example are shown below:

1. If Transport is Expensive Car then Consumption Style is High;

2. If Transport is Cheap Car then Consumption Style is High;

3. If Transport is Car then Consumption Style is High.

Step 12: Since the condition parts of the first and second certain rules are more

specific than the third certain rule, the first two rules are removed from the certain

rule set.

Step 13: All the possible rules are derived from the boundary approximations.

The plausibility measure of each rule is also calculated in this step. For example, the

plausibility measure of the equivalence class {(Obj(4))(Obj(6))(Obj(8))(Obj(10))} of

Transport = Express Train in the boundary approximation for class XH is calculated as

follows:

P (If Transport = Express Train then XH)

),,,(),,,(),,,(

)10()8()6()4(

)8()7()2()1()10()8()6()4(

objobjobjobjobjobjobjobjobjobjobjobj ∩=

= 0.25.

25

All the resulting possible rules with their plausibility values are shown below:

1. If Transport is Express Train then Consumption Style is High,

with plausibility = 0.25;

2. If Residence is Single House then Consumption Style is High,


3. If Transport is Train then Consumption Style is High,


4. If Residence is House then Consumption Style is High,


5. If Transport is Express Train and Residence is Single House then

consumption style is High, with plausibility = 0.5;

6. If Transport is Train and Residence is Single House then Consumption Style

is High, with plausibility = 0.5;

7. If Transport is Train and Residence is House then consumption style is High,


8. If Transport is Express Train and Residence is House as then Consumption

Style is High, with plausibility = 0.33.

Step 14: l = l + 1 = 2. Steps 5 to 14 are then repeated for the other class XL.

Step 15: All the certain rules and possible rules are then output:

7. Discussion

In the proposed learning algorithm for handling training examples with

26

hierarchical values, only the maximally general certain rules, instead of all certain

ones, are kept for classification. Certain rules which are not maximally general are

removed since they provide no other new information. Take the maximally general

rule “If Transport is Car then Consumption Style is High” derived in the above

section as an example. All the descendent rules covered by the maximally general rule

according to the taxonomy relation in Figure 2 are shown as follows:

1. If Transport is Expensive Car then Consumption Style is High;

2. If Transport is Cheap Car then Consumption Style is High.

It can be easily verified that the above two rules are also certain rules. Besides,

any rules generated by adding additional constraints into the maximally general rule

or into its descendent rules are also certain. These include the following 18 rules:

1. If Transport is Car and Residence is Villa then Consumption Style is High;

2. If Transport is Car and Residence is Single House then Consumption Style is

High;

3. If Transport is Car and Residence is Suite then Consumption Style is High;

4. If Transport is Car and Residence is Apartment then Consumption Style is

High;

5. If Transport is Car and Residence is House then Consumption Style is High;

6. If Transport is Car and Residence is Building then Consumption Style is

High;

7. If Transport is Expensive Car and Residence is Villa then Consumption Style

is High;

8. If Transport is Expensive Car and Residence is Single House then

27

Consumption Style is High;

9. If Transport is Expensive Car and Residence is Suite then Consumption Style

is High;

10. If Transport is Expensive Car and Residence is Apartment then

Consumption Style is High;

11. If Transport is Expensive Car and Residence is House then Consumption

Style is High;

12. If Transport is Expensive Car and Residence is Building then Consumption

Style is High;

13. If Transport is Cheap Car and Residence is Villa then Consumption Style is

High;

14. If Transport is Cheap Car and Residence is Single House then Consumption

Style is High;

15. If Transport is Cheap Car and Residence is Suite then Consumption Style is

High;

16. If Transport is Cheap Car and Residence is Apartment then Consumption

Style is High;

17. If Transport is Cheap Car and Residence is House then Consumption Style

is High;

18. If Transport is Cheap Car and Residence is Building then Consumption

Style is High;

The pruning procedure is embedded in the proposed algorithm. The above

subsumption relation for certain rules is, however, not valid for possible rules. The

plausibility of a parent possible rule will always lie between the minimum and the

28

maximum plausibility values of its children rules. Take the possible rule “If Transport

is Train then Consumption Style is High, with plausibility = 0.14” derived in the

above section as an example. Both its descendent rules according to the taxonomy

relation in Figure 2 are shown as follows:

1. If Transport is Express Train then Consumption Style is High,


2. If Transport is Ordinary Train then Consumption Style is High,

with plausibility = 0.

It can be seen that the plausibility of the parent rule is between 0.25 and 0. Note

that the second child rule will not be actually kept since its plausibility is zero. It is

only shown here to demonstrate the relationship of the plausibility values in parent

and child rules. The child rules with plausibility values less than their parent rules will

also be kept by the proposed algorithm since they may provide some useful

information about the classification. When a new event satisfies both a child rule and

its parent rule, it is more accurate to derive the plausibility of the consequence from

the child rule than from the parent rule. However, if a new event has an unknown

attribute value, but a known non-terminal value, it can still be inferred using the

parent rules. The proposed algorithm thus keeps all the possible rules except for those

with plausibility = 0. If the child rules with plausibility values less than their parent

rules won’t be kept, the proposed algorithm can be easily modified by simply adding

a subsumption checking after the step of generating the possible rules.

Besides, a plausibility threshold can be used in the proposed algorithm to avoid

overwhelming possible rules. The rules with their plausibility values less than the

29

threshold will thus be pruned. This checking step can easily be embedded in finding

the boundary approximation to reduce the computational time further.

8. Conclusions and Future Works

In this paper, we have proposed a new learning algorithm based on rough sets to

find cross-level certain and possible rules from training data with hierarchical

attribute values. The proposed method adopts the concept of equivalence classes to

find the terminal-level elementary sets of single attributes. These equivalence classes

are then easily used to find the non-terminal-level elementary sets of single attributes

and the cross-level elementary sets of multiple attributes by the union and the

intersection operations. Lower and boundary approximations are then derived from

the elementary sets from the terminal level to the root level. Boundary approximations,

instead of upper approximations, are used in the proposed algorithm to find possible

rules, thus reducing some subsumption checking. Lower approximations are used to

derive maximally general certain rules. Some pruning heuristics are also used to avoid

unnecessary search. The rules derived can be used to infer a new event with both

terminal and non-terminal attribute values.

The limitation of the proposed algorithm is its application to symbolic data. If

numerical data are fed, they will first be converted into intervals. Currently, we are

trying to apply fuzzy concepts to managing numerical data to enhance its power.

References

[1] B. G. Buchanan and E. H. Shortliffe, Rule-Based Expert System: The MYCIN

30

Experiments of the Standford Heuristic Programming Projects, Massachusetts:

Addison-Wesley, 1984.

[2] M. R. Chmielewski, J. W. Grzymala-Busse, N. W. Pererson and S. Than. “The

rule induction system LERS – A version for personal computers,” Foundation of

Computation Decision Science Vol. 18, No. 3, 1993, pp. 181-212.

[3] J. W. Grzymala-Busse, “Knowledge acquisition under uncertainty: A rough set

approach,” Journal of Intelligent Robotic Systems, Vol. 1, 1988, pp. 3-16.

[4] S. Hirano, X. Sun, and S. Tsumoto, “Dealing with multiple types of expert

knowledge in medical image segmentation: A rough sets style approach,” The

2002 IEEE International Conference on Fuzzy system, Vol. 2, 2002, pp. 884

-889.

[5] T. P. Hong, T. T. Wang and S. L. Wang, "Knowledge acquisition from

quantitative data using the rough-set theory," Intelligent Data Analysis, Vol. 4,

2000, pp. 289-304.

[6] T. P. Hong, T. T. Wang and B. C. Chien, "Mining approximate coverage rules,"

International Journal of Fuzzy Systems, Vol. 3, No. 2, 2001, pp. 409-414.

[7] G. Lambert-Torres, A. P. Alves da Silva, V. H. Quintana and L. E. Borges da Silva,

“Knowledge-base reduction based on rough set techniques,” The Canadian

Conference on Electrical and Computer Engineering, 1996, pp. 278-281.

[8] C. H. Lee, S. H. Swo and S. C. Choi, ”Rule discovery using hierarchical

classification structure with rough sets,” The 9th IFSA World Congress and the

20th NAFIPS International Conference, Vol. 1, 2001, pp. 447 -452.

[9] P. J. Lingras and Y. Y. Yao, “Data mining using extensions of the rough set

model,” Journal of the American Society for Information Science, Vol. 49, No. 5,

1998, pp. 415-422.

31

[10] R. S. Michalski, J. G. Carbonell and T. M. Mitchell, Machine Learning: An

Artificial Intelligence Approach, Vol. 1, California: Kaufmann Publishers, 1983.

[11] R. S. Michalski, J. G. Carbonell and T. M. Mitchell, Machine Learning: An

Artificial Intelligence Approach, Vol. 2, 1984.

[12] Z. Pawlak, “Rough set,” International Journal of Computer and Information

Sciences, Vol. 11, No. 5, 1982, pp. 341-356.

[13] Z. Pawlak, Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer

Academic Publishers, 1991.

[14] Z. Pawlak, “Why rough sets?,” The Fifth IEEE International Conference on

Fuzzy Systems, Vol. 2, 1996, pp. 738 –743.

[15] S. Tsumoto, “Extraction of experts’ decision rules from clinical databases using

rough set model,” Intelligent Data Analysis, Vol. 2, 1998, pp. 215-227.

[16] S. Tsumto, “Knowledge discovery in medical databases based on rough sets and

attribute-oriented generalization,” The 1998 IEEE International Conference on

Fuzzy Systems, Vol. 2, 1998, pp. 1296 -1301.

[17] Y. Y. Yao, “Stratified rough sets and granular computing,” The 18th International

Conference of the North American, 1999, pp. 800 -804.

[18] N. Zhong, J. Z. Dong, S. Ohsuga and T. Y. Lin, “An incremental, probabilistic

rough set approach to rule discovery,” The IEEE International Conference on

Fuzzy Systems, Vol. 2, 1998, pp. 933 –938.

Date post:	21-Nov-2023
Category:	Documents
Upload:	independent
View:	0 times
Download:	0 times

Learning cross-level certain and possible rules by rough sets

Documents