Rule Induction with Extension Matrices Leslie Damon, based on slides by Yuen F. Helbig Dr. Xindong...

Post on 22-Dec-2015

233 views 0 download

Tags:

transcript

Rule Induction with Extension Matrices

Leslie Damon, based on slides by Yuen F. Helbig

Dr. Xindong Wu, 1998

Outline

Extension matrix approach for rule induction The MFL and MCV optimization problems The HCV solution Noise handling and discretization in HCV Comparison of HCV with ID3-like algorithms

including C4.5 and C4.5 rules

Attribute-based induction algorithms

Attribute based induction concentrates on symbolic and heuristic computations

•doesn’t require built in knowledge

Best known are the ID3-like algorithms•low order polynomial in time and space

Alternatively, the extension matrix approach•Developed by Hong, et al at Univ. of Illinois in 1985•Uses extension matrix as its mathematical basis

A positive example is such an example that belongs to a known class, say ‘Play’

All the other examples can be called negative examples

Positive and Negative Examples

ek+ =(v1k

+ , ..., vak+ )

(overcast, mild, high, windy) => Play

ek− =(v1k

− ,..., vak− )

(rainy, hot, high, windy) => Don’t Play

Negative example matrix is defined as

NEM=(e1− ,..., en

−)T =(rij)nxa

rainy hot high windy

rainy cool normal windy

sunny hot normal windy

sunny mild high windy

⎢⎢⎢⎢

⎥⎥⎥⎥

Negative Example Matrix

⎩⎨⎧

=kijr when, v+

jk NEMij NEMij when, v+

jk NEMij

The extension matrix (EM) of a positive example against NEM, is defined as

EMk =(rijk)nxa, k ={1 ,..., }pdead-element

Extension Matrix

A dead element cannot be used to distinguish a positive example from negative examples

Example Extension Matrix

rainy hot high windy

rainy cool normal windy

sunny hot normal windy

sunny mild high windy

⎢⎢⎢⎢

⎥⎥⎥⎥

Negative Extension Matrix (NEM)

Positive Example

[ ]overcast mild high windy

Example Extension Matrix

rainy hot

rainy cool normal

sunny hot normal

sunny

* *

*

*

* * *

⎢⎢⎢⎢

⎥⎥⎥⎥

Extension Matrix (EM)

Positive Example

[ ]overcast mild high windy

⎟⎟⎟

⎜⎜⎜

∗∗

∗∗

01

10

1

3 2 1X X X

e.g., {X1 1, X2 0, X1 1} and {X1 1, X3 1, X2 0} are paths in the extension matrix above

A set of ‘n’ non-dead elements that come from ‘i’different rows is called a path in an extension matrix

Attributes

Extension matrix

Paths in Extension Matrices

Conjunctive Formulas

A path in the EMk of the positive example k against NEM corresponds to a conjunctive formula or cover

],r[X L ijji

n

1i i≠=

=∧

{r1j 1 ,..., rnjn}

Path: {X 1, X 0, X 1}

Formula: X 1 X 0 X 1

Path: {X 1, X , X 0}

Formula: X 1 X X 0

1 2 1

1 2 1

1 3 2

1 3 2

= = =≠ ∩ ≠ ∩ ≠

= = =≠ ∩ ≠ ∩ ≠

11

A path in the EMD of

against NE corresponds to a conjunctive

formula or cover,

L =∧i=1

n

[Xji ≠riji ], which covers

(e1+ , ..., en

+ )against NE and vice-versa

nxaij)(rEMD=Disjunction Matrix

⎩⎨⎧

=ijr when,

otherwise

∨k2=1k =EMik2

( ,i )j = ( ,NEM i )j

∃k1 ∈ {i1 ,..., ik} :EMk1( ,i )j =∗

{ei1+, ..., eik

+}

{r1j 1, ..., rnjn}

all of

Extension Matrix Disjunction

EMD Example

rainy hot high windy

rainy cool normal windy

sunny hot normal windy

sunny mild high windy

⎢⎢⎢⎢

⎥⎥⎥⎥

Negative Extension Matrix (NEM)

EMD Example

rainy hot

rainy cool normal

sunny hot normal

sunny

* *

*

*

* * *

⎢⎢⎢⎢

⎥⎥⎥⎥

Extension Matrix (EM)

Positive Example

[ ]overcast mild high windy

EMD Example

rainy hot

rainy cool

sunny hot

sunny

* *

* *

* *

* * *

⎢⎢⎢⎢

⎥⎥⎥⎥

Positive Example

[ ]overcast mild normal calm

Extension Matrix Disjunction (EMD)

EMD Example

* * * *

* * *

* * *

* * *

cool

sunny

sunny

⎢⎢⎢⎢

⎥⎥⎥⎥

Positive Example

[ ]rainy hot high calm

Extension Matrix Disjunction (EMD)

MFL and MCV (1)

The minimum formula problem (MFL) Generating a conjunctive formula that covers a

positive example or an intersecting group of positive examples against NEM and has the minimum number of different conjunctive selectors

The minimum cover problem (MCV) Seeking a cover that covers all positive

examples in PE against NEM and has the minimum number of conjunctive formulae with each conjunctive formula being as short as possible

MFL and MCV (2)

NP-hard

Two complete algorithms are designed to solve them when each attribute domain Di {i 1,…,a} satisfies |Di| 2O(na2a) for MFLO(n2a4a pa24a) for MCV

When |Di| 2, the domain can be decomposed into several, each having base 2

HCV is a extension matrix based rule induction algorithm which is Heuristic Attribute based Noise tolerant

Divides the positive examples into intersecting groups.

Uses HFL heuristics to find a conjunctive formula which covers each intersecting group.

Low order polynomial time complexity at induction time

What is HCV ?

HFL finds a heuristic conjunctive formula which corresponds to a path in an extension or disjunction matrix

Consists of 4 strategies, applied in turn Time complexity of O(na3)

What is HFL ?

HFL - Fast Strategy

Selector [X5 {normal, dry-peep}] can be a possible selector, which will cover all 5 rows

⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜

∗−

∗−∗∗

normalfastmediumlow

peepdryfastspotslightabsent

normalstripslightlow

peepdryfasthale

normalstripslightabsent

⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜

∗∗∗

∗∗

∗∗∗

1

01

10

01

10

1

⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜

∗∗∗

∗∗

∗∗∗

1

01

10

01

10

1

HFL - Precedence

Selector [X1 1] and [X3 1] are two inevitable selectors in the above extension matrix

⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜

∗∗∗

∗∗

∗∗∗

1

01

10

01

10

1

HFL - Elimination

Attribute X2 can be eliminated by X3

⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜

∗∗∗∗∗∗

∗∗∗

01

101

10

101

010

11

HFL - Least Frequency

Attribute X1 can be eliminated and there still exists a path

⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜

∗∗

∗∗

∗∗

10

01

10

01

10

11

HFL Algorithm (1)

Procedure HFL(EM; Hfl) S0: Hfl {} S1: /* the fast strategy */ Try the fast strategy on all these rows which haven't

been covered; If successful, add a corresponding selector to Hfl

and return(Hfl)S2: /* the precedence strategy */ Apply the precedence strategy to the uncovered

rows; If some inevitable selectors are found,

add them to Hfl, label all the rows they cover, and go to S1

HFL Algorithm (2)

S3: /* the elimination strategy */ Apply the elimination strategy to those attributes

that have neither been selected nor eliminated; If an eliminable selector is found, reset all the elements

in the corresponding column with *, and go to S2. S4: /* the least frequency strategy */ Apply the least frequency strategy to those attributes

which have neither been selected nor eliminated, and find a least frequency selector;

Reset all the elements in the corresponding column with *, and go to S2.

Return(Hfl)

HCV Algorithm

HCV:

partitions the PEs into intersecting groupscalls HFL to find the Hfl for each groupbuilds covering formula by doing a logical OR of

the Hflsreturns the covering formula Hcv

Complexity of HCV

Worst case time complexity

Space requirement 2na

(O (na+ (2na +na+na+1) +(na3

j=i+1

p

∑i=1

p

∑ ) +1))

≈ (O pna3 +p2 )na

HCV Example

HCV Example

absent slight strip normal normal

high heavy hole fast dry peep

low slight strip normal normal

absent slight spot fast dry peep

low medium flack fast normal

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

NEM for Pneumonia

HCV Example

absent slight strip normal

hole fast dry peep

low slight strip normal

absent slight spot fast dry peep

low medium fast normal

*

* *

*

*

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

EM1

Positive Example 1

[ ]high heavy flack normal bubble like−

HCV Example

absent slight strip normal

high hole fast dry peep

low slight strip normal

absent slight spot fast dry peep

low medium fast normal

*

*

*

*

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

EM2

Positive Example 2

[ ]medium heavy flack normal bubble like−

HCV Example

absent strip normal

high heavy hole fast

strip normal

absent fast

medium flack fast normal

* *

*

* * *

* * *

*

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

EM3

Positive Example 3

[ ]low slight spot normal dry peep−

HCV Example

absent slight strip normal

heavy hole fast dry peep

low slight strip normal

absent slight spot fast dry peep

low fast normal

*

*

*

* *

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

EM4

Positive Example 4

[ ]high medium flack normal bubble like−

HCV Example

absent strip normal

high heavy hole fast dry peep

low strip normal

absent spot fast dry peep

low medium fast normal

* *

* *

*

*

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

EM5

Positive Example 5

medium slight flack normal bubble − like[ ]

HCV Example

EM1 EM2∩

absent slight strip normal

hole fast dry peep

low slight strip normal

absent slight spot fast dry peep

low medium fast normal

*

* *

*

*

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

HCV Example

EM1 EM2 EM3∩

absent strip normal

hole fast

strip normal

absent fast

medium fast normal

* *

* * *

* * *

* * *

* *

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

HCV Example

EM1 EM2 EM3 EM4∩

absent strip normal

hole fast

strip normal

absent fast

fast normal

* *

* * *

* * *

* * *

* * *

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

∩∩

HCV Example

EM1 EM2 EM3 EM4 EM5∩

absent strip normal

hole fast

strip normal

absent fast

fast normal

* *

* * *

* * *

* * *

* * *

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

∩∩ ∩

HCV Example

HFL Step 1: Fast Strategy

absent strip normal

hole fast

strip normal

absent fast

fast normal

* *

* * *

* * *

* * *

* * *

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

HFL Rules = {}

HCV Example

HFL Step 2: Precedence

absent strip normal

hole fast

strip normal

absent fast

fast normal

* *

* * *

* * *

* * *

* * *

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

HFL Rules = {}

HCV Example

HFL Step 3: Elimination

absent strip normal

hole fast

strip normal

absent fast

fast normal

* *

* * *

* * *

* * *

* * *

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

HFL Rules = {}

HCV Example

absent strip normal

hole fast

strip normal

absent fast

fast normal

* *

* * *

* * *

* * *

* * *

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

HFL Rules = {}

HFL Step 4: Least-Frequency

HCV Example

HFL Step 4: Least-Frequency

* * *

* * *

* * *

* * * *

* * *

strip normal

hole fast

strip normal

fast

fast normal

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

HFL Rules = {}

HCV Example

HFL Step 2: Precedence

* * *

* * *

* * *

* * * *

* * *

strip normal

hole fast

strip normal

fast

fast normal

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

HFL Rules = {ESR fast }≠

HCV Example

HFL Step 2: Precedence

* * *

* * * * *

* * *

* * * * *

* * * * *

strip normal

strip normal

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

HFL Rules = {ESR fast }, go to S1

HCV Example

* * *

* * * * *

* * *

* * * * *

* * * * *

strip normal

strip normal

⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥

HFL Step 1: Fast Strategy

HFL Rules = {ESR fast, AUSCULTATION normal }≠

* * strip * *

* * * * *

* * strip * *

* * * * *

* * * * *

⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎥ ⎥ ⎥ ⎥ ⎥ ⎥

HCV Example

HFL Step 1: Fast Strategy

HFL Rules = {ESR fast , AUSCULTATION normal }

≠≠

HCV Example

HCV generated rule

C4.5rules generated rule

Example (8)

HCV Noise Handling

Don’t care values are dead elements

Approximate partitioning partitioning of PE into groups can be approximate

rather than strict

Stopping criteria similar to -c option of C4.5

Real-Valued Attributes

HCV uses the Information Gain Heuristic

Stop splitting criteria Stop if the information gain on all cut points is the

same. Stop if the number of examples to split is less than a

certain number. Limit the total number of intervals.

Comparison (1)

Tr aining Set 1 Tr aining Set 2 Tr aining Set 3Algorithm

rules conditions rules conditions rules conditionsID3 53 216 105 498 30 98

C4.5 60 262 113 566 27 89

C4.5 with grouping 9 31 55 353 20 102

C4.5 Rules 31 101 97 374 23 65

C4.5rules with grouping 8 19 46 188 11 35

NewID 21 143 59 401 18 101

HCV 7 16 39 168 18 62

Table 1: Number of rules and conditions using Monk 1, 2 and 3 dataset as training set 1, 2 and 3 respectively

Comparison (2)

Table 2: AccuracyAlgorithm Test Set 1 Test Set 2 Test Set 3

ID3 83.3% 68.3% 94.4%

C4.5 82.4% 69.7% 90.3%C4.5 with grouping 100% 82.4% 93.1%

C4.5 Rules 92.4% 75.7% 85.4%

C4.5rules with grouping 100% 81.0% 91.4%NewID 93% 78% 89%

HCV 100% 81.7% 90.3%

Comparison (3)

Conclusions

Rules generated in HCV take the form of variable-valued logic rules, rather than decision trees

HCV generates very compact rules in low-order polynomial time

Noise handling and discretization

Predictive accuracy comparable to the ID3 family of algorithms viz., C4.5, C4.5rules

a Number of attributesXa ath attributee Vector of positive examplese– Vector of negative examples

Value of ath attribute in the kth positive examplen Number of negative examplesp Number of positive examples(rij)axb ijth element of axb matrix

A(i,j) ijth element of matrix A

Extension Matrix Terminology

+akv