Soft Computing For Controle

Soft Computing For Controle

Evolving Fuzzy Rules with Genetic Programming and Clustering

The transformation of an highly accurate opaque model to a comprehensible model.

Genetic programming Black box Arbitary representation and fitness function

Balances Accuracy and Comprehensibility

G-REX (Previous work)

Accept Reject

IF Salary > 5000

IF Age > 25

Reject

X1 X2 Y

1 4 1

4 3 1

2 1 14 5 0

5 2 0

X1 X2 Z

1 4 1

4 3 1

2 1 04 5 0

5 2 0

GP - Crossover

Background

Evolving Fuzzy Decision Trees With Genetic Programming and Clustering

J. Eggermont, (2001) Automatic fuzzyfication using K-Means Genetic Programming Fuzzy Representation

Membership functions

Three types of membership function Distances does not need to be equal Based on medioids/centroids


K-means

Most frequently used clustering method Fast, deterministic and easy to implement.

J.B MacWueen (1967) K- stand for the number of clusters

Each cluster is represented by one membership function A cluster is represented by a centroid.

The mean value of the members An instance belongs to the closest centroid

1 Euclidian distance

2 The new centroid is

the mean of its members

3

Recalculate members Repeat until no change

Kaufmans Initialization

Step 1. The instance closest to the mean valueStep 2-3 Choose a instance far away from the other medioids with many

instance close by.

K-Means is sensitive to the initialization method Pêna J.M. Lozano J. A. and Larranga P. (1999)

An Empirical Investigation of Four Initialization Methods for the K-Means Algorithm

Three types of membership function Distance does not need to be equal Based on medioids


GP Representation

All variables with less than k unique values are treated as crisp sets.

Representation

Calculating membership values

Fitness function

2

1 1

11 ( )r n

Brier ij ijj i

fitness f En

Not precise enough

Reward is equal to the membership Value for the correctly predicted instance

1- the MSE of each membership function

Experiments

5 classification datasets Only continuous variables

IRIS, WINE Categorical and continuous

COLIC, CLEAVLAND, PIMA 10-fold cross validation Stratification Fuzzy GP vs standard GP (if rules) Evaluated against

Accuracy (ACC) Area under ROC-curve (AUC) Brier Score (BRI)

Results

DatasetFuzzy

ACC AUC BRI Train Test Train Test Train Test Size

IRIS 96.8 96.0 99.5 99.1 7.8 8.7 7.4 CLEAVLAND 76.2 75.8 81.3 82.6 37.7 36.7 8.6 WINE 89.4 90.4 97.6 98.1 15.3 13.9 9.0 COLIC 67.2 66.3 66.1 64.6 48.4 50.5 9.0 PIMA 89.4 90.4 97.6 98.1 15.3 13.9 9.0

DatasetIF

ACC AUC BRI Train Test Train Test Train Test Size

IRIS 96.2 93.3 98.1 96.5 7.0 12.2 11 CLEAVLAND 76.8 72.5 76.6 72.4 35.6 40.3 6 WINE 90.8 87.5 95.6 91.2 16.3 22.6 11 COLIC 81.5 81.5 81.9 77.0 28.9 29.2 6 PIMA 90.8 87.5 95.6 91.2 16.3 22.6 11

Iris

Wine

Horse Colic

PIMA Diabetes

Cleveland (Heart disease)

Disscussion

Current membership function removes information from the variable A way to handle outliers

Some extremely simply if rules are better for some dataset. Categorical variables Should not be used as only method

Easy to remember rules but how accurate will they be as a decision support?

Gives a comprehensible explanation that could ad trust and there by improve predictions.

Future work

Alternative membership function

Fuzzy regression

?

Date post:	23-Feb-2016
Category:	Documents
Upload:	gizela
View:	28 times
Download:	0 times

Soft Computing For Controle

Documents