Soft Computing For Controle
Evolving Fuzzy Rules with Genetic Programming and Clustering
The transformation of an highly accurate opaque model to a comprehensible model.
Genetic programming Black box Arbitary representation and fitness function
Balances Accuracy and Comprehensibility
G-REX (Previous work)
Accept Reject
IF Salary > 5000
IF Age > 25
Reject
X1 X2 Y
1 4 1
4 3 1
2 1 14 5 0
5 2 0
X1 X2 Z
1 4 1
4 3 1
2 1 04 5 0
5 2 0
GP - Crossover
Background
Evolving Fuzzy Decision Trees With Genetic Programming and Clustering
J. Eggermont, (2001) Automatic fuzzyfication using K-Means Genetic Programming Fuzzy Representation
Membership functions
Three types of membership function Distances does not need to be equal Based on medioids/centroids
Membership functions
K-means
Most frequently used clustering method Fast, deterministic and easy to implement.
J.B MacWueen (1967) K- stand for the number of clusters
Each cluster is represented by one membership function A cluster is represented by a centroid.
The mean value of the members An instance belongs to the closest centroid
1 Euclidian distance
2 The new centroid is
the mean of its members
3
Recalculate members Repeat until no change
Kaufmans Initialization
Step 1. The instance closest to the mean valueStep 2-3 Choose a instance far away from the other medioids with many
instance close by.
K-Means is sensitive to the initialization method Pêna J.M. Lozano J. A. and Larranga P. (1999)
An Empirical Investigation of Four Initialization Methods for the K-Means Algorithm
Three types of membership function Distance does not need to be equal Based on medioids
Membership functions
GP Representation
All variables with less than k unique values are treated as crisp sets.
Representation
Calculating membership values
Fitness function
2
1 1
11 ( )r n
Brier ij ijj i
fitness f En
Not precise enough
Reward is equal to the membership Value for the correctly predicted instance
1- the MSE of each membership function
Experiments
5 classification datasets Only continuous variables
IRIS, WINE Categorical and continuous
COLIC, CLEAVLAND, PIMA 10-fold cross validation Stratification Fuzzy GP vs standard GP (if rules) Evaluated against
Accuracy (ACC) Area under ROC-curve (AUC) Brier Score (BRI)
Results
DatasetFuzzy
ACC AUC BRI Train Test Train Test Train Test Size
IRIS 96.8 96.0 99.5 99.1 7.8 8.7 7.4 CLEAVLAND 76.2 75.8 81.3 82.6 37.7 36.7 8.6 WINE 89.4 90.4 97.6 98.1 15.3 13.9 9.0 COLIC 67.2 66.3 66.1 64.6 48.4 50.5 9.0 PIMA 89.4 90.4 97.6 98.1 15.3 13.9 9.0
DatasetIF
ACC AUC BRI Train Test Train Test Train Test Size
IRIS 96.2 93.3 98.1 96.5 7.0 12.2 11 CLEAVLAND 76.8 72.5 76.6 72.4 35.6 40.3 6 WINE 90.8 87.5 95.6 91.2 16.3 22.6 11 COLIC 81.5 81.5 81.9 77.0 28.9 29.2 6 PIMA 90.8 87.5 95.6 91.2 16.3 22.6 11
Iris
Wine
Horse Colic
PIMA Diabetes
Cleveland (Heart disease)
Disscussion
Current membership function removes information from the variable A way to handle outliers
Some extremely simply if rules are better for some dataset. Categorical variables Should not be used as only method
Easy to remember rules but how accurate will they be as a decision support?
Gives a comprehensible explanation that could ad trust and there by improve predictions.
Future work
Alternative membership function
Fuzzy regression
?