Binary Fish School Search applied to Feature Selection
João André Gonçalves Sargo
Thesis to obtain the Master of Science Degree in
Mechanical Engineering
Examination Committee
Chairperson: Prof. João Rogério Caldas Pinto
Supervisor: Prof. João Miguel da Costa Sousa
Members of the Committee: Prof. Jorge dos Santos Salvador Marques
Prof. Susana Margarida da Silva Vieira
November 2013
À minha mãe
V
Abstract
The aim of the present work is to develop efficient feature selection approaches. The
problem regarding the increasingly larger accumulation of data is presented, where feature
selection emerges as a promising solution. Despite the variety of feature selection methods, few
of them are able to guarantee a good performance, especially in high dimensional databases.
A novel wrapper methodology applied to feature selection is formulated based on the
Fish School Search (FSS) optimization algorithm, intended to cope with premature convergence.
The FSS was originally designed with a real encoding scheme for searching high-dimensional
spaces based in fish schools behaviour. In order to use this population based optimization
algorithm in feature selection problems, the use of binary encoding for the internal mechanisms
of the fish school search is proposed, emerging the binary fish school search (BFSS).
The proposed algorithm, as well as other state of the art feature selection methods such
as Sequential Forward Selection (SFS) and Binary Particle Swarm Optimization (BPSO), were
combined with fuzzy modelling in a wrapper approach and tested over two databases, a
benchmark and an ICU (intensive care unit) database. The purpose of using this last database
was to predict the readmission of ICU patients 24 to 72 hours after being discharged. Several
statistical measures were considered to characterise the patient stay, including the Shannon
entropy and the weighted mean.
The results obtained by comparing the performance measures and the number of
features selected of the used algorithms, show promising results for the novel algorithm BFSS.
Keywords: Feature Selection, Binary Encoding, Fish School Search, ICU, Readmissions
VI
VII
Resumo
O presente trabalho visa o desenvolvimento de abordagens eficientes para o problema
de seleção de variáveis. A questão da crescente quantidade de informação acumulada é
debatida, para o qual a noção de seleção de variáveis se estabelece como uma solução
promissora. Apesar da grande variedade disponível, poucos são os métodos capazes de garantir
alta precisão, sobretudo em bases de dados de grande dimensão.
Neste sentido, uma nova metodologia foi aqui formulada com base no algoritmo de
otimização Fish School Search, destinado a lidar com a convergência prematura das soluções.
Este método, originalmente desenvolvido com um esquema de codificação em números reais,
pesquisa espaços de alta dimensão baseando-se no comportamento de cardumes. De forma a
utilizar este algoritmo de otimização em problemas de seleção de variáveis, foi proposto o uso
de um esquema de codificação binária para os seus mecanismos internos, surgindo o Binary
Fish School Search (BFSS).
O algoritmo aqui proposto, bem como outros métodos, Sequential Forward Selection e
Binary Particle Swarm Optimization, foram conciliados com modelação fuzzy numa abordagem
wrapper e testados em duas bases de dados, uma de benchmark e outra de uma unidade de
cuidados intensivos (UCI). Esta última foi utilizada de modo a prever a readmissão de pacientes
após alta. Foram consideradas várias medidas estatísticas para caracterizar a sua estadia,
incluindo a entropia de Shannon e a média ponderada.
Os resultados obtidos, através da comparação da precisão e do número de variáveis
selecionadas dos vários algoritmos usados, mostram resultados promissores para o novo
algoritmo BFSS.
Palavras-chave: Seleção de Variáveis, Codificação binária, UCI, Readmissões
VIII
IX
Acknowledgements
My first words of appreciation are to my supervisors Professor João Sousa and Dr. André
Fialho, for all the help, support, and for the opportunity to work in this field of research. A
special thanks to Professor Susana Vieira, for all the support and guidance when I needed the
most. Her expert knowledge in the field of this work was a great contribution. I couldn't forget
to thank Professor Carmelo J. A. Bastos Filho and Débora N. de Oliveira Nascimento for the help
and to their readiness in clarifying doubts.
I would like thank my colleagues at Health Care IST workgroup for their help, support
and for the inspirational breaks.
To my parents, for all the love, hard work, dedication and for always being with me. A
special thanks to my mother, her sacrifice made all this possible.
I would also like to thank to all the friends, in special to António Ramos, Magno Mendes
and Renato Ribeiro for all the willingness, happiness and all the moments of true friendship. A
special thanks Marta Ferreira, for her encouragement and inestimable support during all this
work and until the very last moment.
X
XI
Contents
Abstract ......................................................................................................................................... V
Resumo ........................................................................................................................................ VII
Acknowledgements ...................................................................................................................... IX
List of Figures .............................................................................................................................. XV
List of Tables .............................................................................................................................. XVII
Notation ..................................................................................................................................... XIX
1 Introduction ............................................................................................................................... 1
1.1-Knowledge Discovery .......................................................................................................... 1
1.1.1-Principles of Feature Selection .................................................................................... 3
1.1.2- Modeling ..................................................................................................................... 4
1.2 Prediction of readmissions .................................................................................................. 6
1.3 Contributions ....................................................................................................................... 6
1.4 Outline ................................................................................................................................. 7
2 Knowledge Data Discovery ........................................................................................................ 9
2.1-DATA ................................................................................................................................... 9
2.1.1-Benchmark database – Sonar ...................................................................................... 9
2.1.2- MIMIC II database ..................................................................................................... 10
2.2-Modeling ........................................................................................................................... 16
2.2.1 Fuzzy modeling ........................................................................................................... 16
2.2.2 Clustering ................................................................................................................... 17
2.2.3 Performance Measures .............................................................................................. 18
2.3- Feature Selection ............................................................................................................. 20
2.3.1- Sequential Forward Selection ................................................................................... 20
2.3.2- Binary Particle Swarm Optimization ......................................................................... 22
3 Fish school search .................................................................................................................... 25
3.1- Original Fish school search ............................................................................................... 25
3.1.1-Search Problems and Algorithms ............................................................................... 26
XII
3.1.2-FSS Computational Principles .................................................................................... 26
3.1.3-Overview of the algorithm ......................................................................................... 27
3.1.4-The Feeding Operator ................................................................................................ 27
3.1.5-The Swimming Operators .......................................................................................... 28
3.1.6-Individual Movement ................................................................................................. 28
3.1.7-Collective-Instinctive Movement ............................................................................... 29
3.1.8-Collective- Volitive Movement ................................................................................... 30
3.1.9-FSS Cycle and Stop Conditions ................................................................................... 31
3.1.10-Illustrative Example .................................................................................................. 32
3.2- Decimal to binary Fish school search ............................................................................... 36
3.2.1-Objective function ...................................................................................................... 36
4 Binary Fish School Search ........................................................................................................ 37
4.1- Encoding ........................................................................................................................... 37
4.2-Initialization ...................................................................................................................... 38
4.3-Individual Movement ........................................................................................................ 38
4.4-Collective-Instinctive Movement ...................................................................................... 38
4.5-Collective-volitive Movement ........................................................................................... 40
4.5-Objective function ............................................................................................................ 41
4.7-Parameters ........................................................................................................................ 42
4.8- BFSS cycle and stop condition. ......................................................................................... 42
5 Results ...................................................................................................................................... 43
5.1-Description of the approach ............................................................................................. 43
5.2-Optimization Parameters .................................................................................................. 44
5.3-Sonar database ................................................................................................................. 46
5.3.1-Sequential Forward Selection .................................................................................... 46
5.3.2 Decimal to Binary Fish School Search ........................................................................ 47
5.3.3 Binary Fish School Search ........................................................................................... 49
5.3.4 Binary Particle Swarm Optimization .......................................................................... 53
5.3.5 Comparison of Feature Selection Methods ............................................................... 54
5.4-Readmission database ...................................................................................................... 55
5.4.1 Sequential Forward Selection .................................................................................... 56
5.4.2 Binary Fish School Search ........................................................................................... 56
5.4.3 Binary Particle Swarm Optimization .......................................................................... 59
5.4.4 No feature selection ................................................................................................... 61
XIII
5.4.4 Discuss ........................................................................................................................ 62
6 Conclusion ................................................................................................................................ 63
6.1- Binary Fish School Search................................................................................................. 63
6.2- Prediction of readmissions ............................................................................................... 64
6.3- Future work ...................................................................................................................... 65
Bibliography ................................................................................................................................ 67
Appendix ........................................................................................................................................ i
A Extended Results - sonar database ....................................................................................... ii
B Study of parameters - MIMICII database ............................................................................ viii
XIV
XV
List of Figures
1.1 Knowledge discovery process ...................................................................................................................... 2
2.1 Patient selection flowchart. .........................................................................................................................12
2.2 Number of patients vs number of samples.. ........................................................................................14
2.3 Representation of the different gradients for the weighted mean. ............................................15
2.4 Illustrative diagram of the inputs for the classification models. ...................................................15
2.5 Example of ROC curve. ..................................................................................................................................19
2.6 Decoding process for D2BFSS ....................................................................................................................22
3.1 FSS-Individual movement ............................................................................................................................29
3.2 FSS-Collective-instinctive movement ......................................................................................................30
3.3 FSS-Collective-volitive movement ............................................................................................................31
3.4 FSS-example ......................................................................................................................................................34
3.5 FSS-Fish school evolution ............................................................................................................................35
4.1 Parameters of FSS, D2BFSS and BFSS ......................................................................................................42
5.1 Diagram of the feature selection process ..............................................................................................44
5.2 Evolution of SFS – sonar database............................................................................................................47
5.3 Evolution of B2DFSS – sonar database ...................................................................................................49
5.4 Representation of the study of parameters – BFSS – sonar database. ......................................50
5.5 Representation of the study of parameters – BFSS – sonar database - Continuation.. .......50
5.6 Representation of the study of parameters – BFSS – sonar database - Continuation. ........51
5.7 Representation of the study of parameters – BFSS – sonar database - Continuation. ........51
5.8 Representation of the study of parameters – BFSS – sonar database - Continuation. ........51
5.9 Representation of the study of parameters – BFSS – sonar database - Continuation. ........52
5.10 Evolution of BFSS – sonar database ......................................................................................................53
5.11 Evolution of BPSO – sonar database ........................................................................................................54
5.12 Representation of the study of parameters – BFSS – readmission database ..........................57
5.13 Evolution of BFSS – readmission database ............................................................................................59
5.14 Evolution of BPSO – readmission database ..........................................................................................61
XVI
XVII
List of Tables
2.1: List of physiological variables considered. .................................................................................................11
2.2: Physiological limits ...............................................................................................................................................13
2.3: Summary of the number of samples and patients ..................................................................................14
2.4: Number of patients readmitted and not readmitted .............................................................................14
3.1: FSS - example .........................................................................................................................................................33
5.1: Model assessment results – SFS – sonar database. ...............................................................................46
5.2: Results of the study the parameters: stepind and stepvol – D2BFSS ...................................................48
5.3: Results of the study the parameters: Wscale and α – D2BFSS ...............................................................48
5.4: Model assessment results – D2BFSS – sonar database. ........................................................................49
5.5: Configuration of the parameters thres_c and tresh_v tested – BFSS. ..............................................50
5.6: Results of the study the parameters: stepind – BFSS ................................................................................52
5.7: Model assessment results – BFSS – sonar database. ..............................................................................52
5.8: Model assessment results – BPSO - sonar database. .............................................................................53
5.9- Comparison between FS algorithms - sonar database. ........................................................................54
5.10: Model assessment results - SFS - readmission datasets ....................................................................56
5.11: Configuration of the parameters thres_c and tresh_v –BFSS ............................................................56
5.12: Results of the study the parameters: stepind –BFSS ...............................................................................58
5.13: Model assessment results – BFSS – readmission datasets .................................................................58
5.14: Results of the study the parameters: α –BPSO .......................................................................................60
5.15: Model assessment results – BPSO – readmission datasets ...............................................................60
5.16: Model assessment results – NO FS – readmission datasets. .............................................................61
5.17: Comparison between FS – readmission dataset ....................................................................................62
A.1: Results of the study the parameters: thres_c and thres_v - Tests 1-3 .............................................. ii
A.2: Results of the study the parameters: thres_c and thres_v - Tests 4-7 ............................................. iii
A.3: Results of the study the parameters: thres_c and thres_v - Tests 8-11 .......................................... iv
A.4: Results of the study the parameters: thres_c and thres_v - Tests 12-14 ......................................... v
A.5: Results of the study the parameters: thres_c and thres_v - Tests 15-17 ........................................ vi
A.6: Results of the study the parameters: thres_c and thres_v - Test 18 ............................................... vii
XVIII
B.1: Results of the study the parameters: thres_c and thres_v - Tests 1-2 ........................................... viii
B.2: Results of the study the parameters: thres_c and thres_v - Tests 3-5 ............................................. ix
XIX
Notation
Symbols
N f – Number of features selected
N t –Total number of features
N s – Number of data samples
y – - System output
x – Data sample
X – - Database
¯y – - Model output
Υ – Set of possible labels or classes
υ – Class
l – Number of existing labels
D – Decision region
τ – Threshold
w ij – Weight factor
σ j – Activation function
A i – Fuzzy set of the antecedent
B i – Fuzzy set of the consequent
R i – Fuzzy rule
R – Number of rules
µ A (x), µ B (y) – Menmbership functions
µ R (x, y) – Fuzzy relation
f i – Mapping function of the i th rule
β i – Degree of fulfullment of rule i
vi – Cluster center
J – Objective function
U – Fuzzy partition matrix
V – Matrix of cluster prototypes
XX
ρ i – Constant that controls volume of cluster i
π i – Parameter vector for the i t h rule
θ – Matrix of model parameters
ψ – Weighted vector of inputs
Ψ – Matrix of vector inputs
P(x, y) – Probability distribution function
R[f] – Risk or expected loss of f
F – Features space
φ – Non-linear mapping function
x 0 – Subset of features
0 | x) – Relevance index
p – Parameter value
g i – Value of bit i
u – Number of bits used
Φ – Disturbance associated with real data
O – Objective
C i (v) – Constraint function
N e – Number of missclassifications
N s – Total number of tested samples
$(x i ) – Cumulative relative fitness
p tour – Tournament selection probability
p c – Crossover probability
p u – Uniform crossover probability
p mut , r mut – Mutation probability
S – Particle swarm
V i – Visibility sphere of particle i
vi – Particle velocity
xi – Particle position
ai – Particle acceleration
Δt – Time step
oi – Particle cohesion
li – Particle alignment
si – Particle separation
C o , C l , C s – Acceleration weighting factors
S(v ij ) – Logistic function of the velocity
v max – Particle velocity threshold
XXI
w up – Upper bound weight
w low – Lower bound weight
Stepind –step individual parameter
Stepvol -step volative parameter
t -iteration
gactual -atual iteration
gtotal -total number of iterations
thres_c -collective threshold parameter
thres_v -volative threshold parameter
Acronyms
ACC - Accuracy
AUC – Area Under the Curve
BFSS – Binary Fish School Search
BPSO – Binary Particle Swarm Optimization
D2BFSS – Decimal to Binary Fish School Search
FCM – Fuzzy C-Means
FM – Fuzzy Modelling
FN – False Negatives
FP – False Positives
FS – Feature Selection
FSS – Fish School Search
GA – Genetic Algorithm
ICU – Intensive Care Unit
IOM – Institute of Medicine
IQR - Interquartile Range
KDD – Knowledge Data Discovery
MA – Model Assessment
NAE – National Academy of Engineering
NN – Neural Networks
XXII
NP-hard - Non-deterministic Polynomial-time hard
PSO – Particle Swarm Optimization
ROC – Receiver Operating Characteristic
SA – Simulated Annealing
SFS – Sequential Forward Selection
TN – True Negatives
TS – Takagi-Sugeno
TP – True Positives
1
Chapter 1
Introduction
Over the past 30 years, the continuing development and application of systems engineering
methods has enabled an unprecedented growth in the manufacturing, logistics, distribution, and
transportation sectors of our economy [35]. Vast business organizations (e.g. airline companies, chains
of department stores, large manufacturing companies), could not properly operate in the current
business environment without the extensive use of various engineering tools for design, analysis and
control of complex production and distribution systems [36]. However, even with the emergence and
development of new engineering techniques over the past years, some industries have barely begun
to take advantage of systems engineering tools, which means that there is a great number of potential
unexplored applications for them.
A good example is the health care delivery system, one of the most technologically intense
and data-rich industries [32]. According to a report from the National Academy of Engineering (NAE)
and the Institute of Medicine (IOM), the application of systems engineering tools could play a crucial
role in solving the current crisis in the very complex health care system [47].
With the computerization of many sectors and with the advances in data collection tools, our
capabilities of both generating and collecting data have been increasing rapidly in the last several
decades. This explosive growth in stored data has generated an urgent need for new techniques and
automated tools that can intelligently assist us in transforming the vast amounts of data into useful
information and knowledge [26].
This chapter begins with a brief overview of methods currently used in knowledge discovery in
databases. Further, the case of study will be introduced, the prediction of readmissions in an intensive
care unit (ICU). In the end of the chapter, the contributions and outline of this work are presented.
1.1-Knowledge Discovery
The information age is very hard to grasp. In an average person’s life nowadays, we get more
information in a day than someone who lived 100 years ago would get in a lifetime. The speed at
2
which information is increasing means that finding accurate data is becoming more important than the
data itself [42].
The traditional method of turning data into knowledge relies on manual analysis and
interpretation. In the health care industry, this form of manual probing of a data set is slow, expensive
and highly subjective. With the urgent need for a new generation of computation techniques and tools
to assist humans in extracting useful information (knowledge) from a fast growing volume of data, a
methodology was created, the Knowledge Data Discovery (KDD), first introduced by Fayyad in 1996
[14].
The KDD process can be formally defined as a non-trivial process of identifying valid, novel,
potentially useful, and ultimately understandable patterns in large amounts of data.
The KDD process can be decomposed into five main steps, as illustrated in Fig. 1.1:
1. Data acquisition – The process of acquiring and storing data.
2. Data Preprocessing – Consists of applying proper techniques that allow the improvement of the
overall quality of the data. Includes processing of noise/outliers, correction of missing values,
and/or alignment of data sampled at different frequencies.
3. Feature Selection – Consists of finding useful features (variables) to represent the data and
discarding the non-relevant ones, containing redundant information.
4. Modeling – Refers to the process of combining methods from computational intelligence and/or
statistics to extract patterns in data sets. In this work it was used classification models, which
identifying to which of a set of categories (classes) a new observation belongs, on the basis of
a training set of data containing observations whose category membership is known
5. Interpretation - The process of evaluating the discovered knowledge with respect to its validity,
usefulness, novelty, and simplicity. External expertise may be required in this step.
All of the five steps described are equally crucial, and the process is iterative, i.e. multiple loops
can occur between any steps of the KDD method.
In real-world systems, the selection of a low number of features that consistently describe the
problem is usually time consuming and, in many cases, impossible to achieve with a greedy approach.
In this work, the focus is turned to the feature selection stage of the KDD method. A novel approach is
Figure 1.1: Knowledge discovery process.
3
proposed for the optimization algorithm Fish School Search (FSS), in order to use this population
based algorithm in problems of feature selection.
1.1.1-Principles of Feature Selection
The addition of more features (variables) is expected to increase the accuracy of the model
(classifier). However, for some classifiers an increase in input dimensionality decreases the reliability of
statistical parameter estimations and may, consequently, result in a decrease in the classification
accuracy [43]. This is known as the Hughes effect [29], the so-called curse of dimensionality, which
postulates that the classification accuracy will decrease after a certain feature-set size is reached unless
the number of training samples is proportionally increased [43]. The Hughes effect is therefore more
likely to be encountered when small training sets are used and the input dimensionality is increased.
The field of feature selection has been object of extensive research in recent years [41]. This is
explained due to the potential benefits introduced when reducing data dimensionality. It can greatly
improve data visualization and understanding, facilitating knowledge discovery. Furthermore, one
needs to measure and store less information leading to a reduction in equipment, and consequently
cutting unnecessary costs. From the clinical point of view, this process may bring to light new variables
that had not been previously considered as relevant for a given medical problem.
Feature selection algorithms can be grouped into four categories: filters, wrappers, hybrids and
embedded [52, 23]. Filter methods rely on general characteristics of the data to evaluate and select
feature subsets without involving any mining algorithm. Some examples include using measurements
of entropy, variance, correlation or mutual information of single and multiple variables [53]. Wrappers
require one predetermined mining algorithm and use its performance as the evaluation criterion. They
search for the features best suit to improve the performance of the mining algorithms, but they also
tend to be more computationally expensive than filters [56]. Some of the most commonly used
wrapper methods include best-first, branch-and-bound, simulated annealing, genetic algorithms,
forward selection or backward elimination, but the list is considerably longer and continuously
growing. The present thesis introduces a new wrapper method, the Binary Fish School Search (BFSS)
algorithm.
Hybrid models attempt to take advantage of the two previous types of models by exploiting
their advantages in different stages [22]. First, a filter decreases the dimensionality of data by
eliminating features according to the specified criteria. Then, wrappers select relevant features
according to the mining objective.
Finally, embedded methods differ from the previous feature selection methods in the way
feature selection and learning interact. In contrast to filter and wrapper approaches, the learning and
4
feature selection parts cannot be separated in embedded methods [23]. Examples of embedded
methods for feature selection include decision trees and random multinomial.
Many problems related to Feature Selection (FS) have been shown to be NP-hard and finding
the optimal set of features is usually intractable [6,30].Thus, the search of the most predictive feature
subsets can be seen as an optimization problem. Metaheuristics are general upper level (meta)
algorithmic techniques, that can be used as guiding strategies in the design of heuristics to solve
specific optimization problems. These techniques are capable of finding acceptable solutions, within
reasonable time, by using experience-based techniques or through guided search, but do not
guarantee that the optimum will be found. Popular metaheuristics for combinatorial problems
include simulated annealing (SA) by Kirkpatrick [34] genetic algorithms (GA) [27] Scatter Search
[19], Tabu Search [20], and Particle Swarm optimization (PSO).
The present thesis resorted to: 1) a new FS wrapper method based on the new-found
metaheuristic Fish school Search optimisation, based on fish school behaviour [17], 2) a PSO algorithm
modified to be used in FS problems [15] and 3) a wrapper method based on Tree search feature
selection: the Sequential Forward Selection (SFS). The three algorithms were tested in a benchmark
database before being applied to a problem in the health care system.
1.1.2- Modeling
Classification modeling, used in the data mining process, can be defined as the application of
discovery algorithms that produce a particular enumeration of patterns/models over the data.
The usefulness of a model is to mimic how a particular object or phenomenon will behave in a
particular condition. It can be used for testing, analysis or training, in conditions where real-world
systems or concepts can be represented by a model [54].
Machine learning refers to a group of mathematical modeling techniques that are capable of
automatically acquiring and integrating knowledge based on empirical data, such as data from sensors
or databases. This area has been extensively studied with numerous successful applications across a
wide range of fields (a very broad description of application areas and examples can be found in [33]).
The purpose of using machine learning techniques (or learning machines) is to reproduce the
human learning capabilities, namely the ability to recognize complex patterns and make intelligent
decisions based on data.
Learning machines are widely used in classification, regression, recognition and prediction
problems. There are many possible applications for these modeling techniques, that range from
engineering applications in robotics, fault tolerant control, pattern recognition (e.g. speech
recognition, handwriting recognition), to medical applications (e.g. diagnosis, prognosis) [33].
5
Nonetheless, in this work, the main interest is the machine learning capability of discovering and
classifying patterns in high dimension databases.
Pattern recognition [11, 44, 46] addresses the problem of assigning labels (classes) to objects
(or samples), being each sample composed by a set of features (or attributes). In order to better
understand pattern recognition, this subject has been divided into two major types of problems [37]:
unsupervised and supervised learning.
In the unsupervised category, the problem is to understand whether there are groups in the
data, and what characteristics make the objects similar within the group and different across the
groups. Contrarily, in supervised learning, each data sample already has a pre-assigned label, and the
task consists of training a classifier in order to differentiate between labels.
In this work, it was decided to use a non-linear machine learning technique, Fuzzy Modelling
(FM). This method is considered suitable for the demanding problem of pattern recognition, since it
can, theoretically, approximate any multivariate nonlinear function [37]. The main advantages of this
method are the following:
Efficient tool for embedding human (structured) knowledge into useful algorithms multivariate;
Applicable when mathematical model is unknown or impossible to obtain;
Operates successfully under a lack of precise sensor information;
Useful at the higher levels of hierarchical control systems;
Appropriate tool in generic decision-making process;
Transparent, non-crisp model;
Interpretation in the form of rules and logical connectedness. From the medical point of view,
these rules provide additional means of validating the fuzzy classifier by clinician’s knowledge
regarding the system.
The main disadvantages are:
Experts may have problems in structuring the knowledge with respect to the structure of the
model;
Experts sway between extreme poles: too much aware in field of expertise, or tending to hide
their knowledge;
model complexity increases exponentially with the increase of the number of features;
Learning is highly constrained; typically more complex than other models, like neural networks
(NN).
6
1.2 Prediction of readmissions
Patients readmitted to an intensive care unit during the same hospitalization have an increased
length of stay, higher costs and increased risk of death. Previous studies have demonstrated overall
readmission rates of 4-14% [3, 48], of which nearly a third can be attributed to premature discharge
from the critical care setting [3, 12]. It is also documented that the length of stay for readmitted
patients is at least twice as long as that for patients discharged from the ICU but not readmitted and
that hospital death rates are 1.5 to almost 10 times higher among ICU readmission [49].
Increasing pressures on managing care and resources in ICUs is one explanation for strategies
seeking to rapidly free ICU beds. Faced with this scenario, a clinician may elect to discharge a patient,
currently in the ICU, who has already had the benefits of stabilization and intensive monitoring, to
make room for more acute patients allocated in the emergency department, exposing the outwardly
transferring patients to the risk of readmission in the short term. Moreover, despite the existence of
morbidity and mortality issues around readmission, the Centres for Medicare & Medicaid Services have
already reduced funding for specified avoidable conditions, and it is quite possible that avoidable
readmission to an ICU will receive attention in the future as well.
Previous studies [7] have examined different variables that are assessed at discharged, but
these predictive models performed only slightly better than models based upon the gold standard
method- APACHE II.
Thus, this work has encountered the problem of readmission to an ICU, being its goal to
predict the readmission of patients in an ICU within 24-72 hours after the discharge. A data mining
approach was used to a real world database, the MIMIC II, combined with fuzzy modelling and three
different Feature Selection algorithms, the Sequential Forward Selection, the Binary Particle Swarm
optimization and the novel Binary Fish School Search, here formulated. In this context, 22 physiologic
variables acquired during the stay of real patient in an ICU were selected. Statistical measures were
utilized to describe each patient stay: the mean, the standard deviation, the maximum, the minimum,
the Shannon entropy and the weighted mean, which was tested for different weights.
1.3 Contributions
In this work, the problem of Feature Selection in real-world databases is addressed. The main
contributions of this work are:
Introduction and formulation of Binary Fish School algorithm, the novel algorithm for Feature
Selection derived from the optimization algorithm Fish School Search. Originally this algorithm
was presented as a multidimensional real system encoded algorithm [17], and is here modified in
7
order to solve problems with binary inputs. The algorithm was then applied to feature selection
problems;
Use of new types of features (weighted mean and Shannon entropy) to predict the readmissions
of patients in the ICU during the 24 to 72h period that follows the discharge;
Comparison of the FS results applied to two real databases using the three feature selection
algorithms: sequential forward selection, particle swarm optimization and binary fish school
algorithm.
1.4 Outline
In chapter 2, an overview of the knowledge data discovery stages, studied in this work, is
presented. It begins with the definition of the two addressed databases and the necessary
preprocessing of the data. Then, the fuzzy modelling technique is presented, together with the
performance measures considered in this work. Finally a broad description of wrapper methods is
presented along with description of the state of the art feature selection algorithms used in this work.
In chapter 3, the original fish school search algorithm is presented in detail. The internal
mechanisms are featured and an illustrative example is given to consolidate the description. Finally the
first approach to transform the FSS algorithm in order to solve feature selection problems is presented,
the decimal to binary fish school search.
Chapter 4 will introduce the goals and detailed formulation of the binary fish school search.
Chapter 5 presents the results of the wrapper methods that combine the studied machine
learning techniques with the introduced search algorithms. The chapter begins with the presentation
of the outline of the approach, and the definition of the parameters used to evaluate the functioning
of the formulated algorithms. Next, the tests to select the parameters for the optimization algorithms
are presented. Finally, the methods are tested and compared over the two databases.
At last, in Chapter 6 the results of this work are summarized and conclusions are drawn.
Furthermore, promising areas for future research are presented
8
9
Chapter 2
Knowledge Data Discovery
2.1-DATA
In this work, two databases were used: a benchmark database and a health care database, the
MIMIC II. The benchmark databases are employed to ascertain the quality of the developed FS
algorithms, i.e., to verify if the FS algorithms are capable of selecting a low number of features subset
with good informative potential. After validation with the benchmark databases, the FS algorithms
were applied to a health care database, the MIMIC II, as a prediction of readmission problem.
In this chapter, the selected benchmark database is initially exposed and then the health care
database is presented as well as the necessary processing for this database.
2.1.1-Benchmark database – Sonar
The choice of a proper group of benchmark databases is very important to adequately validate
the implementation of an algorithm. These databases should allow the algorithm designer to test the
algorithms according to the predefined performance measures and allow the comparison between
these results with those from state of the art methods.
The sonar database is comprised of 208 real samples of rocks that are divided in two labels. A
data sample is a set of 60 features with values ranging from 0.0 to 1. Each of these features represents
the energy within a particular frequency band, integrated over a certain period of time. The label
associated with each record contains the indication if the rock sonar signals bounced off a metal
cylinder (97 samples) or bounced off a roughly cylindrical rock (111 samples). The task at hand was to
discriminate between these two classes [21].
This database was developed by Sejnowski and Gorman on their study in the classification of
sonar signals using artificial neural networks [21].
10
2.1.2- MIMIC II database
The MIMIC II database [51] is a large database of ICU patients admitted to the Beth Israel
Deaconess Medical Centre, collected from 2001 to 2006. The MIMIC II database is currently formed by
25,549 patients, of which 19,075 are adults (> 15 years old at time of admission). For each patient,
several samples of physiological variables were stored throughout their stay.
In this work, a previously developed dataset (first presented in [15]) was used, including only
adult patients (>15 years) that were ICU inpatients for at least 24 h and readmited back to any ICU of
the same medical centre between 24 and 72 h. This interval is often referred to as an early readmission
[43]. The reason for choosing 24h as the lower bound for the readmission time window is related to
how MIMIC II is structured. Also, patients readmitted to the ICU less than 24 h after their discharge are
considered to belong to the same ICU stay. The choice of 72 h as the upper bound for the readmission
time window was based on previous works [50], and local clinical intensivist suggestions. All included
patients were also required to have at least one measurement of the 22 variables shown in Table 2.1.
These variables were selected based on the hypothesis that a good predictive value could be achieved
using a few physiological variables and taking into account the following directives:
i. The variables had to be easily and/or routinely assessed in the 24 h before discharge A balance had
to exist in the number of selected variables given that it will affect the number of patients that will
form the dataset, i.e. the more variables defined, the fewer the patients that were likely to have all
of them collected at the same time;
ii. Selecting a high number of variables may bias the dataset towards selecting patients having similar
conditions that required their specific measurement/testing;
iii. The variables chosen should be independent with minimal correlation.
Exclusion criteria included patients who died during the ICU stay.
As with other real-world databases, a few preprocessing steps were necessary to improve the
quality of the raw data of the MIMIC II. In order to deal with variables collected within different
sampling periods, similarly to [15], a template variable was used. This process aligned all samples to
the same point in time as a designated template variable. Heart rate was chosen as the template
variable on the basis since it was one of the most frequently measured variables and thus, introduced
fewer artifacts in the data. With regards to missing data, in general, ICU data can be missing either
because they are perceived to be irrelevant for the current clinical problems (thus, not recorded), or
because exogenous interventions or endogenous activities have rendered the data useless [58].
11
Type of variables Variable name (units)
Heart rate (beats/min)
Respiratory rate(breaths/min)
Temperature(ºC)
SpO2(%)
Non-invasive arterial Blood pressure(systolic)(mmHg)
Blood pressure (mean)(mmHG)
Red blood cell count (cellsx103/lL)
White blood cell count (cells x 103/lL)
Platelets (cells x 103/lL)
Hematocrit (%)
BUN (mg/dL)
Sodium (mg/dL)
Potassium (mg/dL)
Calcium (mg/dL)
Chloride (mg/dL)
Creatinine (mg/dL)
Magnesium (mg/dL)
Albumin (g/dL)
Arterial pH
Arterial base excess (mEq/L)
Lactic acid (mg/dL)
Other Urine output (mL/h)
Monitoring signlas
Laboratory tests
Data missing for an intentional reason (e.g. patient is transported out of the ICU for an
imaging scan) was considered non-recoverable and thus deleted. On the other hand, data missing for
some unintentional reason (e.g. sensor goes off patient’s chest) was considered recoverable and the
last available value was used to impute values to these segments.
The Interquartile Range (IQR) method was used in order to deal with the outliers. This method
measures the statistical dispersion of the data, and divides it into quartiles. IQR is a trimmed estimator
that identifies the most robust measure of scale [58]. The patient selection process is summarized in
Fig. 2.1.
It is important to point out that the number of samples for each patient is not constant. A
sample contains measures of the 22 physiologic variables. The number of these samples acquired for
each patient during his stay can vary between 1 and 26 samples and it can have different sampling
periods. The total number of samples considered was 13675.
It was detected that some samples of the 1028 selected patients contained outliers, so some
preprocessing was necessary.
In order to use a constant dimension for the inputs of the models (necessary condition) a
transformation to the data was performed. Statistical measures were used in order to seize the
information of the time series for the physiologic variables of each patient.
The next section exposes the preprocessing used on this dataset.
Table 2.1: List of physiological variables considered from MIMIC II, (according to [15]).
12
MIMIC II(n=25,549)
n=19,075
n=3,034
n=1,267
Survivedn=1,028
Not Survivedn=239
Not Readmittedn=893
Readmittedn=135
Patients > 15 yrICU stay > 24h
w/ all variablesfrom Table 2.1
data preprocessing (removal ofmissing data and outliers)
Data preprocessing
Outliers
After analysing all the 13675 samples of the 1028 patients, and although the IQR method was
applied to the raw database of MIMIC II, there were still some samples that contained values out of the
physiologic limits. As an example, Fig. 2.2 shows the plot of the physiologic variable temperature (ºC)
for all samples considered. The corporal temperature of 5 ºC seen in Fig. 2.2 is not possible, even if the
patient is in a severe condition.
Figure 2.1: Patient selection flowchart [15].
Figure 2.2: Graphical representation of the physiological variable temperature ºC for all samples.
13
No. Variable name (units) Min Max
1 Heart rate (beats/min) 0 250
2 Respiratory rate(breaths/min) 0 200
3 Temperature(ºC) 25 42
4 SpO2(%) 60 100
5 Non-invasive arterial Blood pressure(systolic)(mmHg) 30 300
6 Blood pressure (mean)(mmHG) 10 187
7 Red blood cell count (cellsx103/lL) 2 8
8 White blood cell count (cells x 103/lL) 0.4 50
9 Platelets (cells x 103/lL) 3 1000
10 Hematocrit (%) 19 60
11 BUN (mg/dL) 4 500
12 Sodium (mg/dL) 120 160
13 Potassium (mg/dL) 2.2 8
14 Calcium (mg/dL) 7.2 12
15 Chloride (mg/dL) 80 130
16 Creatinine (mg/dL) 0.1 9
17 Magnesium (mg/dL) 0 10
18 Albumin (g/dL) 0.5 18
19 Arterial pH 4.8 7.8
20 Arterial base excess (mEq/L) -30 20
21 Lactic acid (mg/dL) 0 10
22 Urine output (mL/h) 0 1000
The outliers were eliminated using the maximum and minimum limits for the 22 physiological
variables of Table 2.2. These physiologic limits were obtained through the Decreased Variable Analysis
(MEDAN).
All of the samples that contain one or more physiologic variables with values out of the limits
of Table 2.2 were considered samples with outliers. The total number of measures considered as
outliers was 517. However, only 473 samples contain one or more variables with values out of the
limits. According to [8] the missing samples, considered outliers, can be treated in various ways:
1. Ignore the tuple.
2. Fill in the missing value manually.
3. Use a global constant to fill in the missing value.
4. Use the attribute mean to fill in the missing value.
5. Use the attribute mean for all samples belonging to the same class as the given tuple.
6. Use the most probable value to fill in the missing value.
Methods 3 to 6 bias the data.
Since the number of samples, as well as the temporal spacing, of each sample for each patient
are very irregular, models created later must have the ability to handle these irregularities. Thus, the
approach 1 was chosen, in which a sample containing one or more measures outside of the limits in
Fig. 2.2 is removed, being a process that does not bias data.
Table 2.2: Physiological limits considered for the exclusion of outliers.
14
No. of samples No. of patients
Before the preprocessing 13675 1028
Removed during the preprocessing 473 18
After the preprocessing 13202 1010
Minimum number of samples: 1 2 3 4 5 6 7 8 9 10
No. of patients not readmited: 879 655 637 631 617 604 588 578 567 551
No. of patients readmited: 131 94 89 89 88 87 86 82 80 78
% readmited: 13.0 12.6 12.3 12.4 12.5 12.6 12.8 12.4 12.4 12.4
In this process, some patients had all there samples removed, resulting in a total of 1010
patients. Table 2.3, summarizes the preprocessing of the outliers.
The number of samples of the 22 physiologic variables per patient after treatment of outliers
was analysed, Fig.2.2 shows the variation of the number of patients per number of samples after the
outlier’s treatment. It is worth noticing that the number of patients with only one sample is quite
considerable.
In order to evaluate if the percentage of readmitted patients remained in the 4-14%, referred
in the literature, an analysis of the percentage of patients readmitted was made, varying the patients
with a minimal number of measurements for the 22 physiologic variables. Table 2.4 shows the results
of this analysis for the variation of a minimum number of samples of 1 up to 10.
Data transformation
Knowing that there was a great variability in the number of samples per patient and very
irregular sample periods, some descriptive statistics measures were used to describe the stay of each
patient. By doing this, all patients would have the same number of features that described the time
series of the physiologic variables throughout their ICU hospitalization. These features, with constant
dimension, could then be used as inputs for the classification models.
Table 2.3: Summary of the number of samples and patients, resulting from preprocessing.
Figure 2.2: Graphical representation of the number of patients per number of measurements of the 22
physiological variables considered.
Table 2.4: Analysis of the number of patients readmitted and not readmitted for the subset of patients
with a minimum number of samples. The percentage of readmitted patients remain in 4-14% as referred
in the literature.
15
Previous studies [15], used the arithmetic mean, the maximum, the minimum and the standard
deviation of each physiologic variables in order to absorb the information of the time series of the
considered physiologic variables for each patient. In the present work, in addition to these statistical
measures, the Shannon entropy and the weighted average were also used, giving the possibility to
withdraw more information.
Shannon entropy is the average unpredictability in a random variable, which is equivalent to
its information content. It provides an absolute limit on the best possible lossless encoding
or compression of any communication, assuming that the communication may be represented as a
sequence of independent and identically distributed random variables. There are already studies that
use entropy as feature extraction measure [45, 10].
In relation to the weights of the weighted mean, a linear distribution along the stay of the
patient was considered, giving more relevance to the last measurements before the discharge. Four
gradients were considered for these weights, as presented in Fig. 2.3.
In order to use the descriptive statistics measures announced before, it was decided to use
only patients with a minimum of 3 measurements available, considering 725 patients (647 not
readmitted and 89 readmited, see Table 2.4). Thus, after the treatment of outliers and transformation
of the dataset, 4 datasets emerged, one for each gradient of the weighted mean.
The only features that differ in each dataset are the 22 features that correspond to the
weighted mean. Each dataset was formed by 726 patients (12.3% readmitted) and 132 features. The
four datasets considered will be referred as readmition datasets.
Each patient will be considered as a sample for inputs of the classification models, Fig. 2.4
illustrates a sample.
Figure 2.3: Graphical representation of the four different gradients for the weights to be used in the
weighted mean. The measures of the 22 variables vary between 0 and 24 hours before the discharge.
Figure 2.4: Illustrative diagram of the formation of the inputs for the classification models: each patient
represents a sample ([1x132] array). Associated with each patient there is also a notification that indicates
whether he is or not in readmitted class.
16
2.2-Modeling
In the present work, we used the machine learning technique of fuzzy modelling. These
models were used as classification models. Briefly, for a given sample the model, created based on the
train set of the data, is supposed to correctly assign this sample to one of the labels considered in the
problem. An overview of the fuzzy modelling is given in the next topics.
2.2.1 Fuzzy modeling
Fuzzy modeling is a tool that allows approximation of nonlinear systems when there is little or
no previous knowledge of the problem to be modeled [55, 13]. This tool supports the development of
models around human reasoning (also referred to as approximate reasoning), and allows an element
to belong to a set to a degree, indicating the certainty (or uncertainty) of its membership.
Within medical-related classification problems, several fuzzy-based models have shown
comparable performances to other nonlinear modeling techniques [18, 16, 28]. Fuzzy modelling is
particularly appealing as it provides not only a transparent, non-crisp model, but also a linguistic
interpretation in the form of rules and logical connectives. These are used to establish relations
between the defined features in order to derive a model. A fuzzy classifier contains a rule base
consisting of a set of fuzzy if-then rules together with a fuzzy inference mechanism. These systems
ultimately classify each instance of a dataset as pertaining to one of the possible classes defined for
the specific problem being modeled [55].
For both databases used in this work (Sonar and readmission), the goal was to classify the
samples in one of two labels. In the case of sonar database: rock sonar signals bounced off a metal
cylinder or bounced off a roughly cylindrical rock, and in the readmission problem, patient would be
readmitted or patient would not be readmitted to the ICU after 24-72 hours of discharge.
First order Takagi-Sugeno (TS) fuzzy models [55] were applied, which consist of fuzzy rules
where each rule describes a local input-output relation. When first order TS fuzzy systems are used,
each discriminant function consists of rules of the type:
where, corresponds to the rule number, is the input vector, is
the total number of inputs (features), is the fuzzy set for rule and feature, and is the
consequent function for rule .
The degree of activation of the h rule is given by:
(2.1)
17
where .
The overall output is determined through the weighted average of the individual rule outputs.
The number of rules, and the antecedent fuzzy sets are determined using fuzzy clustering in the
product space of the input and output variables [55]. The consequent parameters for each rule , are
obtained as a weighted ordinary least-square estimate.
Given a classification problem, and being a linear consequent, a threshold is required to turn
the continuous output into the binary output . In this way, ,
and if .
The number of rules j and the antecedent fuzzy sets were determined by means of fuzzy
clustering in the product space of the input and output variables.
For each database, the data was divided into training and test sets, while the model
parameters were calculated using the training set, the feature subset quality was assessed using the
test samples. Such approach was necessary due to the risk of overfitting, which means that the model
could describe random error or noise instead of the underlying information. Thus, the test set provided
a fair comparison over the generalization capabilities of the evaluated models [47].
2.2.2 Clustering
Clustering is an unsupervised learning method that organizes and categorizes data based on
the similarity of data objects [2]. It is used in various fields, such as pattern recognition, machine
learning and bioinformatics [33]. It is useful for knowledge discovery from empirical data and model
construction.
A cluster can be seen as a group of objects more similar to one another than to other data
points, being similarity usually defined as a distance norm. Furthermore, a cluster can also be seen as
the area of influence of rule . Therefore, a cluster center, also called prototype, coincides with the
corresponding rule centre. The closer a data point is to a cluster center, the higher the fulfilment
degree will be.
There is a great number of clustering algorithms, however, most of the analytical clustering
algorithms are based on the minimization of the fuzzy c-means objective functional [5]. This objective
function can be written as (2.2).
(2.2)
where the positive constant determines fuzziness of the resulting clusters. The vector is
one of the data samples, is the cluster center and is a distance norm between data
points and cluster centers. The fuzzy partition matrix contains all the normalized membership values
, is the matrix containing all the data samples, and is the matrix of the cluster prototypes.
18
In this work, the fuzzy C-means (FCM) clustering algorithm was used, requiring the definition
of the number of cluster (which translates into the number of fuzzy rules). The number of clusters to be
used was determined based on the minimization of the partition index [4]. This index accounts for
both properties of the fuzzy memberships and structure of the data by measuring the compactness
and separation of the clusters. This index is defined as:
(2.3)
where corresponds to the weighting exponent of the FCM algorithm, corresponds to the
cardinality of fuzzy cluster , corresponds to the distance between a data point and its
cluster center , and
(named as the separation of a fuzzy cluster ) corresponds to
the sum of the distances from the cluster center to the centers of all other clusters. The lower
the value of t he more compacted and separated are the clusters.
For the sonar databases and the four datasets of the MIMIC II considered, the values of
were calculated by varying J from 2 to 5. The final number of clusters corresponded to a local
minimum where the difference between the values of the criterion was minor.
In the present work, the maximum limit of variation for the search of the final number of
clusters was chosen, with the thought that a smaller number of clusters means a lower number of rules
and hence a lower degree of model complexity. For the two databases considered (sonar and
readmission datasets) the chosen numbers of cluster were 2.
2.2.3 Performance Measures
Traditionally, accuracy has been used to evaluate classifier performance. This measure is
defined as the total number of correct classifications over the total number of available samples.
Usually, most of the classification problems have two classes, positive and negative cases [38]. Thus,
the classified test points can be divided into four categories:
• true positives (TP) - correctly classified positive cases,
• true negatives (TN) - correctly classified negative cases,
• false positives (FP) - incorrectly classified negative cases,
• false negatives (FN) - incorrectly classified positive cases.
Given these categories, the accuracy can be written as (2.4).
(2.4)
This criterion is limited, especially in medical applications, for various reasons. If one of the
classes is more underrepresented than the others, misclassifications in this class will not have a great
impact in the accuracy value. Also, a good classification of a class might be more important than
19
classifying other classes and this cannot be assessed with accuracy. To take this matter into account,
two performance measures were introduced: the sensitivity (2.5) and specificity (2.6):
(2.5)
(2.6)
The sensitivity and specificity varies between
The receiver operating characteristic (ROC), or simply ROC curve, is a graphical plot which
illustrates the performance of a binary classifier system as its discrimination threshold is varied. It is
created by plotting the fraction of true positives out of the positives (sensitivity) vs. the fraction of false
positives out of the negatives (one minus the specificity ), at various threshold settings. An example of
ROC curves is shown in Fig. 2.5.
When using normalized units, the area under the curve (AUC) is equal to the probability of a
classifier ranking a randomly chosen positive instance being higher than a randomly chosen negative
one (assuming 'positive' ranks higher than 'negative'). The AUC measure ranges from 0.5 (random
classifier) to 1 (perfect classifier).
In the present work, the Sensitivity and Specificity were used as performance measures for the
models using the sonar database and readmission datasets, introduced in section 2.1. However, in the
models using sonar database accuracy was used as the main performance measure, and, in the
readmission datasets the AUC. This choice was made because of the fact that the two classes of the
sonar database had similar numbers of samples (89 samples 45% vs 111 samples 55%) and for the
readmission problem the percentage of the class readmitted was only 12.3% against 87.7% of not
readmitted. In this case one of the classes is underrepresented and if the accuracy had been used the
results would not be realistic, i.e. if a model classified all patients as not readmitted, the accuracy
would be ~87.7%, but the AUC measure would be 0.5, corresponding to a random classifier.
For the computation of the measure AUC, only one threshold was used, the one through which
the best performance of the model was achieved with the train set. With the resultant sensitivity and
Figure 2.5: Example of three ROC curves.
20
specificity of the test set using that threshold, a point was marked in the ROC. The AUC was computed
as the area under the two segments that link the points and to the point marked with the
sensitivity and specificity. By doing this, we ensure a good approximation of the performance of the
model.
2.3- Feature Selection
The main characteristic of wrapper methodologies is the involvement of the predictor as part
of the selection procedure. In this work, a learning machine was used as a “black box” to score the
subsets according to their predictive performance [23]. Wrappers are constituted by three main
components:
1) Search method;
2) Learning machine;
3) Feature evaluation criteria.
Wrapper approaches were aimed to improve the results of the specific predictors they work
with. During the search, subsets were evaluated without incorporating knowledge about the specific
structure of the classification [23].
In section 2.2 the fuzzy modeling technique (learning machine) was introduced. It is
considered to have universal function approximation properties, i.e., in theory they could approximate
the behaviour of any function. However, as referred in section 1.1.1, in real problems this is rather
difficult for a number of reasons, being one of them the high dimensionality of the available data.
Feature selection is generally used to identify which of the available variables are closely
related to the prediction of the outcome and to discard those unrelated to it, reducing the
dimensionality of the dataset [25, 41, 39]. From the clinical point of view, this process may bring to
light new variables that had not been previously considered as relevant to a given outcome.
In the present work, three FS algorithms were applied, the sequential forward selection (SFS),
Binary Particle Swarm Optimization (BPSO) and two formulated algorithms: decimal to binary Fish
School Search (D2BFSS) and Binary Fish school search algorithms (BFSS).
The following sections, present an overview of the well-known SFS algorithm and the BPSO
methods.
2.3.1- Sequential Forward Selection
A detailed description of the sequential forward selection search algorithm used is reported in
[39]. Briefly, a model is built for each of the features in consideration, and evaluated using a
performance criterion upon the test set. The feature that returns the best value of the performance
21
criterion is the one selected. Then, other feature candidates are added to the previous best model, one
at a time, and evaluated. Again, the combination of features that maximizes the performance criterion
is selected. When this second stage finishes, the model has two features. This procedure is repeated
until the stop criterion is achieved. In the end, all the relevant features for the considered process
should be obtained.
The main advantages of this method relate to its simplicity, possibility of graphical
representation of the performance of the added feature and transparent interpretation of the results
which, for clinicians, is particular attractive. The main disadvantage is related to the greedy and thus
susceptible approach of finding local optima [39].
In this work, unlike the traditional stop criteria of iteration without improving the
performance of the models, the maximum number of features selected was used as the stopping
criteria. After the maximum number of features had been achieved, the model with the set of features
that achieved best performance was considered as the selected best features.
The overall process of the SFS algorithm can be described as:
For each feature in the feature vector X that does not belong to the features of the model:
repeat
Build model using previous features of the model combined with each feature in the feature
vector X that does not belong to the features of the model
Compute performance measure;
Select the combination of features with the highest value of AUC as the new features of the
model;
until number of selected features reaches defined limit
Select the final features.
The accuracy, for SONAR database, and the AUC, for the MIMIC II derived datasets, were used
as performance measures to maximize. For each combination of features selected by the SFS, the
process of generating the performance measure of the model can be described by the following steps:
1. The model is trained with the train set and the selected features.
2. With the simulated output of the training set, a threshold is iteratively evaluated in order to find
the one who maximise the performance (ACC or AUC, depending on the database).
3. With the threshold found, the test set is then simulated.
4. The final performance of the model is generated with the test set output.
22
2.3.2- Binary Particle Swarm Optimization
Particle swarm optimization is a stochastic population-based metaheuristic, inspired in
swarming behaviour of some biological species (e.g. bird flocks).
There are various ways of encoding a problem solution, the most common and more generic
are real, integer and binary encoding. The use of each of them depends on the problem in hand.
Normally, in feature selection, the search space organization is made such that each state represents a
feature subset [16]. In a problem with variables, a state is encoded by a sequence of bits, each bit
indicating whether a feature is present or absent. An example of a possible state is represented by the
sequence:
(2.7)
The variable xij corresponds to input Fj, where . If feature is to be selected then
if not . This process is illustrated in Fig. 2.6.
Essentially, in BPSO, each particle is a candidate solution of the optimization problem. A
particle is associated to a position and a velocity in the search space, where the method for
determining the changes in velocity depends on the particle itself and the other particles.
The iterative process in search of the optimum is [16]:
Step 1: Evaluate each particle in the swarm;
Step 2: Find the swarm and particle best values;
Step 3: Update velocities;
Step 4: Update positions of the particles;
Step 5: Go to Step 1 if not finished/stop criteria.
There are two crucial steps in the way the algorithm operates, the update of velocities and
update of particle positions.
Figure 2.6: Decoding process in feature selection, according to [16].
23
1) Update velocities: Velocity directs the movement in the search space taking into account the
performance of the own particle and of the swarm, and it is update with the following equation:
(2.8)
The term involving constant is called the cognitive component and the term involving is
the social component. q and r are uniform random numbers . Once velocities have been
update, the restriction is applied; this is a crucial step for the swarm to maintain
coherence.
2) Update particle position: The logistic function of the velocity is used as the probability distribution
for the position, [59]:
(2.9)
Thus the particle position is calculated for each variable by:
(2.10)
Objective Function
Recalling that the two main objectives in the FS problem are: maximizing the model accuracy
and minimizing the size of the feature subset. The objective function [16] will be defined as a fitness
function, being the goal its maximization:
(2.11)
where is the size of the feature subset and the total number of features to be selected. The term
on the left side of the equation accounts for the overall accuracy or AUC and the term on the right for
the percentage of used features. Constant is the weight of the related goal: accuracy or AUC
and subset size.
24
25
CHAPTER 3
Fish school search
The novel Binary Fish school search algorithm, formulated and presented in this work, was
created based on the optimization search algorithm: Fish school search (FSS), invented by C. Bastos
Filho and F. Lima Neto in 2007, [17]. In this topic, the original Fish school search optimization algorithm
is presented, based on [17], as well as the formulation of the decimal to binary system fish school
search algorithm (D2BFSS).
3.1- Original Fish school search
Several oceanic fish species, as well as other animals, present social behaviour. This
phenomenon’s main purpose is to increase mutual survivability and may be viewed in two ways: (i) for
mutual protection and (ii) for synergistic achievement of other collective tasks. Here, protection means
reducing the chances of being caught by predators; and synergy, refers to an active mean of achieving
collective goals such as finding food.
Apart from debating whether the emergent behaviour of a fish school is due to learning or
genetic reasons, it is important to note that some fish species live their entire lives in schools. This
reduces individual freedom in terms of swimming movements and increases competition in regions
with scarce food. However, fish aggregation is a fact and the benefits largely outweigh the drawbacks.
Along with the development of this technique the authors have taken great care not to depart
from the original inspiration source, but FSS contains a few abstractions and simplifications that have
been introduced to afford efficiency and usability to the algorithm. The main characteristics derived
from real fish schools and incorporated into the core of the approach are sound. They are grouped
into two observable categories of behaviours as follows:
• Feeding: inspired by the natural instinct of individuals (fish) to find food in order to grow strong
and to be able to breed. Notice that food here is a metaphor for the evaluation of candidate
solutions in the search process. An individual fish is considered to be able to lose as well as to
obtain weight, depending on the regions it swims in;
26
• Swimming: the most elaborate observable behaviour utilized in this approach. It aims at mimicking
the coordinated and the only apparent collective movement produced by all the fish in the school.
Swimming is primarily driven by feeding needs and, in the algorithm, it is a metaphor for the search
process itself.
3.1.1-Search Problems and Algorithms
Although there are several approaches for searching, there is, unfortunately, no general
optimal search strategy [40]. Thus, solving search problems is sometimes more of an art form than an
engineering practice. Although custom-made algorithms are valuable options for specific problems, a
more generalized automatic search engine would be a great bonus for tackling problems of high
dimensionality. Search problems can be highly varied. For example, they can be classified into two
groups with regard to the structure of their search-space: structured or unstructured. For the former,
there are many traditional techniques that are, on average, quite efficient. The same observation does
not apply to the latter, that is, there is no overall good approach for search spaces on which there is no
prior information.
The FSS can be a valuable option for searching in high dimensional and unstructured spaces.
3.1.2-FSS Computational Principles
The search process in FSS is carried out by a population of limited memory individuals – - the
fishes. Each fish represents a possible solution to the problem. Similar to PSO or GA, search guidance
in FSS is driven by the success of some individual members of the population.
The main feature of the FSS paradigm is that all fish contain an innate memory of their
successes – their weights. In comparison to PSO, this information is highly relevant because it can
obviate the need to keep a log of the best positions visited by all individuals, their velocities and other
competitive global variables. Another major feature of FSS is the idea of evolution through a
combination of some collective swimming, i.e. “operators” that select among different modes of
operation during the search process, on the basis of instantaneous results.
As for dealing with the high dimensionality and lack of structure of the search space, the
authors of the algorithm [17], believed that FSS should at least incorporate principles such as the
following:
(i) Simple computation in all individuals;
(ii) Various means of storing distributed memory of past computation;
(iii) Local computation (preferably within small radiuses);
(iv) Low communication between neighbouring individuals;
27
(v) Minimum centralized control (preferably none); and
(vi) Some diversity among individuals.
A brief rationale for the above-mentioned principles is given, respectively: (i) this reduces the
overall computation cost of the search; (ii) this allows for adaptive learning; (iii), (iv) and (v) these keep
computation costs low as well as allowing some local knowledge to be shared, thereby speeding up
convergence; and finally, (vi) this might also speed up the search due to the
differentiation/specialization of individuals. These principles, incorporated in FSS, led the authors to
believe that FSS could deal with multimodal problems.
3.1.3-Overview of the algorithm
The inspiration mentioned, together with the principles just stated above, were incorporated in
the approach in the form of two operators that comprise the main routines of the FSS algorithm. To
understand the operators, a number of concepts must be defined.
The concept of food was considered as related to the function to be optimized in the process.
For example, in a minimization problem the amount of food in a region would be inversely
proportional to the function evaluation in this region. The “aquarium” is defined by the delimited
region in the search space where the fish can be positioned. The operators were grouped in the same
manner in which they were observed when drawn from the fish school, defined as follows:
Feeding: food is a metaphor for indicating to the fish the regions of the aquarium that are likely
to be good spots for the search process;
Swimming: a collection of operators that are responsible for guiding the search effort globally
towards subspaces of the aquarium that were collectively sensed by all individual fish as more
promising with regard to the search process.
3.1.4-The Feeding Operator
As in real situations, the fish of FSS are attracted to food scattered in the aquarium in various
concentrations. In order to find greater amounts of food, the fish in the school can move
independently (see individual movements in the next section).
As a result, each fish is allowed to grow in weight, depending on its success or failure in
obtaining food. The authors proposed that fish’s weight variation be proportional to the normalized
difference between the evaluation of fitness function of previous and current fish position with regard
to food concentration of these spots. The assessment of ‘food’ concentration considers all problem
dimensions, as shown in (3.1):
28
(3.1)
where was the weight of the fish , the position of the fish and evaluated the
fitness function (i.e. amount of food) in .
A few additional measures were included to ensure rapid convergence toward rich areas of the
aquarium, namely:
o Fish weight variation is evaluated once at every FSS cycle;
o An additional parameter, named weight scale ( ) was created to limit the weight of a fish. The
fish weight may vary between 1 and .
o All the fish are born with weight equal to
.
3.1.5-The Swimming Operators
A basic animal instinct is to react to environmental stimulation (or sometimes, the lack of it). In
this approach, swimming is considered to be an elaborate form of reaction regarding survivability. In
FSS, the swimming patterns of the fish school are the result of a combination of three different causes
(i.e. movements).
For fish, swimming is directly related to all the important individual and collective behaviours
such as feeding, breeding, escaping from predators, moving to more liveable regions of the aquarium
or, simply being gregarious. This panoply of motivations to swim away inspired the authors [17] to
group causes of swimming into three classes: (i) individual, (ii) collective-instinct and (iii) collective
volition. Below further explanations on how computations are performed on each of them are
provided.
3.1.6-Individual Movement
Individual movement occurs for each fish in the aquarium at every cycle of the FSS algorithm.
The swim direction is randomly chosen. Provided the candidate destination point lies within the
aquarium boundaries, the fish assess whether the food density there seems to be better than at its
current location. If not, or if the step-size would be considered not possible (i.e. lying outside the
aquarium or blocked by, say, reefs), the individual movement of the fish would not occur. Soon after
each individual movement, feeding would occur, as detailed above.
For this movement, a parameter was defined to determine the fish displacement in the
aquarium called individual step ( ). Each fish moves stepind if the new position has more food
than the previous position. Actually, to include more randomness in the search process the individual
step is multiplied by a random number generated by a uniform distribution in the interval ,
29
represented as u in (3.1). In this simulation, the individual step was decreased linearly in order to
provide exploitation abilities in later iterations:
(3.1)
where is the number of the actual iteration and is the total number of iterations.
Fig. 3.1 shows an illustrative example of this swimming operator. One can note that just the
fish that found spots with more food have moved.
3.1.7-Collective-Instinctive Movement
After the individual movement, a weighted average of individual movement based on the
instantaneous success of all fish of the school is computed. This means that fish that had successful
individual movements influence the resulting direction of movement more than the unsuccessful ones.
When the overall direction is computed, each fish is repositioned. This movement is based on the
fitness evaluation enhancement achieved, as shown in (3.3).
(3.3)
where is the displacement of the fish due to the individual movement in the FSS cycle. Fig. 3.2
shows the influence of the collective-instinctive movement in the example presented in Fig. 3.1. One
can note that in this case all the fish had their positions adjusted.
Figure 3.1: Individual movement is illustrated here before and after its occurrence; red dots are fish
positions after and black dots are the same fish before individual movement. Fish with unsuccessful
individual movement are overlaid, showing only the position after the usage of this operator.
30
3.1.8-Collective- Volitive Movement
After individual and collective-instinctive movements are performed, one additional positional
adjustment is still necessary for all fish in the school: the collective-volitive movement. This movement
is devised as an overall success/failure evaluation based on the incremental weight variation of the
whole fish school. In other words, this last movement will be based on the overall performance of the
fish school in the iteration.
The rationale is as follows: if the fish school is putting on weight (meaning the search has been
successful), the radius of the school should contract; if not, it should dilate. This operator is deemed to
help greatly in enhancing the exploration abilities in FSS. This phenomenon might also occur in real
swarms, but the reasons are as yet unknown.
The fish-school dilation or contraction is applied as a small step drift to every fish position with
regard to the school’s barycenter. The fish-school’s barycenter is obtained by considering all fish
positions and their weights, as shown in (3.4).
Collective-volitive movement will be inwards or outwards (in relation to the fishschool’s
barycenter), according to whether the previously recorded overall weight of the school has increased
or decreased in relation to the new overall weight observed at the end of the current FSS cycle.
(3.4)
For this movement, a parameter called volitive step ( ) was defined as well. The new
position is evaluated as in (3.5) if the overall weight of the school increases in the FSS cycle; if the
overall weight decreases, (3.6) should be used.
(3.5)
Figure 3.2: Collective-instinctive movement is illustrated here before and after its occurrence; green dots
are fish positions after and red dots are the same fish before collective-instinctive movement.
31
(3.6)
where is a random number uniformly generated in the interval . We also decreased the
linear stepvol along the iterations.
Fig. 3.3 shows the influence of the collective-volitive movement in the example presented in
Fig. 3.1 after individual and collective-instinctive movements. In this case, as the overall weight of the
school had increased, the radius of the school diminished.
3.1.9-FSS Cycle and Stop Conditions
The FSS algorithm starts by randomly generating a fish school according to parameters that
control fish sizes and their initial positions.
Regarding dynamic, the central idea of FSS is that all bio-inspired operators perform
independently from each other. The FSS search process is enclosed in a loop, where invocations of the
previously presented operators will occur until at least one stop condition is met. Stop conditions
conceived for FSS are as follows: limitation of the number of cycles, time limit, maximum school radius
and maximum school weight.
Below, the pseudo-code for the Fish School Search Algorithm is presented. In the initialization
step, each fish in the swarm has its weight initialized with the value
and its position in each dimension initialized randomly in the search space.
Figure 3.3: Collective-volitive movement is illustrated here before and after its occurrence; pink dots are
fish positions after and green dots are the same fish before collective-volitive movement. The position of
the barycentre is represented by the blue dot.
32
Algorithm Fish School Search
1. Initialize fish in the swarm
2. While maximum iterations or stop criteria is not attained do
3. for each fish I in the swarm do
a. update position applying the individual operator
calculate fish fitness
if
else
b. apply feeding operator
update fish weight according to (3.1)
c. apply collective-instinctive movement
update fish position according to (3.3)
d. apply collective-volitive movement
if overall weight of the school decreases in the cycle
update fish position using (3.5)
elseif overall wight of the school decreases in the cycle
update fish position using (3.6)
end for decrease the individual and volitive steps linearly
end while
3.1.10-Illustrative Example
This section presents an illustrative example (presented in [17]) aimed at better understanding
of how FSS can be used and, ultimately, how it works. The selected example considers a small school
and a very simple problem that is three fish are set to find the global optimum of the sphere function
in two dimensions. The sphere function is presented in (3.7) and its parameters are: (i) feasible space [-
10,10], (ii) number of iterations equal to 10, (iii) = 10, (iv) initial stepind = 1, (v) final stepind = 0.1,
(vi) initial stepvol = 0.5, (vii) final stepvol = 0.05. Table 3.1 includes initial values associated with the
experimental fish school; Fig. 3.4a presents start-up loci of all fish.
(3.7)
33
Fish wight position fitness
# 1 5 (9,7) 130
# 2 5 (5,6) 61
# 3 5 (8,4) 80
Initial conditions
After initialization, all fish are free to check for new candidate positions that are generated by
the individual movement operator. Assuming that these positions were x1 = (9.6,6.2), x2 = (4.6,4.4) and
x3 = (6.2,4.2), and the associated fitnesses f (x1) = 130.6, f (x2) = 40.52 and f (x3) = 56.08, one should
notice that fish #2 and fish #3 found best positions, whereas fish #1 did not move. The positions after
the individual movement were then x1 = (9,7), x2 = (4.6,4.4) and x3 = (6.2,4.2). Fig. 3.4b illustrates the
individual movement of the three fish in search space for the sphere problem.
According to this model, the next operator to be computed should be feeding. As fish #1
remained in the same position, it would not change its weight. The weight of fish #2 and fish #3 would
change according to (3.1). The weight variation depends on the maximum fitness change. The
maximum fitness variation in this case was achieved by fish #3 and is equal to 23.92. As a result, fish #3
increased its weight by 1 unit and its new weight became 6. The fitness variation of fish #2 was 20.48.
Dividing the fitness variation of fish #2 by maximum fitness change, concluding that the weight
variation of fish #2 is 0.86. The new weight of fish #2 is then 5.86. Following the model, the third
operator to be computed would be the collective instinctive one. This operator evaluates the collective
displacement of the fish school considering the individual fitness variations and the individual
movement according to (3.3). As fish #1 stayed in the same position, it would not influence the overall
calculation. Considering the values obtained in this iteration, the displacement was (-1.2,-0.6). This
vector applies to all the fish (including fish #1), so the new positions, after third operator
computations, were x1 = (8.4,5.6), x2 = (3.4,3.8) and x3 = (5,3.6).
Then the fitnesses, regarding new positions recalculations, were 101.8, 26 and 37.96 for fish #1,
#2 and #3, respectively. The individual displacement of all fish due to collective-instinctive operator is
presented in Fig. 3.4c. The interested reader may find it interesting to compare Fig. 3.4b and Fig. 3.4c.
The last operator to be considered in this example is the collective-volitive one. For that, one
has to obtain the instantaneous value of the barycenter of the fish school according to (3.4). In this
case, the barycenter was (4.96,4.25). Notice that the weight of whole school has increased, therefore a
contraction instead of a dilatation was the implicit decision of the school (i.e. collective-volitive). By
means of using (3.5), the new positions were x1 =(5.81,4.89), x2 =(4.02,3.98) and x3 =(4.98,3.92). The
barycentre and the collective-volitive movement for this step are presented in Fig. 3.4d.
At this point, the algorithm tests if valid stop-conditions are met. Obviously it was not the case
yet, thus a new cycle began as explained above. If one compares the initial and final positions
Table 3.1: Initial conditions for the three fish in the sphere exemple,[17].
34
illustrated in Fig. 3.4, after this first iteration, the reader can observe that all fish are closer to the
optimum point (0,0).
Of course the optimum point is unknown to the algorithm. However, in a very peculiar manner
the FSS model assures fast convergence towards it (i.e. the goal for the search process) because of the
above mentioned natural principles instantiated in the FSS algorithm
Figure 3.4: Example with three fish in the sphere example: (a) Initial position, (b) individual
movement, (c) instinctive collective movement and (d) collective-volitive movement
35
a)
d)
b)
e)
c)
h)
f)
i)g)
In order to illustrate the convergence behaviour of the fish school along the iterations, the
simulation results for the sphere function are presented. For these simulations were used 30 fish, [-
100,100] in the two dimensions, initialization range [0,100] in the two dimensions, wscale= 500, initial
stepind = 1, final stepind = 0.1, initial stepvol = 0.5, final stepvol = 0.05. Fig. 3.5 shows the fish positions
after iteration (a) 1, (b) 5, (c) 10, (d) 20, (e) 30, (f) 40, (g) 50, (h) 100 e (i) 200, respectively. One can note
that the school was attracted to the optimum point (0,0).
Figure 3.5: Fish school evolution after iteration (a) 1, (b) 5, (c) 10, (d) 20, (e) 30, (f) 40, (g) 50, (h) 100
and (i) 200 for sphere function with 30 fish
36
3.2- Decimal to binary Fish school search
The FSS algorithm, described in section 3.1, is a high dimension search optimization process in
continuous space. The first intuitive and logical approach was to use the FSS continuous algorithm to
search a one dimension integer number that would then be transformed, in the objective function, in a
vector of binary input with the dimension equal of the number of the features to be selected. To do so,
the decimal to binary system representation was chosen.
This vector of binary input would be then be used as the encoding of the BPSO algorithm,
presented in section 2.3.2.
This relatively simple representation allows the original algorithm, to select features without
major changes in the original algorithm. All the operators were used as described in section 3.1. The
only modification needed was to round the position of each fish to its nearest integer. Thus, it was only
necessary to use one dimension on the search space to search the decimal system representation of
the solution. This approach was called the D2BFSS algorithm.
3.2.1-Objective function
In order to evaluate the fitness of the decimal system solution (position of the fish), the integer
solution was transformed to its binary representation, which was a vector of 0 or 1 bits with dimension
equal to the maximum number of features to be selected.
Inspired by the objective function used on [16], presented in section 2.3.2, the following fitness
function was used to describe the performance of the selected features during the FS process:
(3.7)
where represents the number of features selected and the total number of features while the
value accounts for the performance measure of the test set. The value varies between 1 and 0. If
the right side of (3.7) was not used ( ), there would not be a restriction to the number of features
selected by the algorithm.
37
CHAPTER 4
Binary Fish School Search
It is important to note that the D2BFSS approach, section 3.2, does not manipulate the vector
of bits (feature selected) in its internal mechanisms (the FSS movement operators). Thereby, the
decimal to binary system approach may have problems with convergence, low performance or even be
a random search.
With this concern in mind, it was decided to modify the internal mechanisms of the FSS
algorithm to manipulate binary inputs himself. The following sections describe the modifications to the
fish school search algorithm, emerging the binary fish school search.
4.1- Encoding
There are various ways of encoding a problem solution, the encoding presented here was
inspired in [16], similar to section 2.3.2. An example of a possible state (position of a fish) is
represented by the sequence:
(4.1)
Where is the total number of features to be selected. Each bit indicates whether or not a
feature is selected. This binary scheme, offers a straightforward representation of a feature subset,
allowing the algorithm to search through the workspace, adding or removing features, simple by
flipping bits in the sequence.
While the FSS algorithm was not originally developed in the context of binary encoding, it
appeared to be possible to modify the real to a binary encoding, keeping the following principles:
to follow the internal mechanisms of the original algorithm , without losing the meaning of
each operator;
to add few additional parameters;
to ensure the convergence of the algorithm;
to keep simplicity and understanding to the modifications.
38
In the next sections, the modifications made to each of FSS internal mechanisms are
presented.
4.2-Initialization
For each fish , the initial position was initialized randomly by doing:
(4.2)
where is a random number uniformly generated in the interval , the number of fishes and
the total number of features to be selected.
By doing this, the algorithm starts with completely random positions, being the number of
features selected at the start around . If the initial number was too small, the algorithm might not
converge freely along iterations.
4.3-Individual Movement
The Individual movement occurs once in every cycle of the BFSS. For each fish , and for each
bit , if a random number (uniform distribution in the interval [0,1]), is smaller than Sind(t) the bit will
flip, otherwise it will not change:
(4.3)
Parameter Sind, in the same way as the FSS, will decrease linearly along the iterations
depending on the first value and the last value of stepind. This allows a soft convergence through the
iterations.
A fish will move if the new position has more food than the previous position, i.e. if the fitness
function of new set of features selected (new position) has a better performance than the previous
one. By doing this, the random exploration of each individual fish is preserved.
4.4-Collective-Instinctive Movement
After the individual movement, the weighted average of the individual movements, based on
fishes that had moved, is calculated. This process was executed in the same way as the FSS, equation
(3.1).
In order to make all fishes head to the direction of the successful individual movement
position some changes had to be made to the original FSS algorithm.
39
When dealing with positions with bits (values 0 or 1), equation (3.3) loses its meaning. The
displacement of the fish, in equation (3.3), can no longer be quantified correctly using the
discrete flipping of a bit.
For that reason, equation (4.4) was used to describe the resultant position of the overall
successful of the individual movement:
(4.4)
In (4.4), in (3.3) was replaced with . In this approach, the use of the actual position
of the fishes that had success in the individual movement is seen as being more descriptive than the
flipping of bits.
The resulting vector has the same dimension as the positions of the fishes, but with values
varying between 0 and 1. As an illustrative example, (4.5) represents a possible configuration of :
(4.5)
The goal of the Collective-Instinctive Movement operator is to attract each fish to the resultant
direction of the individual movement operator. In the Binary Fish School Search each fish approaches .
To do so, it is necessary for it to have bit format, two options were here considered to transform in a
bit vector:
a) Using a constant threshold in all iterations– if the values of the bits of were below the
parameter , they would be considered 0, otherwise 1.
For example, if the value of was used in the example (4.5), the resultant vector would
be:
(4.6)
The problem of using a constant threshold in all iterations is that, depending on the evolution of
the FS process, could be formed of only 0s, i.e. all the values of lower than . In
addition, if in any iteration the algorithm favoured a certain feature, it could happen that the
algorithm loses the exploration abilities in later iterations. If this occurred, it would introduce trends
and convergence to local maxima.
b) Using an adaptive threshold for each iteration: multiplying the parameter by the max value
of . The resultant value of this multiplication would then be used as threshold in the current
iteration for this operator.
For the example (4.5), if the parameter was 0.4, the threshold used in this iteration would be
, considering 0.7 the max value of (4.5), resulting in:
(4.7)
40
Therefore in b), for each iteration , the threshold to compute binary vector was calculated
using the max value of . This allowed the algorithm to select at least 1 feature, less likely to
incurring the problem described in the option a).
The study to these two options is presented in chapter 5.
After the computation of in bit format, all fish position could now tend to . To do so,
the position of each fish was compared with . One bit (randomly chosen) of the fish that did not
have the same value as was flipped. This process approaches the position of each fish to . In
comparison with the original algorithm, no longer represents the direction but the position
resultant of the successfully individual movements.
By only flipping one bit per fish, a soft and steady convergence of the algorithm is expected. An
illustrative example can be represented:
(4.8)
In (4.8) the fish moved in the direction of . The bits with the same values of are
represented in red. The resultant position, , is achieved by flipping one random bit that has
different values, represented in green. The total number of bits in red in the new position is greater
than the one in the position before the collective-instinctive movement, making the new position of
the fish to be closer to .
4.5-Collective-volitive Movement
Similarly to the Collective-Instinctive Movement operator, the Collective-volitive operator
underwent some changes. The main goal of this operator is, depending of the success of the individual
movement, to contract or dilate the fish position to or from the barycentre.
The barycentre was computed in the same way as in the FSS algorithm (3.4). Analogously to
the computation of the vector , after (3.4) the barycentre was not obtained in a bit format. Thereby,
two options were also considered to transform the barycentre to a bit format:
a) Using a constant threshold through iterations:
b) Using the adaptive threshold for each iteration: multiplying with the max value of
barycentre.
If the overall individual movement was a success (overall weights improved in the iteration)
each fish would approximate to the barycentre. Similarly to the process in the Collective-Instinctive
Movement operator, section 4.4, every bit per fish were compared to the barycentre. One bit (chosen
randomly) that was not the same value as the barycentre was then flipped. By making only one flip per
fish, the algorithm enables a soft directing from the previous position to the new one, closer to the
41
barycentre. An illustrative example to the case of the improvement of the overall weights (contraction)
is shown:
(4.9)
In (4.9), fish changed randomly one of its bits that were different (green) from barycentre
( ). This allowed the fish to approximate to the baricenter.
If the overall weights had not improved, each fish has to move to the opposite direction of the
barycentre. To do this, the concept of anti-barycenter is introduced, consisting of a vector with the
same dimensions as the barycentre but with flipped bits. In this situation, the process is the same as
described above for the case of contraction to the barycentre but using the anti-barycentre. In (4.10)
the representation of the case of not improvement of the overall weights of the example (4.9) is
presented:
(4.10)
In (4.10), the fish new position is obtained comparing each bit with the anti-baricenter
of the barycentre presented in (4.9). One of the bits with different values (green), was flipped, making
the new position of the fish to be closer to the and consequently further to in (4.9).
With the one bit flip mechanism, the barycentre could no longer be seen as a possible solution
(as is FSS algorithm) but as a point of reference to guide the fishes in the contraction or dilation
process. The best solution per iteration would now be selected by the fish with the best performance
after the collective-volatile movement.
After the collective-volitive movement, a new cycle begins.
4.5-Objective function
Although some of the parameters of the BFSS algorithm influence the final number of features
selected (use of thresholds), the process of developing an objective function is critical, since it serves as
guidance in search of the optimum.
The fitness function was defined as in [16], being the goal its maximization. The most suitable
representation to the proposed task is shown:
(4.11)
where is the classifier performance measure (ACC or AUC, depending on the database) , the
number of features selected and is the total number of features to be selected. The term on the left
side of the equation accounts for the overall accuracy of the model while the term on the right for the
percentage of used features. Note that both terms in the objective function are normalized. Constant
42
FSS
•No. of fishes
•No. of iterations
•Wscale
•stepind [inital and final]
•stepvol [inital and final]
D2BFSS
•No. of fishes
•No. of iterations
•Wscale
•stepind [initial and final]
•stepvol [initial and final]
•α
BFSS
•No.of fishes
•No. of iterations
•Wscale
•stepind [initial and final]
•thres_c
•thres_v
•α
defines the weight of the related goal, performance and subset size. The constant α is a
parameter of the algorithm and varies depending the total number features to be selected ( ) and
the desired number of features selected.
4.7-Parameters
The choice of the set of parameters is a crucial step in wrapper search methods. If the set is
not the most suitable, the predictor will underperform, which might mislead the search algorithm.
When performing the modifications presented above, some parameters were introduced. Fig.
4.1 summarises the set of parameters used in the FSS, D2BFSS and BFSS algorithms.
It is known that the more parameters an algorithm uses, the more time is taken for parameters
estimation and the greater the complexity in the process. The approach taken, as well as the resultant
set of parameters selected, is expected to be able to achieve convergence, and although in a more
subjective way, maintaining the meaning of each operator in the original FSS algorithm.
4.8- BFSS cycle and stop condition.
In the same way as the FSS algorithm, the BFFS starts by randomly generating a fish school
(features selected). In general, the cycle is similar to the FSS, being the main differences the
modifications to each internal mechanism (operators). In addition, instead of using the position of the
barycentre as the best solution in the iteration, the BFFS uses the fish with the maximum fitness
function.
Regarding the stopping criterion, the followings could be used: time limit, maximum school
weight and maximum number of iteration reached (used in all the experiments here presented).
Figure 4.1: Parameters for the original FSS, the decimal to binary FSS and the binary FSS
43
Chapter 5
Results
The main objective of this chapter is to evaluate the applicability of the proposed search
optimization algorithms. These methods combine the machine learning algorithm, introduced in
section 2.2, with the state of the art search algorithms presented in section 2.3 and the formulated in
chapters 3 and 4, using the approach described in the section 5.1. They will be compared with each
other and with the results obtained without FS, based on the predictive performance and the number
of selected features.
For each of the two databases considered in this work, a study was made, so as to ascertain
the parameters of the optimization algorithms to the feature selection problem.
The implementation of the algorithms and the obtainment of the results were made with
Matlab ® R2010a.
5.1-Description of the approach
The use of a learning machine in wrapper methods, so as to evaluate subset suitability,
involves a correct feature subset assessment. The process described in this section, was preformed for
each of the databases considered in this work.
The data was firstly divided in two groups, the feature selection (FS) subset and the model
assessment (MA) subset. This division was random but with the same percentage of each class in each
subset, i.e. 50% of the samples belong to the FS and the other 50% to the MA subset, and both groups
had the same percentage of samples for each class considered.
The FS subset was divided in 70% of the samples for training set and 30% for testing set. This
division was also performed randomly and with the same percentage of each class in each set. The
feature selection was then accomplished and, after the stop criterion was reached, the model with the
best performance was selected. The features selected, as well as its threshold, were then recorded and
a 10-fold cross validation was performed to the MA subset.
44
The k-fold cross validation consists of dividing data into k subsets, using at each time k−1 for
training and the other set for testing. Each subset had been divided to have the same percentage of
samples for each class. For k times, models were trained and tested using the recorded features and
threshold, until all possible combinations of training and testing sets were covered. The results are the
average of the performance measure, introduced in section 2.2.3, on all splits. The mean and the
standard deviation of the AUC, sensitivity and specificity and accuracy are then reported. The k-fold
cross validation allows the evaluation of the validity and robustness of the discovered model by
assessing how the resulting model from feature selection would generalize to an independent data set
(MA subset).
To reduce variability, introduced by the division of the data not only in the FS/MA subset but
also in the division of train/test subsets, 10 rounds of the 10-fold cross validation process were
performed always using different partitions. The round with the best performance was selected, and
the performance measures of that round described the model created with the selected set of features.
Fig. 5.1 summarizes the process.
5.2-Optimization Parameters
In order to choose an appropriated set of parameters to the optimization algorithm, it was
important to do a study of the parameters to be used. Recalling that D2BFSS and BFSS were never
tested before, several measures were chosen to select a fair set of parameters and to evaluate the
internal dynamic of the proposed algorithms. These measures, that will be called indicators, used the
results of the best solution in the FS process and also the MA results:
FS best fitness: encompassing the performance of the best model and the number of features
selected in all iterations of the FS process. It ranged from 0 to 1, the higher the better. Calculated
with (2.11), (3.7) or (4.11) depending on the FS algorithm.
Number of feature selected: the lower the better.
Figure 5.1: Diagram of the whole process: 1) selecting features with the FS subset 2) validation of the
models created with the features selected with the MA subset.
45
Performance of the best model in the FS: the performance of the best model in the FS process,
the ACC in the case of the sonar base and the AUC in the readmission databases. The higher the
better.
Performance of the MA: the 10-fold cross validation result using the MA subset with the features
selected using the FS algorithm. The mean ACC for the sonar database and mean AUC for the
readmission databases. The higher the better.
Iteration of the optimization algorithm with the best solution: Although it can occur, the
optimization algorithm is not supposed to find the best solution in the first’s iterations. The
algorithm should evolve and converge to a better solution. Can vary between 1 and the max
iterations used in the FS process. Low values for this indicator are considered lower quality
solutions.
Percentage of contraction in all iterations: the percentage of all iterations in which, in the
collective-volitive operator, the algorithm contract. This allows the analysis of the internal
behaviour of the algorithm. If the percentage is 0%, the algorithm only expanded and with 100%
the algorithm contract in all iterations. Neither 0% nor 100% are favourable to the correct
execution of the algorithm, to the general level of convergence (0%) and convergence to local
maxima (100%).
Number of repetitions of the same position of the barycentre: this measure allows the verification
of correct function of the internal mechanisms of the algorithm. Varies between 0 and the
maximum number of iterations. The limits are considered as lower quality solutions.
The plot: the result of the visual analysis of the graphical evolution of the best solution per
iteration in the FS process. This graph is supposed to show the convergence of the algorithm
along its iterations, as well the oscillation near the local maxima. Classified as – (bad conjugation),
+ (good) and ++ (very good).
The optimization algorithm should search the space for the solution (exploration) and, in the
same time, converge to a good solution (exploitation). The three last measures help the algorithm
developer to understand what is happening in the internal dynamics of the algorithm, and to achieve a
good ration of exploration and exploitation.
The D2BFSS and BFSS algorithms, here formulated, do not guarantee in advance a
convergence evolution or a good performance of the model created so, it is crucial to consider as
many factors as possible to ensure the correct function of the algorithm.
46
100 rounds AUC ACC % sensitivity specificity AUC ACC % sensitivity specificity threshold features selected
mean 0.73 73.19 0.77 0.69 0.12 12.05 0.18 0.21 0.48 8
std 0.04 4.14 0.07 0.08 0.03 2.59 0.05 0.05 0.05 4
best round 0.81 81.51 0.87 0.75 0.15 15.13 0.13 0.24 0.45 9
mean standard deviation
10-fold cross validation
5.3-Sonar database
As described in section 2.1, the sonar database consists of 208 samples with 60 features and
two classes. As the number of samples for each class was nearly the same, the main performance
measure to be maximised was the accuracy. Due to the number of features to be selected and the
number of samples it was decided that, after the study of the optimization parameters, 100 rounds of
the whole process (FS+MA) would be simulated with different partitions of FS/MA. The mean, the
standard deviation and the best model created, in the 100 rounds, would then be used to compare the
different feature selection algorithms. The same partitions of the data were used for the 100 rounds
between the different algorithms, so that the comparison of the results would be fair.
5.3.1-Sequential Forward Selection
After consulting [16,21], it was verified that previous studies that used the SFS applied to the
sonar database did not select more than 15 features. The SFS stop criteria was chosen as 15 features
selected.
Since the SFS does not have parameters, a study of the parameters was not necessary, and 100
rounds of the process described in section 5.1 were computed. Table 5.1 shows the results of the mean
and standard deviation results of the 10-fold cross validation of the 100 rounds and also the model
with better performance. The selected threshold and number of features in each round were also
recorded.
The SFS algorithm allows the visualization of the quality of each feature subset selected by the
algorithm, Fig. 5.2 summarizes graphically the process of FS for the best model of the 100 rounds.
After the addition of the ninth feature (marked in green in Fig. 5.2), the performance of the models
created with the additional feature didn’t improve.
Table 5.1: Model assessment results - SFS method using sonar database.
47
5.3.2 Decimal to Binary Fish School Search
As described in section 4.3.2, the D2BFSS algorithm had the same parameters as the Fish
School Search genetic algorithm, with the addition of the parameters α (see Fig.4.1). The selected stop
criterion was the number of iterations: 300 iterations.
To maintain consistency in the comparison of the results between different FS algorithms, the
same number of fish/ particles (30) and the same partition of the data (FS/MA and train/test) was used,
as well as the same stop condition for the search process.
In every study of the parameters the same initial position of the fishes/particles was utilized.
This reduced the variability of the results and ensured a coherent analysis of the parameters of the
optimization algorithm.
According to the examples in [17], the most sensitive parameters of the FSS algorithm are the
stepind (initial and final) and the stepvol (initial and final), these were the first two parameters to be
selected in this study. The values of the variation of the parameters stepind (initial and final) and stepvol
(initial and final), are presented in Table 5.2. These values were extrapolated from the ones used in the
examples presented in [17]. The initial values for these two parameters were considered the total
number of possible solutions for the feature selection problem (2^60~=1e18), similar to [17], and the
final values were varied.
The combination of the parameters Stepind and Stepvol ([1e18 1e3] and [1e18 1e5], respectively
) were chosen mainly because of the low number of features selected and the better values for the
mean ACC of crossvalidation.
It is important to note that, for all combinations presented in Table 5.2, the percentage of
contraction, the number of equal barycentre and the plot accounted a low performance of the internal
function of the algorithm.
Figure 5.2: Evolution of ACC with the step-wise inclusion of each individual feature.
48
Stepind Stepvol Wscale α Fitness FS Features selected ACC FSMean ACC
crossvalidation
%
contraction
No. of same
postition of
baricenter
Plot
5000 0.1 0.78 12 0.63 0.77 1.00 0 -
5000 0.3 0.8 14 0.88 0.76 1.00 0 -
5000 0.5 0.81 21 0.97 0.77 1.00 0 -
5000 0.7 0.85 22 0.94 0.71 1.00 0 -
5000 0.9 0.94 22 0.97 0.73 1.00 0 -
5000 0.1 0.77 14 0.75 0.68 1.00 0 -
5000 0.3 0.8 15 0.91 0.73 1.00 0 -
5000 0.5 0.81 13 0.84 0.72 1.00 0 -
5000 0.7 0.85 21 0.94 0.75 1.00 0 -
5000 0.9 0.93 26 0.97 0.74 1.00 0 -
5000 0.1 0.76 14 0.72 0.73 1.00 0 -
5000 0.3 0.82 13 0.91 0.74 1.00 0 -
5000 0.5 0.8 16 0.88 0.82 1.00 0 -
5000 0.7 0.85 16 0.91 0.74 1.00 0 -
5000 0.9 0.91 19 0.94 0.71 1.00 0 -
[1e18 1e3] [1e18 1e5]
[1e18 1e8] [1e18 1e12]
[1e18 1e12] [1e18 1e16]
Study of the parameter: stepind e stepvol [initial and final value]
Stepind Stepvol Wscale α Fitness FS Features selected ACC FSMean ACC
crossvalidation
%
contraction
No. of same
postition of
baricenter
Plot
50 0.1 0.76 13 0.62 0.71 0.41 0 -
50 0.3 0.79 14 0.87 0.72 0.44 0 -
50 0.5 0.79 17 0.87 0.82 0.4 0 -
50 0.7 0.83 19 0.9 0.75 0.4 0 -
50 0.9 0.9 23 0.93 0.73 0.41 0 -
500 0.1 0.78 12 0.68 0.79 1.00 0 -
500 0.3 0.79 15 0.9 0.75 1.00 0 -
500 0.5 0.81 15 0.87 0.8 1.00 0 -
500 0.7 0.84 22 0.93 0.73 1.00 0 -
500 0.9 0.92 27 0.96 0.69 1.00 0 -
5000 0.1 0.78 12 0.62 0.77 1.00 0 -
5000 0.3 0.79 14 0.87 0.75 1.00 0 -
5000 0.5 0.8 21 0.96 0.76 1.00 0 -
5000 0.7 0.84 22 0.93 0.71 1.00 0 -
5000 0.9 0.93 22 0.96 0.72 1.00 0 -
50000 0.1 0.75 15 0.78 0.74 1.00 0 -
50000 0.3 0.77 15 0.84 0.76 1.00 0 -
50000 0.5 0.81 18 0.93 0.75 1.00 0 -
50000 0.7 0.84 22 0.93 0.76 1.00 0 -
50000 0.9 0.91 18 0.93 0.74 1.00 0 -
[1e18 1e3] [1e18 1e5]
Study of the parameter: Wscale e α
[1e18 1e3] [1e18 1e5]
[1e18 1e3] [1e18 1e5]
[1e18 1e3] [1e18 1e5]
The combination of the parameters Stepind and Stepvol ([1e18 1e3] and [1e18 1e5], respectively
) were chosen mainly because of the low number of features selected and the better values for the
mean ACC of crossvalidation.
It is important to note that, for all combinations presented in Table 5.2, the percentage of
contraction, the number of equal barycentre and the plot accounted a low performance of the internal
function of the algorithm.
It was then proceeded the selection of the parameters Wscale and α, Table 5.3. The selected
parameters (Wscale=50 and α=0.1) were chosen mainly because of the indicators: percentages of
contraction, number of features selected and mean ACC of the crossvalitation values, presented in
Table 5.3. The tests using Wscale=50 were the only ones that the contraction to the barycenter did not
occurred in all iterations.
Table 5.2: Results of the study the parameters: stepind and stepvol ([initial final]), selected set at bold
Table 5.3: Results of the study the parameters: Wscale and α. Selected set as in bold.
49
100 rounds AUC ACC % sensitivity specificity AUC ACC % sensitivity specificity threshold features selected
mean 0.74 74.59 0.78 0.71 0.13 12.67 0.18 0.2 0.48 13
std 0.04 3.74 0.07 0.08 0.02 2.25 0.04 0.04 0.05 1
best round 0.84 83.8 0.88 0.8 0.11 11.61 0.16 0.19 0.45 15
10-fold cross validation
mean standard deviation
With the set of parameters selected, 100 rounds were simulated, being the results presented in
Table 5.4.
All tests using D2BFSS presented low convergence (indicator: plot) during its graphical
evolution in the FS optimization algorithm. This random dynamic can be visualized in Fig 5.3, which
shows the evolution of the best fish per iteration, of the best model of Table 5.4.
It can be seen that the number of selected features is always around 20 along the 300
iterations. Convergence is not evident in the graphical evolution of the fitness, ACC and the number of
features selected.
5.3.3 Binary Fish School Search
Unlike the D2BFSS algorithm, there were no guidelines to test the parameters range of the
BFSS algorithm so, a wider approximation was taken. The tables with the results of the study of
parameters are presented in appendix A. Analogously to D2BFSS the first parameters selected were the
parameters and the values. The two structural options were also tested, the use or
not of the adaptive threshold in the collective and volitive operators, presented in sections 4.4 and 4.5.
Table 5.4:Model assessment results – D2BFSS method using sonar database.
Figure 5.3: Graphical evolution of the B2DFSS process of feature selection. Evolution of the fish with best
performance per iteration (above) and evolution of the number of features selected of the fish with the
best performance per iteration (below)
50
Test: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Thres_c 0.9 0.7 0.5 0.3 0.1 0.9 0.7 0.1 0.3 0.9 0.7 0.5 0.3 0.1 0.9 0.7 0.1 0.3
Thres_v 0.9 0.7 0.5 0.3 0.1 0.1 0.3 0.9 0.7 0.9 0.7 0.5 0.3 0.1 0.1 0.3 0.9 0.7
Non-adaptative threshold Adaptative threshold
In appendix A, the detailed results of the initial 18 tests are presented, each test corresponding
to a different set of the parameters and the . These tests used the same partition of
samples for FS/MA and train/test sets. The first 9 tests used the non-adaptive approach and others use
the adaptive threshold. The variation of parameters for each test is presented in Table 5.5.
Each test considered the variation of the Wscale parameter (5, 50, 500, 5000 and 2000), as well
as the variation of the parameters α (0.1, 0.3, 0.5, 0.7 and 0.9).
In order to help the visualization of the selection process, Fig. 5.4-5.8 outlines the results for
the 18 tests, each colour representing the different combinations of Wscale and α to the associated
parameters thres_c and thres_v of each test.
The most discriminate indicators (introduced in section 5.2) that were used to select the best
test were: the FS best fitness (Fig. 5.4), the number of features selected (Fig. 5.5), the percentage of
contraction of the volitive operator (Fig. 5.6), the number of equal positions of the barycenters (Fig.
5.7), iteration of the optimization algorithm with the best solution (Fig. 5.8), and the plot which can be
analysed in the detailed tables in appendix A.
Table 5.5: Configuration of the parameters thres_c and tresh_v for each of the 18 tests.
Figure 5.4: Graphical representation of the results of tests 1-18 for the indicator: FS best fitness. This
indicator indicates the fitness performance of the best model in the FS process, higher the best. The overall
best performance for the tests 10-18 is evident.
Figure 5.5: Graphical representation of the results of tests 1-18 for the indicator: number of features
selected. This indicator indicates the number of features selected in the FS process, lower the best. Tests
10-18 achieved slightly better results for this operator.
51
After analysing the results from Fig. 5.4-5.8, it was decided that the tests 12 to 16 were the
ones with the best configuration. The overall best performance of the tests with the adaptive threshold
(12-18) over the ones without it (1-9) was obvious. All the indicators were taken into consideration to
these conclusions however, the most incriminating one was the number of repetitions of the same
position of the barycentre. Only the test 12-16 performed well from the point of view of this indicator.
Recalling that the high number of repeated positions of the barycentre leads to local maxima and,
therefore, to a weaker overall performance of the algorithm. In the tables of detailed data in appendix
A, it also proved that the indicator plot achieved better performances for the tests 12-16.
After selecting the tests 12–16, it was decided to vary the value of the parameter Wscale to 5, 50,
500,5000,20000, 100000 and 1000000, in order to perform a wider analysis of the parameters Wscale.
The same strategy here taken to choose the best test. After looking at the detailed data the
decisive indicators for selecting the best test and respective set of parameters was the number of
repetitions of the same position of the barycentre (Fig. 5.9).
Figure 5.6: Graphical representation of the results of tests 1-18 for the indicator: percentage of
contraction in all iterations. This indicator indicates the percentage of contraction in the collective-
volitive operator, neither 0 nor 1 are favourable. Tests 10-18 achieved better results for this operator.
Figure 5.7: Graphical representation of the results of tests 1-18 for the indicator: number of
repetitions of the same position of the barycentre. This indicator indicates the repetition of the
position of the barycentre which represents the position where the fishes are going to contract or
expand in the collective-volitive operator, the limits are considered as lower quality solutions.
Tests 12-16 achieved better results for this operator.
Figure 5.8: Graphical representation of the results of tests 1-18 for the indicator: iteration with the
best solution in the FS algorithm. Results close to 0 are considered as low convergence configuration.
Tests 10-18 achieved better results for this operator.
52
100 rounds AUC ACC % sensitivity specificity AUC ACC % sensitivity specificity threshold features selected
mean 0.72 72.85 0.78 0.67 0.12 12 0.18 0.22 0.46 7
std 0.04 4.27 0.08 0.1 0.02 2.14 0.05 0.05 0.05 2
best round 0.85 85.62 0.95 0.76 0.09 9.29 0.09 0.16 0.43 8
10-fold cross validation
mean standard deviation
thre_c thres_v Stepind Wscale α Fitness FS Features selected ACC FSMean ACC
crossvalidation
%
contraction
No. of same
postition of
baricenter
Plot
[0.5 0.2] 0.85 10 87.5 75.83 0.83 59 ++
[0.5 0.001] 0.88 9 90.63 73.85 0.88 91 ++
[0.2 0.001] 0.89 9 93.75 70.35 0.91 131 ++
[0.2 0.01] 0.87 12 93.75 74.83 0.89 109 ++
[0.01 0.001] 0.85 12 90.63 72.99 0.5 26 ++
Study of the parameter: Stepind [inital and final values]
0.7 0.3 100000 0.5
Tests 13 and 16 presented good results in this indicator. However, as shown in the detailed
data, the indicator plot of the test 16 proved to be the best. Next, the parameters Wscale and α were
selected within the test 16. This was made in a more precise way and looking into the detailed results,
concluding Wscale=100000. The parameters Wscale proved not to be as relevant as in D2BFSS. The last
parameter to be selected was the Stepind initial and final values, the results are shown in Table 5.6.
The final configuration (stepind=[0.01 0.001]) of parameters was mainly selected due to the
percentage of contraction and the number of the same position of barycentre.
The 100 rounds with the selected set of parameters were then simulated. The results are
presented in Table 5.7.
A plot of the graphical evolution of the FS process using BFSS is shown in Fig. 5.10, presenting
the fitness and the accuracy for the fish with best fitness performance per iteration. The number of
features selected is also presented.
Figure 5.9:Graphical representation of the results of tests 12-16 for the indicator: number of repetitions of
the same position of the barycentre. Tests 13 and 16 achieved better results for this operator.
Table 5.6: Results of the study the parameters: stepind. Selected set in bold.
Table 5.7: Model assessment results - BFSS method using sonar database.
53
100 rounds AUC ACC % sensitivity specificity AUC ACC % sensitivity specificity threshold features selected
mean 0.73 73.3 0.78 0.67 0.13 13.2 0.17 0.2 0.47 8
std 0.04 4.31 0.09 0.09 0.04 3.59 0.05 0.05 0.05 2
best round 0.82 82.67 0.92 0.73 0.1 9.28 0.12 0.15 0.45 7
10-fold cross validation
mean standard deviation
The plot shows a more explicit convergence throughout the entire process of optimization in
comparison with the plot of the D2BFSS algorithm Fig. 5.3.
5.3.4 Binary Particle Swarm Optimization
Analogously to the FSS modified algorithms 300 iterations and 30 particles were used in the
BPSO simulations. The BPSO had already been applied to the sonar database in the feature selection
problem [64], the parameters suggested in [64] were used:
Vmax = 5
Pmut = no
Rmut= 0.5/Nf
Reset xsb=4
α = 0.7
The results of the 100 rounds are presented in Table 5.8.
Table 5.8: Model assessment results - BPSO method using sonar database.
Figure 5.10: Graphical evolution of the BFSS process of feature selection. Evolution of the fish with
best performance per iteration (above) and evolution of the number of features selected of the fish
with the best performance per iteration (below)
54
100 rounds AUC ACC % sensitivity specificity AUC ACC % sensitivity specificity threshold features selected
mean 0,74 74,66 0,77 0,71 0,11 11,51 0,17 0,19 0,48 60
std 0,03 3,59 0,054 0,05 0,02 2,53 0,04 0,05 0,041 0
best round 0,83 83,57 0,87 0,79 0,1 9,61 0,08 0,15 0,5 60
mean 0.73 73.19 0.77 0.69 0.12 12.05 0.18 0.21 0.48 8
std 0.04 4.14 0.07 0.08 0.03 2.59 0.05 0.05 0.05 4
best round 0.81 81.51 0.87 0.75 0.15 15.13 0.13 0.24 0.45 9
mean 0.74 74.59 0.78 0.71 0.13 12.67 0.18 0.2 0.48 13
std 0.04 3.74 0.07 0.08 0.02 2.25 0.04 0.04 0.05 1
best round 0.84 83.8 0.88 0.8 0.11 11.61 0.16 0.19 0.45 15
mean 0.72 72.85 0.78 0.67 0.12 12 0.18 0.22 0.46 7
std 0.04 4.27 0.08 0.1 0.02 2.14 0.05 0.05 0.05 2
best round 0.85 85.62 0.95 0.76 0.09 9.29 0.09 0.16 0.43 8
mean 0.73 73.3 0.78 0.67 0.13 13.2 0.17 0.2 0.47 8
std 0.04 4.31 0.09 0.09 0.04 3.59 0.05 0.05 0.05 2
best round 0.82 82.67 0.92 0.73 0.1 9.28 0.12 0.15 0.45 7
BPSO
standard deviation
NO FS
SFS
D2BFSS
BFSS
10-fold cross validation
mean
The plot of the FS process for the best round is shown in Fig. 5.11.
After iteration ~70 the best particle does not change. This behaviour leads us to believe that
early convergence to a local maximum could be occurring.
5.3.5 Comparison of Feature Selection Methods
It is now possible to compare the performances of the several FS algorithms applied to the
sonar database. The results are summarized in Table 5.9. The results of the process described in
section 5.2 without using FS (no FS) are also presented. In this case the data was also divided in FS/MA
subsets but the FS subset was only used to find the threshold for the data.
The main measures of performance here compared were the mean accuracy (%) of the
crossvalidation process and the number of features selected.
Table 5.9- Comparison between the studied FS approaches using sonar database.
Figure 5.11: Graphical evolution of the BPSO process of feature selection. Evolution of the fish
with best performance per iteration (above) and evolution of the number of features selected of
the fish with the best performance per iteration (below
55
As expected, the FS algorithms using metaheuristics optimization algorithms (D2BFSS, BFSS
and BPSO) achieved better results in comparison with the SFS and no feature selection approaches.
Regarding the number of features selected, the D2BFSS method had lower performance in comparison
with the BFSS and the BPSO. This, and the fact that the D2BFSS presented a poor convergence
throughout the iterations in the FSS process (section 5.3.2), led to the conclusion that the use of the
decimal to binary system was not the best approach to achieve results similar to the state of the art
algorithms, reinforcing the need to modify the internal mechanisms of the FSS algorithm to deal with
binary input, the BFSS.
It can be concluded that, although the BPSO algorithm had selected 1 less feature in
comparison to the BFSS, the performance of BFSS algorithm greatly exceed the BPSO. It is believed
that the presence of the collective-volitive operator in the BFSS, which enables the fishes to contract or
expand per iteration, is the main tool allowing the algorithm to not converge to local maxima and,
thereby, obtaining better results.
5.4-Readmission database
After the processing of the MIMICII database [15], section 2.1.2, there were four different
datasets, each dataset including a different gradient of weights for the weighted mean, to characterise
the time series for each of the 22 variables considered per patient.
Each dataset is composed of 726 samples (patients) with 132 features. The goal is to classify
each of the patients in one of two classes: readmitted or not readmitted within 24 to 72h after
discharged. Since the number of samples is quite imbalanced in each class (12.3% readmitted and
87.7% not readmitted) the main performance measure to be maximized is the AUC. Due to the high
number of features to be selected and the high number of samples, in comparison to the sonar data
base , it was chosen to perform 500 rounds of the whole process (FS+MA), always with different
partitions of the data for the FS/MA and train/test sets. However, the same partitions between the
different FS algorithms were used, allowing the results to be comparable.
For each of the four datasets , the SFS, the BFSS and the BPSO algorithms were applied to the
Feature selection. Due to its low overall performance and lack of convergence in the sonar database,
the D2BFSS was discarded.
56
Gradient 500 rounds AUC ACC % sensitivity specificity AUC ACC % sensitivity specificity threshold features selected
mean 0.63 56.19 0.73 0.53 0.08 17.09 0.23 0.21 0.13 8
std 0.02 6.07 0.07 0.07 0.01 4.01 0.05 0.05 0.02 2
best round 0.68 64.57 0.73 0.63 0.12 19.88 0.13 0.22 0.15 10
mean 0.63 56.79 0.72 0.54 0.08 17.28 0.23 0.21 0.13 8
std 0.01 6.09 0.07 0.07 0.01 4.05 0.05 0.05 0.02 2
best round 0.69 61.28 0.79 0.59 0.11 15.59 0.18 0.17 0.15 9
mean 0.63 56.28 0.73 0.53 0.08 17.51 0.23 0.22 0.13 8
std 0.01 6.03 0.07 0.07 0.02 4.01 0.05 0.05 0.02 2
best round 0.68 51.37 0.91 0.45 0.08 9.35 0.11 0.10 0.10 10
mean 0.63 56.72 0.72 0.54 0.08 17.32 0.23 0.22 0.13 8
std 0.02 6.13 0.07 0.07 0.02 4.14 0.05 0.05 0.02 2
best round 0.68 66.43 0.71 0.65 0.10 14.59 0.14 0.16 0.15 9
0.6
0.9
10-fold cross validation
mean standard deviation
0.1
0.4
Test: 1 2 3 4 5
Thres_c 0.5 0.3 0.1 0.9 0.7
Thres_v 0.5 0.3 0.1 0.1 0.3
Adaptative threshold
5.4.1 Sequential Forward Selection
The number of features selected in health care problems is crucial. Lower number of features
selected corresponds to simpler models, as announced in section 1.1.2. After the analysis of several
articles [15,16] that used fuzzy modelling to predict the readmission of ICU patients and feature
selection, it was decided to use the stop criterion of the SFS algorithm of 10 features selected. After
the spot criteria was met, the set of features selected with the best performance (AUC) was selected.
Table 5.10 shows the results of the 500 rounds for each of the datasets.
The results show that the use of different gradients to the weights of the weighted mean was
not significant. The number of features selected and the corresponded AUC measure did not allow the
retrieval of a definitive pattern of the best gradient.
5.4.2 Binary Fish School Search
The stop criterion selected to the metaheuristic algorithms (BFSS and BPSO) was the number
of iterations, 500 iterations with 10 (fish/particles). This selection was made taking into consideration
the max number of features to be selected.
Once again, a study was performed to select the parameters. The first parameters to be
selected were the and . The conclusions of the study of parameters made to the sonar
database were here used, only the 5 tests with significant better overall performance were here used to
select and parameters. The configuration of parameters in this 5 tests is presented in
Table 5.11.
Table 5.10: Model assessment results - SFS method using the readmission datasets with different gradients.
Table 5.11: Configuration of the parameters thres_c and tresh_v for each of the 5 tests.
57
These combinations of the parameters correspond to the best 5 configurations in the first
study of the parameters of the sonar database, (tests 12, 13, 14, 15 and 16 in appendix A, with the
adaptive threshold). The results of these 5 configurations, presented in section 5.3.3, were so evident
that there was no need to test again all the configurations.
As the four datasets only differ in the 22 features of the weighted mean, it was decided to do
the study of parameters for only one dataset. The detailed results for these 5 tests are attached in
appendix B.
The same strategy as in the sonar database was adopted for choosing the best test. For each
test, the parameters Wscale (values of 5, 50,500,5000,20000, 100000 and 1000000) and α (values of 0.1,
0.3, 0.5, 0.7 and 0.9) were tested.
After the observation of the detailed results, it was evident that the indicator that stood out
was the plot. The other operators prove not to be conclusive As an example, Fig 5.12 illustrates the
graphical representation of three indicators.
As the graphical visualization is a qualitative analyse and its goal is to exclude tests that have
obvious weak performance, it was not possible to conclude which was the test with better
performance. However, as shown in the appendix B, the plot indicator states that test 3 has a really
good convergence in the visualization of the graphical evolution in comparison with the other tests.
Figure 5.11: Graphical representation of the results of tests 1-5.
58
Gradient 500 rounds AUC ACC % sensitivity specificity AUC ACC % sensitivity specificity threshold features selected
mean 0.61 53.32 0.72 0.51 0.07 20.52 0.27 0.27 0.13 5
std 0.03 7.99 0.10 0.10 0.02 5.79 0.07 0.08 0.03 1
best round 0.69 67.00 0.73 0.66 0.08 8.24 0.21 0.11 0.15 6
mean 0.61 53.85 0.71 0.51 0.07 20.95 0.27 0.27 0.13 5
std 0.03 8.35 0.11 0.11 0.02 5.69 0.07 0.08 0.03 1
best round 0.68 63.16 0.76 0.61 0.13 15.21 0.20 0.17 0.15 6
mean 0.61 52.85 0.72 0.50 0.07 20.98 0.27 0.27 0.13 5
std 0.02 8.19 0.10 0.11 0.02 5.70 0.07 0.08 0.03 1
best round 0.67 64.58 0.70 0.64 0.11 11.41 0.22 0.14 0.15 7
mean 0.61 53.69 0.71 0.51 0.07 21.35 0.28 0.28 0.13 5
std 0.03 7.73 0.10 0.10 0.02 5.74 0.08 0.08 0.03 1
best round 0.69 55.53 0.87 0.51 0.11 14.12 0.11 0.16 0.15 6
0.6
0.9
10-fold cross validation
mean standard deviation
0.1
0.4
thre_c thre_v Stepind Wscale α Fitness FS Features selectedIteration with the
best model in FSACC FS
MeanAUC
crossvalidation
%
contraction
No. of same
postition of
baricenter
Plot
[0.5 0.02] 0.94 4 166 0.64 0.60 0.10 39 ++
[0.5 0.001] 0.94 4 166 0.64 0.60 0.11 39 ++
[0.2 0.01] 0.92 8 149 0.71 0.59 0.28 42 ++
[0.2 0.001] 0.95 1 96 0.54 0.54 0.27 44 ++
[0.01 0.001] 0.93 7 314 0.75 0.58 0.42 80 ++
Study of the parameter: Stepind [inital and final values]
0.1 0.1 500 0.1
The selection of the parameters Wscale and α was made observing the detailed data from test 3,
Wscale=500 and α=0.1. The measures that had major influence in this choice were the number of
features selected (low) and the performance of the best model in the FS process (high). The final
parameter to be optimized was the parameter Stepind. The results are presented in Table 5.12.
The final value of the Stepind of [0.01 0.001] was selected mainly because of the percentage of
contraction. Reminding that very low percentage of contraction in the volitive operator means no
convergence, low exploitation, and very high percentage means convergence to local maxima, low
exploration.
With a viable set of parameters selected, the 500 rounds were performed for the 4 datasets
with different gradients for the weights of the weighted mean, being the results presented in Table
5.13.
As occurred in the SFS results, the results of the usage of different gradients for the weighted
means prove that there was not a gradient that achieved significantly better performance. In title of
example, the graphical evolution of the FS process using BFSS and the dataset with gradient of 0.1 is
presented in Fig. 5.13.
The graphical evolution in Fig 5.13, unquestionably confirms the convergence of the novel
BFSS algorithm formulated in this thesis. In the first ~200 iterations there is a transitory dynamic,
where the BFSS algorithm mainly converges to a lower number of features. In the rest of the iterations,
the algorithm searches for the optimal combination of a low number of features and a high
performance of AUC.
Table 5.12: Results of the study the parameters: stepind ([initial final]). Selected as better in bold.
Table 5.13: Model assessment results - BFSS method using the readmission datasets with different gradients.
59
It is important to recall that the BFSS´s internal mechanisms, more specifically the collective-
volitive operator imposes a contraction (if the individual search is successfully) or an expansion (if the
individual search is unsuccessfully) which results in the oscillation of the graphical evolution. This effect
is highly beneficial in terms of avoiding convergence to local maxima, and still maintaining the
convergence.
5.4.3 Binary Particle Swarm Optimization
The BPSO had already been used in feature selection problems using datasets derived from
the MIMICII database. Even though the Shannon entropy and the weighted mean were introduced, the
overall characteristics of the database are very similar to those datasets used before [17]. Thus, it was
decided to use the parameters referred in [17] with the exception of the parameter α:
Vmax = 5
Pmut = no
Rmut= 0.5/Nf
Reset xsb=4
With the introduction of the Shannon entropy and weighted mean, the number of total
features to be selected was higher than in [17]. The parameter α is directly related to the final number
Figure 5.12: Graphical evolution of the BFSS process of feature selection. Evolution of the fish
with best performance per iteration (above) and evolution of the number of features selected of
the fish with the best performance per iteration (below)
60
α features selected Mean AUC crossvalidation
0.1 1 0.52
0.3 6 0.61
0.5 8 0.64
0.7 15 0.68
0.9 21 0.64
Study of the parameter: α
Gradient 500 rounds AUC ACC % sensitivity specificity AUC ACC % sensitivity specificity threshold features selected
mean 0.62 53.60 0.73 0.51 0.07 17.53 0.24 0.23 0.14 10
std 0.02 7.18 0.09 0.09 0.02 5.50 0.07 0.07 0.03 3
best round 0.68 56.72 0.83 0.53 0.08 15.92 0.16 0.19 0.15 11
mean 0.53 68.34 0.33 0.73 0.11 7.91 0.20 0.09 0.15 10
std 0.04 7.05 0.14 0.10 0.03 2.29 0.06 0.02 0.02 3
best round 0.67 68.89 0.64 0.70 0.15 8.58 0.25 0.08 0.15 11
mean 0.62 53.80 0.73 0.51 0.07 17.85 0.25 0.23 0.14 9
std 0.02 6.51 0.08 0.08 0.02 5.53 0.07 0.07 0.03 3
best round 0.69 58.44 0.84 0.55 0.12 14.69 0.17 0.16 0.15 14
mean 0.59 57.66 0.62 0.57 0.08 14.60 0.23 0.18 0.14 10
std 0.05 9.75 0.21 0.14 0.03 6.58 0.07 0.09 0.03 3
best round 0.67 59.13 0.78 0.57 0.12 10.23 0.31 0.14 0.10 8
0.4
0.6
0.9
10-fold cross validation
mean standard deviation
0.1
of features selected and is affected by the total number of features. It was necessary to do an analysis
to select the value of the parameter α. Table 5.14 shows the results of the tests. Once again, this
analysis was made only for the dataset with the gradient 0.1.
It was decided to use the α value of 0.5. The final set of parameters were then used to perform
the 500 rounds to each of the 4 readmission datasets, the results are presented in Table 5.15.
The results of the usage of the BPSO algorithm in the features selection process show that the
gradient of 0.9 achieved better overall performance than the other gradients datasets, mainly because
of the low number of features selected. Nonetheless, it is important to note that the results are not
highly evident, reinforcing the idea that the use of different gradients is not significant. As an
illustrative example, the graphical evolution of the best model using gradient 0.9 is presented in Fig.
5.14. The graphic evolution shows an explicit convergence of the BPSO algorithm. Unlike the BFSS
algorithm, after the transient dynamic (1st to ~250th iteration), the oscillation of the best particle in
each iteration is not sharp. This fact can be related to a lack of exploration and consequent lower
performance. In the author´s opinion, this is the main disadvantage of using the BPSO over the BFSS.
Table 5.14: Results of the study the parameters: α. Selected in bold.
Table 5.15: Model assessment results - BPSO method using the readmission datasets with different gradients.
61
Gradient 500 rounds AUC ACC % sensitivity specificity AUC ACC % sensitivity specificity threshold features selected
mean 0.59 48.78 0.72 0.46 0.06 28.10 0.34 0.37 0.31 132
std 0.02 10.24 0.12 0.13 0.02 5.02 0.07 0.07 0.04 0
best round 0.65 64.84 0.66 0.65 0.08 19.88 0.25 0.25 0.25 132
mean 0.59 49.31 0.71 0.46 0.06 0.28 0.34 0.36 0.31 132
std 0.02 10.39 0.11 0.13 0.02 0.05 0.07 0.07 0.05 0
best round 0.66 66.89 0.66 0.67 0.11 0.25 0.27 0.32 0.25 132
mean 0.59 49.84 0.70 0.47 0.06 28.49 0.34 0.37 0.31 132
std 0.02 9.73 0.11 0.13 0.02 4.50 0.06 0.06 0.05 0
best round 0.66 60.59 0.74 0.59 0.08 17.77 0.14 0.21 0.20 132
mean 0.58 48.77 0.71 0.46 0.06 28.45 0.34 0.37 0.32 132
std 0.02 9.80 0.11 0.13 0.02 4.82 0.06 0.06 0.05 0
best round 0.65 67.58 0.63 0.68 0.10 11.73 0.21 0.14 0.20 132
0.1
0.4
0.6
0.9
10-fold cross validation
mean standard deviation
5.4.4 No feature selection
For each of the 4 datasets, the whole process was computed without using a FS algorithm
(using all available features). The results of the 500 rounds are presented in Table 5.16.
The results show that with no feature selection, the performances using the different datasets
are very similar.
Table 5.16: Model assessment results – without feature selection using the readmission datasets with different gradients.
Figure 5.13: Graphical evolution of the BPSO process of feature selection. Evolution of the fish
with best performance per iteration (above) and evolution of the number of features selected of
the fish with the best performance per iteration (below)
62
500 rounds AUC ACC % sensitivity specificity AUC ACC % sensitivity specificity threshold features selected
mean 0.58 48.77 0.71 0.46 0.06 28.45 0.34 0.37 0.32 132
std 0.02 9.80 0.11 0.13 0.02 4.82 0.06 0.06 0.05 0
best round 0.65 67.58 0.63 0.68 0.10 11.73 0.21 0.14 0.20 132
mean 0.64 56.73 0.73 0.54 0.08 17.33 0.24 0.22 0.14 8
std 0.02 6.14 0.08 0.08 0.02 4.15 0.06 0.05 0.02 2
best round 0.68 66.43 0.71 0.66 0.10 14.59 0.14 0.17 0.15 9
mean 0.61 53.69 0.71 0.51 0.07 21.35 0.28 0.28 0.13 5
std 0.03 7.73 0.10 0.10 0.02 5.74 0.08 0.08 0.03 1
best round 0.69 55.53 0.87 0.51 0.11 14.12 0.11 0.16 0.15 6
mean 0.59 57.66 0.62 0.57 0.08 14.60 0.23 0.18 0.14 10
std 0.05 9.75 0.21 0.14 0.03 6.58 0.07 0.09 0.03 3
best round 0.67 59.13 0.78 0.57 0.12 10.23 0.31 0.14 0.10 8
NO FS
SFS
BFSS
BPSO
Gradient:0.9
mean, 10-fold cross validation standard deviation, 10-fold cross validation
5.4.4 Discuss
After collecting the results of all the methods of feature selection for the readmission datasets,
it can be affirmed that the use of different gradients for the weights in the weighted mean proved to
be weakly relevant. With the exception of the results using the BPSO algorithm, all the algorithms
show almost no sensitivity to the presence of different gradients for the linear distribution of the
weights for the weighted mean for the 22 physiologic variables.
The performance of the different feature selection algorithms can be compared using the
same dataset. It was decided to collect the results of one dataset to do the comparison, Table 5.17
summarizes the results.
The results of the comparison of the different features algorithms are not as obvious as the
ones with the benchmark sonar database were, possibly due to the variability introduced dealing with
real data in high dimensions spaces. The overall performance of the metaheuristic algorithms achieved
better results in comparison with the SFS and no feature selection algorithm results.
Once again, the BFSS achieved better overall results in comparison with all methods used,
achieving slightly better mean of AUC and less features selected.
It is noteworthy to mention that in all of the datasets used, the BFSS (Table 5.13) achieved
considerably less features than all the other methods, maintaining the convergence of the algorithm
and slightly better performance of the models.
It is also relevant to note that the sensitivity of the best model using the BFSS algorithm is
considerably better in comparison with that of other algorithms. Recalling that this measure indicates
the true positive rate, i.e. the patients who were selected as readmitted and were in fact.
Table 5.17: Comparison between the studied FS approaches using the readmission dataset with
gradient:0.9.
63
Chapter 6
Conclusion
This work has addressed the problem of feature selection. Firstly, the background for machine
learning was presented, followed by the definition of its application to the problem of classification.
Then, the two databases to be used were introduced, consisting of the benchmark sonar database and
the readmission datasets arising from the MIMIC II database. Further along, the fuzzy classification
models and C-mean clustering techniques were briefly described.
The modifications that allowed the optimization algorithm Fish School Search to be used in
binary input problems were formulated and then applied to the feature selection problem,
emphasizing the novel Binary Fish school Search. Other state of the art algorithms for FS were also
described, namely the SFS and the BPSO approaches.
Finally, the proposed methods were validated over the benchmark sonar database and then
applied to a case of study: the prediction of patient’s readmission in an ICU within 24-72h period
following their discharge.
This chapter summarizes the conclusions acquired during the development of this work, and
suggests topics for future research.
6.1- Binary Fish School Search
In general, the modification of an existing algorithm is risky. In advance, it is very difficult to
the algorithm developer to guarantee that the algorithm will have a better performance that the state
of the art algorithms or even to assure convergence.
There are countless ways to modify algorithms with real encoding schemes in order to solve
binary input problems. In this thesis, two novel approaches to the FSS algorithm were presented, the
D2BFSS and the BFSS. The decimal to binary approach was achieved through simple passage from
continuous to discrete system in the usage of the objective function to the original FSS algorithm. The
BFSS was a more complex approach that modified the internal mechanisms of the FSS algorithm,
allowing the procedure itself to manipulate the binary inputs.
64
Results of the benchmark sonar database for the feature selection problem showed that
simple manipulation of the fitness function, in order to transform the objective function from decimal
to binary system, achieved low performance, as well as a very low convergence evolution, when
compared to the BFSS algorithm. This result was expected due to the simplicity of the D2BFSS, and
reinforced the need to implement internal changes in FSS algorithm.
The results of the two databases tested showed that the initial development goals proposed
for the BFSS algorithm were achieved: to remain faithful to the original internal mechanisms of the FSS
without losing the meaning of each operator, addiction of few number of parameters, overall
simplicity and understanding of the modifications, and, mainly, the convergence of the algorithm.
However, it is important to note that the number of parameters used in the BFSS process is
high in comparison with other Feature Selection algorithms, and some limitations to the proposed
algorithm can arise if there is a poor adjustment in the parameters.
To conclusion, this brand new binary optimization algorithm exceeded expectations, with not
only its convergent evolution but also with good results in comparison with the other features
selection algorithms, especially the BPSO. The main contribution to achieve such results is believed to
have been the presence of the collective-volitive operator, which had a major role in the exploration
process of the algorithm, and consequent capacity of avoiding local maxima.
It is also important to note that although the formulated BFSS algorithm was used in the
problem of feature selection, it can be applied to any optimization problem with a binary encoding of
the inputs, opening doors for future fields of research
6.2- Prediction of readmissions
The dimensionality of medical data is typically very high. Hence the great importance of using
algorithms of feature selection for this environment, improving early detection of medical problems.
The proposed wrapper methods are appropriated to this problem due to various reasons, such as the
possibility to deal with large databases, the capacity to predict outcomes by means of a small subset of
features and the possibility to bring out new set of variables never considered before.
However, the proposal of using different gradients for the linear distribution of the weights in
the weighted mean to describe the time series of the physiologic variables, proved not to be
significant. It was expected that one gradient for the weights of the weighted mean would stand out,
the results show that the use of different gradients did not present a significant benefit in the
performance of the models nor in the number of Features Selected.
65
6.3- Future work
In this work, various modifications to the FSS algorithm were proposed. Nevertheless, not all
the possible paths were explored. The most promising future research topics in the BPSO algorithm
could be the following:
Use of the BFSS algorithm in other application: although the presented BFSS algorithm was used
in the feature selection problem, it can be used for any optimization problem with a binary
encoding of the inputs.
Reduction of the number of parameters: lower number of parameters means less time in their
selection and, generally, more simplicity. This research topic must be done maintaining the good
performance of the models and low number of features selected.
Use of newly different parameters update strategies: instead of using constant parameters in all
iterations of the algorithm, using varying values over the iterations. Studies [31] already proved
that the use of dynamic values for the weights of the fishes in FSS algorithms can be beneficial.
Development and refinement of FSS operators: testing other possible paths in the encoding of
the binary inputs.
To prove the full potential of the BFSS algorithm it is of major important to:
Validate de results in other benchmark databases: testing for databases with different dimensions
and compare the results with other state of the art algorithms.
Not using the weighted mean in the MIMIC II database: using the same kind of statistical
measures as the latest studies in the prediction of readmission of ICU patients. This would allow
the comparison of the performance of the BFSS to the state of the art features selection
algorithms in the readmission problem.
66
67
Bibliography
[1] D. C. Angus. Grappling with intensive care unit quality: does the readmission rate tell us
anything?. Critic Care Med, 26(11) pages 1779-1780. 1998.
[2] R. Babuska. Fuzzy Modeling for Control. International Series in Intelligent Technologies. Kluwer
Academic Publishers, Norwell, MA, USA, 1st edition. April 1998.
[3] W. Baigelman, R. Katz, G. Geary. Patient readmission to critical care unit during the same
hospitalization at a community teaching hospital. Intensive Care Med, 9(5), pages 253-256. 1983.
[4] A. M. Bensaid, L. O. Hall, J. C. Bezdek, L. P. Clarke, M. L. Silbiger, J. A. Arrington, R. F. Murtagh.
Validity-guided (re)clustering with applications to image segmentation. In IEEE Transactions on
Fuzzy Systems, pages 112-123. 1996.
[5] J.C. Bezdek. Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic
Publishers, Norwell, MA, USA. 1981.
[6] A. L. Blum, R. L. Rivest. Training a 3-node Neural Network is NP-complete. Neural Networks,
pages 117-127, 1992.
[7] A. J. Campbell, J. A. Cook, G. Adey. In Predicting death and readmission after intensive care
discharge. Br J Anaesth, pages 656-662, 2008.
[8] S. Chakrabarti,E.Cox ,E. Frank,R. ,H. G. ,J.Han ,X. Jiang ,M. Kamber ,S. S. Lightstone ,T. P. Nadeau
,R.E. Neapolitan ,D. Pyle ,M. Refaat ,M. Schneider ,T. J. Teorey ,I.H. Witten. Data Mining-know it
all data mining. Elsevier. 2009.
[9] G.D. Clifford, W. J. Long, G.B. Moody, P. Szolovits. Robust parameter extraction for decision
support using multimodal intensive care data. Philosophical Transactions, Series A,
Mathematical, Physical, and Engineering Sciences, 367, pages 411-429. 2009.
[10] J. Dayou, N. C. Han, H. C. Mun, A. H. Ahmad. Classification and Identification of Frog Sound
Based on Entropy Approach. IPCBEE, vol. 3, pages 184-187. 2011.
[11] R. O. Duda, P. E. Hart. Pattern Classification and Scene Analysis. Wiley & Sons. 1973.
[12] C. G. Durbin Jr, R. F. Kopel. A case control study of patients readmitted to the intensive care
unit. Critic Care Med, 21, pages 1547-1553. 1993.
[13] A. P. Engelbrecht. Computational Intelligence: An Introduction. New York: John Wiley. 2003.
[14] U. Fayyad, G. Piatetsky-Shapiro, P. Smyth. From data mining to knowledge discovery in
databases. AI Magazine, 17(3), pages 37-54. 1996.
68
[15] A.S. Fialho, F. Cismondi,S.M. Vieira, S.R. Reti, J.M.C. Sousa, S.N. Finkelstein. Data mining using
clinical physiology at discharge to predict ICU readmissions. Expert Systems with Applications
39. 2012.
[16] A. S. Fialho, F. Cismondi, S. M. Vieira, J. M. C. Sousa, S. R. Reti, M. D. Howell, and S. N. Finkelstein.
Predicting outcomes of septic shock patients using feature selection based on soft computing
techniques. In Proc. of the IPMU 13th International Conference, pages 65-74. 2010.
[17] C. J. A B. Filho. , F. B. L. Neto, A. J. C. C. Lins, A. I. S. Nascimento, M. P. Lima. A Novel Search
Algorithm based on Fish School Behavior. IEEE International Conference on Systems, Man, and
Cybernetics - SMC. 2008.
[18] S. N. Ghazavi,T. W. Liao. Medical data mining by fuzzy modeling with selected features. Artif
Intell Med, 43(3), pages 195-206. 2008.
[19] F. A.B. Glover. Heuristics for Integer programming Using Surrogate Constraints. Decision
Sciences 8 (1): 156-166. 1977.
[20] F. A. B. Glover. "Future Paths for Integer Programming and Links to Artificial
Intelligence. Computers and Operations Research 13 (5), pages 533-549. 1986.
[21] R. P. Gorman, T. J. Sejnowski. Analysis of hidden units in a layered network trained to classify
sonar targets. Neural Networks 1, pages75-89. 1988.
[22] I. Guyon, A. Elisseeff. An introduction to variable and feature selection. Journal of Machine
Learning Research, 3, pages 1157-1182. 2003.
[23] I.Guyon, S. Gunn, M. Nikravesh, and L. A. Zadeh, editors. Feature Extraction: Foundations and
Applications (Studies in Fuzziness and Soft Computing). Springer. August 2006.
[24] I. Guyon, S. Gunn, M. Nikravesh, L. A. Zadeh, editors. Feature Extraction: Foundations and
Applications (Studies in Fuzziness and Soft Computing). Springer. August 2006.
[25] D. C. Hadorn, E. B. Keeler, W. H. Rogers, R.H. Brook. Assessing the Performanceof Mortality
Prediction Models. RAND. 1993.
[26] J. Han, M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann. 2000.
[27] J.H. A. B. Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press.
1975
[28] A. L. Horn, F. Cismondi, A. S. Fialho, S. M. Vieira, J. M. C. Sousa, S. R. Reti, M. D. Howell, S. N.
Finkelstein. Multi-objective performance evaluation using fuzzy criteria: Increasing sensitivity
prediction for outcome of septic shock patients. Proc. of 18th World Congress of the
International Federation of Automatic Control (IFAC). 2011.
[29] G. Hughes. On the mean accuracy of statistical pattern recognizers. IEEE Transactions on
Information Theory, 14(1), pages 55-63. January 1968.
[30] L. Hyafil, R. L. Rivest. Constructing Optimal Binary Decision Trees is NP-complete. Information
Processing Letters, 5(1), pages15-17. 1976.
69
[31] A. G. K. Janecek, Y. Tan. Feeding the fish - weight update strategies for the fish school search
algorithm. ICSI'11 Proceedings of the Second international conference on Advances in swarm
intelligence , Volume Part II. 2011.
[32] R. Kaushal, D. Blumenthal, E. G. Poon, A. K. Jha, C. Franz, B. Middleton, J. Glaser, G.Kuperman, M.
Christino, R. Fernandopulle, J. P. Newhouse, D. W. Bates. The costs of a national health
information network. Annuals of Internal Medicine, 143(3), pages165-173. 2005.
[33] V. Kecman. Learning and Soft Computing Support Vector Machines, Neural Networks, Fuzzy
Logic Systems. MIT Press. 2001.
[34] S. A.B. Kirkpatrick, C.D. Gelatt Jr., M.P. Vecchi. Optimization by Simulated
Annealing. Science 220 (4598), pages 671–680. 1983
[35] R. Kopach-Konrad, M. Lawley, M. Criswell, I. Hasan, S. Chakraborty, J. Pekny, B. Doebbeling.
Applying systems engineering principles in improving health care delivery. Journal of General
Internal Medicine, 22, pages 431-437. 2007.
[36] A. Kossiakoff, W. N. Sweet. Systems Engineering Principles and Practice. Wiley-Interscience,
2002.
[37] L. I. Kuncheva. Combining Pattern Classifiers: Methods and Algorithms. Wiley. 2004.
[38] M. A. Mazurowski, P. A. Habas, J. M. Zurada, J.Y. Lo, J. A. Baker, G. D. Tourassi. Training neural
network classifiers for medical decision making: The effects of imbalanced datasets on
classification performance. Neural Networks, 21(2-3):427-436. 2008.
[39] L. F. Mendonça, S. M. Vieira, J. M. C. Sousa. Decision tree search methods in fuzzy modeling and
classification. International Journal of Approximate Reasoning, 44(2), pages106-123. 2007.
[40] Z. Michalewicz, D. Fogel. How to solve it - modern heuristics. 2nd edn, Springer, Heidelberg.
2004.
[41] H. Liu, H. Motoda. Computational Methods of Feature Selection. Chapman and Hall. 2007.
[42] G. E. Moore. Cramming more components onto integrated circuits. Electronics Magazine, 4.
1965.
[43] A. Navot. On the Role of Feature Selection in Machine Learning Thesis. Senate of the Hebrew
University. December 2006.
[44] N. Nilsson. Learning Machines: Foundations of Trainable Pattern-Classifying Systems. McGraw-
Hill. 1965.
[45] A. Partap, B. Singh. Shannon and Non-Shannon Measures of Entropy for Statistical Texture
Feature Extraction in Digitized Mammograms. WCECS, vol. II. 2009.
[46] E. A. Patrick. Fundamentals of Pattern Recognition. Prentice Hall. September 1972.
[47] P. Reid. Building a Better Delivery System: A New Engineering/Health Care Partnership.National
Academies Press. 2005.
70
[48] A. L. Rosenberg, C. M. Watts. Patients readmitted to intensive care units: A systematic review of
risk factors and outcomes. Chest, pages 492–502, 2000.
[49] A. L. Rosenberg, C. Watts. Patients Readmitted to ICUs: A Systematic Review of Risk Factors and
Outcomes. Critical Care Reviews, 2000.
[50] A. L. Rosenberg, T. P. Hofer, R. A. Hayward, C. Strachan, C. M. Watts. Who bounces back?
Physiologic and other predictors of intensive care unit readmission. Crit Care Med, pages 511–
518. 2001.
[51] M. Saeed, C. Lieu, R. Mark. MIMIC II. a massive temporal icu database to support research in
intelligence patient monitoring. Computers in Cardiology, 29, pages 641-644. 2002.
[52] Y. Saeys, I. Inza, P. Larranaga. A review of feature selection techniques in bioinformatics.
Bioinformatics, 23(19), pages 2507–2517. 2007.
[53] N. Sanchez-Marono, A. Alonso-Betanzos, and M. Tombilla-Sanroman. Filter methods for feature
selection: A comparative study. In Intelligent Data Engineering and Automated Learning: IDEAL
2007, volume 4881 of Lecture Notes in Computer Science, pages 178-187. Springer Berlin /
Heidelberg. 2007.
[54] J. A. Sokolowski. Principles of Modeling and Simulation: A Multidisciplinary Approach. Wiley.
February 2009.
[55] J. M. C. Sousa, U. Kaymak. Fuzzy Decision Making in Modeling and Control. World Scientific
Singapore. 2002.
[56] L. Talavera. An evaluation of filter and wrapper methods for feature selection in categorical
clustering. In Advances in Intelligent Data Analysis VI, volume 3646 of Lecture Notes in
Computer Science, pages 742-742. Springer Berlin / Heidelberg. 2005.
[57] T. Takagi, M. Sugeno. Fuzzy identification of systems and its applications to modelling and
control. IEEE Transactions on Systems, Man and Cybernetics, 15(1) pages 116–132. 1985.
[58] G. Upton, I. Cook . Understanding Statistics. Oxford University Press. page 55.1996.
[59] L. Yuan, Z.-D. Zhao. A modified binary particle swarm optimization algorithm for permutation
flow shop problem. In Proc. of the International Conference on Machine Learning and
Cybernetics, vol. 2, pages 902-907.
i
Appendix
ii
Test No. thre_c thre_v Stepind Wscale α Fitness FS Features selectedIteration with the
best model in FSACC FS
MeanAUC
crossvalidation
%
contraction
No. of same
postition of
baricenter
Plot
0.1 0.96 1 196 78.13 63.33 0.47 297 -
0.3 0.92 1 44 78.13 68.08 0.75 299 -
0.5 0.91 5 115 90.63 70.63 0.95 299 -
0.7 0.89 5 56 87.5 67.31 0.98 295 -
0.9 0.93 11 85 93.75 76.12 0.99 299 -
0.1 0.96 1 260 78.13 67.08 0.46 298 -
0.3 0.93 2 229 84.38 76.01 0.74 297 -
0.5 0.91 3 92 87.5 71.70 0.95 297 -
0.7 0.88 6 109 87.5 70.99 0.98 295 -
0.9 0.92 12 65 93.75 70.90 0.98 299 -
0.1 0.96 1 152 78.13 68.10 0.45 297 -
0.3 0.93 4 239 90.63 71.19 0.77 299 -
0.5 0.91 3 59 87.5 68.77 0.96 299 -
0.7 0.88 2 59 84.38 76.34 0.97 295 -
0.9 0.92 16 108 93.75 76.03 0.98 295 -
0.1 0.96 1 175 71.88 61.42 0.46 297 -
0.3 0.93 2 71 84.38 68.49 0.77 297 -
0.5 0.91 5 142 90.63 66.99 0.95 297 -
0.7 0.89 4 95 87.5 67.95 0.98 299 -
0.9 0.95 13 104 96.88 73.01 0.99 299 -
0.1 0.96 1 152 71.88 71.19 0.46 299 -
0.3 0.93 2 175 84.38 75.40 0.78 299 -
0.5 0.91 2 206 84.38 74.94 0.93 299 -
0.7 0.92 3 182 90.63 64.97 0.98 299 -
0.9 0.92 12 20 93.75 72.54 0.99 294 -
0.1 0.96 1 268 71.88 73.94 0.50 277 -
0.3 0.94 3 221 90.63 69.39 0.76 285 -
0.5 0.90 3 112 84.38 75.83 0.93 278 -
0.7 0.90 3 221 87.5 65.95 0.98 284 -
0.9 0.93 10 129 93.75 73.63 0.97 286 -
0.1 0.95 1 200 68.75 69.59 0.48 279 -
0.3 0.92 1 259 78.13 69.81 0.63 284 -
0.5 0.91 3 161 87.5 75.03 0.91 285 -
0.7 0.91 5 43 90.63 65.84 0.97 274 -
0.9 0.95 16 50 96.88 73.01 0.94 276 -
0.1 0.95 1 193 62.5 65.04 0.49 276 -
0.3 0.93 2 276 84.38 77.98 0.68 277 -
0.5 0.91 3 277 87.5 70.90 0.93 286 -
0.7 0.90 7 82 90.63 67.99 0.97 284 -
0.9 0.94 21 9 96.88 70.40 0.96 279 -
0.1 0.96 1 253 78.13 65.48 0.48 280 -
0.3 0.93 2 259 84.38 74.11 0.72 284 -
0.5 0.90 4 56 87.5 67.26 0.94 284 -
0.7 0.89 4 69 87.5 73.94 0.94 284 -
0.9 0.92 15 29 93.75 72.94 0.98 280 -
0.1 0.96 1 213 71.88 62.93 0.46 275 -
0.3 0.92 1 114 78.13 68.57 0.74 285 -
0.5 0.90 3 162 84.38 70.61 0.92 288 -
0.7 0.90 7 111 90.63 69.30 0.98 278 -
0.9 0.90 7 88 90.63 72.30 0.98 282 -
0.1 0.92 3 239 65.63 68.68 0.57 203 -
0.3 0.87 8 103 87.5 73.94 0.67 225 -
0.5 0.84 13 136 90.63 76.74 0.89 250 -
0.7 0.82 22 207 90.63 70.59 0.97 239 -
0.9 0.85 20 8 87.5 74.21 0.97 240 -
0.1 0.89 6 191 75 72.21 0.58 205 -
0.3 0.82 11 228 84.38 67.97 0.76 243 -
0.5 0.81 19 90 93.75 70.30 0.94 255 -
0.7 0.85 22 96 93.75 71.92 0.97 247 -
0.9 0.90 25 202 93.75 75.56 0.92 232 -
0.1 0.89 5 290 65.63 69.10 0.62 207 -
0.3 0.82 10 186 78.13 71.10 0.91 250 -
0.5 0.84 13 51 90.63 75.96 0.90 264 -
0.7 0.85 25 173 96.88 72.40 0.90 242 -
0.9 0.91 21 18 93.75 73.12 0.95 246 -
0.1 0.89 6 256 81.25 71.19 0.54 203 -
0.3 0.85 9 127 84.38 73.74 0.75 260 -
0.5 0.84 14 55 90.63 70.19 0.88 246 -
0.7 0.84 24 41 93.75 73.03 0.92 249 -
0.9 0.97 21 84 100 72.63 0.90 248 -
0.1 0.87 7 120 75 71.99 0.63 228 -
0.3 0.83 11 124 87.5 74.10 0.89 253 -
0.5 0.84 16 103 93.75 73.72 0.87 258 -
0.7 0.88 15 151 93.75 72.52 0.97 248 -
0.9 0.93 27 199 96.88 73.83 0.89 250 -
3
2
1
0.5 0.5 [0.2 0.01]
5
50
500
5000
20000
5000
20000
0.7 0.7 [0.2 0.01]
5
50
500
5000
20000
Study of the parameter: stepind e stepvol [initial and final value], wscale and α
5
0.9 0.9 [0.2 0.01]
50
500
A Extended Results - sonar database
In order to choose an appropriated set of parameters to the optimization algorithm, it is
important to do a coherent study to the parameters to be used. Some limitations can arise if
there is a poor adjustment in the parameters. With this in mind, we present the detailed results
of the 16 tests used in the study of the parameters thres_c and thres_v, wscale and α, for the
sonar database. The 1-9 tests used the non-adaptative threshold and the 10 -18 used the
adaptative threshold.
Table A.1: Results of the study the parameters: thres_c and thres_v - Tests 1-3
iii
Test No. thre_c thre_v Stepind Wscale α Fitness FS Features selectedIteration with the
best model in FSACC FS
MeanAUC
crossvalidation
%
contraction
No. of same
postition of
baricenter
Plot
0.1 0.64 22 1 65.63 85.48 1.00 262 -
0.3 0.71 19 1 78.13 72.70 1.00 245 -
0.5 0.72 26 2 87.5 69.35 1.00 249 -
0.7 0.72 26 1 78.13 68.52 1.00 266 -
0.9 0.83 35 2 87.5 76.23 0.99 265 -
0.1 0.61 24 1 68.75 70.10 0.99 245 -
0.3 0.67 22 9 75 71.48 1.00 235 -
0.5 0.68 29 11 84.38 68.39 0.99 260 -
0.7 0.76 35 8 90.63 69.30 1.00 252 -
0.9 0.82 40 62 87.5 77.56 1.00 261 -
0.1 0.67 20 3 71.88 71.99 1.00 250 -
0.3 0.64 26 3 81.25 78.37 0.99 253 -
0.5 0.67 23 4 71.88 71.41 1.00 253 -
0.7 0.73 36 43 87.5 69.59 1.00 260 -
0.9 0.82 39 85 87.5 72.42 1.00 260 -
0.1 0.63 23 2 78.13 72.08 1.00 247 -
0.3 0.64 23 1 68.75 70.37 1.00 248 -
0.5 0.64 24 2 68.75 70.17 0.97 244 -
0.7 0.76 31 7 87.5 71.63 1.00 258 -
0.9 0.81 32 9 84.38 68.77 0.99 256 -
0.1 0.65 22 1 78.13 70.81 1.00 258 -
0.3 0.67 22 3 75 73.92 1.00 246 -
0.5 0.68 22 1 71.88 67.97 1.00 253 -
0.7 0.75 32 12 87.5 72.63 0.99 252 -
0.9 0.83 37 20 87.5 70.19 1.00 267 -
0.1 0.63 23 3 78.13 71.48 1.00 294 -
0.3 0.67 22 1 75 76.56 1.00 297 -
0.5 0.71 20 1 75 71.40 1.00 296 -
0.7 0.73 37 42 87.5 72.83 1.00 299 -
0.9 0.78 28 1 81.25 70.72 0.99 298 -
0.1 0.67 20 1 65.63 74.10 1.00 294 -
0.3 0.65 24 2 75 76.14 1.00 295 -
0.5 0.68 22 2 71.88 78.78 0.99 296 -
0.7 0.73 32 1 84.38 73.94 1.00 297 -
0.9 0.83 37 5 87.5 75.94 0.99 299 -
0.1 0.61 24 1 71.88 73.10 1.00 294 -
0.3 0.65 23 1 71.88 80.14 1.00 296 -
0.5 0.70 23 1 78.13 72.29 1.00 294 -
0.7 0.73 24 5 78.13 72.21 1.00 296 -
0.9 0.88 20 2 90.63 71.03 1.00 297 -
0.1 0.61 24 1 71.88 70.28 1.00 298 -
0.3 0.66 22 1 71.88 72.72 1.00 295 -
0.5 0.68 24 1 75 70.19 1.00 297 -
0.7 0.72 39 27 87.5 70.54 1.00 294 -
0.9 0.80 35 6 84.38 73.40 0.99 298 -
0.1 0.66 21 4 78.13 77.65 1.00 294 -
0.3 0.67 22 1 75 69.90 1.00 296 -
0.5 0.69 24 1 78.13 72.21 1.00 299 -
0.7 0.74 35 34 87.5 68.88 1.00 296 -
0.9 0.82 23 2 84.38 73.08 1.00 297 -
0.1 0.82 9 116 59.38 73.23 0.96 186 -
0.3 0.76 18 28 90.63 69.21 0.97 208 -
0.5 0.77 22 180 90.63 71.92 0.97 224 -
0.7 0.80 22 11 87.5 71.12 0.98 243 -
0.9 0.85 24 12 87.5 72.03 0.99 248 -
0.1 0.80 12 172 75 70.68 0.92 158 -
0.3 0.77 17 46 90.63 72.88 0.98 207 -
0.5 0.78 21 46 90.63 69.90 0.97 212 -
0.7 0.84 27 155 96.88 71.83 0.97 231 -
0.9 0.86 34 36 90.63 77.56 0.99 259 -
0.1 0.80 11 197 65.63 70.01 0.91 156 -
0.3 0.75 17 274 81.25 71.03 0.97 189 -
0.5 0.76 20 3 84.38 75.05 0.99 253 -
0.7 0.80 26 59 90.63 70.30 0.97 243 -
0.9 0.87 25 169 90.63 71.24 0.98 261 -
0.1 0.76 15 172 81.25 70.81 0.95 151 -
0.3 0.76 16 73 81.25 75.14 0.95 194 -
0.5 0.77 22 24 90.63 73.83 0.98 206 -
0.7 0.79 25 134 87.5 68.08 0.97 253 -
0.9 0.87 29 26 90.63 70.61 0.98 246 -
0.1 0.75 15 282 71.88 73.79 0.92 172 -
0.3 0.73 19 101 84.38 76.29 0.97 183 -
0.5 0.79 20 30 90.63 74.92 0.97 219 -
0.7 0.81 20 183 87.5 72.74 0.98 223 -
0.9 0.87 27 35 90.63 74.41 0.98 243 -
0.1 0.76 14 31 71.88 73.81 0.91 206 -
0.3 0.75 17 29 84.38 77.83 0.94 227 -
0.5 0.76 23 87 90.63 70.99 0.97 208 -
0.7 0.84 24 100 93.75 70.26 0.98 213 -
0.9 0.86 36 37 90.63 74.93 0.98 190 -
0.1 0.77 13 55 68.75 70.90 0.89 205 -
0.3 0.75 17 73 84.38 74.10 0.92 213 -
0.5 0.75 21 140 84.38 68.39 0.99 209 -
0.7 0.79 33 95 93.75 68.99 0.98 180 -
0.9 0.89 31 55 93.75 72.01 0.97 189 -
0.1 0.78 14 142 87.5 77.05 0.93 212 -
0.3 0.77 16 109 84.38 72.90 0.94 217 -
0.5 0.75 21 22 84.38 75.85 0.98 200 -
0.7 0.81 25 144 90.63 70.72 0.97 215 -
0.9 0.89 34 61 93.75 69.01 0.98 202 -
0.1 0.73 16 118 65.63 76.03 0.96 222 -
0.3 0.77 19 184 96.88 76.85 0.91 197 -
0.5 0.76 17 131 81.25 76.42 0.99 190 -
0.7 0.80 22 11 87.5 70.70 0.97 197 -
0.9 0.86 35 118 90.63 75.94 0.98 214 -
0.1 0.76 15 82 81.25 71.22 0.91 193 -
0.3 0.76 15 3 78.13 70.99 0.92 226 -
0.5 0.76 23 16 90.63 71.92 0.97 193 -
0.7 0.83 21 153 90.63 71.12 0.96 185 -
0.9 0.88 37 114 93.75 69.88 0.97 195 -
5000
20000
5
50
4 0.3 0.3 [0.2 0.01]
5
50
500
7 0.7 0.3 [0.2 0.01]
5
50
500
5000
20000
6 0.9 0.1 [0.2 0.01]
5
50
500
5000
20000
5 0.1 0.1 [0.2 0.01] 500
5000
20000
Study of the parameter: stepind e stepvol [initial and final value], wscale and α
Table A.2: Results of the study the parameters: thres_c and thres_v - Tests 4-7
iv
Test No. thre_c thre_v Stepind Wscale α Fitness FS Features selectedIteration with the
best model in FSACC FS
MeanAUC
crossvalidation
%
contraction
No. of same
postition of
baricenter
Plot
0.1 0.94 1 254 59.38 58.11 0.54 297 -
0.3 0.93 2 185 84.38 74.19 0.77 295 -
0.5 0.90 6 121 90.63 73.81 0.91 295 -
0.7 0.90 7 213 90.63 72.83 0.95 295 -
0.9 0.90 9 262 90.63 70.79 0.96 270 -
0.1 0.95 1 188 62.5 70.39 0.54 299 -
0.3 0.92 1 122 78.13 68.06 0.76 295 -
0.5 0.90 8 100 93.75 72.01 0.89 288 -
0.7 0.88 6 249 87.5 72.42 0.95 280 -
0.9 0.94 20 169 96.88 72.80 0.97 277 -
0.1 0.94 3 238 84.38 75.83 0.54 299 -
0.3 0.93 2 227 84.38 71.90 0.77 297 -
0.5 0.92 4 206 90.63 75.55 0.91 299 -
0.7 0.87 8 298 87.5 69.30 0.96 285 -
0.9 0.94 20 24 96.88 75.72 0.94 267 -
0.1 0.94 2 225 71.88 60.42 0.53 299 -
0.3 0.93 2 213 84.38 75.65 0.75 292 -
0.5 0.92 6 204 93.75 72.51 0.90 293 -
0.7 0.87 12 208 90.63 70.97 0.95 274 -
0.9 0.89 13 251 90.63 73.39 0.98 283 -
0.1 0.95 1 231 65.63 75.03 0.53 299 -
0.3 0.93 2 202 84.38 74.01 0.76 297 -
0.5 0.91 3 295 87.5 71.52 0.91 291 -
0.7 0.88 6 278 87.5 68.61 0.97 275 -
0.9 0.98 14 254 100 74.63 0.96 292 -
0.1 0.96 1 296 71.88 63.31 0.53 257 -
0.3 0.93 3 164 87.5 65.88 0.70 275 -
0.5 0.89 6 165 87.5 75.83 0.87 266 -
0.7 0.90 12 45 93.75 73.52 0.88 243 -
0.9 0.92 17 211 93.75 78.03 0.90 243 -
0.1 0.94 2 121 68.75 67.97 0.51 272 -
0.3 0.93 2 167 84.38 76.92 0.77 269 -
0.5 0.91 9 26 96.88 73.74 0.90 263 -
0.7 0.91 10 184 93.75 68.59 0.92 250 -
0.9 0.95 14 173 96.88 78.33 0.91 229 -
0.1 0.95 2 233 75 66.68 0.53 261 -
0.3 0.92 2 133 81.25 72.49 0.82 269 -
0.5 0.89 7 91 90.63 76.74 0.92 256 -
0.7 0.88 20 126 96.88 72.61 0.94 247 -
0.9 0.94 19 55 96.88 73.90 0.96 227 -
0.1 0.96 1 271 71.88 73.12 0.53 267 -
0.3 0.93 4 255 90.63 71.74 0.69 261 -
0.5 0.91 5 285 90.63 67.92 0.90 273 -
0.7 0.88 15 218 93.75 74.72 0.96 238 -
0.9 0.91 18 223 93.75 70.79 0.95 235 -
0.1 0.96 1 240 78.13 65.44 0.53 266 -
0.3 0.93 2 176 84.38 75.72 0.72 270 -
0.5 0.91 5 218 90.63 69.79 0.85 255 -
0.7 0.91 13 202 96.88 74.42 0.93 238 -
0.9 0.95 14 96 96.88 71.41 0.89 230 -
0.1 0.96 1 20 75 67.08 0.37 237 -
0.3 0.93 2 169 84.38 75.67 0.73 276 -
0.5 0.94 4 19 93.75 69.28 0.86 283 -
0.7 0.90 7 28 90.63 68.90 0.98 271 -
0.9 0.93 10 54 93.75 72.70 0.93 250 -
0.1 0.95 1 185 68.75 70.79 0.38 224 -
0.3 0.93 2 206 84.38 70.90 0.72 290 -
0.5 0.90 4 151 87.5 70.32 0.97 292 -
0.7 0.90 11 120 93.75 70.81 0.98 275 -
0.9 0.95 16 48 96.88 68.99 0.98 254 -
0.1 0.96 1 226 78.13 67.17 0.37 237 -
0.3 0.92 1 45 78.13 68.35 0.75 287 -
0.5 0.90 5 117 87.5 73.72 0.82 295 -
0.7 0.91 9 55 93.75 69.66 0.93 279 -
0.9 0.92 12 81 93.75 72.31 0.93 259 -
0.1 0.96 1 222 75 64.95 0.36 231 -
0.3 0.92 1 118 78.13 64.93 0.68 285 -
0.5 0.91 2 133 84.38 76.06 0.92 291 -
0.7 0.91 13 50 96.88 71.48 0.96 284 -
0.9 0.93 7 190 93.75 75.54 0.93 266 -
0.1 0.96 1 150 71.88 63.15 0.38 230 -
0.3 0.92 1 262 78.13 65.06 0.71 286 -
0.5 0.91 2 53 84.38 72.83 0.98 285 -
0.7 0.90 11 65 93.75 72.41 0.96 276 -
0.9 0.90 8 134 90.63 68.81 0.93 266 -
0.1 0.96 1 288 78.13 64.31 0.39 176 -
0.3 0.93 2 204 84.38 74.03 0.75 265 -
0.5 0.91 3 241 87.5 65.35 0.96 254 -
0.7 0.90 3 31 87.5 76.63 0.96 240 -
0.9 0.95 14 32 96.88 71.52 0.97 221 -
0.1 0.95 1 180 65.63 57.52 0.39 174 -
0.3 0.93 2 134 84.38 72.72 0.64 268 -
0.5 0.91 2 129 84.38 73.01 0.92 264 -
0.7 0.93 15 54 100 71.52 0.98 251 -
0.9 0.94 4 159 93.75 70.41 0.94 236 -
0.1 0.96 1 260 78.13 69.59 0.41 191 -
0.3 0.93 2 243 84.38 77.05 0.77 258 -
0.5 0.91 3 213 87.5 72.21 0.92 276 -
0.7 0.92 11 15 96.88 75.01 0.97 244 -
0.9 0.88 6 223 87.5 67.88 0.93 244 -
0.1 0.96 1 227 71.88 65.13 0.39 187 -
0.3 0.92 5 34 93.75 75.03 0.78 273 -
0.5 0.91 2 38 84.38 75.21 0.85 268 -
0.7 0.90 3 147 87.5 72.83 0.99 240 -
0.9 0.92 14 43 93.75 72.97 0.96 243 -
0.1 0.96 1 209 71.88 72.08 0.41 205 -
0.3 0.92 1 78 78.13 67.37 0.86 280 -
0.5 0.92 6 57 93.75 67.35 0.93 272 -
0.7 0.89 8 114 90.63 72.83 0.99 245 -
0.9 0.92 15 38 93.75 74.92 0.95 240 -
5000
20000
5
50
8 0.1 0.9 [0.2 0.01]
5
50
500
11 0.7 0.7 [0.2 0.01]
5
50
500
5000
20000
10 0.9 0.9 [0.2 0.01]
5
50
500
5000
20000
9 0.3 0.7 [0.2 0.01] 500
5000
20000
Study of the parameter: stepind e stepvol [initial and final value], wscale and α
Table A.3: Results of the study the parameters: thres_c and thres_ - Tests 8-11
v
Test No. thre_c thre_v Stepind Wscale α Fitness FS Features selectedIteration with the
best model in FSACC FS
MeanAUC
crossvalidation
%
contraction
No. of same
postition of
baricenter
Plot
0.1 0.95 1 252 65.63 59.31 0.45 111 -
0.3 0.92 1 106 78.13 68.37 0.71 206 -
0.5 0.94 4 206 93.75 68.69 0.84 240 -
0.7 0.91 5 122 90.63 69.30 0.92 198 -
0.9 0.93 10 36 93.75 71.19 0.97 197 -
0.1 0.95 2 176 84.38 73.32 0.48 132 -
0.3 0.92 1 213 78.13 66.08 0.80 225 -
0.5 0.91 2 69 84.38 75.03 0.90 251 -
0.7 0.91 5 149 90.63 69.81 0.94 218 -
0.9 0.92 13 74 93.75 71.54 0.95 180 -
0.1 0.95 1 275 68.75 67.88 0.47 95 -
0.3 0.92 1 102 78.13 69.99 0.70 219 -
0.5 0.91 3 54 87.5 72.59 0.91 230 -
0.7 0.91 4 300 90.63 74.31 0.92 209 -
0.9 0.90 12 100 90.63 74.54 0.97 182 -
0.1 0.96 1 236 71.88 63.73 0.51 106 -
0.3 0.93 2 126 84.38 75.01 0.75 228 -
0.5 0.90 4 129 87.5 71.58 0.89 250 -
0.7 0.93 9 37 96.88 73.12 0.92 234 -
0.9 0.93 11 120 93.75 71.08 0.97 192 -
0.1 0.95 1 267 65.63 56.52 0.46 117 -
0.3 0.93 4 172 90.63 74.48 0.71 221 -
0.5 0.91 7 99 93.75 74.54 0.90 254 -
0.7 0.94 4 97 93.75 68.68 0.92 251 -
0.9 0.94 4 261 93.75 69.72 0.97 192 -
0.1 0.96 1 278 75 66.66 0.47 110 -
0.3 0.92 1 81 78.13 67.08 0.74 223 -
0.5 0.92 4 46 90.63 69.49 0.93 237 -
0.7 0.91 13 103 96.88 72.81 0.94 209 -
0.9 0.94 4 279 93.75 70.19 0.92 195 -
0.1 0.95 2 217 75 71.81 0.48 103 -
0.3 0.92 1 168 78.13 69.99 0.66 225 -
0.5 0.91 3 238 87.5 68.68 0.75 238 -
0.7 0.90 6 191 90.63 72.40 0.93 225 -
0.9 0.95 16 20 96.88 69.90 0.96 180 -
0.1 0.92 3 274 68.75 71.10 0.63 29 +
0.3 0.92 4 217 87.5 72.88 0.77 89 +
0.5 0.85 10 201 87.5 70.90 0.91 91 +
0.7 0.85 21 35 93.75 71.47 0.94 101 +
0.9 0.94 17 254 96.88 72.28 0.94 123 +
0.1 0.90 5 137 78.13 72.12 0.60 22 +
0.3 0.89 5 279 84.38 69.90 0.76 74 +
0.5 0.87 6 279 84.38 68.70 0.91 103 +
0.7 0.89 18 238 96.88 74.03 0.94 137 +
0.9 0.94 18 243 96.88 72.88 0.95 115 +
0.1 0.92 4 99 78.13 71.92 0.60 45 +
0.3 0.92 2 280 81.25 70.81 0.75 137 +
0.5 0.89 10 266 93.75 75.46 0.89 126 +
0.7 0.88 16 194 93.75 73.61 0.95 137 +
0.9 0.94 20 76 96.88 74.63 0.92 131 +
0.1 0.93 3 250 71.88 67.97 0.66 54 +
0.3 0.91 4 244 84.38 73.83 0.76 146 +
0.5 0.89 7 277 90.63 70.90 0.89 128 +
0.7 0.88 19 111 96.88 72.99 0.94 127 +
0.9 0.95 13 271 96.88 69.90 0.94 110 +
0.1 0.93 3 158 71.88 72.21 0.58 23 +
0.3 0.88 6 228 84.38 72.70 0.78 124 +
0.5 0.89 7 252 90.63 74.61 0.89 147 +
0.7 0.89 14 252 93.75 75.31 0.95 120 +
0.9 0.93 23 182 96.88 70.79 0.96 125 +
0.1 0.92 4 123 81.25 65.15 0.63 33 +
0.3 0.90 5 287 87.5 72.39 0.75 92 +
0.5 0.88 9 286 90.63 75.63 0.91 131 +
0.7 0.87 17 292 93.75 74.56 0.95 119 +
0.9 0.91 23 124 93.75 72.47 0.95 120 +
0.1 0.92 4 152 75 65.84 0.66 33 +
0.3 0.89 6 242 87.5 70.81 0.74 68 +
0.5 0.90 8 207 93.75 69.59 0.89 141 +
0.7 0.86 19 253 93.75 72.81 0.97 155 +
0.9 0.92 12 292 93.75 72.52 0.95 143 +
0.1 0.95 1 215 62.5 63.56 0.58 40 +
0.3 0.89 7 144 90.63 69.08 0.83 95 +
0.5 0.86 8 300 84.38 70.92 0.93 187 +
0.7 0.85 17 251 90.63 72.81 0.97 239 +
0.9 0.90 24 30 93.75 72.10 0.96 273 +
0.1 0.95 2 291 81.25 75.61 0.53 45 +
0.3 0.86 7 207 81.25 69.21 0.83 116 +
0.5 0.83 15 44 90.63 70.28 0.96 238 +
0.7 0.85 12 253 87.5 75.74 0.96 257 +
0.9 0.91 23 154 93.75 73.10 0.96 252 +
0.1 0.94 2 101 71.88 65.35 0.58 50 +
0.3 0.89 6 278 87.5 72.41 0.82 117 +
0.5 0.84 14 258 90.63 72.72 0.95 246 +
0.7 0.88 16 186 93.75 72.34 0.94 261 +
0.9 0.88 20 150 90.63 71.88 0.96 281 +
0.1 0.95 1 61 65.63 66.79 0.49 42 +
0.3 0.87 7 148 84.38 73.94 0.81 121 +
0.5 0.82 14 238 87.5 74.76 0.95 247 +
0.7 0.86 11 295 87.5 71.28 0.96 262 +
0.9 0.89 14 141 90.63 73.81 0.96 270 +
0.1 0.95 2 253 81.25 74.61 0.52 44 +
0.3 0.88 6 145 84.38 68.46 0.76 93 +
0.5 0.82 10 299 81.25 72.69 0.95 218 +
0.7 0.85 17 261 90.63 71.52 0.95 265 +
0.9 0.90 24 126 93.75 65.95 0.95 269 +
0.1 0.94 2 87 68.75 67.90 0.51 44 +
0.3 0.88 7 199 87.5 68.46 0.84 129 +
0.5 0.82 10 192 81.25 70.70 0.93 241 +
0.7 0.87 17 196 93.75 68.47 0.97 264 +
0.9 0.90 24 181 93.75 68.04 0.97 270 +
0.1 0.94 1 214 59.38 61.93 0.55 43 +
0.3 0.87 7 198 84.38 75.74 0.84 123 +
0.5 0.82 10 137 81.25 72.63 0.97 248 +
0.7 0.84 15 119 87.5 66.04 0.97 263 +
0.9 0.90 24 54 93.75 69.81 0.96 265 +
5000
20000
100000
1000000
0.1 [0.2 0.01]
1000000
12 0.5 0.5 [0.2 0.01]
5
50
500
13 0.3 0.3 [0.2 0.01]
50
500
5000
20000
100000
14
5000
20000
100000
1000000
5
0.1
5
50
500
Study of the parameter: stepind e stepvol [initial and final value], wscale and α
Table A.4: Results of the study the parameters: thres_c and thres_v - Tests 12-14
vi
Test No. thre_c thre_v Stepind Wscale α Fitness FS Features selectedIteration with the
best model in FSACC FS
MeanAUC
crossvalidation
%
contraction
No. of same
postition of
baricenter
Plot
0.1 0.94 3 135 81.25 72.17 0.55 22 +
0.3 0.88 5 148 78.13 76.01 0.85 83 +
0.5 0.86 11 298 90.63 74.42 0.91 189 +
0.7 0.82 28 82 93.75 81.78 0.96 237 +
0.9 0.91 18 266 93.75 74.63 0.96 245 +
0.1 0.95 1 71 62.5 47.73 0.42 18 +
0.3 0.89 6 134 87.5 69.21 0.81 101 +
0.5 0.84 15 282 93.75 72.19 0.93 237 +
0.7 0.89 14 250 93.75 70.10 0.96 252 +
0.9 0.88 23 49 90.63 71.79 0.96 259 +
0.1 0.95 2 85 81.25 74.40 0.48 28 +
0.3 0.88 6 279 84.38 69.97 0.76 82 +
0.5 0.85 14 156 93.75 73.01 0.93 210 +
0.7 0.84 23 8 93.75 72.70 0.96 263 +
0.9 0.89 15 151 90.63 75.92 0.97 279 +
0.1 0.94 1 105 56.25 62.35 0.47 22 +
0.3 0.91 4 125 84.38 74.59 0.76 90 +
0.5 0.85 7 243 81.25 67.15 0.92 225 +
0.7 0.86 15 293 90.63 73.21 0.97 276 +
0.9 0.87 28 145 90.63 70.08 0.97 265 +
0.1 0.95 1 223 62.5 62.13 0.47 30 +
0.3 0.87 6 110 81.25 75.94 0.80 110 +
0.5 0.82 9 104 78.13 72.12 0.94 241 +
0.7 0.85 21 90 93.75 68.99 0.96 264 +
0.9 0.90 12 279 90.63 72.77 0.96 269 +
0.1 0.95 2 213 75 65.99 0.50 32 +
0.3 0.89 6 95 87.5 72.99 0.75 86 +
0.5 0.81 11 289 81.25 74.63 0.95 238 +
0.7 0.85 16 280 90.63 72.70 0.95 252 +
0.9 0.91 22 30 93.75 73.72 0.94 270 +
0.1 0.95 2 142 84.38 75.23 0.46 27 +
0.3 0.91 4 157 84.38 73.81 0.79 110 +
0.5 0.84 17 134 96.88 73.08 0.96 256 +
0.7 0.84 18 153 90.63 73.49 0.98 282 +
0.9 0.89 16 201 90.63 72.10 0.94 275 +
0.1 0.92 4 151 78.13 71.97 0.59 21 ++
0.3 0.92 4 188 87.5 74.83 0.75 110 ++
0.5 0.86 8 212 84.38 72.21 0.88 94 ++
0.7 0.83 17 186 87.5 71.37 0.96 88 ++
0.9 0.93 23 184 96.88 72.83 0.95 104 ++
0.1 0.91 4 227 71.88 64.88 0.64 28 ++
0.3 0.89 4 168 78.13 68.99 0.79 96 ++
0.5 0.87 6 283 84.38 72.01 0.91 123 ++
0.7 0.92 12 270 96.88 75.12 0.96 117 ++
0.9 0.92 13 254 93.75 72.90 0.97 129 ++
0.1 0.94 3 252 81.25 69.17 0.59 40 ++
0.3 0.90 4 223 81.25 72.21 0.74 94 ++
0.5 0.86 9 300 87.5 73.90 0.90 156 ++
0.7 0.86 14 261 90.63 71.71 0.97 139 ++
0.9 0.91 18 110 93.75 70.97 0.95 122 ++
0.1 0.92 4 216 81.25 68.90 0.59 20 ++
0.3 0.89 4 260 78.13 68.26 0.78 105 ++
0.5 0.89 7 280 90.63 70.81 0.90 140 ++
0.7 0.89 17 135 96.88 73.63 0.93 141 ++
0.9 0.92 13 201 93.75 73.40 0.94 112 ++
0.1 0.91 4 274 65.63 68.37 0.63 26 ++
0.3 0.92 4 278 87.5 66.26 0.77 97 ++
0.5 0.89 8 282 90.63 69.01 0.93 136 ++
0.7 0.86 19 106 93.75 73.59 0.97 126 ++
0.9 0.91 21 107 93.75 72.32 0.96 139 ++
0.1 0.91 3 275 56.25 66.37 0.60 24 ++
0.3 0.88 6 138 84.38 69.81 0.75 80 ++
0.5 0.87 12 91 93.75 74.83 0.89 109 ++
0.7 0.89 17 250 96.88 77.05 0.94 115 ++
0.9 0.90 25 14 93.75 71.81 0.95 134 ++
0.1 0.94 2 276 71.88 75.74 0.64 33 ++
0.3 0.91 3 248 81.25 66.08 0.75 77 ++
0.5 0.87 8 279 87.5 73.54 0.93 155 ++
0.7 0.87 12 285 90.63 72.92 0.95 131 ++
0.9 0.92 13 274 93.75 71.52 0.93 134 ++
0.1 0.96 1 206 78.13 64.13 0.38 199 -
0.3 0.93 3 218 87.5 76.56 0.71 276 -
0.5 0.90 3 122 84.38 64.84 0.93 283 -
0.7 0.92 7 178 93.75 67.26 0.99 270 -
0.9 0.90 8 130 90.63 73.32 0.99 272 -
0.1 0.96 1 125 75 66.26 0.39 247 -
0.3 0.93 2 107 84.38 73.92 0.75 287 -
0.5 0.90 4 82 87.5 68.97 0.95 286 -
0.7 0.89 5 89 87.5 70.08 0.94 282 -
0.9 0.90 9 132 90.63 75.92 0.99 274 -
0.1 0.96 1 282 75 65.35 0.38 239 -
0.3 0.92 1 55 78.13 64.93 0.73 293 -
0.5 0.90 6 88 90.63 69.72 0.95 285 -
0.7 0.93 6 111 93.75 66.47 0.95 290 -
0.9 0.91 4 257 90.63 66.06 0.92 276 -
0.1 0.96 1 274 78.13 65.15 0.39 247 -
0.3 0.93 2 158 84.38 72.72 0.71 284 -
0.5 0.90 3 146 84.38 70.77 0.97 290 -
0.7 0.90 6 64 90.63 75.62 0.98 276 -
0.9 0.95 15 138 96.88 68.19 0.98 264 -
0.1 0.96 1 286 75 67.08 0.38 251 -
0.3 0.93 2 208 84.38 78.05 0.77 289 -
0.5 0.90 4 149 87.5 69.19 0.97 289 -
0.7 0.90 6 57 90.63 69.99 1.00 286 -
0.9 0.96 10 42 96.88 70.99 0.95 267 -
17 0.1 0.9 [0.2 0.01]
5000
20000
100000
1000000
15 0.9 0.1 [0.2 0.01]
5
50
500
16 0.7 0.3 [0.2 0.01]
50
500
5000
20000
5000
20000
100000
1000000
5
5
50
500
Study of the parameter: stepind e stepvol [initial and final value], wscale and α
Table A.5: Results of the study the parameters: thres_c and thres_v - Tests 15-17
vii
Test No. thre_c thre_v Stepind Wscale α Fitness FS Features selectedIteration with the
best model in FSACC FS
MeanAUC
crossvalidation
%
contraction
No. of same
postition of
baricenter
Plot
0.1 0.96 1 265 78.13 65.86 0.42 187 -
0.3 0.92 1 83 78.13 66.95 0.76 276 -
0.5 0.91 3 56 87.5 69.17 0.89 264 -
0.7 0.90 11 106 93.75 70.99 0.99 252 -
0.9 0.91 6 34 90.63 68.79 0.94 252 -
0.1 0.96 1 185 78.13 64.06 0.40 185 -
0.3 0.93 3 126 87.5 64.22 0.73 278 -
0.5 0.91 2 53 84.38 72.61 0.92 273 -
0.7 0.92 12 18 96.88 74.70 0.95 246 -
0.9 0.91 6 114 90.63 73.63 0.96 242 -
0.1 0.95 2 199 81.25 73.81 0.39 166 -
0.3 0.93 2 58 84.38 73.99 0.76 280 -
0.5 0.91 3 107 87.5 63.24 0.95 258 -
0.7 0.89 4 111 87.5 65.06 0.98 258 -
0.9 0.93 8 145 93.75 71.81 0.94 231 -
0.1 0.96 1 197 78.13 69.08 0.41 185 -
0.3 0.92 1 92 78.13 67.77 0.74 257 -
0.5 0.90 4 89 87.5 64.17 0.95 277 -
0.7 0.89 4 38 87.5 70.12 0.94 273 -
0.9 0.92 12 86 93.75 76.85 0.98 225 -
0.1 0.95 1 225 68.75 69.70 0.39 176 -
0.3 0.93 2 81 84.38 75.74 0.71 258 -
0.5 0.93 3 77 90.63 67.17 0.92 266 -
0.7 0.93 6 92 93.75 71.63 0.95 251 -
0.9 0.95 15 97 96.88 67.49 0.99 225 -
18 0.3 0.7 [0.2 0.01]
5000
20000
5
50
500
Study of the parameter: stepind e stepvol [initial and final value], wscale and α
Table A.6: Results of the study the parameters: thres_c and thres_v – Test 18
viii
Test No. thre_c thre_v Stepind Wscale α Fitness FS Features selectedIteration with the
best model in FSACC FS
MeanAUC
crossvalidation
%
contraction
No. of same
postition of
baricenter
Plot
0.1 0.95 2 497 0.66 0.57 0.43 81 -
0.3 0.91 2 392 0.75 0.57 0.62 219 -
0.5 0.84 4 283 0.71 0.57 0.86 326 -
0.7 0.82 3 437 0.75 0.57 0.98 175 -
0.9 0.78 9 423 0.76 0.58 0.99 133 -
0.1 0.93 4 460 0.61 0.59 0.43 138 -
0.3 0.91 2 380 0.73 0.58 0.66 267 -
0.5 0.85 3 422 0.72 0.55 0.88 283 -
0.7 0.81 8 407 0.75 0.60 0.97 184 -
0.9 0.76 8 411 0.74 0.61 0.99 132 -
0.1 0.95 3 475 0.70 0.57 0.42 95 -
0.3 0.90 4 347 0.74 0.59 0.64 252 -
0.5 0.87 3 369 0.76 0.59 0.85 295 -
0.7 0.81 8 381 0.75 0.60 0.97 177 -
0.9 0.77 2 452 0.75 0.56 0.98 138 -
0.1 0.93 4 434 0.58 0.63 0.42 96 -
0.3 0.90 3 350 0.72 0.58 0.63 242 -
0.5 0.87 2 366 0.75 0.58 0.85 301 -
0.7 0.80 3 478 0.72 0.56 0.98 204 -
0.9 0.74 34 318 0.74 0.66 0.97 119 -
0.1 0.96 2 482 0.73 0.56 0.44 118 -
0.3 0.90 2 449 0.71 0.57 0.65 228 -
0.5 0.84 4 456 0.71 0.73 0.89 285 -
0.7 0.80 21 275 0.78 0.60 0.98 245 -
0.9 0.76 14 377 0.74 0.61 0.99 142 -
0.1 0.93 6 499 0.69 0.63 0.43 95 -
0.3 0.90 3 348 0.73 0.57 0.66 252 -
0.5 0.85 3 346 0.73 0.56 0.89 288 -
0.7 0.80 9 355 0.74 0.63 0.98 194 -
0.9 0.76 11 403 0.74 0.60 0.99 139 -
0.1 0.95 3 476 0.69 0.59 0.42 95 -
0.3 0.90 2 455 0.71 0.56 0.63 260 -
0.5 0.87 2 328 0.75 0.58 0.84 323 -
0.7 0.82 2 460 0.75 0.55 0.97 231 -
0.9 0.76 10 405 0.74 0.61 0.99 142 -
0.1 0.90 10 463 0.66 0.57 0.52 36 +
0.3 0.85 13 492 0.74 0.60 0.81 70 +
0.5 0.73 19 489 0.61 0.59 0.97 110 +
0.7 0.70 51 486 0.74 0.59 0.99 124 +
0.9 0.71 66 94 0.73 0.59 0.99 143 +
0.1 0.87 13 478 0.59 0.63 0.49 21 +
0.3 0.82 17 498 0.69 0.61 0.81 84 +
0.5 0.73 33 317 0.72 0.62 0.97 150 +
0.7 0.69 45 337 0.70 0.61 0.99 168 +
0.9 0.74 63 478 0.76 0.56 0.99 153 +
0.1 0.88 11 490 0.54 0.53 0.48 32 +
0.3 0.81 20 487 0.74 0.63 0.81 116 +
0.5 0.70 39 479 0.69 0.57 0.97 144 +
0.7 0.70 51 394 0.74 0.62 0.99 162 +
0.9 0.72 65 17 0.74 0.62 0.99 179 +
0.1 0.89 12 477 0.71 0.57 0.45 28 +
0.3 0.87 7 500 0.68 0.57 0.84 138 +
0.5 0.70 35 414 0.67 0.59 0.98 154 +
0.7 0.71 35 458 0.69 0.56 0.98 164 +
0.9 0.72 58 12 0.74 0.62 0.99 174 +
0.1 0.89 11 456 0.61 0.57 0.48 30 +
0.3 0.81 21 496 0.73 0.63 0.88 139 +
0.5 0.71 33 500 0.67 0.62 0.96 145 +
0.7 0.71 50 14 0.75 0.62 0.99 166 +
0.9 0.74 49 487 0.76 0.63 0.99 158 +
0.1 0.91 8 477 0.61 0.60 0.50 56 +
0.3 0.83 14 431 0.69 0.62 0.79 114 +
0.5 0.72 33 347 0.69 0.58 0.95 134 +
0.7 0.70 41 418 0.70 0.59 0.98 149 +
0.9 0.74 61 312 0.76 0.57 0.99 165 +
0.1 0.87 15 253 0.69 0.60 0.45 21 +
0.3 0.82 14 489 0.64 0.58 0.81 127 +
0.5 0.73 32 397 0.70 0.62 0.98 142 +
0.7 0.70 46 465 0.72 0.66 0.98 139 +
0.9 0.72 65 169 0.74 0.60 0.99 187 +
1000000
5
50
500
5000
20000
100000
0.2 0.01
5
50
500
5000
20000
100000
1000000
0.2 0.012 0.3 0.3
Study of the parameter: stepind e stepvol [initial and final value], wscale and α
1 0.5 0.5
B Study of parameters - MIMICII database
We present the detailed results of the 5 tests used in the study of the parameters
thres_c and thres_v, wscale and α to the readmission datasets. The 5 tests used the adaptative
threshold.
Table B.1: Results of the study the parameters: thres_c and thres_v - Tests 1-2
ix
Test No. thre_c thre_v Stepind Wscale α Fitness FS Features selectedIteration with the
best model in FSACC FS
MeanAUC
crossvalidation
%
contraction
No. of same
postition of
baricenter
Plot
0.1 0.93 3 122 0.50 0.54 0.29 27 ++
0.3 0.77 27 406 0.72 0.59 0.91 286 ++
0.5 0.70 41 463 0.71 0.59 0.97 420 ++
0.7 0.71 36 373 0.70 0.58 0.99 379 ++
0.9 0.74 71 432 0.77 0.62 0.99 427 ++
0.1 0.92 5 125 0.58 0.56 0.26 36 ++
0.3 0.83 18 445 0.74 0.60 0.88 285 ++
0.5 0.72 30 491 0.68 0.59 0.96 418 ++
0.7 0.70 55 427 0.74 0.62 0.96 449 ++
0.9 0.72 69 268 0.75 0.60 0.99 438 ++
0.1 0.92 8 149 0.71 0.59 0.28 42 ++
0.3 0.78 19 237 0.59 0.60 0.87 314 ++
0.5 0.73 27 463 0.66 0.59 0.96 436 ++
0.7 0.69 55 4 0.74 0.60 0.99 424 ++
0.9 0.72 65 273 0.74 0.55 0.98 454 ++
0.1 0.94 3 124 0.58 0.58 0.31 53 ++
0.3 0.80 17 198 0.64 0.61 0.88 305 ++
0.5 0.71 33 499 0.66 0.60 0.98 437 ++
0.7 0.69 67 415 0.78 0.58 0.99 466 ++
0.9 0.73 63 343 0.75 0.62 0.99 458 ++
0.1 0.94 3 110 0.58 0.60 0.26 48 ++
0.3 0.79 23 467 0.70 0.60 0.86 299 ++
0.5 0.69 37 342 0.67 0.55 0.99 446 ++
0.7 0.68 51 18 0.71 0.61 0.98 454 ++
0.9 0.73 70 497 0.76 0.58 0.99 454 ++
0.1 0.92 7 302 0.66 0.60 0.31 38 ++
0.3 0.80 22 246 0.71 0.59 0.87 300 ++
0.5 0.70 37 382 0.68 0.55 0.97 440 ++
0.7 0.68 44 344 0.68 0.58 0.99 450 ++
0.9 0.74 74 145 0.77 0.58 0.98 467 ++
0.1 0.92 5 279 0.58 0.62 0.28 40 ++
0.3 0.86 7 259 0.67 0.58 0.64 150 ++
0.5 0.69 38 372 0.67 0.59 0.99 448 ++
0.7 0.70 58 314 0.77 0.62 0.99 449 ++
0.9 0.73 69 29 0.76 0.63 0.99 465 ++
0.1 0.94 3 156 0.58 0.51 0.23 25 +
0.3 0.76 30 219 0.73 0.60 0.94 275 +
0.5 0.71 30 488 0.66 0.62 0.99 375 +
0.7 0.68 54 306 0.71 0.60 0.98 380 +
0.9 0.72 64 382 0.74 0.65 0.99 421 +
0.1 0.93 4 130 0.61 0.58 0.25 28 +
0.3 0.86 8 193 0.68 0.64 0.69 174 +
0.5 0.73 33 425 0.71 0.62 0.97 416 +
0.7 0.73 42 400 0.75 0.62 0.99 427 +
0.9 0.72 75 204 0.76 0.58 0.99 450 +
0.1 0.93 3 144 0.51 0.58 0.25 33 +
0.3 0.77 25 361 0.68 0.58 0.91 357 +
0.5 0.71 31 419 0.66 0.61 0.98 458 +
0.7 0.69 45 205 0.70 0.59 0.98 455 +
0.9 0.72 64 142 0.74 0.60 0.99 464 +
0.1 0.93 5 241 0.64 0.62 0.29 41 +
0.3 0.77 25 499 0.68 0.59 0.88 312 +
0.5 0.71 32 297 0.65 0.60 0.97 450 +
0.7 0.68 67 292 0.76 0.65 0.99 453 +
0.9 0.74 60 451 0.77 0.63 0.99 452 +
0.1 0.94 3 308 0.62 0.59 0.25 29 +
0.3 0.86 7 225 0.65 0.63 0.75 199 +
0.5 0.69 41 425 0.69 0.64 0.97 423 +
0.7 0.69 51 123 0.72 0.63 0.99 463 +
0.9 0.74 63 113 0.76 0.60 0.99 459 +
0.1 0.93 4 325 0.59 0.60 0.27 45 +
0.3 0.85 11 180 0.70 0.62 0.64 151 +
0.5 0.69 37 376 0.66 0.58 0.98 454 +
0.7 0.71 44 242 0.73 0.59 0.98 446 +
0.9 0.73 64 86 0.75 0.60 0.99 455 +
0.1 0.93 3 183 0.51 0.55 0.28 36 +
0.3 0.89 4 166 0.69 0.59 0.69 143 +
0.5 0.72 40 328 0.75 0.61 0.97 443 +
0.7 0.69 38 479 0.68 0.59 0.98 459 +
0.9 0.73 68 66 0.76 0.60 0.99 441 +
0.1 0.86 16 469 0.66 0.61 0.52 22 +
0.3 0.84 14 490 0.70 0.57 0.77 83 +
0.5 0.72 32 478 0.68 0.65 0.96 110 +
0.7 0.70 47 401 0.72 0.63 0.99 127 +
0.9 0.72 69 16 0.75 0.57 0.99 205 +
0.1 0.88 12 479 0.57 0.58 0.47 25 +
0.3 0.82 16 493 0.69 0.59 0.79 92 +
0.5 0.73 27 488 0.67 0.59 0.96 156 +
0.7 0.71 43 157 0.72 0.62 0.99 163 +
0.9 0.75 63 258 0.78 0.61 0.98 173 +
0.1 0.84 17 281 0.56 0.61 0.41 12 +
0.3 0.85 14 463 0.76 0.60 0.82 126 +
0.5 0.74 27 486 0.69 0.60 0.97 134 +
0.7 0.70 41 489 0.71 0.61 0.98 177 +
0.9 0.75 60 307 0.77 0.62 0.99 174 +
0.1 0.85 16 485 0.60 0.53 0.42 15 +
0.3 0.84 14 499 0.71 0.65 0.80 107 +
0.5 0.76 24 488 0.70 0.60 0.97 133 +
0.7 0.71 54 401 0.77 0.59 0.99 192 +
0.9 0.71 76 42 0.74 0.57 0.99 205 +
0.1 0.91 9 489 0.69 0.60 0.47 50 +
0.3 0.83 10 477 0.62 0.63 0.80 115 +
0.5 0.74 26 498 0.68 0.63 0.97 154 +
0.7 0.73 52 369 0.78 0.64 0.98 159 +
0.9 0.74 60 385 0.76 0.61 0.99 171 +
0.1 0.86 13 479 0.52 0.61 0.51 36 +
0.3 0.86 10 481 0.70 0.60 0.80 140 +
0.5 0.72 31 365 0.68 0.59 0.98 140 +
0.7 0.70 51 431 0.74 0.60 0.99 177 +
0.9 0.73 42 163 0.74 0.61 0.98 195 +
0.1 0.90 7 429 0.52 0.59 0.49 33 +
0.3 0.84 11 495 0.65 0.58 0.77 128 +
0.5 0.71 36 474 0.68 0.58 0.97 149 +
0.7 0.71 38 460 0.71 0.59 0.98 160 +
0.9 0.71 61 459 0.73 0.57 0.99 165 +
500
5000
20000
100000
1000000
5000
20000
100000
1000000
5 0.7 0.3 0.2 0.01
5
50
20000
100000
1000000
4 0.9 0.1 0.2 0.01
5
50
500
Study of the parameter: stepind e stepvol [initial and final value], wscale and α
3 0.1 0.1 0.2 0.01
5
50
500
5000
Table B.2: Results of the study the parameters: thres_c and thres_v - Tests 3-5