Binary Fish School Search applied to Feature SelectionE7%E3o.pdf · of them are able to guarantee a...

Binary Fish School Search applied to Feature Selection

João André Gonçalves Sargo

Thesis to obtain the Master of Science Degree in

Mechanical Engineering

Examination Committee

Chairperson: Prof. João Rogério Caldas Pinto

Supervisor: Prof. João Miguel da Costa Sousa

Members of the Committee: Prof. Jorge dos Santos Salvador Marques

Prof. Susana Margarida da Silva Vieira

November 2013

À minha mãe

V

Abstract

The aim of the present work is to develop efficient feature selection approaches. The

problem regarding the increasingly larger accumulation of data is presented, where feature

selection emerges as a promising solution. Despite the variety of feature selection methods, few

of them are able to guarantee a good performance, especially in high dimensional databases.

A novel wrapper methodology applied to feature selection is formulated based on the

Fish School Search (FSS) optimization algorithm, intended to cope with premature convergence.

The FSS was originally designed with a real encoding scheme for searching high-dimensional

spaces based in fish schools behaviour. In order to use this population based optimization

algorithm in feature selection problems, the use of binary encoding for the internal mechanisms

of the fish school search is proposed, emerging the binary fish school search (BFSS).

The proposed algorithm, as well as other state of the art feature selection methods such

as Sequential Forward Selection (SFS) and Binary Particle Swarm Optimization (BPSO), were

combined with fuzzy modelling in a wrapper approach and tested over two databases, a

benchmark and an ICU (intensive care unit) database. The purpose of using this last database

was to predict the readmission of ICU patients 24 to 72 hours after being discharged. Several

statistical measures were considered to characterise the patient stay, including the Shannon

entropy and the weighted mean.

The results obtained by comparing the performance measures and the number of

features selected of the used algorithms, show promising results for the novel algorithm BFSS.

Keywords: Feature Selection, Binary Encoding, Fish School Search, ICU, Readmissions

VI

VII

Resumo

O presente trabalho visa o desenvolvimento de abordagens eficientes para o problema

de seleção de variáveis. A questão da crescente quantidade de informação acumulada é

debatida, para o qual a noção de seleção de variáveis se estabelece como uma solução

promissora. Apesar da grande variedade disponível, poucos são os métodos capazes de garantir

alta precisão, sobretudo em bases de dados de grande dimensão.

Neste sentido, uma nova metodologia foi aqui formulada com base no algoritmo de

otimização Fish School Search, destinado a lidar com a convergência prematura das soluções.

Este método, originalmente desenvolvido com um esquema de codificação em números reais,

pesquisa espaços de alta dimensão baseando-se no comportamento de cardumes. De forma a

utilizar este algoritmo de otimização em problemas de seleção de variáveis, foi proposto o uso

de um esquema de codificação binária para os seus mecanismos internos, surgindo o Binary

Fish School Search (BFSS).

O algoritmo aqui proposto, bem como outros métodos, Sequential Forward Selection e

Binary Particle Swarm Optimization, foram conciliados com modelação fuzzy numa abordagem

wrapper e testados em duas bases de dados, uma de benchmark e outra de uma unidade de

cuidados intensivos (UCI). Esta última foi utilizada de modo a prever a readmissão de pacientes

após alta. Foram consideradas várias medidas estatísticas para caracterizar a sua estadia,

incluindo a entropia de Shannon e a média ponderada.

Os resultados obtidos, através da comparação da precisão e do número de variáveis

selecionadas dos vários algoritmos usados, mostram resultados promissores para o novo

algoritmo BFSS.

Palavras-chave: Seleção de Variáveis, Codificação binária, UCI, Readmissões

VIII

IX

Acknowledgements

My first words of appreciation are to my supervisors Professor João Sousa and Dr. André

Fialho, for all the help, support, and for the opportunity to work in this field of research. A

special thanks to Professor Susana Vieira, for all the support and guidance when I needed the

most. Her expert knowledge in the field of this work was a great contribution. I couldn't forget

to thank Professor Carmelo J. A. Bastos Filho and Débora N. de Oliveira Nascimento for the help

and to their readiness in clarifying doubts.

I would like thank my colleagues at Health Care IST workgroup for their help, support

and for the inspirational breaks.

To my parents, for all the love, hard work, dedication and for always being with me. A

special thanks to my mother, her sacrifice made all this possible.

I would also like to thank to all the friends, in special to António Ramos, Magno Mendes

and Renato Ribeiro for all the willingness, happiness and all the moments of true friendship. A

special thanks Marta Ferreira, for her encouragement and inestimable support during all this

work and until the very last moment.

X

XI

Contents

Abstract ......................................................................................................................................... V

Resumo ........................................................................................................................................ VII

Acknowledgements ...................................................................................................................... IX

List of Figures .............................................................................................................................. XV

List of Tables .............................................................................................................................. XVII

Notation ..................................................................................................................................... XIX

1 Introduction ............................................................................................................................... 1

1.1-Knowledge Discovery .......................................................................................................... 1

1.1.1-Principles of Feature Selection .................................................................................... 3

1.1.2- Modeling ..................................................................................................................... 4

1.2 Prediction of readmissions .................................................................................................. 6

1.3 Contributions ....................................................................................................................... 6

1.4 Outline ................................................................................................................................. 7

2 Knowledge Data Discovery ........................................................................................................ 9

2.1-DATA ................................................................................................................................... 9

2.1.1-Benchmark database – Sonar ...................................................................................... 9

2.1.2- MIMIC II database ..................................................................................................... 10

2.2-Modeling ........................................................................................................................... 16

2.2.1 Fuzzy modeling ........................................................................................................... 16

2.2.2 Clustering ................................................................................................................... 17

2.2.3 Performance Measures .............................................................................................. 18

2.3- Feature Selection ............................................................................................................. 20

2.3.1- Sequential Forward Selection ................................................................................... 20

2.3.2- Binary Particle Swarm Optimization ......................................................................... 22

3 Fish school search .................................................................................................................... 25

3.1- Original Fish school search ............................................................................................... 25

3.1.1-Search Problems and Algorithms ............................................................................... 26

XII

3.1.2-FSS Computational Principles .................................................................................... 26

3.1.3-Overview of the algorithm ......................................................................................... 27

3.1.4-The Feeding Operator ................................................................................................ 27

3.1.5-The Swimming Operators .......................................................................................... 28

3.1.6-Individual Movement ................................................................................................. 28

3.1.7-Collective-Instinctive Movement ............................................................................... 29

3.1.8-Collective- Volitive Movement ................................................................................... 30

3.1.9-FSS Cycle and Stop Conditions ................................................................................... 31

3.1.10-Illustrative Example .................................................................................................. 32

3.2- Decimal to binary Fish school search ............................................................................... 36

3.2.1-Objective function ...................................................................................................... 36

4 Binary Fish School Search ........................................................................................................ 37

4.1- Encoding ........................................................................................................................... 37

4.2-Initialization ...................................................................................................................... 38

4.3-Individual Movement ........................................................................................................ 38

4.4-Collective-Instinctive Movement ...................................................................................... 38

4.5-Collective-volitive Movement ........................................................................................... 40

4.5-Objective function ............................................................................................................ 41

4.7-Parameters ........................................................................................................................ 42

4.8- BFSS cycle and stop condition. ......................................................................................... 42

5 Results ...................................................................................................................................... 43

5.1-Description of the approach ............................................................................................. 43

5.2-Optimization Parameters .................................................................................................. 44

5.3-Sonar database ................................................................................................................. 46

5.3.1-Sequential Forward Selection .................................................................................... 46

5.3.2 Decimal to Binary Fish School Search ........................................................................ 47

5.3.3 Binary Fish School Search ........................................................................................... 49

5.3.4 Binary Particle Swarm Optimization .......................................................................... 53

5.3.5 Comparison of Feature Selection Methods ............................................................... 54

5.4-Readmission database ...................................................................................................... 55

5.4.1 Sequential Forward Selection .................................................................................... 56

5.4.2 Binary Fish School Search ........................................................................................... 56

5.4.3 Binary Particle Swarm Optimization .......................................................................... 59

5.4.4 No feature selection ................................................................................................... 61

XIII

5.4.4 Discuss ........................................................................................................................ 62

6 Conclusion ................................................................................................................................ 63

6.1- Binary Fish School Search................................................................................................. 63

6.2- Prediction of readmissions ............................................................................................... 64

6.3- Future work ...................................................................................................................... 65

Bibliography ................................................................................................................................ 67

Appendix ........................................................................................................................................ i

A Extended Results - sonar database ....................................................................................... ii

B Study of parameters - MIMICII database ............................................................................ viii

XIV

XV

List of Figures

1.1 Knowledge discovery process ...................................................................................................................... 2

2.1 Patient selection flowchart. .........................................................................................................................12

2.2 Number of patients vs number of samples.. ........................................................................................14

2.3 Representation of the different gradients for the weighted mean. ............................................15

2.4 Illustrative diagram of the inputs for the classification models. ...................................................15

2.5 Example of ROC curve. ..................................................................................................................................19

2.6 Decoding process for D2BFSS ....................................................................................................................22

3.1 FSS-Individual movement ............................................................................................................................29

3.2 FSS-Collective-instinctive movement ......................................................................................................30

3.3 FSS-Collective-volitive movement ............................................................................................................31

3.4 FSS-example ......................................................................................................................................................34

3.5 FSS-Fish school evolution ............................................................................................................................35

4.1 Parameters of FSS, D2BFSS and BFSS ......................................................................................................42

5.1 Diagram of the feature selection process ..............................................................................................44

5.2 Evolution of SFS – sonar database............................................................................................................47

5.3 Evolution of B2DFSS – sonar database ...................................................................................................49

5.4 Representation of the study of parameters – BFSS – sonar database. ......................................50

5.5 Representation of the study of parameters – BFSS – sonar database - Continuation.. .......50

5.6 Representation of the study of parameters – BFSS – sonar database - Continuation. ........51




5.10 Evolution of BFSS – sonar database ......................................................................................................53

5.11 Evolution of BPSO – sonar database ........................................................................................................54

5.12 Representation of the study of parameters – BFSS – readmission database ..........................57

5.13 Evolution of BFSS – readmission database ............................................................................................59

5.14 Evolution of BPSO – readmission database ..........................................................................................61

XVI

XVII

List of Tables

2.1: List of physiological variables considered. .................................................................................................11

2.2: Physiological limits ...............................................................................................................................................13

2.3: Summary of the number of samples and patients ..................................................................................14

2.4: Number of patients readmitted and not readmitted .............................................................................14

3.1: FSS - example .........................................................................................................................................................33

5.1: Model assessment results – SFS – sonar database. ...............................................................................46

5.2: Results of the study the parameters: stepind and stepvol – D2BFSS ...................................................48

5.3: Results of the study the parameters: Wscale and α – D2BFSS ...............................................................48

5.4: Model assessment results – D2BFSS – sonar database. ........................................................................49

5.5: Configuration of the parameters thres_c and tresh_v tested – BFSS. ..............................................50

5.6: Results of the study the parameters: stepind – BFSS ................................................................................52

5.7: Model assessment results – BFSS – sonar database. ..............................................................................52

5.8: Model assessment results – BPSO - sonar database. .............................................................................53

5.9- Comparison between FS algorithms - sonar database. ........................................................................54

5.10: Model assessment results - SFS - readmission datasets ....................................................................56

5.11: Configuration of the parameters thres_c and tresh_v –BFSS ............................................................56

5.12: Results of the study the parameters: stepind –BFSS ...............................................................................58

5.13: Model assessment results – BFSS – readmission datasets .................................................................58

5.14: Results of the study the parameters: α –BPSO .......................................................................................60

5.15: Model assessment results – BPSO – readmission datasets ...............................................................60

5.16: Model assessment results – NO FS – readmission datasets. .............................................................61

5.17: Comparison between FS – readmission dataset ....................................................................................62

A.1: Results of the study the parameters: thres_c and thres_v - Tests 1-3 .............................................. ii

A.2: Results of the study the parameters: thres_c and thres_v - Tests 4-7 ............................................. iii

A.3: Results of the study the parameters: thres_c and thres_v - Tests 8-11 .......................................... iv

A.4: Results of the study the parameters: thres_c and thres_v - Tests 12-14 ......................................... v

A.5: Results of the study the parameters: thres_c and thres_v - Tests 15-17 ........................................ vi

A.6: Results of the study the parameters: thres_c and thres_v - Test 18 ............................................... vii

XVIII

B.1: Results of the study the parameters: thres_c and thres_v - Tests 1-2 ........................................... viii

B.2: Results of the study the parameters: thres_c and thres_v - Tests 3-5 ............................................. ix

XIX

Notation

Symbols

N f – Number of features selected

N t –Total number of features

N s – Number of data samples

y – - System output

x – Data sample

X – - Database

¯y – - Model output

Υ – Set of possible labels or classes

υ – Class

l – Number of existing labels

D – Decision region

τ – Threshold

w ij – Weight factor

σ j – Activation function

A i – Fuzzy set of the antecedent

B i – Fuzzy set of the consequent

R i – Fuzzy rule

R – Number of rules

µ A (x), µ B (y) – Menmbership functions

µ R (x, y) – Fuzzy relation

f i – Mapping function of the i th rule

β i – Degree of fulfullment of rule i

vi – Cluster center

J – Objective function

U – Fuzzy partition matrix

V – Matrix of cluster prototypes

XX

ρ i – Constant that controls volume of cluster i

π i – Parameter vector for the i t h rule

θ – Matrix of model parameters

ψ – Weighted vector of inputs

Ψ – Matrix of vector inputs

P(x, y) – Probability distribution function

R[f] – Risk or expected loss of f

F – Features space

φ – Non-linear mapping function

x 0 – Subset of features

0 | x) – Relevance index

p – Parameter value

g i – Value of bit i

u – Number of bits used

Φ – Disturbance associated with real data

O – Objective

C i (v) – Constraint function

N e – Number of missclassifications

N s – Total number of tested samples

$(x i ) – Cumulative relative fitness

p tour – Tournament selection probability

p c – Crossover probability

p u – Uniform crossover probability

p mut , r mut – Mutation probability

S – Particle swarm

V i – Visibility sphere of particle i

vi – Particle velocity

xi – Particle position

ai – Particle acceleration

Δt – Time step

oi – Particle cohesion

li – Particle alignment

si – Particle separation

C o , C l , C s – Acceleration weighting factors

S(v ij ) – Logistic function of the velocity

v max – Particle velocity threshold

XXI

w up – Upper bound weight

w low – Lower bound weight

Stepind –step individual parameter

Stepvol -step volative parameter

t -iteration

gactual -atual iteration

gtotal -total number of iterations

thres_c -collective threshold parameter

thres_v -volative threshold parameter

Acronyms

ACC - Accuracy

AUC – Area Under the Curve

BFSS – Binary Fish School Search

BPSO – Binary Particle Swarm Optimization

D2BFSS – Decimal to Binary Fish School Search

FCM – Fuzzy C-Means

FM – Fuzzy Modelling

FN – False Negatives

FP – False Positives

FS – Feature Selection

FSS – Fish School Search

GA – Genetic Algorithm

ICU – Intensive Care Unit

IOM – Institute of Medicine

IQR - Interquartile Range

KDD – Knowledge Data Discovery

MA – Model Assessment

NAE – National Academy of Engineering

NN – Neural Networks

XXII

NP-hard - Non-deterministic Polynomial-time hard

PSO – Particle Swarm Optimization

ROC – Receiver Operating Characteristic

SA – Simulated Annealing

SFS – Sequential Forward Selection

TN – True Negatives

TS – Takagi-Sugeno

TP – True Positives

1

Chapter 1

Introduction

Over the past 30 years, the continuing development and application of systems engineering

methods has enabled an unprecedented growth in the manufacturing, logistics, distribution, and

transportation sectors of our economy [35]. Vast business organizations (e.g. airline companies, chains

of department stores, large manufacturing companies), could not properly operate in the current

business environment without the extensive use of various engineering tools for design, analysis and

control of complex production and distribution systems [36]. However, even with the emergence and

development of new engineering techniques over the past years, some industries have barely begun

to take advantage of systems engineering tools, which means that there is a great number of potential

unexplored applications for them.

A good example is the health care delivery system, one of the most technologically intense

and data-rich industries [32]. According to a report from the National Academy of Engineering (NAE)

and the Institute of Medicine (IOM), the application of systems engineering tools could play a crucial

role in solving the current crisis in the very complex health care system [47].

With the computerization of many sectors and with the advances in data collection tools, our

capabilities of both generating and collecting data have been increasing rapidly in the last several

decades. This explosive growth in stored data has generated an urgent need for new techniques and

automated tools that can intelligently assist us in transforming the vast amounts of data into useful

information and knowledge [26].

This chapter begins with a brief overview of methods currently used in knowledge discovery in

databases. Further, the case of study will be introduced, the prediction of readmissions in an intensive

care unit (ICU). In the end of the chapter, the contributions and outline of this work are presented.

1.1-Knowledge Discovery

The information age is very hard to grasp. In an average person’s life nowadays, we get more

information in a day than someone who lived 100 years ago would get in a lifetime. The speed at

2

which information is increasing means that finding accurate data is becoming more important than the

data itself [42].

The traditional method of turning data into knowledge relies on manual analysis and

interpretation. In the health care industry, this form of manual probing of a data set is slow, expensive

and highly subjective. With the urgent need for a new generation of computation techniques and tools

to assist humans in extracting useful information (knowledge) from a fast growing volume of data, a

methodology was created, the Knowledge Data Discovery (KDD), first introduced by Fayyad in 1996

[14].

The KDD process can be formally defined as a non-trivial process of identifying valid, novel,

potentially useful, and ultimately understandable patterns in large amounts of data.

The KDD process can be decomposed into five main steps, as illustrated in Fig. 1.1:

1. Data acquisition – The process of acquiring and storing data.

2. Data Preprocessing – Consists of applying proper techniques that allow the improvement of the

overall quality of the data. Includes processing of noise/outliers, correction of missing values,

and/or alignment of data sampled at different frequencies.

3. Feature Selection – Consists of finding useful features (variables) to represent the data and

discarding the non-relevant ones, containing redundant information.

4. Modeling – Refers to the process of combining methods from computational intelligence and/or

statistics to extract patterns in data sets. In this work it was used classification models, which

identifying to which of a set of categories (classes) a new observation belongs, on the basis of

a training set of data containing observations whose category membership is known

5. Interpretation - The process of evaluating the discovered knowledge with respect to its validity,

usefulness, novelty, and simplicity. External expertise may be required in this step.

All of the five steps described are equally crucial, and the process is iterative, i.e. multiple loops

can occur between any steps of the KDD method.

In real-world systems, the selection of a low number of features that consistently describe the

problem is usually time consuming and, in many cases, impossible to achieve with a greedy approach.

In this work, the focus is turned to the feature selection stage of the KDD method. A novel approach is

Figure 1.1: Knowledge discovery process.

http://en.wikipedia.org/wiki/Categorical_data

http://en.wikipedia.org/wiki/Observation

http://en.wikipedia.org/wiki/Training_set

3

proposed for the optimization algorithm Fish School Search (FSS), in order to use this population

based algorithm in problems of feature selection.

1.1.1-Principles of Feature Selection

The addition of more features (variables) is expected to increase the accuracy of the model

(classifier). However, for some classifiers an increase in input dimensionality decreases the reliability of

statistical parameter estimations and may, consequently, result in a decrease in the classification

accuracy [43]. This is known as the Hughes effect [29], the so-called curse of dimensionality, which

postulates that the classification accuracy will decrease after a certain feature-set size is reached unless

the number of training samples is proportionally increased [43]. The Hughes effect is therefore more

likely to be encountered when small training sets are used and the input dimensionality is increased.

The field of feature selection has been object of extensive research in recent years [41]. This is

explained due to the potential benefits introduced when reducing data dimensionality. It can greatly

improve data visualization and understanding, facilitating knowledge discovery. Furthermore, one

needs to measure and store less information leading to a reduction in equipment, and consequently

cutting unnecessary costs. From the clinical point of view, this process may bring to light new variables

that had not been previously considered as relevant for a given medical problem.

Feature selection algorithms can be grouped into four categories: filters, wrappers, hybrids and

embedded [52, 23]. Filter methods rely on general characteristics of the data to evaluate and select

feature subsets without involving any mining algorithm. Some examples include using measurements

of entropy, variance, correlation or mutual information of single and multiple variables [53]. Wrappers

require one predetermined mining algorithm and use its performance as the evaluation criterion. They

search for the features best suit to improve the performance of the mining algorithms, but they also

tend to be more computationally expensive than filters [56]. Some of the most commonly used

wrapper methods include best-first, branch-and-bound, simulated annealing, genetic algorithms,

forward selection or backward elimination, but the list is considerably longer and continuously

growing. The present thesis introduces a new wrapper method, the Binary Fish School Search (BFSS)

algorithm.

Hybrid models attempt to take advantage of the two previous types of models by exploiting

their advantages in different stages [22]. First, a filter decreases the dimensionality of data by

eliminating features according to the specified criteria. Then, wrappers select relevant features

according to the mining objective.

Finally, embedded methods differ from the previous feature selection methods in the way

feature selection and learning interact. In contrast to filter and wrapper approaches, the learning and

4

feature selection parts cannot be separated in embedded methods [23]. Examples of embedded

methods for feature selection include decision trees and random multinomial.

Many problems related to Feature Selection (FS) have been shown to be NP-hard and finding

the optimal set of features is usually intractable [6,30].Thus, the search of the most predictive feature

subsets can be seen as an optimization problem. Metaheuristics are general upper level (meta)

algorithmic techniques, that can be used as guiding strategies in the design of heuristics to solve

specific optimization problems. These techniques are capable of finding acceptable solutions, within

reasonable time, by using experience-based techniques or through guided search, but do not

guarantee that the optimum will be found. Popular metaheuristics for combinatorial problems

include simulated annealing (SA) by Kirkpatrick [34] genetic algorithms (GA) [27] Scatter Search

[19], Tabu Search [20], and Particle Swarm optimization (PSO).

The present thesis resorted to: 1) a new FS wrapper method based on the new-found

metaheuristic Fish school Search optimisation, based on fish school behaviour [17], 2) a PSO algorithm

modified to be used in FS problems [15] and 3) a wrapper method based on Tree search feature

selection: the Sequential Forward Selection (SFS). The three algorithms were tested in a benchmark

database before being applied to a problem in the health care system.

1.1.2- Modeling

Classification modeling, used in the data mining process, can be defined as the application of

discovery algorithms that produce a particular enumeration of patterns/models over the data.

The usefulness of a model is to mimic how a particular object or phenomenon will behave in a

particular condition. It can be used for testing, analysis or training, in conditions where real-world

systems or concepts can be represented by a model [54].

Machine learning refers to a group of mathematical modeling techniques that are capable of

automatically acquiring and integrating knowledge based on empirical data, such as data from sensors

or databases. This area has been extensively studied with numerous successful applications across a

wide range of fields (a very broad description of application areas and examples can be found in [33]).

The purpose of using machine learning techniques (or learning machines) is to reproduce the

human learning capabilities, namely the ability to recognize complex patterns and make intelligent

decisions based on data.

Learning machines are widely used in classification, regression, recognition and prediction

problems. There are many possible applications for these modeling techniques, that range from

engineering applications in robotics, fault tolerant control, pattern recognition (e.g. speech

recognition, handwriting recognition), to medical applications (e.g. diagnosis, prognosis) [33].

http://en.wikipedia.org/wiki/Simulated_annealing

http://en.wikipedia.org/wiki/Metaheuristic#cite_note-kirkpatrick83optimization-9

http://en.wikipedia.org/wiki/Genetic_algorithms

http://en.wikipedia.org/wiki/Metaheuristic#cite_note-holland75adaptation-10

http://en.wikipedia.org/wiki/Metaheuristic#cite_note-glover77scattersearch-11

http://en.wikipedia.org/wiki/Tabu_search

http://en.wikipedia.org/wiki/Metaheuristic#cite_note-glover86future-12

5

Nonetheless, in this work, the main interest is the machine learning capability of discovering and

classifying patterns in high dimension databases.

Pattern recognition [11, 44, 46] addresses the problem of assigning labels (classes) to objects

(or samples), being each sample composed by a set of features (or attributes). In order to better

understand pattern recognition, this subject has been divided into two major types of problems [37]:

unsupervised and supervised learning.

In the unsupervised category, the problem is to understand whether there are groups in the

data, and what characteristics make the objects similar within the group and different across the

groups. Contrarily, in supervised learning, each data sample already has a pre-assigned label, and the

task consists of training a classifier in order to differentiate between labels.

In this work, it was decided to use a non-linear machine learning technique, Fuzzy Modelling

(FM). This method is considered suitable for the demanding problem of pattern recognition, since it

can, theoretically, approximate any multivariate nonlinear function [37]. The main advantages of this

method are the following:

Efficient tool for embedding human (structured) knowledge into useful algorithms multivariate;

Applicable when mathematical model is unknown or impossible to obtain;

Operates successfully under a lack of precise sensor information;

Useful at the higher levels of hierarchical control systems;

Appropriate tool in generic decision-making process;

Transparent, non-crisp model;

Interpretation in the form of rules and logical connectedness. From the medical point of view,

these rules provide additional means of validating the fuzzy classifier by clinician’s knowledge

regarding the system.

The main disadvantages are:

Experts may have problems in structuring the knowledge with respect to the structure of the

model;

Experts sway between extreme poles: too much aware in field of expertise, or tending to hide

their knowledge;

model complexity increases exponentially with the increase of the number of features;

Learning is highly constrained; typically more complex than other models, like neural networks

(NN).

6

1.2 Prediction of readmissions

Patients readmitted to an intensive care unit during the same hospitalization have an increased

length of stay, higher costs and increased risk of death. Previous studies have demonstrated overall

readmission rates of 4-14% [3, 48], of which nearly a third can be attributed to premature discharge

from the critical care setting [3, 12]. It is also documented that the length of stay for readmitted

patients is at least twice as long as that for patients discharged from the ICU but not readmitted and

that hospital death rates are 1.5 to almost 10 times higher among ICU readmission [49].

Increasing pressures on managing care and resources in ICUs is one explanation for strategies

seeking to rapidly free ICU beds. Faced with this scenario, a clinician may elect to discharge a patient,

currently in the ICU, who has already had the benefits of stabilization and intensive monitoring, to

make room for more acute patients allocated in the emergency department, exposing the outwardly

transferring patients to the risk of readmission in the short term. Moreover, despite the existence of

morbidity and mortality issues around readmission, the Centres for Medicare & Medicaid Services have

already reduced funding for specified avoidable conditions, and it is quite possible that avoidable

readmission to an ICU will receive attention in the future as well.

Previous studies [7] have examined different variables that are assessed at discharged, but

these predictive models performed only slightly better than models based upon the gold standard

method- APACHE II.

Thus, this work has encountered the problem of readmission to an ICU, being its goal to

predict the readmission of patients in an ICU within 24-72 hours after the discharge. A data mining

approach was used to a real world database, the MIMIC II, combined with fuzzy modelling and three

different Feature Selection algorithms, the Sequential Forward Selection, the Binary Particle Swarm

optimization and the novel Binary Fish School Search, here formulated. In this context, 22 physiologic

variables acquired during the stay of real patient in an ICU were selected. Statistical measures were

utilized to describe each patient stay: the mean, the standard deviation, the maximum, the minimum,

the Shannon entropy and the weighted mean, which was tested for different weights.

1.3 Contributions

In this work, the problem of Feature Selection in real-world databases is addressed. The main

contributions of this work are:

Introduction and formulation of Binary Fish School algorithm, the novel algorithm for Feature

Selection derived from the optimization algorithm Fish School Search. Originally this algorithm

was presented as a multidimensional real system encoded algorithm [17], and is here modified in

7

order to solve problems with binary inputs. The algorithm was then applied to feature selection

problems;

Use of new types of features (weighted mean and Shannon entropy) to predict the readmissions

of patients in the ICU during the 24 to 72h period that follows the discharge;

Comparison of the FS results applied to two real databases using the three feature selection

algorithms: sequential forward selection, particle swarm optimization and binary fish school

algorithm.

1.4 Outline

In chapter 2, an overview of the knowledge data discovery stages, studied in this work, is

presented. It begins with the definition of the two addressed databases and the necessary

preprocessing of the data. Then, the fuzzy modelling technique is presented, together with the

performance measures considered in this work. Finally a broad description of wrapper methods is

presented along with description of the state of the art feature selection algorithms used in this work.

In chapter 3, the original fish school search algorithm is presented in detail. The internal

mechanisms are featured and an illustrative example is given to consolidate the description. Finally the

first approach to transform the FSS algorithm in order to solve feature selection problems is presented,

the decimal to binary fish school search.

Chapter 4 will introduce the goals and detailed formulation of the binary fish school search.

Chapter 5 presents the results of the wrapper methods that combine the studied machine

learning techniques with the introduced search algorithms. The chapter begins with the presentation

of the outline of the approach, and the definition of the parameters used to evaluate the functioning

of the formulated algorithms. Next, the tests to select the parameters for the optimization algorithms

are presented. Finally, the methods are tested and compared over the two databases.

At last, in Chapter 6 the results of this work are summarized and conclusions are drawn.

Furthermore, promising areas for future research are presented

8

9

Chapter 2

Knowledge Data Discovery

2.1-DATA

In this work, two databases were used: a benchmark database and a health care database, the

MIMIC II. The benchmark databases are employed to ascertain the quality of the developed FS

algorithms, i.e., to verify if the FS algorithms are capable of selecting a low number of features subset

with good informative potential. After validation with the benchmark databases, the FS algorithms

were applied to a health care database, the MIMIC II, as a prediction of readmission problem.

In this chapter, the selected benchmark database is initially exposed and then the health care

database is presented as well as the necessary processing for this database.

2.1.1-Benchmark database – Sonar

The choice of a proper group of benchmark databases is very important to adequately validate

the implementation of an algorithm. These databases should allow the algorithm designer to test the

algorithms according to the predefined performance measures and allow the comparison between

these results with those from state of the art methods.

The sonar database is comprised of 208 real samples of rocks that are divided in two labels. A

data sample is a set of 60 features with values ranging from 0.0 to 1. Each of these features represents

the energy within a particular frequency band, integrated over a certain period of time. The label

associated with each record contains the indication if the rock sonar signals bounced off a metal

cylinder (97 samples) or bounced off a roughly cylindrical rock (111 samples). The task at hand was to

discriminate between these two classes [21].

This database was developed by Sejnowski and Gorman on their study in the classification of

sonar signals using artificial neural networks [21].

10

2.1.2- MIMIC II database

The MIMIC II database [51] is a large database of ICU patients admitted to the Beth Israel

Deaconess Medical Centre, collected from 2001 to 2006. The MIMIC II database is currently formed by

25,549 patients, of which 19,075 are adults (> 15 years old at time of admission). For each patient,

several samples of physiological variables were stored throughout their stay.

In this work, a previously developed dataset (first presented in [15]) was used, including only

adult patients (>15 years) that were ICU inpatients for at least 24 h and readmited back to any ICU of

the same medical centre between 24 and 72 h. This interval is often referred to as an early readmission

[43]. The reason for choosing 24h as the lower bound for the readmission time window is related to

how MIMIC II is structured. Also, patients readmitted to the ICU less than 24 h after their discharge are

considered to belong to the same ICU stay. The choice of 72 h as the upper bound for the readmission

time window was based on previous works [50], and local clinical intensivist suggestions. All included

patients were also required to have at least one measurement of the 22 variables shown in Table 2.1.

These variables were selected based on the hypothesis that a good predictive value could be achieved

using a few physiological variables and taking into account the following directives:

i. The variables had to be easily and/or routinely assessed in the 24 h before discharge A balance had

to exist in the number of selected variables given that it will affect the number of patients that will

form the dataset, i.e. the more variables defined, the fewer the patients that were likely to have all

of them collected at the same time;

ii. Selecting a high number of variables may bias the dataset towards selecting patients having similar

conditions that required their specific measurement/testing;

iii. The variables chosen should be independent with minimal correlation.

Exclusion criteria included patients who died during the ICU stay.

As with other real-world databases, a few preprocessing steps were necessary to improve the

quality of the raw data of the MIMIC II. In order to deal with variables collected within different

sampling periods, similarly to [15], a template variable was used. This process aligned all samples to

the same point in time as a designated template variable. Heart rate was chosen as the template

variable on the basis since it was one of the most frequently measured variables and thus, introduced

fewer artifacts in the data. With regards to missing data, in general, ICU data can be missing either

because they are perceived to be irrelevant for the current clinical problems (thus, not recorded), or

because exogenous interventions or endogenous activities have rendered the data useless [58].

11

Type of variables Variable name (units)

Heart rate (beats/min)

Respiratory rate(breaths/min)

Temperature(ºC)

SpO2(%)

Non-invasive arterial Blood pressure(systolic)(mmHg)

Blood pressure (mean)(mmHG)

Red blood cell count (cellsx103/lL)

White blood cell count (cells x 103/lL)

Platelets (cells x 103/lL)

Hematocrit (%)

BUN (mg/dL)

Sodium (mg/dL)

Potassium (mg/dL)

Calcium (mg/dL)

Chloride (mg/dL)

Creatinine (mg/dL)

Magnesium (mg/dL)

Albumin (g/dL)

Arterial pH

Arterial base excess (mEq/L)

Lactic acid (mg/dL)

Other Urine output (mL/h)

Monitoring signlas

Laboratory tests

Data missing for an intentional reason (e.g. patient is transported out of the ICU for an

imaging scan) was considered non-recoverable and thus deleted. On the other hand, data missing for

some unintentional reason (e.g. sensor goes off patient’s chest) was considered recoverable and the

last available value was used to impute values to these segments.

The Interquartile Range (IQR) method was used in order to deal with the outliers. This method

measures the statistical dispersion of the data, and divides it into quartiles. IQR is a trimmed estimator

that identifies the most robust measure of scale [58]. The patient selection process is summarized in

Fig. 2.1.

It is important to point out that the number of samples for each patient is not constant. A

sample contains measures of the 22 physiologic variables. The number of these samples acquired for

each patient during his stay can vary between 1 and 26 samples and it can have different sampling

periods. The total number of samples considered was 13675.

It was detected that some samples of the 1028 selected patients contained outliers, so some

preprocessing was necessary.

In order to use a constant dimension for the inputs of the models (necessary condition) a

transformation to the data was performed. Statistical measures were used in order to seize the

information of the time series for the physiologic variables of each patient.

The next section exposes the preprocessing used on this dataset.

Table 2.1: List of physiological variables considered from MIMIC II, (according to [15]).

12

MIMIC II(n=25,549)

n=19,075

n=3,034

n=1,267

Survivedn=1,028

Not Survivedn=239

Not Readmittedn=893

Readmittedn=135

Patients > 15 yrICU stay > 24h

w/ all variablesfrom Table 2.1

data preprocessing (removal ofmissing data and outliers)

Data preprocessing

Outliers

After analysing all the 13675 samples of the 1028 patients, and although the IQR method was

applied to the raw database of MIMIC II, there were still some samples that contained values out of the

physiologic limits. As an example, Fig. 2.2 shows the plot of the physiologic variable temperature (ºC)

for all samples considered. The corporal temperature of 5 ºC seen in Fig. 2.2 is not possible, even if the

patient is in a severe condition.

Figure 2.1: Patient selection flowchart [15].

Figure 2.2: Graphical representation of the physiological variable temperature ºC for all samples.

13

No. Variable name (units) Min Max

1 Heart rate (beats/min) 0 250

2 Respiratory rate(breaths/min) 0 200

3 Temperature(ºC) 25 42

4 SpO2(%) 60 100

5 Non-invasive arterial Blood pressure(systolic)(mmHg) 30 300

6 Blood pressure (mean)(mmHG) 10 187

7 Red blood cell count (cellsx103/lL) 2 8

8 White blood cell count (cells x 103/lL) 0.4 50

9 Platelets (cells x 103/lL) 3 1000

10 Hematocrit (%) 19 60

11 BUN (mg/dL) 4 500

12 Sodium (mg/dL) 120 160

13 Potassium (mg/dL) 2.2 8

14 Calcium (mg/dL) 7.2 12

15 Chloride (mg/dL) 80 130

16 Creatinine (mg/dL) 0.1 9

17 Magnesium (mg/dL) 0 10

18 Albumin (g/dL) 0.5 18

19 Arterial pH 4.8 7.8

20 Arterial base excess (mEq/L) -30 20

21 Lactic acid (mg/dL) 0 10

22 Urine output (mL/h) 0 1000

The outliers were eliminated using the maximum and minimum limits for the 22 physiological

variables of Table 2.2. These physiologic limits were obtained through the Decreased Variable Analysis

(MEDAN).

All of the samples that contain one or more physiologic variables with values out of the limits

of Table 2.2 were considered samples with outliers. The total number of measures considered as

outliers was 517. However, only 473 samples contain one or more variables with values out of the

limits. According to [8] the missing samples, considered outliers, can be treated in various ways:

1. Ignore the tuple.

2. Fill in the missing value manually.

3. Use a global constant to fill in the missing value.

4. Use the attribute mean to fill in the missing value.

5. Use the attribute mean for all samples belonging to the same class as the given tuple.

6. Use the most probable value to fill in the missing value.

Methods 3 to 6 bias the data.

Since the number of samples, as well as the temporal spacing, of each sample for each patient

are very irregular, models created later must have the ability to handle these irregularities. Thus, the

approach 1 was chosen, in which a sample containing one or more measures outside of the limits in

Fig. 2.2 is removed, being a process that does not bias data.

Table 2.2: Physiological limits considered for the exclusion of outliers.

14

No. of samples No. of patients

Before the preprocessing 13675 1028

Removed during the preprocessing 473 18

After the preprocessing 13202 1010

Minimum number of samples: 1 2 3 4 5 6 7 8 9 10

No. of patients not readmited: 879 655 637 631 617 604 588 578 567 551

No. of patients readmited: 131 94 89 89 88 87 86 82 80 78

% readmited: 13.0 12.6 12.3 12.4 12.5 12.6 12.8 12.4 12.4 12.4

In this process, some patients had all there samples removed, resulting in a total of 1010

patients. Table 2.3, summarizes the preprocessing of the outliers.

The number of samples of the 22 physiologic variables per patient after treatment of outliers

was analysed, Fig.2.2 shows the variation of the number of patients per number of samples after the

outlier’s treatment. It is worth noticing that the number of patients with only one sample is quite

considerable.

In order to evaluate if the percentage of readmitted patients remained in the 4-14%, referred

in the literature, an analysis of the percentage of patients readmitted was made, varying the patients

with a minimal number of measurements for the 22 physiologic variables. Table 2.4 shows the results

of this analysis for the variation of a minimum number of samples of 1 up to 10.

Data transformation

Knowing that there was a great variability in the number of samples per patient and very

irregular sample periods, some descriptive statistics measures were used to describe the stay of each

patient. By doing this, all patients would have the same number of features that described the time

series of the physiologic variables throughout their ICU hospitalization. These features, with constant

dimension, could then be used as inputs for the classification models.

Table 2.3: Summary of the number of samples and patients, resulting from preprocessing.

Figure 2.2: Graphical representation of the number of patients per number of measurements of the 22

physiological variables considered.

Table 2.4: Analysis of the number of patients readmitted and not readmitted for the subset of patients

with a minimum number of samples. The percentage of readmitted patients remain in 4-14% as referred

in the literature.

15

Previous studies [15], used the arithmetic mean, the maximum, the minimum and the standard

deviation of each physiologic variables in order to absorb the information of the time series of the

considered physiologic variables for each patient. In the present work, in addition to these statistical

measures, the Shannon entropy and the weighted average were also used, giving the possibility to

withdraw more information.

Shannon entropy is the average unpredictability in a random variable, which is equivalent to

its information content. It provides an absolute limit on the best possible lossless encoding

or compression of any communication, assuming that the communication may be represented as a

sequence of independent and identically distributed random variables. There are already studies that

use entropy as feature extraction measure [45, 10].

In relation to the weights of the weighted mean, a linear distribution along the stay of the

patient was considered, giving more relevance to the last measurements before the discharge. Four

gradients were considered for these weights, as presented in Fig. 2.3.

In order to use the descriptive statistics measures announced before, it was decided to use

only patients with a minimum of 3 measurements available, considering 725 patients (647 not

readmitted and 89 readmited, see Table 2.4). Thus, after the treatment of outliers and transformation

of the dataset, 4 datasets emerged, one for each gradient of the weighted mean.

The only features that differ in each dataset are the 22 features that correspond to the

weighted mean. Each dataset was formed by 726 patients (12.3% readmitted) and 132 features. The

four datasets considered will be referred as readmition datasets.

Each patient will be considered as a sample for inputs of the classification models, Fig. 2.4

illustrates a sample.

Figure 2.3: Graphical representation of the four different gradients for the weights to be used in the

weighted mean. The measures of the 22 variables vary between 0 and 24 hours before the discharge.

Figure 2.4: Illustrative diagram of the formation of the inputs for the classification models: each patient

represents a sample ([1x132] array). Associated with each patient there is also a notification that indicates

whether he is or not in readmitted class.

http://en.wikipedia.org/wiki/Information_content

http://en.wikipedia.org/wiki/Lossless

http://en.wikipedia.org/wiki/Data_compression

http://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables

16

2.2-Modeling

In the present work, we used the machine learning technique of fuzzy modelling. These

models were used as classification models. Briefly, for a given sample the model, created based on the

train set of the data, is supposed to correctly assign this sample to one of the labels considered in the

problem. An overview of the fuzzy modelling is given in the next topics.

2.2.1 Fuzzy modeling

Fuzzy modeling is a tool that allows approximation of nonlinear systems when there is little or

no previous knowledge of the problem to be modeled [55, 13]. This tool supports the development of

models around human reasoning (also referred to as approximate reasoning), and allows an element

to belong to a set to a degree, indicating the certainty (or uncertainty) of its membership.

Within medical-related classification problems, several fuzzy-based models have shown

comparable performances to other nonlinear modeling techniques [18, 16, 28]. Fuzzy modelling is

particularly appealing as it provides not only a transparent, non-crisp model, but also a linguistic

interpretation in the form of rules and logical connectives. These are used to establish relations

between the defined features in order to derive a model. A fuzzy classifier contains a rule base

consisting of a set of fuzzy if-then rules together with a fuzzy inference mechanism. These systems

ultimately classify each instance of a dataset as pertaining to one of the possible classes defined for

the specific problem being modeled [55].

For both databases used in this work (Sonar and readmission), the goal was to classify the

samples in one of two labels. In the case of sonar database: rock sonar signals bounced off a metal

cylinder or bounced off a roughly cylindrical rock, and in the readmission problem, patient would be

readmitted or patient would not be readmitted to the ICU after 24-72 hours of discharge.

First order Takagi-Sugeno (TS) fuzzy models [55] were applied, which consist of fuzzy rules

where each rule describes a local input-output relation. When first order TS fuzzy systems are used,

each discriminant function consists of rules of the type:

where, corresponds to the rule number, is the input vector, is

the total number of inputs (features), is the fuzzy set for rule and feature, and is the

consequent function for rule .

The degree of activation of the h rule is given by:

(2.1)

17

where .

The overall output is determined through the weighted average of the individual rule outputs.

The number of rules, and the antecedent fuzzy sets are determined using fuzzy clustering in the

product space of the input and output variables [55]. The consequent parameters for each rule , are

obtained as a weighted ordinary least-square estimate.

Given a classification problem, and being a linear consequent, a threshold is required to turn

the continuous output into the binary output . In this way, ,

and if .

The number of rules j and the antecedent fuzzy sets were determined by means of fuzzy

clustering in the product space of the input and output variables.

For each database, the data was divided into training and test sets, while the model

parameters were calculated using the training set, the feature subset quality was assessed using the

test samples. Such approach was necessary due to the risk of overfitting, which means that the model

could describe random error or noise instead of the underlying information. Thus, the test set provided

a fair comparison over the generalization capabilities of the evaluated models [47].

2.2.2 Clustering

Clustering is an unsupervised learning method that organizes and categorizes data based on

the similarity of data objects [2]. It is used in various fields, such as pattern recognition, machine

learning and bioinformatics [33]. It is useful for knowledge discovery from empirical data and model

construction.

A cluster can be seen as a group of objects more similar to one another than to other data

points, being similarity usually defined as a distance norm. Furthermore, a cluster can also be seen as

the area of influence of rule . Therefore, a cluster center, also called prototype, coincides with the

corresponding rule centre. The closer a data point is to a cluster center, the higher the fulfilment

degree will be.

There is a great number of clustering algorithms, however, most of the analytical clustering

algorithms are based on the minimization of the fuzzy c-means objective functional [5]. This objective

function can be written as (2.2).

(2.2)

where the positive constant determines fuzziness of the resulting clusters. The vector is

one of the data samples, is the cluster center and is a distance norm between data

points and cluster centers. The fuzzy partition matrix contains all the normalized membership values

, is the matrix containing all the data samples, and is the matrix of the cluster prototypes.

18

In this work, the fuzzy C-means (FCM) clustering algorithm was used, requiring the definition

of the number of cluster (which translates into the number of fuzzy rules). The number of clusters to be

used was determined based on the minimization of the partition index [4]. This index accounts for

both properties of the fuzzy memberships and structure of the data by measuring the compactness

and separation of the clusters. This index is defined as:

(2.3)

where corresponds to the weighting exponent of the FCM algorithm, corresponds to the

cardinality of fuzzy cluster , corresponds to the distance between a data point and its

cluster center , and

(named as the separation of a fuzzy cluster ) corresponds to

the sum of the distances from the cluster center to the centers of all other clusters. The lower

the value of t he more compacted and separated are the clusters.

For the sonar databases and the four datasets of the MIMIC II considered, the values of

were calculated by varying J from 2 to 5. The final number of clusters corresponded to a local

minimum where the difference between the values of the criterion was minor.

In the present work, the maximum limit of variation for the search of the final number of

clusters was chosen, with the thought that a smaller number of clusters means a lower number of rules

and hence a lower degree of model complexity. For the two databases considered (sonar and

readmission datasets) the chosen numbers of cluster were 2.

2.2.3 Performance Measures

Traditionally, accuracy has been used to evaluate classifier performance. This measure is

defined as the total number of correct classifications over the total number of available samples.

Usually, most of the classification problems have two classes, positive and negative cases [38]. Thus,

the classified test points can be divided into four categories:

• true positives (TP) - correctly classified positive cases,

• true negatives (TN) - correctly classified negative cases,

• false positives (FP) - incorrectly classified negative cases,

• false negatives (FN) - incorrectly classified positive cases.

Given these categories, the accuracy can be written as (2.4).

(2.4)

This criterion is limited, especially in medical applications, for various reasons. If one of the

classes is more underrepresented than the others, misclassifications in this class will not have a great

impact in the accuracy value. Also, a good classification of a class might be more important than

19

classifying other classes and this cannot be assessed with accuracy. To take this matter into account,

two performance measures were introduced: the sensitivity (2.5) and specificity (2.6):

(2.5)

(2.6)

The sensitivity and specificity varies between

The receiver operating characteristic (ROC), or simply ROC curve, is a graphical plot which

illustrates the performance of a binary classifier system as its discrimination threshold is varied. It is

created by plotting the fraction of true positives out of the positives (sensitivity) vs. the fraction of false

positives out of the negatives (one minus the specificity ), at various threshold settings. An example of

ROC curves is shown in Fig. 2.5.

When using normalized units, the area under the curve (AUC) is equal to the probability of a

classifier ranking a randomly chosen positive instance being higher than a randomly chosen negative

one (assuming 'positive' ranks higher than 'negative'). The AUC measure ranges from 0.5 (random

classifier) to 1 (perfect classifier).

In the present work, the Sensitivity and Specificity were used as performance measures for the

models using the sonar database and readmission datasets, introduced in section 2.1. However, in the

models using sonar database accuracy was used as the main performance measure, and, in the

readmission datasets the AUC. This choice was made because of the fact that the two classes of the

sonar database had similar numbers of samples (89 samples 45% vs 111 samples 55%) and for the

readmission problem the percentage of the class readmitted was only 12.3% against 87.7% of not

readmitted. In this case one of the classes is underrepresented and if the accuracy had been used the

results would not be realistic, i.e. if a model classified all patients as not readmitted, the accuracy

would be ~87.7%, but the AUC measure would be 0.5, corresponding to a random classifier.

For the computation of the measure AUC, only one threshold was used, the one through which

the best performance of the model was achieved with the train set. With the resultant sensitivity and

Figure 2.5: Example of three ROC curves.

http://en.wikipedia.org/wiki/Graph_of_a_function

http://en.wikipedia.org/wiki/Binary_classifier

http://en.wikipedia.org/wiki/True_positive

http://en.wikipedia.org/wiki/Sensitivity_(tests)

http://en.wikipedia.org/wiki/False_positive

http://en.wikipedia.org/wiki/False_positive

http://en.wikipedia.org/wiki/Specificity_(tests)

20

specificity of the test set using that threshold, a point was marked in the ROC. The AUC was computed

as the area under the two segments that link the points and to the point marked with the

sensitivity and specificity. By doing this, we ensure a good approximation of the performance of the

model.

2.3- Feature Selection

The main characteristic of wrapper methodologies is the involvement of the predictor as part

of the selection procedure. In this work, a learning machine was used as a “black box” to score the

subsets according to their predictive performance [23]. Wrappers are constituted by three main

components:

1) Search method;

2) Learning machine;

3) Feature evaluation criteria.

Wrapper approaches were aimed to improve the results of the specific predictors they work

with. During the search, subsets were evaluated without incorporating knowledge about the specific

structure of the classification [23].

In section 2.2 the fuzzy modeling technique (learning machine) was introduced. It is

considered to have universal function approximation properties, i.e., in theory they could approximate

the behaviour of any function. However, as referred in section 1.1.1, in real problems this is rather

difficult for a number of reasons, being one of them the high dimensionality of the available data.

Feature selection is generally used to identify which of the available variables are closely

related to the prediction of the outcome and to discard those unrelated to it, reducing the

dimensionality of the dataset [25, 41, 39]. From the clinical point of view, this process may bring to

light new variables that had not been previously considered as relevant to a given outcome.

In the present work, three FS algorithms were applied, the sequential forward selection (SFS),

Binary Particle Swarm Optimization (BPSO) and two formulated algorithms: decimal to binary Fish

School Search (D2BFSS) and Binary Fish school search algorithms (BFSS).

The following sections, present an overview of the well-known SFS algorithm and the BPSO

methods.

2.3.1- Sequential Forward Selection

A detailed description of the sequential forward selection search algorithm used is reported in

[39]. Briefly, a model is built for each of the features in consideration, and evaluated using a

performance criterion upon the test set. The feature that returns the best value of the performance

21

criterion is the one selected. Then, other feature candidates are added to the previous best model, one

at a time, and evaluated. Again, the combination of features that maximizes the performance criterion

is selected. When this second stage finishes, the model has two features. This procedure is repeated

until the stop criterion is achieved. In the end, all the relevant features for the considered process

should be obtained.

The main advantages of this method relate to its simplicity, possibility of graphical

representation of the performance of the added feature and transparent interpretation of the results

which, for clinicians, is particular attractive. The main disadvantage is related to the greedy and thus

susceptible approach of finding local optima [39].

In this work, unlike the traditional stop criteria of iteration without improving the

performance of the models, the maximum number of features selected was used as the stopping

criteria. After the maximum number of features had been achieved, the model with the set of features

that achieved best performance was considered as the selected best features.

The overall process of the SFS algorithm can be described as:

For each feature in the feature vector X that does not belong to the features of the model:

repeat

Build model using previous features of the model combined with each feature in the feature

vector X that does not belong to the features of the model

Compute performance measure;

Select the combination of features with the highest value of AUC as the new features of the

model;

until number of selected features reaches defined limit

Select the final features.

The accuracy, for SONAR database, and the AUC, for the MIMIC II derived datasets, were used

as performance measures to maximize. For each combination of features selected by the SFS, the

process of generating the performance measure of the model can be described by the following steps:

1. The model is trained with the train set and the selected features.

2. With the simulated output of the training set, a threshold is iteratively evaluated in order to find

the one who maximise the performance (ACC or AUC, depending on the database).

3. With the threshold found, the test set is then simulated.

4. The final performance of the model is generated with the test set output.

22

2.3.2- Binary Particle Swarm Optimization

Particle swarm optimization is a stochastic population-based metaheuristic, inspired in

swarming behaviour of some biological species (e.g. bird flocks).

There are various ways of encoding a problem solution, the most common and more generic

are real, integer and binary encoding. The use of each of them depends on the problem in hand.

Normally, in feature selection, the search space organization is made such that each state represents a

feature subset [16]. In a problem with variables, a state is encoded by a sequence of bits, each bit

indicating whether a feature is present or absent. An example of a possible state is represented by the

sequence:

(2.7)

The variable xij corresponds to input Fj, where . If feature is to be selected then

if not . This process is illustrated in Fig. 2.6.

Essentially, in BPSO, each particle is a candidate solution of the optimization problem. A

particle is associated to a position and a velocity in the search space, where the method for

determining the changes in velocity depends on the particle itself and the other particles.

The iterative process in search of the optimum is [16]:

Step 1: Evaluate each particle in the swarm;

Step 2: Find the swarm and particle best values;

Step 3: Update velocities;

Step 4: Update positions of the particles;

Step 5: Go to Step 1 if not finished/stop criteria.

There are two crucial steps in the way the algorithm operates, the update of velocities and

update of particle positions.

Figure 2.6: Decoding process in feature selection, according to [16].

23

1) Update velocities: Velocity directs the movement in the search space taking into account the

performance of the own particle and of the swarm, and it is update with the following equation:

(2.8)

The term involving constant is called the cognitive component and the term involving is

the social component. q and r are uniform random numbers . Once velocities have been

update, the restriction is applied; this is a crucial step for the swarm to maintain

coherence.

2) Update particle position: The logistic function of the velocity is used as the probability distribution

for the position, [59]:

(2.9)

Thus the particle position is calculated for each variable by:

(2.10)

Objective Function

Recalling that the two main objectives in the FS problem are: maximizing the model accuracy

and minimizing the size of the feature subset. The objective function [16] will be defined as a fitness

function, being the goal its maximization:

(2.11)

where is the size of the feature subset and the total number of features to be selected. The term

on the left side of the equation accounts for the overall accuracy or AUC and the term on the right for

the percentage of used features. Constant is the weight of the related goal: accuracy or AUC

and subset size.

24

25

CHAPTER 3

Fish school search

The novel Binary Fish school search algorithm, formulated and presented in this work, was

created based on the optimization search algorithm: Fish school search (FSS), invented by C. Bastos

Filho and F. Lima Neto in 2007, [17]. In this topic, the original Fish school search optimization algorithm

is presented, based on [17], as well as the formulation of the decimal to binary system fish school

search algorithm (D2BFSS).

3.1- Original Fish school search

Several oceanic fish species, as well as other animals, present social behaviour. This

phenomenon’s main purpose is to increase mutual survivability and may be viewed in two ways: (i) for

mutual protection and (ii) for synergistic achievement of other collective tasks. Here, protection means

reducing the chances of being caught by predators; and synergy, refers to an active mean of achieving

collective goals such as finding food.

Apart from debating whether the emergent behaviour of a fish school is due to learning or

genetic reasons, it is important to note that some fish species live their entire lives in schools. This

reduces individual freedom in terms of swimming movements and increases competition in regions

with scarce food. However, fish aggregation is a fact and the benefits largely outweigh the drawbacks.

Along with the development of this technique the authors have taken great care not to depart

from the original inspiration source, but FSS contains a few abstractions and simplifications that have

been introduced to afford efficiency and usability to the algorithm. The main characteristics derived

from real fish schools and incorporated into the core of the approach are sound. They are grouped

into two observable categories of behaviours as follows:

• Feeding: inspired by the natural instinct of individuals (fish) to find food in order to grow strong

and to be able to breed. Notice that food here is a metaphor for the evaluation of candidate

solutions in the search process. An individual fish is considered to be able to lose as well as to

obtain weight, depending on the regions it swims in;

http://sites.google.com/site/carmelofilho/

http://sites.google.com/site/carmelofilho/

http://www.fbln.pro.br/

26

• Swimming: the most elaborate observable behaviour utilized in this approach. It aims at mimicking

the coordinated and the only apparent collective movement produced by all the fish in the school.

Swimming is primarily driven by feeding needs and, in the algorithm, it is a metaphor for the search

process itself.

3.1.1-Search Problems and Algorithms

Although there are several approaches for searching, there is, unfortunately, no general

optimal search strategy [40]. Thus, solving search problems is sometimes more of an art form than an

engineering practice. Although custom-made algorithms are valuable options for specific problems, a

more generalized automatic search engine would be a great bonus for tackling problems of high

dimensionality. Search problems can be highly varied. For example, they can be classified into two

groups with regard to the structure of their search-space: structured or unstructured. For the former,

there are many traditional techniques that are, on average, quite efficient. The same observation does

not apply to the latter, that is, there is no overall good approach for search spaces on which there is no

prior information.

The FSS can be a valuable option for searching in high dimensional and unstructured spaces.

3.1.2-FSS Computational Principles

The search process in FSS is carried out by a population of limited memory individuals – - the

fishes. Each fish represents a possible solution to the problem. Similar to PSO or GA, search guidance

in FSS is driven by the success of some individual members of the population.

The main feature of the FSS paradigm is that all fish contain an innate memory of their

successes – their weights. In comparison to PSO, this information is highly relevant because it can

obviate the need to keep a log of the best positions visited by all individuals, their velocities and other

competitive global variables. Another major feature of FSS is the idea of evolution through a

combination of some collective swimming, i.e. “operators” that select among different modes of

operation during the search process, on the basis of instantaneous results.

As for dealing with the high dimensionality and lack of structure of the search space, the

authors of the algorithm [17], believed that FSS should at least incorporate principles such as the

following:

(i) Simple computation in all individuals;

(ii) Various means of storing distributed memory of past computation;

(iii) Local computation (preferably within small radiuses);

(iv) Low communication between neighbouring individuals;

27

(v) Minimum centralized control (preferably none); and

(vi) Some diversity among individuals.

A brief rationale for the above-mentioned principles is given, respectively: (i) this reduces the

overall computation cost of the search; (ii) this allows for adaptive learning; (iii), (iv) and (v) these keep

computation costs low as well as allowing some local knowledge to be shared, thereby speeding up

convergence; and finally, (vi) this might also speed up the search due to the

differentiation/specialization of individuals. These principles, incorporated in FSS, led the authors to

believe that FSS could deal with multimodal problems.

3.1.3-Overview of the algorithm

The inspiration mentioned, together with the principles just stated above, were incorporated in

the approach in the form of two operators that comprise the main routines of the FSS algorithm. To

understand the operators, a number of concepts must be defined.

The concept of food was considered as related to the function to be optimized in the process.

For example, in a minimization problem the amount of food in a region would be inversely

proportional to the function evaluation in this region. The “aquarium” is defined by the delimited

region in the search space where the fish can be positioned. The operators were grouped in the same

manner in which they were observed when drawn from the fish school, defined as follows:

Feeding: food is a metaphor for indicating to the fish the regions of the aquarium that are likely

to be good spots for the search process;

Swimming: a collection of operators that are responsible for guiding the search effort globally

towards subspaces of the aquarium that were collectively sensed by all individual fish as more

promising with regard to the search process.

3.1.4-The Feeding Operator

As in real situations, the fish of FSS are attracted to food scattered in the aquarium in various

concentrations. In order to find greater amounts of food, the fish in the school can move

independently (see individual movements in the next section).

As a result, each fish is allowed to grow in weight, depending on its success or failure in

obtaining food. The authors proposed that fish’s weight variation be proportional to the normalized

difference between the evaluation of fitness function of previous and current fish position with regard

to food concentration of these spots. The assessment of ‘food’ concentration considers all problem

dimensions, as shown in (3.1):

28

(3.1)

where was the weight of the fish , the position of the fish and evaluated the

fitness function (i.e. amount of food) in .

A few additional measures were included to ensure rapid convergence toward rich areas of the

aquarium, namely:

o Fish weight variation is evaluated once at every FSS cycle;

o An additional parameter, named weight scale ( ) was created to limit the weight of a fish. The

fish weight may vary between 1 and .

o All the fish are born with weight equal to

.

3.1.5-The Swimming Operators

A basic animal instinct is to react to environmental stimulation (or sometimes, the lack of it). In

this approach, swimming is considered to be an elaborate form of reaction regarding survivability. In

FSS, the swimming patterns of the fish school are the result of a combination of three different causes

(i.e. movements).

For fish, swimming is directly related to all the important individual and collective behaviours

such as feeding, breeding, escaping from predators, moving to more liveable regions of the aquarium

or, simply being gregarious. This panoply of motivations to swim away inspired the authors [17] to

group causes of swimming into three classes: (i) individual, (ii) collective-instinct and (iii) collective

volition. Below further explanations on how computations are performed on each of them are

provided.

3.1.6-Individual Movement

Individual movement occurs for each fish in the aquarium at every cycle of the FSS algorithm.

The swim direction is randomly chosen. Provided the candidate destination point lies within the

aquarium boundaries, the fish assess whether the food density there seems to be better than at its

current location. If not, or if the step-size would be considered not possible (i.e. lying outside the

aquarium or blocked by, say, reefs), the individual movement of the fish would not occur. Soon after

each individual movement, feeding would occur, as detailed above.

For this movement, a parameter was defined to determine the fish displacement in the

aquarium called individual step ( ). Each fish moves stepind if the new position has more food

than the previous position. Actually, to include more randomness in the search process the individual

step is multiplied by a random number generated by a uniform distribution in the interval ,

29

represented as u in (3.1). In this simulation, the individual step was decreased linearly in order to

provide exploitation abilities in later iterations:

(3.1)

where is the number of the actual iteration and is the total number of iterations.

Fig. 3.1 shows an illustrative example of this swimming operator. One can note that just the

fish that found spots with more food have moved.

3.1.7-Collective-Instinctive Movement

After the individual movement, a weighted average of individual movement based on the

instantaneous success of all fish of the school is computed. This means that fish that had successful

individual movements influence the resulting direction of movement more than the unsuccessful ones.

When the overall direction is computed, each fish is repositioned. This movement is based on the

fitness evaluation enhancement achieved, as shown in (3.3).

(3.3)

where is the displacement of the fish due to the individual movement in the FSS cycle. Fig. 3.2

shows the influence of the collective-instinctive movement in the example presented in Fig. 3.1. One

can note that in this case all the fish had their positions adjusted.

Figure 3.1: Individual movement is illustrated here before and after its occurrence; red dots are fish

positions after and black dots are the same fish before individual movement. Fish with unsuccessful

individual movement are overlaid, showing only the position after the usage of this operator.

30

3.1.8-Collective- Volitive Movement

After individual and collective-instinctive movements are performed, one additional positional

adjustment is still necessary for all fish in the school: the collective-volitive movement. This movement

is devised as an overall success/failure evaluation based on the incremental weight variation of the

whole fish school. In other words, this last movement will be based on the overall performance of the

fish school in the iteration.

The rationale is as follows: if the fish school is putting on weight (meaning the search has been

successful), the radius of the school should contract; if not, it should dilate. This operator is deemed to

help greatly in enhancing the exploration abilities in FSS. This phenomenon might also occur in real

swarms, but the reasons are as yet unknown.

The fish-school dilation or contraction is applied as a small step drift to every fish position with

regard to the school’s barycenter. The fish-school’s barycenter is obtained by considering all fish

positions and their weights, as shown in (3.4).

Collective-volitive movement will be inwards or outwards (in relation to the fishschool’s

barycenter), according to whether the previously recorded overall weight of the school has increased

or decreased in relation to the new overall weight observed at the end of the current FSS cycle.

(3.4)

For this movement, a parameter called volitive step ( ) was defined as well. The new

position is evaluated as in (3.5) if the overall weight of the school increases in the FSS cycle; if the

overall weight decreases, (3.6) should be used.

(3.5)

Figure 3.2: Collective-instinctive movement is illustrated here before and after its occurrence; green dots

are fish positions after and red dots are the same fish before collective-instinctive movement.

31

(3.6)

where is a random number uniformly generated in the interval . We also decreased the

linear stepvol along the iterations.

Fig. 3.3 shows the influence of the collective-volitive movement in the example presented in

Fig. 3.1 after individual and collective-instinctive movements. In this case, as the overall weight of the

school had increased, the radius of the school diminished.

3.1.9-FSS Cycle and Stop Conditions

The FSS algorithm starts by randomly generating a fish school according to parameters that

control fish sizes and their initial positions.

Regarding dynamic, the central idea of FSS is that all bio-inspired operators perform

independently from each other. The FSS search process is enclosed in a loop, where invocations of the

previously presented operators will occur until at least one stop condition is met. Stop conditions

conceived for FSS are as follows: limitation of the number of cycles, time limit, maximum school radius

and maximum school weight.

Below, the pseudo-code for the Fish School Search Algorithm is presented. In the initialization

step, each fish in the swarm has its weight initialized with the value

and its position in each dimension initialized randomly in the search space.

Figure 3.3: Collective-volitive movement is illustrated here before and after its occurrence; pink dots are

fish positions after and green dots are the same fish before collective-volitive movement. The position of

the barycentre is represented by the blue dot.

32

Algorithm Fish School Search

1. Initialize fish in the swarm

2. While maximum iterations or stop criteria is not attained do

3. for each fish I in the swarm do

a. update position applying the individual operator

calculate fish fitness

if

else

b. apply feeding operator

update fish weight according to (3.1)

c. apply collective-instinctive movement

update fish position according to (3.3)

d. apply collective-volitive movement

if overall weight of the school decreases in the cycle

update fish position using (3.5)

elseif overall wight of the school decreases in the cycle

update fish position using (3.6)

end for decrease the individual and volitive steps linearly

end while

3.1.10-Illustrative Example

This section presents an illustrative example (presented in [17]) aimed at better understanding

of how FSS can be used and, ultimately, how it works. The selected example considers a small school

and a very simple problem that is three fish are set to find the global optimum of the sphere function

in two dimensions. The sphere function is presented in (3.7) and its parameters are: (i) feasible space [-

10,10], (ii) number of iterations equal to 10, (iii) = 10, (iv) initial stepind = 1, (v) final stepind = 0.1,

(vi) initial stepvol = 0.5, (vii) final stepvol = 0.05. Table 3.1 includes initial values associated with the

experimental fish school; Fig. 3.4a presents start-up loci of all fish.

(3.7)

33

Fish wight position fitness

# 1 5 (9,7) 130

# 2 5 (5,6) 61

# 3 5 (8,4) 80

Initial conditions

After initialization, all fish are free to check for new candidate positions that are generated by

the individual movement operator. Assuming that these positions were x1 = (9.6,6.2), x2 = (4.6,4.4) and

x3 = (6.2,4.2), and the associated fitnesses f (x1) = 130.6, f (x2) = 40.52 and f (x3) = 56.08, one should

notice that fish #2 and fish #3 found best positions, whereas fish #1 did not move. The positions after

the individual movement were then x1 = (9,7), x2 = (4.6,4.4) and x3 = (6.2,4.2). Fig. 3.4b illustrates the

individual movement of the three fish in search space for the sphere problem.

According to this model, the next operator to be computed should be feeding. As fish #1

remained in the same position, it would not change its weight. The weight of fish #2 and fish #3 would

change according to (3.1). The weight variation depends on the maximum fitness change. The

maximum fitness variation in this case was achieved by fish #3 and is equal to 23.92. As a result, fish #3

increased its weight by 1 unit and its new weight became 6. The fitness variation of fish #2 was 20.48.

Dividing the fitness variation of fish #2 by maximum fitness change, concluding that the weight

variation of fish #2 is 0.86. The new weight of fish #2 is then 5.86. Following the model, the third

operator to be computed would be the collective instinctive one. This operator evaluates the collective

displacement of the fish school considering the individual fitness variations and the individual

movement according to (3.3). As fish #1 stayed in the same position, it would not influence the overall

calculation. Considering the values obtained in this iteration, the displacement was (-1.2,-0.6). This

vector applies to all the fish (including fish #1), so the new positions, after third operator

computations, were x1 = (8.4,5.6), x2 = (3.4,3.8) and x3 = (5,3.6).

Then the fitnesses, regarding new positions recalculations, were 101.8, 26 and 37.96 for fish #1,

#2 and #3, respectively. The individual displacement of all fish due to collective-instinctive operator is

presented in Fig. 3.4c. The interested reader may find it interesting to compare Fig. 3.4b and Fig. 3.4c.

The last operator to be considered in this example is the collective-volitive one. For that, one

has to obtain the instantaneous value of the barycenter of the fish school according to (3.4). In this

case, the barycenter was (4.96,4.25). Notice that the weight of whole school has increased, therefore a

contraction instead of a dilatation was the implicit decision of the school (i.e. collective-volitive). By

means of using (3.5), the new positions were x1 =(5.81,4.89), x2 =(4.02,3.98) and x3 =(4.98,3.92). The

barycentre and the collective-volitive movement for this step are presented in Fig. 3.4d.

At this point, the algorithm tests if valid stop-conditions are met. Obviously it was not the case

yet, thus a new cycle began as explained above. If one compares the initial and final positions

Table 3.1: Initial conditions for the three fish in the sphere exemple,[17].

34

illustrated in Fig. 3.4, after this first iteration, the reader can observe that all fish are closer to the

optimum point (0,0).

Of course the optimum point is unknown to the algorithm. However, in a very peculiar manner

the FSS model assures fast convergence towards it (i.e. the goal for the search process) because of the

above mentioned natural principles instantiated in the FSS algorithm

Figure 3.4: Example with three fish in the sphere example: (a) Initial position, (b) individual

movement, (c) instinctive collective movement and (d) collective-volitive movement

35

a)

d)

b)

e)

c)

h)

f)

i)g)

In order to illustrate the convergence behaviour of the fish school along the iterations, the

simulation results for the sphere function are presented. For these simulations were used 30 fish, [-

100,100] in the two dimensions, initialization range [0,100] in the two dimensions, wscale= 500, initial

stepind = 1, final stepind = 0.1, initial stepvol = 0.5, final stepvol = 0.05. Fig. 3.5 shows the fish positions

after iteration (a) 1, (b) 5, (c) 10, (d) 20, (e) 30, (f) 40, (g) 50, (h) 100 e (i) 200, respectively. One can note

that the school was attracted to the optimum point (0,0).

Figure 3.5: Fish school evolution after iteration (a) 1, (b) 5, (c) 10, (d) 20, (e) 30, (f) 40, (g) 50, (h) 100

and (i) 200 for sphere function with 30 fish

36

3.2- Decimal to binary Fish school search

The FSS algorithm, described in section 3.1, is a high dimension search optimization process in

continuous space. The first intuitive and logical approach was to use the FSS continuous algorithm to

search a one dimension integer number that would then be transformed, in the objective function, in a

vector of binary input with the dimension equal of the number of the features to be selected. To do so,

the decimal to binary system representation was chosen.

This vector of binary input would be then be used as the encoding of the BPSO algorithm,

presented in section 2.3.2.

This relatively simple representation allows the original algorithm, to select features without

major changes in the original algorithm. All the operators were used as described in section 3.1. The

only modification needed was to round the position of each fish to its nearest integer. Thus, it was only

necessary to use one dimension on the search space to search the decimal system representation of

the solution. This approach was called the D2BFSS algorithm.

3.2.1-Objective function

In order to evaluate the fitness of the decimal system solution (position of the fish), the integer

solution was transformed to its binary representation, which was a vector of 0 or 1 bits with dimension

equal to the maximum number of features to be selected.

Inspired by the objective function used on [16], presented in section 2.3.2, the following fitness

function was used to describe the performance of the selected features during the FS process:

(3.7)

where represents the number of features selected and the total number of features while the

value accounts for the performance measure of the test set. The value varies between 1 and 0. If

the right side of (3.7) was not used ( ), there would not be a restriction to the number of features

selected by the algorithm.

37

CHAPTER 4

Binary Fish School Search

It is important to note that the D2BFSS approach, section 3.2, does not manipulate the vector

of bits (feature selected) in its internal mechanisms (the FSS movement operators). Thereby, the

decimal to binary system approach may have problems with convergence, low performance or even be

a random search.

With this concern in mind, it was decided to modify the internal mechanisms of the FSS

algorithm to manipulate binary inputs himself. The following sections describe the modifications to the

fish school search algorithm, emerging the binary fish school search.

4.1- Encoding

There are various ways of encoding a problem solution, the encoding presented here was

inspired in [16], similar to section 2.3.2. An example of a possible state (position of a fish) is

represented by the sequence:

(4.1)

Where is the total number of features to be selected. Each bit indicates whether or not a

feature is selected. This binary scheme, offers a straightforward representation of a feature subset,

allowing the algorithm to search through the workspace, adding or removing features, simple by

flipping bits in the sequence.

While the FSS algorithm was not originally developed in the context of binary encoding, it

appeared to be possible to modify the real to a binary encoding, keeping the following principles:

to follow the internal mechanisms of the original algorithm , without losing the meaning of

each operator;

to add few additional parameters;

to ensure the convergence of the algorithm;

to keep simplicity and understanding to the modifications.

38

In the next sections, the modifications made to each of FSS internal mechanisms are

presented.

4.2-Initialization

For each fish , the initial position was initialized randomly by doing:

(4.2)

where is a random number uniformly generated in the interval , the number of fishes and

the total number of features to be selected.

By doing this, the algorithm starts with completely random positions, being the number of

features selected at the start around . If the initial number was too small, the algorithm might not

converge freely along iterations.

4.3-Individual Movement

The Individual movement occurs once in every cycle of the BFSS. For each fish , and for each

bit , if a random number (uniform distribution in the interval [0,1]), is smaller than Sind(t) the bit will

flip, otherwise it will not change:

(4.3)

Parameter Sind, in the same way as the FSS, will decrease linearly along the iterations

depending on the first value and the last value of stepind. This allows a soft convergence through the

iterations.

A fish will move if the new position has more food than the previous position, i.e. if the fitness

function of new set of features selected (new position) has a better performance than the previous

one. By doing this, the random exploration of each individual fish is preserved.

4.4-Collective-Instinctive Movement

After the individual movement, the weighted average of the individual movements, based on

fishes that had moved, is calculated. This process was executed in the same way as the FSS, equation

(3.1).

In order to make all fishes head to the direction of the successful individual movement

position some changes had to be made to the original FSS algorithm.

39

When dealing with positions with bits (values 0 or 1), equation (3.3) loses its meaning. The

displacement of the fish, in equation (3.3), can no longer be quantified correctly using the

discrete flipping of a bit.

For that reason, equation (4.4) was used to describe the resultant position of the overall

successful of the individual movement:

(4.4)

In (4.4), in (3.3) was replaced with . In this approach, the use of the actual position

of the fishes that had success in the individual movement is seen as being more descriptive than the

flipping of bits.

The resulting vector has the same dimension as the positions of the fishes, but with values

varying between 0 and 1. As an illustrative example, (4.5) represents a possible configuration of :

(4.5)

The goal of the Collective-Instinctive Movement operator is to attract each fish to the resultant

direction of the individual movement operator. In the Binary Fish School Search each fish approaches .

To do so, it is necessary for it to have bit format, two options were here considered to transform in a

bit vector:

a) Using a constant threshold in all iterations– if the values of the bits of were below the

parameter , they would be considered 0, otherwise 1.

For example, if the value of was used in the example (4.5), the resultant vector would

be:

(4.6)

The problem of using a constant threshold in all iterations is that, depending on the evolution of

the FS process, could be formed of only 0s, i.e. all the values of lower than . In

addition, if in any iteration the algorithm favoured a certain feature, it could happen that the

algorithm loses the exploration abilities in later iterations. If this occurred, it would introduce trends

and convergence to local maxima.

b) Using an adaptive threshold for each iteration: multiplying the parameter by the max value

of . The resultant value of this multiplication would then be used as threshold in the current

iteration for this operator.

For the example (4.5), if the parameter was 0.4, the threshold used in this iteration would be

, considering 0.7 the max value of (4.5), resulting in:

(4.7)

40

Therefore in b), for each iteration , the threshold to compute binary vector was calculated

using the max value of . This allowed the algorithm to select at least 1 feature, less likely to

incurring the problem described in the option a).

The study to these two options is presented in chapter 5.

After the computation of in bit format, all fish position could now tend to . To do so,

the position of each fish was compared with . One bit (randomly chosen) of the fish that did not

have the same value as was flipped. This process approaches the position of each fish to . In

comparison with the original algorithm, no longer represents the direction but the position

resultant of the successfully individual movements.

By only flipping one bit per fish, a soft and steady convergence of the algorithm is expected. An

illustrative example can be represented:

(4.8)

In (4.8) the fish moved in the direction of . The bits with the same values of are

represented in red. The resultant position, , is achieved by flipping one random bit that has

different values, represented in green. The total number of bits in red in the new position is greater

than the one in the position before the collective-instinctive movement, making the new position of

the fish to be closer to .

4.5-Collective-volitive Movement

Similarly to the Collective-Instinctive Movement operator, the Collective-volitive operator

underwent some changes. The main goal of this operator is, depending of the success of the individual

movement, to contract or dilate the fish position to or from the barycentre.

The barycentre was computed in the same way as in the FSS algorithm (3.4). Analogously to

the computation of the vector , after (3.4) the barycentre was not obtained in a bit format. Thereby,

two options were also considered to transform the barycentre to a bit format:

a) Using a constant threshold through iterations:

b) Using the adaptive threshold for each iteration: multiplying with the max value of

barycentre.

If the overall individual movement was a success (overall weights improved in the iteration)

each fish would approximate to the barycentre. Similarly to the process in the Collective-Instinctive

Movement operator, section 4.4, every bit per fish were compared to the barycentre. One bit (chosen

randomly) that was not the same value as the barycentre was then flipped. By making only one flip per

fish, the algorithm enables a soft directing from the previous position to the new one, closer to the

41

barycentre. An illustrative example to the case of the improvement of the overall weights (contraction)

is shown:

(4.9)

In (4.9), fish changed randomly one of its bits that were different (green) from barycentre

( ). This allowed the fish to approximate to the baricenter.

If the overall weights had not improved, each fish has to move to the opposite direction of the

barycentre. To do this, the concept of anti-barycenter is introduced, consisting of a vector with the

same dimensions as the barycentre but with flipped bits. In this situation, the process is the same as

described above for the case of contraction to the barycentre but using the anti-barycentre. In (4.10)

the representation of the case of not improvement of the overall weights of the example (4.9) is

presented:

(4.10)

In (4.10), the fish new position is obtained comparing each bit with the anti-baricenter

of the barycentre presented in (4.9). One of the bits with different values (green), was flipped, making

the new position of the fish to be closer to the and consequently further to in (4.9).

With the one bit flip mechanism, the barycentre could no longer be seen as a possible solution

(as is FSS algorithm) but as a point of reference to guide the fishes in the contraction or dilation

process. The best solution per iteration would now be selected by the fish with the best performance

after the collective-volatile movement.

After the collective-volitive movement, a new cycle begins.

4.5-Objective function

Although some of the parameters of the BFSS algorithm influence the final number of features

selected (use of thresholds), the process of developing an objective function is critical, since it serves as

guidance in search of the optimum.

The fitness function was defined as in [16], being the goal its maximization. The most suitable

representation to the proposed task is shown:

(4.11)

where is the classifier performance measure (ACC or AUC, depending on the database) , the

number of features selected and is the total number of features to be selected. The term on the left

side of the equation accounts for the overall accuracy of the model while the term on the right for the

percentage of used features. Note that both terms in the objective function are normalized. Constant

42

FSS

•No. of fishes

•No. of iterations

•Wscale

•stepind [inital and final]

•stepvol [inital and final]

D2BFSS

•No. of fishes


•Wscale

•stepind [initial and final]

•stepvol [initial and final]

•α

BFSS

•No.of fishes


•Wscale

•stepind [initial and final]

•thres_c

•thres_v

•α

defines the weight of the related goal, performance and subset size. The constant α is a

parameter of the algorithm and varies depending the total number features to be selected ( ) and

the desired number of features selected.

4.7-Parameters

The choice of the set of parameters is a crucial step in wrapper search methods. If the set is

not the most suitable, the predictor will underperform, which might mislead the search algorithm.

When performing the modifications presented above, some parameters were introduced. Fig.

4.1 summarises the set of parameters used in the FSS, D2BFSS and BFSS algorithms.

It is known that the more parameters an algorithm uses, the more time is taken for parameters

estimation and the greater the complexity in the process. The approach taken, as well as the resultant

set of parameters selected, is expected to be able to achieve convergence, and although in a more

subjective way, maintaining the meaning of each operator in the original FSS algorithm.

4.8- BFSS cycle and stop condition.

In the same way as the FSS algorithm, the BFFS starts by randomly generating a fish school

(features selected). In general, the cycle is similar to the FSS, being the main differences the

modifications to each internal mechanism (operators). In addition, instead of using the position of the

barycentre as the best solution in the iteration, the BFFS uses the fish with the maximum fitness

function.

Regarding the stopping criterion, the followings could be used: time limit, maximum school

weight and maximum number of iteration reached (used in all the experiments here presented).

Figure 4.1: Parameters for the original FSS, the decimal to binary FSS and the binary FSS

43

Chapter 5

Results

The main objective of this chapter is to evaluate the applicability of the proposed search

optimization algorithms. These methods combine the machine learning algorithm, introduced in

section 2.2, with the state of the art search algorithms presented in section 2.3 and the formulated in

chapters 3 and 4, using the approach described in the section 5.1. They will be compared with each

other and with the results obtained without FS, based on the predictive performance and the number

of selected features.

For each of the two databases considered in this work, a study was made, so as to ascertain

the parameters of the optimization algorithms to the feature selection problem.

The implementation of the algorithms and the obtainment of the results were made with

Matlab ® R2010a.

5.1-Description of the approach

The use of a learning machine in wrapper methods, so as to evaluate subset suitability,

involves a correct feature subset assessment. The process described in this section, was preformed for

each of the databases considered in this work.

The data was firstly divided in two groups, the feature selection (FS) subset and the model

assessment (MA) subset. This division was random but with the same percentage of each class in each

subset, i.e. 50% of the samples belong to the FS and the other 50% to the MA subset, and both groups

had the same percentage of samples for each class considered.

The FS subset was divided in 70% of the samples for training set and 30% for testing set. This

division was also performed randomly and with the same percentage of each class in each set. The

feature selection was then accomplished and, after the stop criterion was reached, the model with the

best performance was selected. The features selected, as well as its threshold, were then recorded and

a 10-fold cross validation was performed to the MA subset.

44

The k-fold cross validation consists of dividing data into k subsets, using at each time k−1 for

training and the other set for testing. Each subset had been divided to have the same percentage of

samples for each class. For k times, models were trained and tested using the recorded features and

threshold, until all possible combinations of training and testing sets were covered. The results are the

average of the performance measure, introduced in section 2.2.3, on all splits. The mean and the

standard deviation of the AUC, sensitivity and specificity and accuracy are then reported. The k-fold

cross validation allows the evaluation of the validity and robustness of the discovered model by

assessing how the resulting model from feature selection would generalize to an independent data set

(MA subset).

To reduce variability, introduced by the division of the data not only in the FS/MA subset but

also in the division of train/test subsets, 10 rounds of the 10-fold cross validation process were

performed always using different partitions. The round with the best performance was selected, and

the performance measures of that round described the model created with the selected set of features.

Fig. 5.1 summarizes the process.

5.2-Optimization Parameters

In order to choose an appropriated set of parameters to the optimization algorithm, it was

important to do a study of the parameters to be used. Recalling that D2BFSS and BFSS were never

tested before, several measures were chosen to select a fair set of parameters and to evaluate the

internal dynamic of the proposed algorithms. These measures, that will be called indicators, used the

results of the best solution in the FS process and also the MA results:

FS best fitness: encompassing the performance of the best model and the number of features

selected in all iterations of the FS process. It ranged from 0 to 1, the higher the better. Calculated

with (2.11), (3.7) or (4.11) depending on the FS algorithm.

Number of feature selected: the lower the better.

Figure 5.1: Diagram of the whole process: 1) selecting features with the FS subset 2) validation of the

models created with the features selected with the MA subset.

45

Performance of the best model in the FS: the performance of the best model in the FS process,

the ACC in the case of the sonar base and the AUC in the readmission databases. The higher the

better.

Performance of the MA: the 10-fold cross validation result using the MA subset with the features

selected using the FS algorithm. The mean ACC for the sonar database and mean AUC for the

readmission databases. The higher the better.

Iteration of the optimization algorithm with the best solution: Although it can occur, the

optimization algorithm is not supposed to find the best solution in the first’s iterations. The

algorithm should evolve and converge to a better solution. Can vary between 1 and the max

iterations used in the FS process. Low values for this indicator are considered lower quality

solutions.

Percentage of contraction in all iterations: the percentage of all iterations in which, in the

collective-volitive operator, the algorithm contract. This allows the analysis of the internal

behaviour of the algorithm. If the percentage is 0%, the algorithm only expanded and with 100%

the algorithm contract in all iterations. Neither 0% nor 100% are favourable to the correct

execution of the algorithm, to the general level of convergence (0%) and convergence to local

maxima (100%).

Number of repetitions of the same position of the barycentre: this measure allows the verification

of correct function of the internal mechanisms of the algorithm. Varies between 0 and the

maximum number of iterations. The limits are considered as lower quality solutions.

The plot: the result of the visual analysis of the graphical evolution of the best solution per

iteration in the FS process. This graph is supposed to show the convergence of the algorithm

along its iterations, as well the oscillation near the local maxima. Classified as – (bad conjugation),

+ (good) and ++ (very good).

The optimization algorithm should search the space for the solution (exploration) and, in the

same time, converge to a good solution (exploitation). The three last measures help the algorithm

developer to understand what is happening in the internal dynamics of the algorithm, and to achieve a

good ration of exploration and exploitation.

The D2BFSS and BFSS algorithms, here formulated, do not guarantee in advance a

convergence evolution or a good performance of the model created so, it is crucial to consider as

many factors as possible to ensure the correct function of the algorithm.

46

100 rounds AUC ACC % sensitivity specificity AUC ACC % sensitivity specificity threshold features selected

mean 0.73 73.19 0.77 0.69 0.12 12.05 0.18 0.21 0.48 8

std 0.04 4.14 0.07 0.08 0.03 2.59 0.05 0.05 0.05 4

best round 0.81 81.51 0.87 0.75 0.15 15.13 0.13 0.24 0.45 9

mean standard deviation

10-fold cross validation

5.3-Sonar database

As described in section 2.1, the sonar database consists of 208 samples with 60 features and

two classes. As the number of samples for each class was nearly the same, the main performance

measure to be maximised was the accuracy. Due to the number of features to be selected and the

number of samples it was decided that, after the study of the optimization parameters, 100 rounds of

the whole process (FS+MA) would be simulated with different partitions of FS/MA. The mean, the

standard deviation and the best model created, in the 100 rounds, would then be used to compare the

different feature selection algorithms. The same partitions of the data were used for the 100 rounds

between the different algorithms, so that the comparison of the results would be fair.

5.3.1-Sequential Forward Selection

After consulting [16,21], it was verified that previous studies that used the SFS applied to the

sonar database did not select more than 15 features. The SFS stop criteria was chosen as 15 features

selected.

Since the SFS does not have parameters, a study of the parameters was not necessary, and 100

rounds of the process described in section 5.1 were computed. Table 5.1 shows the results of the mean

and standard deviation results of the 10-fold cross validation of the 100 rounds and also the model

with better performance. The selected threshold and number of features in each round were also

recorded.

The SFS algorithm allows the visualization of the quality of each feature subset selected by the

algorithm, Fig. 5.2 summarizes graphically the process of FS for the best model of the 100 rounds.

After the addition of the ninth feature (marked in green in Fig. 5.2), the performance of the models

created with the additional feature didn’t improve.

Table 5.1: Model assessment results - SFS method using sonar database.

47

5.3.2 Decimal to Binary Fish School Search

As described in section 4.3.2, the D2BFSS algorithm had the same parameters as the Fish

School Search genetic algorithm, with the addition of the parameters α (see Fig.4.1). The selected stop

criterion was the number of iterations: 300 iterations.

To maintain consistency in the comparison of the results between different FS algorithms, the

same number of fish/ particles (30) and the same partition of the data (FS/MA and train/test) was used,

as well as the same stop condition for the search process.

In every study of the parameters the same initial position of the fishes/particles was utilized.

This reduced the variability of the results and ensured a coherent analysis of the parameters of the

optimization algorithm.

According to the examples in [17], the most sensitive parameters of the FSS algorithm are the

stepind (initial and final) and the stepvol (initial and final), these were the first two parameters to be

selected in this study. The values of the variation of the parameters stepind (initial and final) and stepvol

(initial and final), are presented in Table 5.2. These values were extrapolated from the ones used in the

examples presented in [17]. The initial values for these two parameters were considered the total

number of possible solutions for the feature selection problem (2^60~=1e18), similar to [17], and the

final values were varied.

The combination of the parameters Stepind and Stepvol ([1e18 1e3] and [1e18 1e5], respectively

) were chosen mainly because of the low number of features selected and the better values for the

mean ACC of crossvalidation.

It is important to note that, for all combinations presented in Table 5.2, the percentage of

contraction, the number of equal barycentre and the plot accounted a low performance of the internal

function of the algorithm.

Figure 5.2: Evolution of ACC with the step-wise inclusion of each individual feature.

48

Stepind Stepvol Wscale α Fitness FS Features selected ACC FSMean ACC

crossvalidation

%

contraction

No. of same

postition of

baricenter

Plot

5000 0.1 0.78 12 0.63 0.77 1.00 0 -

5000 0.3 0.8 14 0.88 0.76 1.00 0 -

5000 0.5 0.81 21 0.97 0.77 1.00 0 -

5000 0.7 0.85 22 0.94 0.71 1.00 0 -

5000 0.9 0.94 22 0.97 0.73 1.00 0 -

5000 0.1 0.77 14 0.75 0.68 1.00 0 -

5000 0.3 0.8 15 0.91 0.73 1.00 0 -

5000 0.5 0.81 13 0.84 0.72 1.00 0 -

5000 0.7 0.85 21 0.94 0.75 1.00 0 -

5000 0.9 0.93 26 0.97 0.74 1.00 0 -

5000 0.1 0.76 14 0.72 0.73 1.00 0 -

5000 0.3 0.82 13 0.91 0.74 1.00 0 -

5000 0.5 0.8 16 0.88 0.82 1.00 0 -

5000 0.7 0.85 16 0.91 0.74 1.00 0 -

5000 0.9 0.91 19 0.94 0.71 1.00 0 -

[1e18 1e3] [1e18 1e5]

[1e18 1e8] [1e18 1e12]

[1e18 1e12] [1e18 1e16]

Study of the parameter: stepind e stepvol [initial and final value]

Stepind Stepvol Wscale α Fitness FS Features selected ACC FSMean ACC

crossvalidation

%

contraction

No. of same

postition of

baricenter

Plot

50 0.1 0.76 13 0.62 0.71 0.41 0 -

50 0.3 0.79 14 0.87 0.72 0.44 0 -

50 0.5 0.79 17 0.87 0.82 0.4 0 -

50 0.7 0.83 19 0.9 0.75 0.4 0 -

50 0.9 0.9 23 0.93 0.73 0.41 0 -

500 0.1 0.78 12 0.68 0.79 1.00 0 -

500 0.3 0.79 15 0.9 0.75 1.00 0 -

500 0.5 0.81 15 0.87 0.8 1.00 0 -

500 0.7 0.84 22 0.93 0.73 1.00 0 -

500 0.9 0.92 27 0.96 0.69 1.00 0 -

5000 0.1 0.78 12 0.62 0.77 1.00 0 -

5000 0.3 0.79 14 0.87 0.75 1.00 0 -

5000 0.5 0.8 21 0.96 0.76 1.00 0 -

5000 0.7 0.84 22 0.93 0.71 1.00 0 -

5000 0.9 0.93 22 0.96 0.72 1.00 0 -

50000 0.1 0.75 15 0.78 0.74 1.00 0 -

50000 0.3 0.77 15 0.84 0.76 1.00 0 -

50000 0.5 0.81 18 0.93 0.75 1.00 0 -

50000 0.7 0.84 22 0.93 0.76 1.00 0 -

50000 0.9 0.91 18 0.93 0.74 1.00 0 -

[1e18 1e3] [1e18 1e5]

Study of the parameter: Wscale e α

[1e18 1e3] [1e18 1e5]

[1e18 1e3] [1e18 1e5]

[1e18 1e3] [1e18 1e5]

The combination of the parameters Stepind and Stepvol ([1e18 1e3] and [1e18 1e5], respectively

) were chosen mainly because of the low number of features selected and the better values for the

mean ACC of crossvalidation.

It is important to note that, for all combinations presented in Table 5.2, the percentage of

contraction, the number of equal barycentre and the plot accounted a low performance of the internal

function of the algorithm.

It was then proceeded the selection of the parameters Wscale and α, Table 5.3. The selected

parameters (Wscale=50 and α=0.1) were chosen mainly because of the indicators: percentages of

contraction, number of features selected and mean ACC of the crossvalitation values, presented in

Table 5.3. The tests using Wscale=50 were the only ones that the contraction to the barycenter did not

occurred in all iterations.

Table 5.2: Results of the study the parameters: stepind and stepvol ([initial final]), selected set at bold

Table 5.3: Results of the study the parameters: Wscale and α. Selected set as in bold.

49


mean 0.74 74.59 0.78 0.71 0.13 12.67 0.18 0.2 0.48 13

std 0.04 3.74 0.07 0.08 0.02 2.25 0.04 0.04 0.05 1

best round 0.84 83.8 0.88 0.8 0.11 11.61 0.16 0.19 0.45 15



With the set of parameters selected, 100 rounds were simulated, being the results presented in

Table 5.4.

All tests using D2BFSS presented low convergence (indicator: plot) during its graphical

evolution in the FS optimization algorithm. This random dynamic can be visualized in Fig 5.3, which

shows the evolution of the best fish per iteration, of the best model of Table 5.4.

It can be seen that the number of selected features is always around 20 along the 300

iterations. Convergence is not evident in the graphical evolution of the fitness, ACC and the number of

features selected.

5.3.3 Binary Fish School Search

Unlike the D2BFSS algorithm, there were no guidelines to test the parameters range of the

BFSS algorithm so, a wider approximation was taken. The tables with the results of the study of

parameters are presented in appendix A. Analogously to D2BFSS the first parameters selected were the

parameters and the values. The two structural options were also tested, the use or

not of the adaptive threshold in the collective and volitive operators, presented in sections 4.4 and 4.5.

Table 5.4:Model assessment results – D2BFSS method using sonar database.

Figure 5.3: Graphical evolution of the B2DFSS process of feature selection. Evolution of the fish with best

performance per iteration (above) and evolution of the number of features selected of the fish with the

best performance per iteration (below)

50

Test: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Thres_c 0.9 0.7 0.5 0.3 0.1 0.9 0.7 0.1 0.3 0.9 0.7 0.5 0.3 0.1 0.9 0.7 0.1 0.3

Thres_v 0.9 0.7 0.5 0.3 0.1 0.1 0.3 0.9 0.7 0.9 0.7 0.5 0.3 0.1 0.1 0.3 0.9 0.7

Non-adaptative threshold Adaptative threshold

In appendix A, the detailed results of the initial 18 tests are presented, each test corresponding

to a different set of the parameters and the . These tests used the same partition of

samples for FS/MA and train/test sets. The first 9 tests used the non-adaptive approach and others use

the adaptive threshold. The variation of parameters for each test is presented in Table 5.5.

Each test considered the variation of the Wscale parameter (5, 50, 500, 5000 and 2000), as well

as the variation of the parameters α (0.1, 0.3, 0.5, 0.7 and 0.9).

In order to help the visualization of the selection process, Fig. 5.4-5.8 outlines the results for

the 18 tests, each colour representing the different combinations of Wscale and α to the associated

parameters thres_c and thres_v of each test.

The most discriminate indicators (introduced in section 5.2) that were used to select the best

test were: the FS best fitness (Fig. 5.4), the number of features selected (Fig. 5.5), the percentage of

contraction of the volitive operator (Fig. 5.6), the number of equal positions of the barycenters (Fig.

5.7), iteration of the optimization algorithm with the best solution (Fig. 5.8), and the plot which can be

analysed in the detailed tables in appendix A.

Table 5.5: Configuration of the parameters thres_c and tresh_v for each of the 18 tests.

Figure 5.4: Graphical representation of the results of tests 1-18 for the indicator: FS best fitness. This

indicator indicates the fitness performance of the best model in the FS process, higher the best. The overall

best performance for the tests 10-18 is evident.

Figure 5.5: Graphical representation of the results of tests 1-18 for the indicator: number of features

selected. This indicator indicates the number of features selected in the FS process, lower the best. Tests

10-18 achieved slightly better results for this operator.

51

After analysing the results from Fig. 5.4-5.8, it was decided that the tests 12 to 16 were the

ones with the best configuration. The overall best performance of the tests with the adaptive threshold

(12-18) over the ones without it (1-9) was obvious. All the indicators were taken into consideration to

these conclusions however, the most incriminating one was the number of repetitions of the same

position of the barycentre. Only the test 12-16 performed well from the point of view of this indicator.

Recalling that the high number of repeated positions of the barycentre leads to local maxima and,

therefore, to a weaker overall performance of the algorithm. In the tables of detailed data in appendix

A, it also proved that the indicator plot achieved better performances for the tests 12-16.

After selecting the tests 12–16, it was decided to vary the value of the parameter Wscale to 5, 50,

500,5000,20000, 100000 and 1000000, in order to perform a wider analysis of the parameters Wscale.

The same strategy here taken to choose the best test. After looking at the detailed data the

decisive indicators for selecting the best test and respective set of parameters was the number of

repetitions of the same position of the barycentre (Fig. 5.9).

Figure 5.6: Graphical representation of the results of tests 1-18 for the indicator: percentage of

contraction in all iterations. This indicator indicates the percentage of contraction in the collective-

volitive operator, neither 0 nor 1 are favourable. Tests 10-18 achieved better results for this operator.

Figure 5.7: Graphical representation of the results of tests 1-18 for the indicator: number of

repetitions of the same position of the barycentre. This indicator indicates the repetition of the

position of the barycentre which represents the position where the fishes are going to contract or

expand in the collective-volitive operator, the limits are considered as lower quality solutions.

Tests 12-16 achieved better results for this operator.

Figure 5.8: Graphical representation of the results of tests 1-18 for the indicator: iteration with the

best solution in the FS algorithm. Results close to 0 are considered as low convergence configuration.

Tests 10-18 achieved better results for this operator.

52


mean 0.72 72.85 0.78 0.67 0.12 12 0.18 0.22 0.46 7

std 0.04 4.27 0.08 0.1 0.02 2.14 0.05 0.05 0.05 2

best round 0.85 85.62 0.95 0.76 0.09 9.29 0.09 0.16 0.43 8



thre_c thres_v Stepind Wscale α Fitness FS Features selected ACC FSMean ACC

crossvalidation

%

contraction

No. of same

postition of

baricenter

Plot

[0.5 0.2] 0.85 10 87.5 75.83 0.83 59 ++

[0.5 0.001] 0.88 9 90.63 73.85 0.88 91 ++

[0.2 0.001] 0.89 9 93.75 70.35 0.91 131 ++

[0.2 0.01] 0.87 12 93.75 74.83 0.89 109 ++

[0.01 0.001] 0.85 12 90.63 72.99 0.5 26 ++

Study of the parameter: Stepind [inital and final values]

0.7 0.3 100000 0.5

Tests 13 and 16 presented good results in this indicator. However, as shown in the detailed

data, the indicator plot of the test 16 proved to be the best. Next, the parameters Wscale and α were

selected within the test 16. This was made in a more precise way and looking into the detailed results,

concluding Wscale=100000. The parameters Wscale proved not to be as relevant as in D2BFSS. The last

parameter to be selected was the Stepind initial and final values, the results are shown in Table 5.6.

The final configuration (stepind=[0.01 0.001]) of parameters was mainly selected due to the

percentage of contraction and the number of the same position of barycentre.

The 100 rounds with the selected set of parameters were then simulated. The results are

presented in Table 5.7.

A plot of the graphical evolution of the FS process using BFSS is shown in Fig. 5.10, presenting

the fitness and the accuracy for the fish with best fitness performance per iteration. The number of

features selected is also presented.

Figure 5.9:Graphical representation of the results of tests 12-16 for the indicator: number of repetitions of

the same position of the barycentre. Tests 13 and 16 achieved better results for this operator.

Table 5.6: Results of the study the parameters: stepind. Selected set in bold.

Table 5.7: Model assessment results - BFSS method using sonar database.

53


mean 0.73 73.3 0.78 0.67 0.13 13.2 0.17 0.2 0.47 8

std 0.04 4.31 0.09 0.09 0.04 3.59 0.05 0.05 0.05 2

best round 0.82 82.67 0.92 0.73 0.1 9.28 0.12 0.15 0.45 7



The plot shows a more explicit convergence throughout the entire process of optimization in

comparison with the plot of the D2BFSS algorithm Fig. 5.3.

5.3.4 Binary Particle Swarm Optimization

Analogously to the FSS modified algorithms 300 iterations and 30 particles were used in the

BPSO simulations. The BPSO had already been applied to the sonar database in the feature selection

problem [64], the parameters suggested in [64] were used:

Vmax = 5

Pmut = no

Rmut= 0.5/Nf

Reset xsb=4

α = 0.7

The results of the 100 rounds are presented in Table 5.8.

Table 5.8: Model assessment results - BPSO method using sonar database.

Figure 5.10: Graphical evolution of the BFSS process of feature selection. Evolution of the fish with

best performance per iteration (above) and evolution of the number of features selected of the fish

with the best performance per iteration (below)

54


mean 0,74 74,66 0,77 0,71 0,11 11,51 0,17 0,19 0,48 60

std 0,03 3,59 0,054 0,05 0,02 2,53 0,04 0,05 0,041 0

best round 0,83 83,57 0,87 0,79 0,1 9,61 0,08 0,15 0,5 60

mean 0.73 73.19 0.77 0.69 0.12 12.05 0.18 0.21 0.48 8

std 0.04 4.14 0.07 0.08 0.03 2.59 0.05 0.05 0.05 4

best round 0.81 81.51 0.87 0.75 0.15 15.13 0.13 0.24 0.45 9

mean 0.74 74.59 0.78 0.71 0.13 12.67 0.18 0.2 0.48 13

std 0.04 3.74 0.07 0.08 0.02 2.25 0.04 0.04 0.05 1

best round 0.84 83.8 0.88 0.8 0.11 11.61 0.16 0.19 0.45 15

mean 0.72 72.85 0.78 0.67 0.12 12 0.18 0.22 0.46 7

std 0.04 4.27 0.08 0.1 0.02 2.14 0.05 0.05 0.05 2

best round 0.85 85.62 0.95 0.76 0.09 9.29 0.09 0.16 0.43 8

mean 0.73 73.3 0.78 0.67 0.13 13.2 0.17 0.2 0.47 8

std 0.04 4.31 0.09 0.09 0.04 3.59 0.05 0.05 0.05 2

best round 0.82 82.67 0.92 0.73 0.1 9.28 0.12 0.15 0.45 7

BPSO

standard deviation

NO FS

SFS

D2BFSS

BFSS


mean

The plot of the FS process for the best round is shown in Fig. 5.11.

After iteration ~70 the best particle does not change. This behaviour leads us to believe that

early convergence to a local maximum could be occurring.

5.3.5 Comparison of Feature Selection Methods

It is now possible to compare the performances of the several FS algorithms applied to the

sonar database. The results are summarized in Table 5.9. The results of the process described in

section 5.2 without using FS (no FS) are also presented. In this case the data was also divided in FS/MA

subsets but the FS subset was only used to find the threshold for the data.

The main measures of performance here compared were the mean accuracy (%) of the

crossvalidation process and the number of features selected.

Table 5.9- Comparison between the studied FS approaches using sonar database.

Figure 5.11: Graphical evolution of the BPSO process of feature selection. Evolution of the fish

with best performance per iteration (above) and evolution of the number of features selected of

the fish with the best performance per iteration (below

55

As expected, the FS algorithms using metaheuristics optimization algorithms (D2BFSS, BFSS

and BPSO) achieved better results in comparison with the SFS and no feature selection approaches.

Regarding the number of features selected, the D2BFSS method had lower performance in comparison

with the BFSS and the BPSO. This, and the fact that the D2BFSS presented a poor convergence

throughout the iterations in the FSS process (section 5.3.2), led to the conclusion that the use of the

decimal to binary system was not the best approach to achieve results similar to the state of the art

algorithms, reinforcing the need to modify the internal mechanisms of the FSS algorithm to deal with

binary input, the BFSS.

It can be concluded that, although the BPSO algorithm had selected 1 less feature in

comparison to the BFSS, the performance of BFSS algorithm greatly exceed the BPSO. It is believed

that the presence of the collective-volitive operator in the BFSS, which enables the fishes to contract or

expand per iteration, is the main tool allowing the algorithm to not converge to local maxima and,

thereby, obtaining better results.

5.4-Readmission database

After the processing of the MIMICII database [15], section 2.1.2, there were four different

datasets, each dataset including a different gradient of weights for the weighted mean, to characterise

the time series for each of the 22 variables considered per patient.

Each dataset is composed of 726 samples (patients) with 132 features. The goal is to classify

each of the patients in one of two classes: readmitted or not readmitted within 24 to 72h after

discharged. Since the number of samples is quite imbalanced in each class (12.3% readmitted and

87.7% not readmitted) the main performance measure to be maximized is the AUC. Due to the high

number of features to be selected and the high number of samples, in comparison to the sonar data

base , it was chosen to perform 500 rounds of the whole process (FS+MA), always with different

partitions of the data for the FS/MA and train/test sets. However, the same partitions between the

different FS algorithms were used, allowing the results to be comparable.

For each of the four datasets , the SFS, the BFSS and the BPSO algorithms were applied to the

Feature selection. Due to its low overall performance and lack of convergence in the sonar database,

the D2BFSS was discarded.

56

Gradient 500 rounds AUC ACC % sensitivity specificity AUC ACC % sensitivity specificity threshold features selected

mean 0.63 56.19 0.73 0.53 0.08 17.09 0.23 0.21 0.13 8

std 0.02 6.07 0.07 0.07 0.01 4.01 0.05 0.05 0.02 2

best round 0.68 64.57 0.73 0.63 0.12 19.88 0.13 0.22 0.15 10

mean 0.63 56.79 0.72 0.54 0.08 17.28 0.23 0.21 0.13 8

std 0.01 6.09 0.07 0.07 0.01 4.05 0.05 0.05 0.02 2

best round 0.69 61.28 0.79 0.59 0.11 15.59 0.18 0.17 0.15 9

mean 0.63 56.28 0.73 0.53 0.08 17.51 0.23 0.22 0.13 8

std 0.01 6.03 0.07 0.07 0.02 4.01 0.05 0.05 0.02 2

best round 0.68 51.37 0.91 0.45 0.08 9.35 0.11 0.10 0.10 10

mean 0.63 56.72 0.72 0.54 0.08 17.32 0.23 0.22 0.13 8

std 0.02 6.13 0.07 0.07 0.02 4.14 0.05 0.05 0.02 2

best round 0.68 66.43 0.71 0.65 0.10 14.59 0.14 0.16 0.15 9

0.6

0.9



0.1

0.4

Test: 1 2 3 4 5

Thres_c 0.5 0.3 0.1 0.9 0.7

Thres_v 0.5 0.3 0.1 0.1 0.3

Adaptative threshold

5.4.1 Sequential Forward Selection

The number of features selected in health care problems is crucial. Lower number of features

selected corresponds to simpler models, as announced in section 1.1.2. After the analysis of several

articles [15,16] that used fuzzy modelling to predict the readmission of ICU patients and feature

selection, it was decided to use the stop criterion of the SFS algorithm of 10 features selected. After

the spot criteria was met, the set of features selected with the best performance (AUC) was selected.

Table 5.10 shows the results of the 500 rounds for each of the datasets.

The results show that the use of different gradients to the weights of the weighted mean was

not significant. The number of features selected and the corresponded AUC measure did not allow the

retrieval of a definitive pattern of the best gradient.

5.4.2 Binary Fish School Search

The stop criterion selected to the metaheuristic algorithms (BFSS and BPSO) was the number

of iterations, 500 iterations with 10 (fish/particles). This selection was made taking into consideration

the max number of features to be selected.

Once again, a study was performed to select the parameters. The first parameters to be

selected were the and . The conclusions of the study of parameters made to the sonar

database were here used, only the 5 tests with significant better overall performance were here used to

select and parameters. The configuration of parameters in this 5 tests is presented in

Table 5.11.

Table 5.10: Model assessment results - SFS method using the readmission datasets with different gradients.

Table 5.11: Configuration of the parameters thres_c and tresh_v for each of the 5 tests.

57

These combinations of the parameters correspond to the best 5 configurations in the first

study of the parameters of the sonar database, (tests 12, 13, 14, 15 and 16 in appendix A, with the

adaptive threshold). The results of these 5 configurations, presented in section 5.3.3, were so evident

that there was no need to test again all the configurations.

As the four datasets only differ in the 22 features of the weighted mean, it was decided to do

the study of parameters for only one dataset. The detailed results for these 5 tests are attached in

appendix B.

The same strategy as in the sonar database was adopted for choosing the best test. For each

test, the parameters Wscale (values of 5, 50,500,5000,20000, 100000 and 1000000) and α (values of 0.1,

0.3, 0.5, 0.7 and 0.9) were tested.

After the observation of the detailed results, it was evident that the indicator that stood out

was the plot. The other operators prove not to be conclusive As an example, Fig 5.12 illustrates the

graphical representation of three indicators.

As the graphical visualization is a qualitative analyse and its goal is to exclude tests that have

obvious weak performance, it was not possible to conclude which was the test with better

performance. However, as shown in the appendix B, the plot indicator states that test 3 has a really

good convergence in the visualization of the graphical evolution in comparison with the other tests.

Figure 5.11: Graphical representation of the results of tests 1-5.

58


mean 0.61 53.32 0.72 0.51 0.07 20.52 0.27 0.27 0.13 5

std 0.03 7.99 0.10 0.10 0.02 5.79 0.07 0.08 0.03 1

best round 0.69 67.00 0.73 0.66 0.08 8.24 0.21 0.11 0.15 6

mean 0.61 53.85 0.71 0.51 0.07 20.95 0.27 0.27 0.13 5

std 0.03 8.35 0.11 0.11 0.02 5.69 0.07 0.08 0.03 1

best round 0.68 63.16 0.76 0.61 0.13 15.21 0.20 0.17 0.15 6

mean 0.61 52.85 0.72 0.50 0.07 20.98 0.27 0.27 0.13 5

std 0.02 8.19 0.10 0.11 0.02 5.70 0.07 0.08 0.03 1

best round 0.67 64.58 0.70 0.64 0.11 11.41 0.22 0.14 0.15 7

mean 0.61 53.69 0.71 0.51 0.07 21.35 0.28 0.28 0.13 5

std 0.03 7.73 0.10 0.10 0.02 5.74 0.08 0.08 0.03 1

best round 0.69 55.53 0.87 0.51 0.11 14.12 0.11 0.16 0.15 6

0.6

0.9



0.1

0.4

thre_c thre_v Stepind Wscale α Fitness FS Features selectedIteration with the

best model in FSACC FS

MeanAUC

crossvalidation

%

contraction

No. of same

postition of

baricenter

Plot

[0.5 0.02] 0.94 4 166 0.64 0.60 0.10 39 ++

[0.5 0.001] 0.94 4 166 0.64 0.60 0.11 39 ++

[0.2 0.01] 0.92 8 149 0.71 0.59 0.28 42 ++

[0.2 0.001] 0.95 1 96 0.54 0.54 0.27 44 ++

[0.01 0.001] 0.93 7 314 0.75 0.58 0.42 80 ++

Study of the parameter: Stepind [inital and final values]

0.1 0.1 500 0.1

The selection of the parameters Wscale and α was made observing the detailed data from test 3,

Wscale=500 and α=0.1. The measures that had major influence in this choice were the number of

features selected (low) and the performance of the best model in the FS process (high). The final

parameter to be optimized was the parameter Stepind. The results are presented in Table 5.12.

The final value of the Stepind of [0.01 0.001] was selected mainly because of the percentage of

contraction. Reminding that very low percentage of contraction in the volitive operator means no

convergence, low exploitation, and very high percentage means convergence to local maxima, low

exploration.

With a viable set of parameters selected, the 500 rounds were performed for the 4 datasets

with different gradients for the weights of the weighted mean, being the results presented in Table

5.13.

As occurred in the SFS results, the results of the usage of different gradients for the weighted

means prove that there was not a gradient that achieved significantly better performance. In title of

example, the graphical evolution of the FS process using BFSS and the dataset with gradient of 0.1 is

presented in Fig. 5.13.

The graphical evolution in Fig 5.13, unquestionably confirms the convergence of the novel

BFSS algorithm formulated in this thesis. In the first ~200 iterations there is a transitory dynamic,

where the BFSS algorithm mainly converges to a lower number of features. In the rest of the iterations,

the algorithm searches for the optimal combination of a low number of features and a high

performance of AUC.

Table 5.12: Results of the study the parameters: stepind ([initial final]). Selected as better in bold.

Table 5.13: Model assessment results - BFSS method using the readmission datasets with different gradients.

59

It is important to recall that the BFSS´s internal mechanisms, more specifically the collective-

volitive operator imposes a contraction (if the individual search is successfully) or an expansion (if the

individual search is unsuccessfully) which results in the oscillation of the graphical evolution. This effect

is highly beneficial in terms of avoiding convergence to local maxima, and still maintaining the

convergence.

5.4.3 Binary Particle Swarm Optimization

The BPSO had already been used in feature selection problems using datasets derived from

the MIMICII database. Even though the Shannon entropy and the weighted mean were introduced, the

overall characteristics of the database are very similar to those datasets used before [17]. Thus, it was

decided to use the parameters referred in [17] with the exception of the parameter α:

Vmax = 5

Pmut = no

Rmut= 0.5/Nf

Reset xsb=4

With the introduction of the Shannon entropy and weighted mean, the number of total

features to be selected was higher than in [17]. The parameter α is directly related to the final number

Figure 5.12: Graphical evolution of the BFSS process of feature selection. Evolution of the fish


the fish with the best performance per iteration (below)

60

α features selected Mean AUC crossvalidation

0.1 1 0.52

0.3 6 0.61

0.5 8 0.64

0.7 15 0.68

0.9 21 0.64

Study of the parameter: α


mean 0.62 53.60 0.73 0.51 0.07 17.53 0.24 0.23 0.14 10

std 0.02 7.18 0.09 0.09 0.02 5.50 0.07 0.07 0.03 3

best round 0.68 56.72 0.83 0.53 0.08 15.92 0.16 0.19 0.15 11

mean 0.53 68.34 0.33 0.73 0.11 7.91 0.20 0.09 0.15 10

std 0.04 7.05 0.14 0.10 0.03 2.29 0.06 0.02 0.02 3

best round 0.67 68.89 0.64 0.70 0.15 8.58 0.25 0.08 0.15 11

mean 0.62 53.80 0.73 0.51 0.07 17.85 0.25 0.23 0.14 9

std 0.02 6.51 0.08 0.08 0.02 5.53 0.07 0.07 0.03 3

best round 0.69 58.44 0.84 0.55 0.12 14.69 0.17 0.16 0.15 14

mean 0.59 57.66 0.62 0.57 0.08 14.60 0.23 0.18 0.14 10

std 0.05 9.75 0.21 0.14 0.03 6.58 0.07 0.09 0.03 3

best round 0.67 59.13 0.78 0.57 0.12 10.23 0.31 0.14 0.10 8

0.4

0.6

0.9



0.1

of features selected and is affected by the total number of features. It was necessary to do an analysis

to select the value of the parameter α. Table 5.14 shows the results of the tests. Once again, this

analysis was made only for the dataset with the gradient 0.1.

It was decided to use the α value of 0.5. The final set of parameters were then used to perform

the 500 rounds to each of the 4 readmission datasets, the results are presented in Table 5.15.

The results of the usage of the BPSO algorithm in the features selection process show that the

gradient of 0.9 achieved better overall performance than the other gradients datasets, mainly because

of the low number of features selected. Nonetheless, it is important to note that the results are not

highly evident, reinforcing the idea that the use of different gradients is not significant. As an

illustrative example, the graphical evolution of the best model using gradient 0.9 is presented in Fig.

5.14. The graphic evolution shows an explicit convergence of the BPSO algorithm. Unlike the BFSS

algorithm, after the transient dynamic (1st to ~250th iteration), the oscillation of the best particle in

each iteration is not sharp. This fact can be related to a lack of exploration and consequent lower

performance. In the author´s opinion, this is the main disadvantage of using the BPSO over the BFSS.

Table 5.14: Results of the study the parameters: α. Selected in bold.

Table 5.15: Model assessment results - BPSO method using the readmission datasets with different gradients.

61


mean 0.59 48.78 0.72 0.46 0.06 28.10 0.34 0.37 0.31 132

std 0.02 10.24 0.12 0.13 0.02 5.02 0.07 0.07 0.04 0

best round 0.65 64.84 0.66 0.65 0.08 19.88 0.25 0.25 0.25 132

mean 0.59 49.31 0.71 0.46 0.06 0.28 0.34 0.36 0.31 132

std 0.02 10.39 0.11 0.13 0.02 0.05 0.07 0.07 0.05 0

best round 0.66 66.89 0.66 0.67 0.11 0.25 0.27 0.32 0.25 132

mean 0.59 49.84 0.70 0.47 0.06 28.49 0.34 0.37 0.31 132

std 0.02 9.73 0.11 0.13 0.02 4.50 0.06 0.06 0.05 0

best round 0.66 60.59 0.74 0.59 0.08 17.77 0.14 0.21 0.20 132

mean 0.58 48.77 0.71 0.46 0.06 28.45 0.34 0.37 0.32 132

std 0.02 9.80 0.11 0.13 0.02 4.82 0.06 0.06 0.05 0

best round 0.65 67.58 0.63 0.68 0.10 11.73 0.21 0.14 0.20 132

0.1

0.4

0.6

0.9



5.4.4 No feature selection

For each of the 4 datasets, the whole process was computed without using a FS algorithm

(using all available features). The results of the 500 rounds are presented in Table 5.16.

The results show that with no feature selection, the performances using the different datasets

are very similar.

Table 5.16: Model assessment results – without feature selection using the readmission datasets with different gradients.

Figure 5.13: Graphical evolution of the BPSO process of feature selection. Evolution of the fish


the fish with the best performance per iteration (below)

62


mean 0.58 48.77 0.71 0.46 0.06 28.45 0.34 0.37 0.32 132

std 0.02 9.80 0.11 0.13 0.02 4.82 0.06 0.06 0.05 0

best round 0.65 67.58 0.63 0.68 0.10 11.73 0.21 0.14 0.20 132

mean 0.64 56.73 0.73 0.54 0.08 17.33 0.24 0.22 0.14 8

std 0.02 6.14 0.08 0.08 0.02 4.15 0.06 0.05 0.02 2

best round 0.68 66.43 0.71 0.66 0.10 14.59 0.14 0.17 0.15 9

mean 0.61 53.69 0.71 0.51 0.07 21.35 0.28 0.28 0.13 5

std 0.03 7.73 0.10 0.10 0.02 5.74 0.08 0.08 0.03 1

best round 0.69 55.53 0.87 0.51 0.11 14.12 0.11 0.16 0.15 6

mean 0.59 57.66 0.62 0.57 0.08 14.60 0.23 0.18 0.14 10

std 0.05 9.75 0.21 0.14 0.03 6.58 0.07 0.09 0.03 3

best round 0.67 59.13 0.78 0.57 0.12 10.23 0.31 0.14 0.10 8

NO FS

SFS

BFSS

BPSO

Gradient:0.9

mean, 10-fold cross validation standard deviation, 10-fold cross validation

5.4.4 Discuss

After collecting the results of all the methods of feature selection for the readmission datasets,

it can be affirmed that the use of different gradients for the weights in the weighted mean proved to

be weakly relevant. With the exception of the results using the BPSO algorithm, all the algorithms

show almost no sensitivity to the presence of different gradients for the linear distribution of the

weights for the weighted mean for the 22 physiologic variables.

The performance of the different feature selection algorithms can be compared using the

same dataset. It was decided to collect the results of one dataset to do the comparison, Table 5.17

summarizes the results.

The results of the comparison of the different features algorithms are not as obvious as the

ones with the benchmark sonar database were, possibly due to the variability introduced dealing with

real data in high dimensions spaces. The overall performance of the metaheuristic algorithms achieved

better results in comparison with the SFS and no feature selection algorithm results.

Once again, the BFSS achieved better overall results in comparison with all methods used,

achieving slightly better mean of AUC and less features selected.

It is noteworthy to mention that in all of the datasets used, the BFSS (Table 5.13) achieved

considerably less features than all the other methods, maintaining the convergence of the algorithm

and slightly better performance of the models.

It is also relevant to note that the sensitivity of the best model using the BFSS algorithm is

considerably better in comparison with that of other algorithms. Recalling that this measure indicates

the true positive rate, i.e. the patients who were selected as readmitted and were in fact.

Table 5.17: Comparison between the studied FS approaches using the readmission dataset with

gradient:0.9.

63

Chapter 6

Conclusion

This work has addressed the problem of feature selection. Firstly, the background for machine

learning was presented, followed by the definition of its application to the problem of classification.

Then, the two databases to be used were introduced, consisting of the benchmark sonar database and

the readmission datasets arising from the MIMIC II database. Further along, the fuzzy classification

models and C-mean clustering techniques were briefly described.

The modifications that allowed the optimization algorithm Fish School Search to be used in

binary input problems were formulated and then applied to the feature selection problem,

emphasizing the novel Binary Fish school Search. Other state of the art algorithms for FS were also

described, namely the SFS and the BPSO approaches.

Finally, the proposed methods were validated over the benchmark sonar database and then

applied to a case of study: the prediction of patient’s readmission in an ICU within 24-72h period

following their discharge.

This chapter summarizes the conclusions acquired during the development of this work, and

suggests topics for future research.

6.1- Binary Fish School Search

In general, the modification of an existing algorithm is risky. In advance, it is very difficult to

the algorithm developer to guarantee that the algorithm will have a better performance that the state

of the art algorithms or even to assure convergence.

There are countless ways to modify algorithms with real encoding schemes in order to solve

binary input problems. In this thesis, two novel approaches to the FSS algorithm were presented, the

D2BFSS and the BFSS. The decimal to binary approach was achieved through simple passage from

continuous to discrete system in the usage of the objective function to the original FSS algorithm. The

BFSS was a more complex approach that modified the internal mechanisms of the FSS algorithm,

allowing the procedure itself to manipulate the binary inputs.

64

Results of the benchmark sonar database for the feature selection problem showed that

simple manipulation of the fitness function, in order to transform the objective function from decimal

to binary system, achieved low performance, as well as a very low convergence evolution, when

compared to the BFSS algorithm. This result was expected due to the simplicity of the D2BFSS, and

reinforced the need to implement internal changes in FSS algorithm.

The results of the two databases tested showed that the initial development goals proposed

for the BFSS algorithm were achieved: to remain faithful to the original internal mechanisms of the FSS

without losing the meaning of each operator, addiction of few number of parameters, overall

simplicity and understanding of the modifications, and, mainly, the convergence of the algorithm.

However, it is important to note that the number of parameters used in the BFSS process is

high in comparison with other Feature Selection algorithms, and some limitations to the proposed

algorithm can arise if there is a poor adjustment in the parameters.

To conclusion, this brand new binary optimization algorithm exceeded expectations, with not

only its convergent evolution but also with good results in comparison with the other features

selection algorithms, especially the BPSO. The main contribution to achieve such results is believed to

have been the presence of the collective-volitive operator, which had a major role in the exploration

process of the algorithm, and consequent capacity of avoiding local maxima.

It is also important to note that although the formulated BFSS algorithm was used in the

problem of feature selection, it can be applied to any optimization problem with a binary encoding of

the inputs, opening doors for future fields of research

6.2- Prediction of readmissions

The dimensionality of medical data is typically very high. Hence the great importance of using

algorithms of feature selection for this environment, improving early detection of medical problems.

The proposed wrapper methods are appropriated to this problem due to various reasons, such as the

possibility to deal with large databases, the capacity to predict outcomes by means of a small subset of

features and the possibility to bring out new set of variables never considered before.

However, the proposal of using different gradients for the linear distribution of the weights in

the weighted mean to describe the time series of the physiologic variables, proved not to be

significant. It was expected that one gradient for the weights of the weighted mean would stand out,

the results show that the use of different gradients did not present a significant benefit in the

performance of the models nor in the number of Features Selected.

65

6.3- Future work

In this work, various modifications to the FSS algorithm were proposed. Nevertheless, not all

the possible paths were explored. The most promising future research topics in the BPSO algorithm

could be the following:

Use of the BFSS algorithm in other application: although the presented BFSS algorithm was used

in the feature selection problem, it can be used for any optimization problem with a binary

encoding of the inputs.

Reduction of the number of parameters: lower number of parameters means less time in their

selection and, generally, more simplicity. This research topic must be done maintaining the good

performance of the models and low number of features selected.

Use of newly different parameters update strategies: instead of using constant parameters in all

iterations of the algorithm, using varying values over the iterations. Studies [31] already proved

that the use of dynamic values for the weights of the fishes in FSS algorithms can be beneficial.

Development and refinement of FSS operators: testing other possible paths in the encoding of

the binary inputs.

To prove the full potential of the BFSS algorithm it is of major important to:

Validate de results in other benchmark databases: testing for databases with different dimensions

and compare the results with other state of the art algorithms.

Not using the weighted mean in the MIMIC II database: using the same kind of statistical

measures as the latest studies in the prediction of readmission of ICU patients. This would allow

the comparison of the performance of the BFSS to the state of the art features selection

algorithms in the readmission problem.

66

67

Bibliography

[1] D. C. Angus. Grappling with intensive care unit quality: does the readmission rate tell us

anything?. Critic Care Med, 26(11) pages 1779-1780. 1998.

[2] R. Babuska. Fuzzy Modeling for Control. International Series in Intelligent Technologies. Kluwer

Academic Publishers, Norwell, MA, USA, 1st edition. April 1998.

[3] W. Baigelman, R. Katz, G. Geary. Patient readmission to critical care unit during the same

hospitalization at a community teaching hospital. Intensive Care Med, 9(5), pages 253-256. 1983.

[4] A. M. Bensaid, L. O. Hall, J. C. Bezdek, L. P. Clarke, M. L. Silbiger, J. A. Arrington, R. F. Murtagh.

Validity-guided (re)clustering with applications to image segmentation. In IEEE Transactions on

Fuzzy Systems, pages 112-123. 1996.

[5] J.C. Bezdek. Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic

Publishers, Norwell, MA, USA. 1981.

[6] A. L. Blum, R. L. Rivest. Training a 3-node Neural Network is NP-complete. Neural Networks,

pages 117-127, 1992.

[7] A. J. Campbell, J. A. Cook, G. Adey. In Predicting death and readmission after intensive care

discharge. Br J Anaesth, pages 656-662, 2008.

[8] S. Chakrabarti,E.Cox ,E. Frank,R. ,H. G. ,J.Han ,X. Jiang ,M. Kamber ,S. S. Lightstone ,T. P. Nadeau

,R.E. Neapolitan ,D. Pyle ,M. Refaat ,M. Schneider ,T. J. Teorey ,I.H. Witten. Data Mining-know it

all data mining. Elsevier. 2009.

[9] G.D. Clifford, W. J. Long, G.B. Moody, P. Szolovits. Robust parameter extraction for decision

support using multimodal intensive care data. Philosophical Transactions, Series A,

Mathematical, Physical, and Engineering Sciences, 367, pages 411-429. 2009.

[10] J. Dayou, N. C. Han, H. C. Mun, A. H. Ahmad. Classification and Identification of Frog Sound

Based on Entropy Approach. IPCBEE, vol. 3, pages 184-187. 2011.

[11] R. O. Duda, P. E. Hart. Pattern Classification and Scene Analysis. Wiley & Sons. 1973.

[12] C. G. Durbin Jr, R. F. Kopel. A case control study of patients readmitted to the intensive care

unit. Critic Care Med, 21, pages 1547-1553. 1993.

[13] A. P. Engelbrecht. Computational Intelligence: An Introduction. New York: John Wiley. 2003.

[14] U. Fayyad, G. Piatetsky-Shapiro, P. Smyth. From data mining to knowledge discovery in

databases. AI Magazine, 17(3), pages 37-54. 1996.

68

[15] A.S. Fialho, F. Cismondi,S.M. Vieira, S.R. Reti, J.M.C. Sousa, S.N. Finkelstein. Data mining using

clinical physiology at discharge to predict ICU readmissions. Expert Systems with Applications

39. 2012.

[16] A. S. Fialho, F. Cismondi, S. M. Vieira, J. M. C. Sousa, S. R. Reti, M. D. Howell, and S. N. Finkelstein.

Predicting outcomes of septic shock patients using feature selection based on soft computing

techniques. In Proc. of the IPMU 13th International Conference, pages 65-74. 2010.

[17] C. J. A B. Filho. , F. B. L. Neto, A. J. C. C. Lins, A. I. S. Nascimento, M. P. Lima. A Novel Search

Algorithm based on Fish School Behavior. IEEE International Conference on Systems, Man, and

Cybernetics - SMC. 2008.

[18] S. N. Ghazavi,T. W. Liao. Medical data mining by fuzzy modeling with selected features. Artif

Intell Med, 43(3), pages 195-206. 2008.

[19] F. A.B. Glover. Heuristics for Integer programming Using Surrogate Constraints. Decision

Sciences 8 (1): 156-166. 1977.

[20] F. A. B. Glover. "Future Paths for Integer Programming and Links to Artificial

Intelligence. Computers and Operations Research 13 (5), pages 533-549. 1986.

[21] R. P. Gorman, T. J. Sejnowski. Analysis of hidden units in a layered network trained to classify

sonar targets. Neural Networks 1, pages75-89. 1988.

[22] I. Guyon, A. Elisseeff. An introduction to variable and feature selection. Journal of Machine

Learning Research, 3, pages 1157-1182. 2003.

[23] I.Guyon, S. Gunn, M. Nikravesh, and L. A. Zadeh, editors. Feature Extraction: Foundations and

Applications (Studies in Fuzziness and Soft Computing). Springer. August 2006.

[24] I. Guyon, S. Gunn, M. Nikravesh, L. A. Zadeh, editors. Feature Extraction: Foundations and

Applications (Studies in Fuzziness and Soft Computing). Springer. August 2006.

[25] D. C. Hadorn, E. B. Keeler, W. H. Rogers, R.H. Brook. Assessing the Performanceof Mortality

Prediction Models. RAND. 1993.

[26] J. Han, M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann. 2000.

[27] J.H. A. B. Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press.

1975

[28] A. L. Horn, F. Cismondi, A. S. Fialho, S. M. Vieira, J. M. C. Sousa, S. R. Reti, M. D. Howell, S. N.

Finkelstein. Multi-objective performance evaluation using fuzzy criteria: Increasing sensitivity

prediction for outcome of septic shock patients. Proc. of 18th World Congress of the

International Federation of Automatic Control (IFAC). 2011.

[29] G. Hughes. On the mean accuracy of statistical pattern recognizers. IEEE Transactions on

Information Theory, 14(1), pages 55-63. January 1968.

[30] L. Hyafil, R. L. Rivest. Constructing Optimal Binary Decision Trees is NP-complete. Information

Processing Letters, 5(1), pages15-17. 1976.

69

[31] A. G. K. Janecek, Y. Tan. Feeding the fish - weight update strategies for the fish school search

algorithm. ICSI'11 Proceedings of the Second international conference on Advances in swarm

intelligence , Volume Part II. 2011.

[32] R. Kaushal, D. Blumenthal, E. G. Poon, A. K. Jha, C. Franz, B. Middleton, J. Glaser, G.Kuperman, M.

Christino, R. Fernandopulle, J. P. Newhouse, D. W. Bates. The costs of a national health

information network. Annuals of Internal Medicine, 143(3), pages165-173. 2005.

[33] V. Kecman. Learning and Soft Computing Support Vector Machines, Neural Networks, Fuzzy

Logic Systems. MIT Press. 2001.

[34] S. A.B. Kirkpatrick, C.D. Gelatt Jr., M.P. Vecchi. Optimization by Simulated

Annealing. Science 220 (4598), pages 671–680. 1983

[35] R. Kopach-Konrad, M. Lawley, M. Criswell, I. Hasan, S. Chakraborty, J. Pekny, B. Doebbeling.

Applying systems engineering principles in improving health care delivery. Journal of General

Internal Medicine, 22, pages 431-437. 2007.

[36] A. Kossiakoff, W. N. Sweet. Systems Engineering Principles and Practice. Wiley-Interscience,

2002.

[37] L. I. Kuncheva. Combining Pattern Classifiers: Methods and Algorithms. Wiley. 2004.

[38] M. A. Mazurowski, P. A. Habas, J. M. Zurada, J.Y. Lo, J. A. Baker, G. D. Tourassi. Training neural

network classifiers for medical decision making: The effects of imbalanced datasets on

classification performance. Neural Networks, 21(2-3):427-436. 2008.

[39] L. F. Mendonça, S. M. Vieira, J. M. C. Sousa. Decision tree search methods in fuzzy modeling and

classification. International Journal of Approximate Reasoning, 44(2), pages106-123. 2007.

[40] Z. Michalewicz, D. Fogel. How to solve it - modern heuristics. 2nd edn, Springer, Heidelberg.

2004.

[41] H. Liu, H. Motoda. Computational Methods of Feature Selection. Chapman and Hall. 2007.

[42] G. E. Moore. Cramming more components onto integrated circuits. Electronics Magazine, 4.

1965.

[43] A. Navot. On the Role of Feature Selection in Machine Learning Thesis. Senate of the Hebrew

University. December 2006.

[44] N. Nilsson. Learning Machines: Foundations of Trainable Pattern-Classifying Systems. McGraw-

Hill. 1965.

[45] A. Partap, B. Singh. Shannon and Non-Shannon Measures of Entropy for Statistical Texture

Feature Extraction in Digitized Mammograms. WCECS, vol. II. 2009.

[46] E. A. Patrick. Fundamentals of Pattern Recognition. Prentice Hall. September 1972.

[47] P. Reid. Building a Better Delivery System: A New Engineering/Health Care Partnership.National

Academies Press. 2005.

http://en.wikipedia.org/wiki/Metaheuristic#cite_ref-kirkpatrick83optimization_9-0

70

[48] A. L. Rosenberg, C. M. Watts. Patients readmitted to intensive care units: A systematic review of

risk factors and outcomes. Chest, pages 492–502, 2000.

[49] A. L. Rosenberg, C. Watts. Patients Readmitted to ICUs: A Systematic Review of Risk Factors and

Outcomes. Critical Care Reviews, 2000.

[50] A. L. Rosenberg, T. P. Hofer, R. A. Hayward, C. Strachan, C. M. Watts. Who bounces back?

Physiologic and other predictors of intensive care unit readmission. Crit Care Med, pages 511–

518. 2001.

[51] M. Saeed, C. Lieu, R. Mark. MIMIC II. a massive temporal icu database to support research in

intelligence patient monitoring. Computers in Cardiology, 29, pages 641-644. 2002.

[52] Y. Saeys, I. Inza, P. Larranaga. A review of feature selection techniques in bioinformatics.

Bioinformatics, 23(19), pages 2507–2517. 2007.

[53] N. Sanchez-Marono, A. Alonso-Betanzos, and M. Tombilla-Sanroman. Filter methods for feature

selection: A comparative study. In Intelligent Data Engineering and Automated Learning: IDEAL

2007, volume 4881 of Lecture Notes in Computer Science, pages 178-187. Springer Berlin /

Heidelberg. 2007.

[54] J. A. Sokolowski. Principles of Modeling and Simulation: A Multidisciplinary Approach. Wiley.

February 2009.

[55] J. M. C. Sousa, U. Kaymak. Fuzzy Decision Making in Modeling and Control. World Scientific

Singapore. 2002.

[56] L. Talavera. An evaluation of filter and wrapper methods for feature selection in categorical

clustering. In Advances in Intelligent Data Analysis VI, volume 3646 of Lecture Notes in

Computer Science, pages 742-742. Springer Berlin / Heidelberg. 2005.

[57] T. Takagi, M. Sugeno. Fuzzy identification of systems and its applications to modelling and

control. IEEE Transactions on Systems, Man and Cybernetics, 15(1) pages 116–132. 1985.

[58] G. Upton, I. Cook . Understanding Statistics. Oxford University Press. page 55.1996.

[59] L. Yuan, Z.-D. Zhao. A modified binary particle swarm optimization algorithm for permutation

flow shop problem. In Proc. of the International Conference on Machine Learning and

Cybernetics, vol. 2, pages 902-907.

http://books.google.com/books?id=vXzWG09_SzAC&pg=PA55&dq=interquartile+range#v=onepage&q=interquartile%20range&f=false

i

Appendix

ii

Test No. thre_c thre_v Stepind Wscale α Fitness FS Features selectedIteration with the


MeanAUC

crossvalidation

%

contraction

No. of same

postition of

baricenter

Plot

0.1 0.96 1 196 78.13 63.33 0.47 297 -

0.3 0.92 1 44 78.13 68.08 0.75 299 -

0.5 0.91 5 115 90.63 70.63 0.95 299 -

0.7 0.89 5 56 87.5 67.31 0.98 295 -

0.9 0.93 11 85 93.75 76.12 0.99 299 -

0.1 0.96 1 260 78.13 67.08 0.46 298 -

0.3 0.93 2 229 84.38 76.01 0.74 297 -

0.5 0.91 3 92 87.5 71.70 0.95 297 -

0.7 0.88 6 109 87.5 70.99 0.98 295 -

0.9 0.92 12 65 93.75 70.90 0.98 299 -

0.1 0.96 1 152 78.13 68.10 0.45 297 -

0.3 0.93 4 239 90.63 71.19 0.77 299 -

0.5 0.91 3 59 87.5 68.77 0.96 299 -

0.7 0.88 2 59 84.38 76.34 0.97 295 -

0.9 0.92 16 108 93.75 76.03 0.98 295 -

0.1 0.96 1 175 71.88 61.42 0.46 297 -

0.3 0.93 2 71 84.38 68.49 0.77 297 -

0.5 0.91 5 142 90.63 66.99 0.95 297 -

0.7 0.89 4 95 87.5 67.95 0.98 299 -

0.9 0.95 13 104 96.88 73.01 0.99 299 -

0.1 0.96 1 152 71.88 71.19 0.46 299 -

0.3 0.93 2 175 84.38 75.40 0.78 299 -

0.5 0.91 2 206 84.38 74.94 0.93 299 -

0.7 0.92 3 182 90.63 64.97 0.98 299 -

0.9 0.92 12 20 93.75 72.54 0.99 294 -

0.1 0.96 1 268 71.88 73.94 0.50 277 -

0.3 0.94 3 221 90.63 69.39 0.76 285 -

0.5 0.90 3 112 84.38 75.83 0.93 278 -

0.7 0.90 3 221 87.5 65.95 0.98 284 -

0.9 0.93 10 129 93.75 73.63 0.97 286 -

0.1 0.95 1 200 68.75 69.59 0.48 279 -

0.3 0.92 1 259 78.13 69.81 0.63 284 -

0.5 0.91 3 161 87.5 75.03 0.91 285 -

0.7 0.91 5 43 90.63 65.84 0.97 274 -

0.9 0.95 16 50 96.88 73.01 0.94 276 -

0.1 0.95 1 193 62.5 65.04 0.49 276 -

0.3 0.93 2 276 84.38 77.98 0.68 277 -

0.5 0.91 3 277 87.5 70.90 0.93 286 -

0.7 0.90 7 82 90.63 67.99 0.97 284 -

0.9 0.94 21 9 96.88 70.40 0.96 279 -

0.1 0.96 1 253 78.13 65.48 0.48 280 -

0.3 0.93 2 259 84.38 74.11 0.72 284 -

0.5 0.90 4 56 87.5 67.26 0.94 284 -

0.7 0.89 4 69 87.5 73.94 0.94 284 -

0.9 0.92 15 29 93.75 72.94 0.98 280 -

0.1 0.96 1 213 71.88 62.93 0.46 275 -

0.3 0.92 1 114 78.13 68.57 0.74 285 -

0.5 0.90 3 162 84.38 70.61 0.92 288 -

0.7 0.90 7 111 90.63 69.30 0.98 278 -

0.9 0.90 7 88 90.63 72.30 0.98 282 -

0.1 0.92 3 239 65.63 68.68 0.57 203 -

0.3 0.87 8 103 87.5 73.94 0.67 225 -

0.5 0.84 13 136 90.63 76.74 0.89 250 -

0.7 0.82 22 207 90.63 70.59 0.97 239 -

0.9 0.85 20 8 87.5 74.21 0.97 240 -

0.1 0.89 6 191 75 72.21 0.58 205 -

0.3 0.82 11 228 84.38 67.97 0.76 243 -

0.5 0.81 19 90 93.75 70.30 0.94 255 -

0.7 0.85 22 96 93.75 71.92 0.97 247 -

0.9 0.90 25 202 93.75 75.56 0.92 232 -

0.1 0.89 5 290 65.63 69.10 0.62 207 -

0.3 0.82 10 186 78.13 71.10 0.91 250 -

0.5 0.84 13 51 90.63 75.96 0.90 264 -

0.7 0.85 25 173 96.88 72.40 0.90 242 -

0.9 0.91 21 18 93.75 73.12 0.95 246 -

0.1 0.89 6 256 81.25 71.19 0.54 203 -

0.3 0.85 9 127 84.38 73.74 0.75 260 -

0.5 0.84 14 55 90.63 70.19 0.88 246 -

0.7 0.84 24 41 93.75 73.03 0.92 249 -

0.9 0.97 21 84 100 72.63 0.90 248 -

0.1 0.87 7 120 75 71.99 0.63 228 -

0.3 0.83 11 124 87.5 74.10 0.89 253 -

0.5 0.84 16 103 93.75 73.72 0.87 258 -

0.7 0.88 15 151 93.75 72.52 0.97 248 -

0.9 0.93 27 199 96.88 73.83 0.89 250 -

3

2

1

0.5 0.5 [0.2 0.01]

5

50

500

5000

20000

5000

20000

0.7 0.7 [0.2 0.01]

5

50

500

5000

20000

Study of the parameter: stepind e stepvol [initial and final value], wscale and α

5

0.9 0.9 [0.2 0.01]

50

500

A Extended Results - sonar database

In order to choose an appropriated set of parameters to the optimization algorithm, it is

important to do a coherent study to the parameters to be used. Some limitations can arise if

there is a poor adjustment in the parameters. With this in mind, we present the detailed results

of the 16 tests used in the study of the parameters thres_c and thres_v, wscale and α, for the

sonar database. The 1-9 tests used the non-adaptative threshold and the 10 -18 used the

adaptative threshold.

Table A.1: Results of the study the parameters: thres_c and thres_v - Tests 1-3

iii



MeanAUC

crossvalidation

%

contraction

No. of same

postition of

baricenter

Plot

0.1 0.64 22 1 65.63 85.48 1.00 262 -

0.3 0.71 19 1 78.13 72.70 1.00 245 -

0.5 0.72 26 2 87.5 69.35 1.00 249 -

0.7 0.72 26 1 78.13 68.52 1.00 266 -

0.9 0.83 35 2 87.5 76.23 0.99 265 -

0.1 0.61 24 1 68.75 70.10 0.99 245 -

0.3 0.67 22 9 75 71.48 1.00 235 -

0.5 0.68 29 11 84.38 68.39 0.99 260 -

0.7 0.76 35 8 90.63 69.30 1.00 252 -

0.9 0.82 40 62 87.5 77.56 1.00 261 -

0.1 0.67 20 3 71.88 71.99 1.00 250 -

0.3 0.64 26 3 81.25 78.37 0.99 253 -

0.5 0.67 23 4 71.88 71.41 1.00 253 -

0.7 0.73 36 43 87.5 69.59 1.00 260 -

0.9 0.82 39 85 87.5 72.42 1.00 260 -

0.1 0.63 23 2 78.13 72.08 1.00 247 -

0.3 0.64 23 1 68.75 70.37 1.00 248 -

0.5 0.64 24 2 68.75 70.17 0.97 244 -

0.7 0.76 31 7 87.5 71.63 1.00 258 -

0.9 0.81 32 9 84.38 68.77 0.99 256 -

0.1 0.65 22 1 78.13 70.81 1.00 258 -

0.3 0.67 22 3 75 73.92 1.00 246 -

0.5 0.68 22 1 71.88 67.97 1.00 253 -

0.7 0.75 32 12 87.5 72.63 0.99 252 -

0.9 0.83 37 20 87.5 70.19 1.00 267 -

0.1 0.63 23 3 78.13 71.48 1.00 294 -

0.3 0.67 22 1 75 76.56 1.00 297 -

0.5 0.71 20 1 75 71.40 1.00 296 -

0.7 0.73 37 42 87.5 72.83 1.00 299 -

0.9 0.78 28 1 81.25 70.72 0.99 298 -

0.1 0.67 20 1 65.63 74.10 1.00 294 -

0.3 0.65 24 2 75 76.14 1.00 295 -

0.5 0.68 22 2 71.88 78.78 0.99 296 -

0.7 0.73 32 1 84.38 73.94 1.00 297 -

0.9 0.83 37 5 87.5 75.94 0.99 299 -

0.1 0.61 24 1 71.88 73.10 1.00 294 -

0.3 0.65 23 1 71.88 80.14 1.00 296 -

0.5 0.70 23 1 78.13 72.29 1.00 294 -

0.7 0.73 24 5 78.13 72.21 1.00 296 -

0.9 0.88 20 2 90.63 71.03 1.00 297 -

0.1 0.61 24 1 71.88 70.28 1.00 298 -

0.3 0.66 22 1 71.88 72.72 1.00 295 -

0.5 0.68 24 1 75 70.19 1.00 297 -

0.7 0.72 39 27 87.5 70.54 1.00 294 -

0.9 0.80 35 6 84.38 73.40 0.99 298 -

0.1 0.66 21 4 78.13 77.65 1.00 294 -

0.3 0.67 22 1 75 69.90 1.00 296 -

0.5 0.69 24 1 78.13 72.21 1.00 299 -

0.7 0.74 35 34 87.5 68.88 1.00 296 -

0.9 0.82 23 2 84.38 73.08 1.00 297 -

0.1 0.82 9 116 59.38 73.23 0.96 186 -

0.3 0.76 18 28 90.63 69.21 0.97 208 -

0.5 0.77 22 180 90.63 71.92 0.97 224 -

0.7 0.80 22 11 87.5 71.12 0.98 243 -

0.9 0.85 24 12 87.5 72.03 0.99 248 -

0.1 0.80 12 172 75 70.68 0.92 158 -

0.3 0.77 17 46 90.63 72.88 0.98 207 -

0.5 0.78 21 46 90.63 69.90 0.97 212 -

0.7 0.84 27 155 96.88 71.83 0.97 231 -

0.9 0.86 34 36 90.63 77.56 0.99 259 -

0.1 0.80 11 197 65.63 70.01 0.91 156 -

0.3 0.75 17 274 81.25 71.03 0.97 189 -

0.5 0.76 20 3 84.38 75.05 0.99 253 -

0.7 0.80 26 59 90.63 70.30 0.97 243 -

0.9 0.87 25 169 90.63 71.24 0.98 261 -

0.1 0.76 15 172 81.25 70.81 0.95 151 -

0.3 0.76 16 73 81.25 75.14 0.95 194 -

0.5 0.77 22 24 90.63 73.83 0.98 206 -

0.7 0.79 25 134 87.5 68.08 0.97 253 -

0.9 0.87 29 26 90.63 70.61 0.98 246 -

0.1 0.75 15 282 71.88 73.79 0.92 172 -

0.3 0.73 19 101 84.38 76.29 0.97 183 -

0.5 0.79 20 30 90.63 74.92 0.97 219 -

0.7 0.81 20 183 87.5 72.74 0.98 223 -

0.9 0.87 27 35 90.63 74.41 0.98 243 -

0.1 0.76 14 31 71.88 73.81 0.91 206 -

0.3 0.75 17 29 84.38 77.83 0.94 227 -

0.5 0.76 23 87 90.63 70.99 0.97 208 -

0.7 0.84 24 100 93.75 70.26 0.98 213 -

0.9 0.86 36 37 90.63 74.93 0.98 190 -

0.1 0.77 13 55 68.75 70.90 0.89 205 -

0.3 0.75 17 73 84.38 74.10 0.92 213 -

0.5 0.75 21 140 84.38 68.39 0.99 209 -

0.7 0.79 33 95 93.75 68.99 0.98 180 -

0.9 0.89 31 55 93.75 72.01 0.97 189 -

0.1 0.78 14 142 87.5 77.05 0.93 212 -

0.3 0.77 16 109 84.38 72.90 0.94 217 -

0.5 0.75 21 22 84.38 75.85 0.98 200 -

0.7 0.81 25 144 90.63 70.72 0.97 215 -

0.9 0.89 34 61 93.75 69.01 0.98 202 -

0.1 0.73 16 118 65.63 76.03 0.96 222 -

0.3 0.77 19 184 96.88 76.85 0.91 197 -

0.5 0.76 17 131 81.25 76.42 0.99 190 -

0.7 0.80 22 11 87.5 70.70 0.97 197 -

0.9 0.86 35 118 90.63 75.94 0.98 214 -

0.1 0.76 15 82 81.25 71.22 0.91 193 -

0.3 0.76 15 3 78.13 70.99 0.92 226 -

0.5 0.76 23 16 90.63 71.92 0.97 193 -

0.7 0.83 21 153 90.63 71.12 0.96 185 -

0.9 0.88 37 114 93.75 69.88 0.97 195 -

5000

20000

5

50

4 0.3 0.3 [0.2 0.01]

5

50

500

7 0.7 0.3 [0.2 0.01]

5

50

500

5000

20000

6 0.9 0.1 [0.2 0.01]

5

50

500

5000

20000

5 0.1 0.1 [0.2 0.01] 500

5000

20000



iv



MeanAUC

crossvalidation

%

contraction

No. of same

postition of

baricenter

Plot

0.1 0.94 1 254 59.38 58.11 0.54 297 -

0.3 0.93 2 185 84.38 74.19 0.77 295 -

0.5 0.90 6 121 90.63 73.81 0.91 295 -

0.7 0.90 7 213 90.63 72.83 0.95 295 -

0.9 0.90 9 262 90.63 70.79 0.96 270 -

0.1 0.95 1 188 62.5 70.39 0.54 299 -

0.3 0.92 1 122 78.13 68.06 0.76 295 -

0.5 0.90 8 100 93.75 72.01 0.89 288 -

0.7 0.88 6 249 87.5 72.42 0.95 280 -

0.9 0.94 20 169 96.88 72.80 0.97 277 -

0.1 0.94 3 238 84.38 75.83 0.54 299 -

0.3 0.93 2 227 84.38 71.90 0.77 297 -

0.5 0.92 4 206 90.63 75.55 0.91 299 -

0.7 0.87 8 298 87.5 69.30 0.96 285 -

0.9 0.94 20 24 96.88 75.72 0.94 267 -

0.1 0.94 2 225 71.88 60.42 0.53 299 -

0.3 0.93 2 213 84.38 75.65 0.75 292 -

0.5 0.92 6 204 93.75 72.51 0.90 293 -

0.7 0.87 12 208 90.63 70.97 0.95 274 -

0.9 0.89 13 251 90.63 73.39 0.98 283 -

0.1 0.95 1 231 65.63 75.03 0.53 299 -

0.3 0.93 2 202 84.38 74.01 0.76 297 -

0.5 0.91 3 295 87.5 71.52 0.91 291 -

0.7 0.88 6 278 87.5 68.61 0.97 275 -

0.9 0.98 14 254 100 74.63 0.96 292 -

0.1 0.96 1 296 71.88 63.31 0.53 257 -

0.3 0.93 3 164 87.5 65.88 0.70 275 -

0.5 0.89 6 165 87.5 75.83 0.87 266 -

0.7 0.90 12 45 93.75 73.52 0.88 243 -

0.9 0.92 17 211 93.75 78.03 0.90 243 -

0.1 0.94 2 121 68.75 67.97 0.51 272 -

0.3 0.93 2 167 84.38 76.92 0.77 269 -

0.5 0.91 9 26 96.88 73.74 0.90 263 -

0.7 0.91 10 184 93.75 68.59 0.92 250 -

0.9 0.95 14 173 96.88 78.33 0.91 229 -

0.1 0.95 2 233 75 66.68 0.53 261 -

0.3 0.92 2 133 81.25 72.49 0.82 269 -

0.5 0.89 7 91 90.63 76.74 0.92 256 -

0.7 0.88 20 126 96.88 72.61 0.94 247 -

0.9 0.94 19 55 96.88 73.90 0.96 227 -

0.1 0.96 1 271 71.88 73.12 0.53 267 -

0.3 0.93 4 255 90.63 71.74 0.69 261 -

0.5 0.91 5 285 90.63 67.92 0.90 273 -

0.7 0.88 15 218 93.75 74.72 0.96 238 -

0.9 0.91 18 223 93.75 70.79 0.95 235 -

0.1 0.96 1 240 78.13 65.44 0.53 266 -

0.3 0.93 2 176 84.38 75.72 0.72 270 -

0.5 0.91 5 218 90.63 69.79 0.85 255 -

0.7 0.91 13 202 96.88 74.42 0.93 238 -

0.9 0.95 14 96 96.88 71.41 0.89 230 -

0.1 0.96 1 20 75 67.08 0.37 237 -

0.3 0.93 2 169 84.38 75.67 0.73 276 -

0.5 0.94 4 19 93.75 69.28 0.86 283 -

0.7 0.90 7 28 90.63 68.90 0.98 271 -

0.9 0.93 10 54 93.75 72.70 0.93 250 -

0.1 0.95 1 185 68.75 70.79 0.38 224 -

0.3 0.93 2 206 84.38 70.90 0.72 290 -

0.5 0.90 4 151 87.5 70.32 0.97 292 -

0.7 0.90 11 120 93.75 70.81 0.98 275 -

0.9 0.95 16 48 96.88 68.99 0.98 254 -

0.1 0.96 1 226 78.13 67.17 0.37 237 -

0.3 0.92 1 45 78.13 68.35 0.75 287 -

0.5 0.90 5 117 87.5 73.72 0.82 295 -

0.7 0.91 9 55 93.75 69.66 0.93 279 -

0.9 0.92 12 81 93.75 72.31 0.93 259 -

0.1 0.96 1 222 75 64.95 0.36 231 -

0.3 0.92 1 118 78.13 64.93 0.68 285 -

0.5 0.91 2 133 84.38 76.06 0.92 291 -

0.7 0.91 13 50 96.88 71.48 0.96 284 -

0.9 0.93 7 190 93.75 75.54 0.93 266 -

0.1 0.96 1 150 71.88 63.15 0.38 230 -

0.3 0.92 1 262 78.13 65.06 0.71 286 -

0.5 0.91 2 53 84.38 72.83 0.98 285 -

0.7 0.90 11 65 93.75 72.41 0.96 276 -

0.9 0.90 8 134 90.63 68.81 0.93 266 -

0.1 0.96 1 288 78.13 64.31 0.39 176 -

0.3 0.93 2 204 84.38 74.03 0.75 265 -

0.5 0.91 3 241 87.5 65.35 0.96 254 -

0.7 0.90 3 31 87.5 76.63 0.96 240 -

0.9 0.95 14 32 96.88 71.52 0.97 221 -

0.1 0.95 1 180 65.63 57.52 0.39 174 -

0.3 0.93 2 134 84.38 72.72 0.64 268 -

0.5 0.91 2 129 84.38 73.01 0.92 264 -

0.7 0.93 15 54 100 71.52 0.98 251 -

0.9 0.94 4 159 93.75 70.41 0.94 236 -

0.1 0.96 1 260 78.13 69.59 0.41 191 -

0.3 0.93 2 243 84.38 77.05 0.77 258 -

0.5 0.91 3 213 87.5 72.21 0.92 276 -

0.7 0.92 11 15 96.88 75.01 0.97 244 -

0.9 0.88 6 223 87.5 67.88 0.93 244 -

0.1 0.96 1 227 71.88 65.13 0.39 187 -

0.3 0.92 5 34 93.75 75.03 0.78 273 -

0.5 0.91 2 38 84.38 75.21 0.85 268 -

0.7 0.90 3 147 87.5 72.83 0.99 240 -

0.9 0.92 14 43 93.75 72.97 0.96 243 -

0.1 0.96 1 209 71.88 72.08 0.41 205 -

0.3 0.92 1 78 78.13 67.37 0.86 280 -

0.5 0.92 6 57 93.75 67.35 0.93 272 -

0.7 0.89 8 114 90.63 72.83 0.99 245 -

0.9 0.92 15 38 93.75 74.92 0.95 240 -

5000

20000

5

50

8 0.1 0.9 [0.2 0.01]

5

50

500

11 0.7 0.7 [0.2 0.01]

5

50

500

5000

20000

10 0.9 0.9 [0.2 0.01]

5

50

500

5000

20000

9 0.3 0.7 [0.2 0.01] 500

5000

20000


Table A.3: Results of the study the parameters: thres_c and thres_ - Tests 8-11

v



MeanAUC

crossvalidation

%

contraction

No. of same

postition of

baricenter

Plot

0.1 0.95 1 252 65.63 59.31 0.45 111 -

0.3 0.92 1 106 78.13 68.37 0.71 206 -

0.5 0.94 4 206 93.75 68.69 0.84 240 -

0.7 0.91 5 122 90.63 69.30 0.92 198 -

0.9 0.93 10 36 93.75 71.19 0.97 197 -

0.1 0.95 2 176 84.38 73.32 0.48 132 -

0.3 0.92 1 213 78.13 66.08 0.80 225 -

0.5 0.91 2 69 84.38 75.03 0.90 251 -

0.7 0.91 5 149 90.63 69.81 0.94 218 -

0.9 0.92 13 74 93.75 71.54 0.95 180 -

0.1 0.95 1 275 68.75 67.88 0.47 95 -

0.3 0.92 1 102 78.13 69.99 0.70 219 -

0.5 0.91 3 54 87.5 72.59 0.91 230 -

0.7 0.91 4 300 90.63 74.31 0.92 209 -

0.9 0.90 12 100 90.63 74.54 0.97 182 -

0.1 0.96 1 236 71.88 63.73 0.51 106 -

0.3 0.93 2 126 84.38 75.01 0.75 228 -

0.5 0.90 4 129 87.5 71.58 0.89 250 -

0.7 0.93 9 37 96.88 73.12 0.92 234 -

0.9 0.93 11 120 93.75 71.08 0.97 192 -

0.1 0.95 1 267 65.63 56.52 0.46 117 -

0.3 0.93 4 172 90.63 74.48 0.71 221 -

0.5 0.91 7 99 93.75 74.54 0.90 254 -

0.7 0.94 4 97 93.75 68.68 0.92 251 -

0.9 0.94 4 261 93.75 69.72 0.97 192 -

0.1 0.96 1 278 75 66.66 0.47 110 -

0.3 0.92 1 81 78.13 67.08 0.74 223 -

0.5 0.92 4 46 90.63 69.49 0.93 237 -

0.7 0.91 13 103 96.88 72.81 0.94 209 -

0.9 0.94 4 279 93.75 70.19 0.92 195 -

0.1 0.95 2 217 75 71.81 0.48 103 -

0.3 0.92 1 168 78.13 69.99 0.66 225 -

0.5 0.91 3 238 87.5 68.68 0.75 238 -

0.7 0.90 6 191 90.63 72.40 0.93 225 -

0.9 0.95 16 20 96.88 69.90 0.96 180 -

0.1 0.92 3 274 68.75 71.10 0.63 29 +

0.3 0.92 4 217 87.5 72.88 0.77 89 +

0.5 0.85 10 201 87.5 70.90 0.91 91 +

0.7 0.85 21 35 93.75 71.47 0.94 101 +

0.9 0.94 17 254 96.88 72.28 0.94 123 +

0.1 0.90 5 137 78.13 72.12 0.60 22 +

0.3 0.89 5 279 84.38 69.90 0.76 74 +

0.5 0.87 6 279 84.38 68.70 0.91 103 +

0.7 0.89 18 238 96.88 74.03 0.94 137 +

0.9 0.94 18 243 96.88 72.88 0.95 115 +

0.1 0.92 4 99 78.13 71.92 0.60 45 +

0.3 0.92 2 280 81.25 70.81 0.75 137 +

0.5 0.89 10 266 93.75 75.46 0.89 126 +

0.7 0.88 16 194 93.75 73.61 0.95 137 +

0.9 0.94 20 76 96.88 74.63 0.92 131 +

0.1 0.93 3 250 71.88 67.97 0.66 54 +

0.3 0.91 4 244 84.38 73.83 0.76 146 +

0.5 0.89 7 277 90.63 70.90 0.89 128 +

0.7 0.88 19 111 96.88 72.99 0.94 127 +

0.9 0.95 13 271 96.88 69.90 0.94 110 +

0.1 0.93 3 158 71.88 72.21 0.58 23 +

0.3 0.88 6 228 84.38 72.70 0.78 124 +

0.5 0.89 7 252 90.63 74.61 0.89 147 +

0.7 0.89 14 252 93.75 75.31 0.95 120 +

0.9 0.93 23 182 96.88 70.79 0.96 125 +

0.1 0.92 4 123 81.25 65.15 0.63 33 +

0.3 0.90 5 287 87.5 72.39 0.75 92 +

0.5 0.88 9 286 90.63 75.63 0.91 131 +

0.7 0.87 17 292 93.75 74.56 0.95 119 +

0.9 0.91 23 124 93.75 72.47 0.95 120 +

0.1 0.92 4 152 75 65.84 0.66 33 +

0.3 0.89 6 242 87.5 70.81 0.74 68 +

0.5 0.90 8 207 93.75 69.59 0.89 141 +

0.7 0.86 19 253 93.75 72.81 0.97 155 +

0.9 0.92 12 292 93.75 72.52 0.95 143 +

0.1 0.95 1 215 62.5 63.56 0.58 40 +

0.3 0.89 7 144 90.63 69.08 0.83 95 +

0.5 0.86 8 300 84.38 70.92 0.93 187 +

0.7 0.85 17 251 90.63 72.81 0.97 239 +

0.9 0.90 24 30 93.75 72.10 0.96 273 +

0.1 0.95 2 291 81.25 75.61 0.53 45 +

0.3 0.86 7 207 81.25 69.21 0.83 116 +

0.5 0.83 15 44 90.63 70.28 0.96 238 +

0.7 0.85 12 253 87.5 75.74 0.96 257 +

0.9 0.91 23 154 93.75 73.10 0.96 252 +

0.1 0.94 2 101 71.88 65.35 0.58 50 +

0.3 0.89 6 278 87.5 72.41 0.82 117 +

0.5 0.84 14 258 90.63 72.72 0.95 246 +

0.7 0.88 16 186 93.75 72.34 0.94 261 +

0.9 0.88 20 150 90.63 71.88 0.96 281 +

0.1 0.95 1 61 65.63 66.79 0.49 42 +

0.3 0.87 7 148 84.38 73.94 0.81 121 +

0.5 0.82 14 238 87.5 74.76 0.95 247 +

0.7 0.86 11 295 87.5 71.28 0.96 262 +

0.9 0.89 14 141 90.63 73.81 0.96 270 +

0.1 0.95 2 253 81.25 74.61 0.52 44 +

0.3 0.88 6 145 84.38 68.46 0.76 93 +

0.5 0.82 10 299 81.25 72.69 0.95 218 +

0.7 0.85 17 261 90.63 71.52 0.95 265 +

0.9 0.90 24 126 93.75 65.95 0.95 269 +

0.1 0.94 2 87 68.75 67.90 0.51 44 +

0.3 0.88 7 199 87.5 68.46 0.84 129 +

0.5 0.82 10 192 81.25 70.70 0.93 241 +

0.7 0.87 17 196 93.75 68.47 0.97 264 +

0.9 0.90 24 181 93.75 68.04 0.97 270 +

0.1 0.94 1 214 59.38 61.93 0.55 43 +

0.3 0.87 7 198 84.38 75.74 0.84 123 +

0.5 0.82 10 137 81.25 72.63 0.97 248 +

0.7 0.84 15 119 87.5 66.04 0.97 263 +

0.9 0.90 24 54 93.75 69.81 0.96 265 +

5000

20000

100000

1000000

0.1 [0.2 0.01]

1000000

12 0.5 0.5 [0.2 0.01]

5

50

500

13 0.3 0.3 [0.2 0.01]

50

500

5000

20000

100000

14

5000

20000

100000

1000000

5

0.1

5

50

500



vi



MeanAUC

crossvalidation

%

contraction

No. of same

postition of

baricenter

Plot

0.1 0.94 3 135 81.25 72.17 0.55 22 +

0.3 0.88 5 148 78.13 76.01 0.85 83 +

0.5 0.86 11 298 90.63 74.42 0.91 189 +

0.7 0.82 28 82 93.75 81.78 0.96 237 +

0.9 0.91 18 266 93.75 74.63 0.96 245 +

0.1 0.95 1 71 62.5 47.73 0.42 18 +

0.3 0.89 6 134 87.5 69.21 0.81 101 +

0.5 0.84 15 282 93.75 72.19 0.93 237 +

0.7 0.89 14 250 93.75 70.10 0.96 252 +

0.9 0.88 23 49 90.63 71.79 0.96 259 +

0.1 0.95 2 85 81.25 74.40 0.48 28 +

0.3 0.88 6 279 84.38 69.97 0.76 82 +

0.5 0.85 14 156 93.75 73.01 0.93 210 +

0.7 0.84 23 8 93.75 72.70 0.96 263 +

0.9 0.89 15 151 90.63 75.92 0.97 279 +

0.1 0.94 1 105 56.25 62.35 0.47 22 +

0.3 0.91 4 125 84.38 74.59 0.76 90 +

0.5 0.85 7 243 81.25 67.15 0.92 225 +

0.7 0.86 15 293 90.63 73.21 0.97 276 +

0.9 0.87 28 145 90.63 70.08 0.97 265 +

0.1 0.95 1 223 62.5 62.13 0.47 30 +

0.3 0.87 6 110 81.25 75.94 0.80 110 +

0.5 0.82 9 104 78.13 72.12 0.94 241 +

0.7 0.85 21 90 93.75 68.99 0.96 264 +

0.9 0.90 12 279 90.63 72.77 0.96 269 +

0.1 0.95 2 213 75 65.99 0.50 32 +

0.3 0.89 6 95 87.5 72.99 0.75 86 +

0.5 0.81 11 289 81.25 74.63 0.95 238 +

0.7 0.85 16 280 90.63 72.70 0.95 252 +

0.9 0.91 22 30 93.75 73.72 0.94 270 +

0.1 0.95 2 142 84.38 75.23 0.46 27 +

0.3 0.91 4 157 84.38 73.81 0.79 110 +

0.5 0.84 17 134 96.88 73.08 0.96 256 +

0.7 0.84 18 153 90.63 73.49 0.98 282 +

0.9 0.89 16 201 90.63 72.10 0.94 275 +

0.1 0.92 4 151 78.13 71.97 0.59 21 ++

0.3 0.92 4 188 87.5 74.83 0.75 110 ++

0.5 0.86 8 212 84.38 72.21 0.88 94 ++

0.7 0.83 17 186 87.5 71.37 0.96 88 ++

0.9 0.93 23 184 96.88 72.83 0.95 104 ++

0.1 0.91 4 227 71.88 64.88 0.64 28 ++

0.3 0.89 4 168 78.13 68.99 0.79 96 ++

0.5 0.87 6 283 84.38 72.01 0.91 123 ++

0.7 0.92 12 270 96.88 75.12 0.96 117 ++

0.9 0.92 13 254 93.75 72.90 0.97 129 ++

0.1 0.94 3 252 81.25 69.17 0.59 40 ++

0.3 0.90 4 223 81.25 72.21 0.74 94 ++

0.5 0.86 9 300 87.5 73.90 0.90 156 ++

0.7 0.86 14 261 90.63 71.71 0.97 139 ++

0.9 0.91 18 110 93.75 70.97 0.95 122 ++

0.1 0.92 4 216 81.25 68.90 0.59 20 ++

0.3 0.89 4 260 78.13 68.26 0.78 105 ++

0.5 0.89 7 280 90.63 70.81 0.90 140 ++

0.7 0.89 17 135 96.88 73.63 0.93 141 ++

0.9 0.92 13 201 93.75 73.40 0.94 112 ++

0.1 0.91 4 274 65.63 68.37 0.63 26 ++

0.3 0.92 4 278 87.5 66.26 0.77 97 ++

0.5 0.89 8 282 90.63 69.01 0.93 136 ++

0.7 0.86 19 106 93.75 73.59 0.97 126 ++

0.9 0.91 21 107 93.75 72.32 0.96 139 ++

0.1 0.91 3 275 56.25 66.37 0.60 24 ++

0.3 0.88 6 138 84.38 69.81 0.75 80 ++

0.5 0.87 12 91 93.75 74.83 0.89 109 ++

0.7 0.89 17 250 96.88 77.05 0.94 115 ++

0.9 0.90 25 14 93.75 71.81 0.95 134 ++

0.1 0.94 2 276 71.88 75.74 0.64 33 ++

0.3 0.91 3 248 81.25 66.08 0.75 77 ++

0.5 0.87 8 279 87.5 73.54 0.93 155 ++

0.7 0.87 12 285 90.63 72.92 0.95 131 ++

0.9 0.92 13 274 93.75 71.52 0.93 134 ++

0.1 0.96 1 206 78.13 64.13 0.38 199 -

0.3 0.93 3 218 87.5 76.56 0.71 276 -

0.5 0.90 3 122 84.38 64.84 0.93 283 -

0.7 0.92 7 178 93.75 67.26 0.99 270 -

0.9 0.90 8 130 90.63 73.32 0.99 272 -

0.1 0.96 1 125 75 66.26 0.39 247 -

0.3 0.93 2 107 84.38 73.92 0.75 287 -

0.5 0.90 4 82 87.5 68.97 0.95 286 -

0.7 0.89 5 89 87.5 70.08 0.94 282 -

0.9 0.90 9 132 90.63 75.92 0.99 274 -

0.1 0.96 1 282 75 65.35 0.38 239 -

0.3 0.92 1 55 78.13 64.93 0.73 293 -

0.5 0.90 6 88 90.63 69.72 0.95 285 -

0.7 0.93 6 111 93.75 66.47 0.95 290 -

0.9 0.91 4 257 90.63 66.06 0.92 276 -

0.1 0.96 1 274 78.13 65.15 0.39 247 -

0.3 0.93 2 158 84.38 72.72 0.71 284 -

0.5 0.90 3 146 84.38 70.77 0.97 290 -

0.7 0.90 6 64 90.63 75.62 0.98 276 -

0.9 0.95 15 138 96.88 68.19 0.98 264 -

0.1 0.96 1 286 75 67.08 0.38 251 -

0.3 0.93 2 208 84.38 78.05 0.77 289 -

0.5 0.90 4 149 87.5 69.19 0.97 289 -

0.7 0.90 6 57 90.63 69.99 1.00 286 -

0.9 0.96 10 42 96.88 70.99 0.95 267 -

17 0.1 0.9 [0.2 0.01]

5000

20000

100000

1000000

15 0.9 0.1 [0.2 0.01]

5

50

500

16 0.7 0.3 [0.2 0.01]

50

500

5000

20000

5000

20000

100000

1000000

5

5

50

500



vii



MeanAUC

crossvalidation

%

contraction

No. of same

postition of

baricenter

Plot

0.1 0.96 1 265 78.13 65.86 0.42 187 -

0.3 0.92 1 83 78.13 66.95 0.76 276 -

0.5 0.91 3 56 87.5 69.17 0.89 264 -

0.7 0.90 11 106 93.75 70.99 0.99 252 -

0.9 0.91 6 34 90.63 68.79 0.94 252 -

0.1 0.96 1 185 78.13 64.06 0.40 185 -

0.3 0.93 3 126 87.5 64.22 0.73 278 -

0.5 0.91 2 53 84.38 72.61 0.92 273 -

0.7 0.92 12 18 96.88 74.70 0.95 246 -

0.9 0.91 6 114 90.63 73.63 0.96 242 -

0.1 0.95 2 199 81.25 73.81 0.39 166 -

0.3 0.93 2 58 84.38 73.99 0.76 280 -

0.5 0.91 3 107 87.5 63.24 0.95 258 -

0.7 0.89 4 111 87.5 65.06 0.98 258 -

0.9 0.93 8 145 93.75 71.81 0.94 231 -

0.1 0.96 1 197 78.13 69.08 0.41 185 -

0.3 0.92 1 92 78.13 67.77 0.74 257 -

0.5 0.90 4 89 87.5 64.17 0.95 277 -

0.7 0.89 4 38 87.5 70.12 0.94 273 -

0.9 0.92 12 86 93.75 76.85 0.98 225 -

0.1 0.95 1 225 68.75 69.70 0.39 176 -

0.3 0.93 2 81 84.38 75.74 0.71 258 -

0.5 0.93 3 77 90.63 67.17 0.92 266 -

0.7 0.93 6 92 93.75 71.63 0.95 251 -

0.9 0.95 15 97 96.88 67.49 0.99 225 -

18 0.3 0.7 [0.2 0.01]

5000

20000

5

50

500


Table A.6: Results of the study the parameters: thres_c and thres_v – Test 18

viii



MeanAUC

crossvalidation

%

contraction

No. of same

postition of

baricenter

Plot

0.1 0.95 2 497 0.66 0.57 0.43 81 -

0.3 0.91 2 392 0.75 0.57 0.62 219 -

0.5 0.84 4 283 0.71 0.57 0.86 326 -

0.7 0.82 3 437 0.75 0.57 0.98 175 -

0.9 0.78 9 423 0.76 0.58 0.99 133 -

0.1 0.93 4 460 0.61 0.59 0.43 138 -

0.3 0.91 2 380 0.73 0.58 0.66 267 -

0.5 0.85 3 422 0.72 0.55 0.88 283 -

0.7 0.81 8 407 0.75 0.60 0.97 184 -

0.9 0.76 8 411 0.74 0.61 0.99 132 -

0.1 0.95 3 475 0.70 0.57 0.42 95 -

0.3 0.90 4 347 0.74 0.59 0.64 252 -

0.5 0.87 3 369 0.76 0.59 0.85 295 -

0.7 0.81 8 381 0.75 0.60 0.97 177 -

0.9 0.77 2 452 0.75 0.56 0.98 138 -

0.1 0.93 4 434 0.58 0.63 0.42 96 -

0.3 0.90 3 350 0.72 0.58 0.63 242 -

0.5 0.87 2 366 0.75 0.58 0.85 301 -

0.7 0.80 3 478 0.72 0.56 0.98 204 -

0.9 0.74 34 318 0.74 0.66 0.97 119 -

0.1 0.96 2 482 0.73 0.56 0.44 118 -

0.3 0.90 2 449 0.71 0.57 0.65 228 -

0.5 0.84 4 456 0.71 0.73 0.89 285 -

0.7 0.80 21 275 0.78 0.60 0.98 245 -

0.9 0.76 14 377 0.74 0.61 0.99 142 -

0.1 0.93 6 499 0.69 0.63 0.43 95 -

0.3 0.90 3 348 0.73 0.57 0.66 252 -

0.5 0.85 3 346 0.73 0.56 0.89 288 -

0.7 0.80 9 355 0.74 0.63 0.98 194 -

0.9 0.76 11 403 0.74 0.60 0.99 139 -

0.1 0.95 3 476 0.69 0.59 0.42 95 -

0.3 0.90 2 455 0.71 0.56 0.63 260 -

0.5 0.87 2 328 0.75 0.58 0.84 323 -

0.7 0.82 2 460 0.75 0.55 0.97 231 -

0.9 0.76 10 405 0.74 0.61 0.99 142 -

0.1 0.90 10 463 0.66 0.57 0.52 36 +

0.3 0.85 13 492 0.74 0.60 0.81 70 +

0.5 0.73 19 489 0.61 0.59 0.97 110 +

0.7 0.70 51 486 0.74 0.59 0.99 124 +

0.9 0.71 66 94 0.73 0.59 0.99 143 +

0.1 0.87 13 478 0.59 0.63 0.49 21 +

0.3 0.82 17 498 0.69 0.61 0.81 84 +

0.5 0.73 33 317 0.72 0.62 0.97 150 +

0.7 0.69 45 337 0.70 0.61 0.99 168 +

0.9 0.74 63 478 0.76 0.56 0.99 153 +

0.1 0.88 11 490 0.54 0.53 0.48 32 +

0.3 0.81 20 487 0.74 0.63 0.81 116 +

0.5 0.70 39 479 0.69 0.57 0.97 144 +

0.7 0.70 51 394 0.74 0.62 0.99 162 +

0.9 0.72 65 17 0.74 0.62 0.99 179 +

0.1 0.89 12 477 0.71 0.57 0.45 28 +

0.3 0.87 7 500 0.68 0.57 0.84 138 +

0.5 0.70 35 414 0.67 0.59 0.98 154 +

0.7 0.71 35 458 0.69 0.56 0.98 164 +

0.9 0.72 58 12 0.74 0.62 0.99 174 +

0.1 0.89 11 456 0.61 0.57 0.48 30 +

0.3 0.81 21 496 0.73 0.63 0.88 139 +

0.5 0.71 33 500 0.67 0.62 0.96 145 +

0.7 0.71 50 14 0.75 0.62 0.99 166 +

0.9 0.74 49 487 0.76 0.63 0.99 158 +

0.1 0.91 8 477 0.61 0.60 0.50 56 +

0.3 0.83 14 431 0.69 0.62 0.79 114 +

0.5 0.72 33 347 0.69 0.58 0.95 134 +

0.7 0.70 41 418 0.70 0.59 0.98 149 +

0.9 0.74 61 312 0.76 0.57 0.99 165 +

0.1 0.87 15 253 0.69 0.60 0.45 21 +

0.3 0.82 14 489 0.64 0.58 0.81 127 +

0.5 0.73 32 397 0.70 0.62 0.98 142 +

0.7 0.70 46 465 0.72 0.66 0.98 139 +

0.9 0.72 65 169 0.74 0.60 0.99 187 +

1000000

5

50

500

5000

20000

100000

0.2 0.01

5

50

500

5000

20000

100000

1000000

0.2 0.012 0.3 0.3


1 0.5 0.5

B Study of parameters - MIMICII database

We present the detailed results of the 5 tests used in the study of the parameters

thres_c and thres_v, wscale and α to the readmission datasets. The 5 tests used the adaptative

threshold.

Table B.1: Results of the study the parameters: thres_c and thres_v - Tests 1-2

ix



MeanAUC

crossvalidation

%

contraction

No. of same

postition of

baricenter

Plot

0.1 0.93 3 122 0.50 0.54 0.29 27 ++

0.3 0.77 27 406 0.72 0.59 0.91 286 ++

0.5 0.70 41 463 0.71 0.59 0.97 420 ++

0.7 0.71 36 373 0.70 0.58 0.99 379 ++

0.9 0.74 71 432 0.77 0.62 0.99 427 ++

0.1 0.92 5 125 0.58 0.56 0.26 36 ++

0.3 0.83 18 445 0.74 0.60 0.88 285 ++

0.5 0.72 30 491 0.68 0.59 0.96 418 ++

0.7 0.70 55 427 0.74 0.62 0.96 449 ++

0.9 0.72 69 268 0.75 0.60 0.99 438 ++

0.1 0.92 8 149 0.71 0.59 0.28 42 ++

0.3 0.78 19 237 0.59 0.60 0.87 314 ++

0.5 0.73 27 463 0.66 0.59 0.96 436 ++

0.7 0.69 55 4 0.74 0.60 0.99 424 ++

0.9 0.72 65 273 0.74 0.55 0.98 454 ++

0.1 0.94 3 124 0.58 0.58 0.31 53 ++

0.3 0.80 17 198 0.64 0.61 0.88 305 ++

0.5 0.71 33 499 0.66 0.60 0.98 437 ++

0.7 0.69 67 415 0.78 0.58 0.99 466 ++

0.9 0.73 63 343 0.75 0.62 0.99 458 ++

0.1 0.94 3 110 0.58 0.60 0.26 48 ++

0.3 0.79 23 467 0.70 0.60 0.86 299 ++

0.5 0.69 37 342 0.67 0.55 0.99 446 ++

0.7 0.68 51 18 0.71 0.61 0.98 454 ++

0.9 0.73 70 497 0.76 0.58 0.99 454 ++

0.1 0.92 7 302 0.66 0.60 0.31 38 ++

0.3 0.80 22 246 0.71 0.59 0.87 300 ++

0.5 0.70 37 382 0.68 0.55 0.97 440 ++

0.7 0.68 44 344 0.68 0.58 0.99 450 ++

0.9 0.74 74 145 0.77 0.58 0.98 467 ++

0.1 0.92 5 279 0.58 0.62 0.28 40 ++

0.3 0.86 7 259 0.67 0.58 0.64 150 ++

0.5 0.69 38 372 0.67 0.59 0.99 448 ++

0.7 0.70 58 314 0.77 0.62 0.99 449 ++

0.9 0.73 69 29 0.76 0.63 0.99 465 ++

0.1 0.94 3 156 0.58 0.51 0.23 25 +

0.3 0.76 30 219 0.73 0.60 0.94 275 +

0.5 0.71 30 488 0.66 0.62 0.99 375 +

0.7 0.68 54 306 0.71 0.60 0.98 380 +

0.9 0.72 64 382 0.74 0.65 0.99 421 +

0.1 0.93 4 130 0.61 0.58 0.25 28 +

0.3 0.86 8 193 0.68 0.64 0.69 174 +

0.5 0.73 33 425 0.71 0.62 0.97 416 +

0.7 0.73 42 400 0.75 0.62 0.99 427 +

0.9 0.72 75 204 0.76 0.58 0.99 450 +

0.1 0.93 3 144 0.51 0.58 0.25 33 +

0.3 0.77 25 361 0.68 0.58 0.91 357 +

0.5 0.71 31 419 0.66 0.61 0.98 458 +

0.7 0.69 45 205 0.70 0.59 0.98 455 +

0.9 0.72 64 142 0.74 0.60 0.99 464 +

0.1 0.93 5 241 0.64 0.62 0.29 41 +

0.3 0.77 25 499 0.68 0.59 0.88 312 +

0.5 0.71 32 297 0.65 0.60 0.97 450 +

0.7 0.68 67 292 0.76 0.65 0.99 453 +

0.9 0.74 60 451 0.77 0.63 0.99 452 +

0.1 0.94 3 308 0.62 0.59 0.25 29 +

0.3 0.86 7 225 0.65 0.63 0.75 199 +

0.5 0.69 41 425 0.69 0.64 0.97 423 +

0.7 0.69 51 123 0.72 0.63 0.99 463 +

0.9 0.74 63 113 0.76 0.60 0.99 459 +

0.1 0.93 4 325 0.59 0.60 0.27 45 +

0.3 0.85 11 180 0.70 0.62 0.64 151 +

0.5 0.69 37 376 0.66 0.58 0.98 454 +

0.7 0.71 44 242 0.73 0.59 0.98 446 +

0.9 0.73 64 86 0.75 0.60 0.99 455 +

0.1 0.93 3 183 0.51 0.55 0.28 36 +

0.3 0.89 4 166 0.69 0.59 0.69 143 +

0.5 0.72 40 328 0.75 0.61 0.97 443 +

0.7 0.69 38 479 0.68 0.59 0.98 459 +

0.9 0.73 68 66 0.76 0.60 0.99 441 +

0.1 0.86 16 469 0.66 0.61 0.52 22 +

0.3 0.84 14 490 0.70 0.57 0.77 83 +

0.5 0.72 32 478 0.68 0.65 0.96 110 +

0.7 0.70 47 401 0.72 0.63 0.99 127 +

0.9 0.72 69 16 0.75 0.57 0.99 205 +

0.1 0.88 12 479 0.57 0.58 0.47 25 +

0.3 0.82 16 493 0.69 0.59 0.79 92 +

0.5 0.73 27 488 0.67 0.59 0.96 156 +

0.7 0.71 43 157 0.72 0.62 0.99 163 +

0.9 0.75 63 258 0.78 0.61 0.98 173 +

0.1 0.84 17 281 0.56 0.61 0.41 12 +

0.3 0.85 14 463 0.76 0.60 0.82 126 +

0.5 0.74 27 486 0.69 0.60 0.97 134 +

0.7 0.70 41 489 0.71 0.61 0.98 177 +

0.9 0.75 60 307 0.77 0.62 0.99 174 +

0.1 0.85 16 485 0.60 0.53 0.42 15 +

0.3 0.84 14 499 0.71 0.65 0.80 107 +

0.5 0.76 24 488 0.70 0.60 0.97 133 +

0.7 0.71 54 401 0.77 0.59 0.99 192 +

0.9 0.71 76 42 0.74 0.57 0.99 205 +

0.1 0.91 9 489 0.69 0.60 0.47 50 +

0.3 0.83 10 477 0.62 0.63 0.80 115 +

0.5 0.74 26 498 0.68 0.63 0.97 154 +

0.7 0.73 52 369 0.78 0.64 0.98 159 +

0.9 0.74 60 385 0.76 0.61 0.99 171 +

0.1 0.86 13 479 0.52 0.61 0.51 36 +

0.3 0.86 10 481 0.70 0.60 0.80 140 +

0.5 0.72 31 365 0.68 0.59 0.98 140 +

0.7 0.70 51 431 0.74 0.60 0.99 177 +

0.9 0.73 42 163 0.74 0.61 0.98 195 +

0.1 0.90 7 429 0.52 0.59 0.49 33 +

0.3 0.84 11 495 0.65 0.58 0.77 128 +

0.5 0.71 36 474 0.68 0.58 0.97 149 +

0.7 0.71 38 460 0.71 0.59 0.98 160 +

0.9 0.71 61 459 0.73 0.57 0.99 165 +

500

5000

20000

100000

1000000

5000

20000

100000

1000000

5 0.7 0.3 0.2 0.01

5

50

20000

100000

1000000

4 0.9 0.1 0.2 0.01

5

50

500


3 0.1 0.1 0.2 0.01

5

50

500

5000

Table B.2: Results of the study the parameters: thres_c and thres_v - Tests 3-5

Date post:	20-Aug-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Binary Fish School Search applied to Feature SelectionE7%E3o.pdf · of them are able to guarantee a...

Documents