ORIGINAL PAPER
Multi-objective approach based on grammar-guided geneticprogramming for solving multiple instance problems
Amelia Zafra • Sebastian Ventura
Published online: 8 December 2011
� Springer-Verlag 2011
Abstract Multiple instance learning (MIL) is considered
a generalization of traditional supervised learning which
deals with uncertainty in the information. Together with
the fact that, as in any other learning framework, the
classifier performance evaluation maintains a trade-off
relationship between different conflicting objectives, this
makes the classification task less straightforward. This
paper introduces a multi-objective proposal that works in a
MIL scenario to obtain well-distributed Pareto solutions to
multi-instance problems. The algorithm developed, Multi-
Objective Grammar Guided Genetic Programming for
Multiple Instances (MOG3P-MI), is based on grammar-
guided genetic programming, which is a robust tool for
classification. Thus, this proposal combines the advantages
of the grammar-guided genetic programming with benefits
provided by multi-objective approaches. First, a study of
multi-objective optimization for MIL is carried out. To do
this, three different extensions of MOG3P-MI are designed
and implemented and their performance is compared. This
study allows us on the one hand, to check the performance
of multi-objective techniques in this learning paradigm and
on the other hand, to determine the most appropriate evo-
lutionary process for MOG3P-MI. Then, MOG3P-MI is
compared with some of the most significant proposals
developed throughout the years in MIL. Computational
experiments show that MOG3P-MI often obtains consis-
tently better results than the other algorithms, achieving the
most accurate models. Moreover, the classifiers obtained
are very comprehensible.
Keywords Multiple instance learning � Multiple
objective learning � Grammar guided genetic
programming � Evolutionary rule learning
1 Introduction
Multiple instance learning (MIL), introduced by Dietterich
et al. (1997), is considered a special learning framework
which deals with the uncertainty of instance labels. This
learning framework extends traditional supervised learning
to problems with incomplete knowledge about the labels in
the training examples. Thus, in supervised learning, every
training instance is assigned a label that is discrete or has a
real-value. In comparison, in MIL, the labels are assigned
only to bags of instances and the labels for individual
instances are unknown. This learning framework has
expanded enormously in the last few years due to the great
number of applications for which it has been found to be
the most appropriate form of representation, achieving an
improvement over results obtained by traditional super-
vised learning. Some examples are: text categorization
(Andrews et al. 2002), content-based image retrieval (Pao
et al. 2008), image annotation (Yang et al. 2005), drug
activity prediction (Maron 1997), web index page recom-
mendation (Zhou et al. 2005; Zafra et al. 2009), semantic
video retrieval (Chen et al. 2009), video concept detection
(Gu et al. 2008; Gao 2008), pedestrian detection (Pang
et al. 2008), and the prediction of student performance
(Zafra et al. 2011). All these applications are based on
classification tasks. The problem of evaluating the quality
of a classifier, whether from the MIL perspective or in
A. Zafra � S. Ventura (&)
Department of Computer Science and Numerical Analysis,
University of Cordoba, Cordoba, Spain
e-mail: [email protected]
A. Zafra
e-mail: [email protected]
123
Soft Comput (2012) 16:955–977
DOI 10.1007/s00500-011-0794-0
traditional supervised learning, is naturally posed in multi-
objective problems with several contradictory objectives
that maintain a trade-off between them. That is, they
necessitate considering several conflicting objectives to be
optimized simultaneously and when one of them is opti-
mized the others reduce their value. In this context, it is
interesting to obtain the different non-dominated solutions
where the several objectives are more optimized at the
same time. In general, the different classification algo-
rithms usually attempt to optimize some validity measures
such as accuracy, comprehensibility, sensitivity, and
specificity.
The great variety of real-world decision problems that
necessitate considering several conflicting objectives to be
optimized simultaneously has made multi-objective opti-
mization very popular and there exists a large body of
literature devoted to multi-objective optimization for tra-
ditional supervised learning where nowadays still has very
recognition (Wu et al. 2011; Liao et al. 2011; Yang et al.
2011). However, we are unaware of multi-objective tech-
niques for classification MIL setting. All proposals to solve
this problem from an MIL perspective fail to take into
account the multi-objective problem and they obtain only
one optimal solution combining the different objectives to
obtain a high quality classifier. The main problem of these
techniques is that they obtain a single well-defined optimal
solution but these problems do not have a single solution:
rather, they have a set of non-comparable ones. These
solutions are known as efficient, nondominated or Pareto
optimal solutions, and the set of them is called the Pareto
Optimal Front (POF). None of these solutions can be
considered better than any other in the same set from the
point of view of the simultaneous optimization of multiple
objectives, all of them perform in the same way to solve the
problem. Therefore, in these problems, the major challenge
consists of generating the set of efficient solutions to
achieve an improvement of some objectives without sac-
rificing any other objective.
Our aim is to introduce a multiple objective proposal
which simultaneously optimizes different contradictory
objectives and provides the non-dominated solution set for
each problem. To do that, a multi-objective grammar gui-
ded genetic programming algorithm for MIL is developed
and it is called Multi-Objective Grammar-Guided Genetic
Programming for Multiple Instance Learning (MOG3P-
MI). The proposal is based on Grammar Guided Genetic
Programming (G3P) that generates rules in IF–THEN form
and evaluates the quality of each classifier according to two
conflicting quality indexes: sensitivity and specificity.
First, to evaluate the performance of the proposed
approach, we consider three different versions of MOG3P-
MI algorithm modifying the philosophy of its evolutionary
process. To do that, three different approaches based on
classic evolutionary multi-objective techniques frequently
used in literature in traditional supervised learning are
adapted and designed to work in the MIL framework. This
empirical study allows us to analyze the performance of
multi-objective techniques in MIL. Then, MOG3P-MI is
compared with the most representative techniques in MIL
designed over the years (specifically, 16 different proposals
are considered in this study). Computational experiments
show that our proposal is a robust algorithm which
achieves better results than the other previous techniques,
finding a balance between sensitivity and specificity and
yielding the most accurate models.
This paper is organized as follows. In Sect. 2, an over-
view of multi-instance and multi-objective learning
frameworks is given. Section 3 presents a description of the
approaches proposed. In Sect. 4, experiments and com-
parisons are carried out. Finally, conclusions and some
possible lines for future research are discussed in Sect. 5.
2 Preliminaries
This section gives the most important definitions in mul-
tiple objective optimization, developments carried out in
the framework of multiple instances learning, and the most
relevant studies in classification tasks using evolutionary
multi-objective techniques.
2.1 Multiple objective learning
The basic concepts related to multi-objective learning will
be introduced. Consider a multi-objective optimization
problem with p objectives where X denotes the finite set of
feasible solutions. We assume that each criterion has to be
maximized.
maximize ff1ðxÞ; f2ðxÞ; . . .; fpðxÞgsubject to x 2 X
�
We assume
• there are n decision variables x ¼ ðx1; x2; . . .; xnÞT ;• X � <n is the set of feasible solutions, i.e., decision
variables, defined by constraints,
• fi : X ! < are p conflicting objective functions,
• f ðxÞ ¼ ðf1ðxÞ; f2ðxÞ; . . .; fpðxÞÞT 2 <p is called an objec-
tive vector,
• Z ¼ f ðXÞ is called the criterion space, and
• z 2 Z are called criterion vectors.
Since, in general, a feasible solution that simultaneously
maximizes all the objective functions does not exist, the
concept of efficient or Pareto optimal solution is used in
this context. A feasible solution x� 2 X (and the corre-
sponding objective vector f ðx�Þ) is said to be efficient or
956 A. Zafra, S. Ventura
123
Pareto optimal for the problem if there does not exist any
other feasible solution x 2 X; such that fiðxÞ� fiðx�Þ for all
i ¼ 1; . . .; p with at least one j 2 1; . . .; p such that
fjðxÞ[ fjðx�Þ:The image of x�; z� ¼ f ðx�Þ; is said to be a non-domi-
nated vector. The set of all efficient solutions of the
problem is called the efficient set, and correspondingly, the
set of all the non-dominated vectors is called the non-
dominated set or Pareto Front. A feasible solution for the
problem ðx�Þ is said to be weakly efficient or weakly Pareto
Optimal if there does not exist any other feasible solution x
such that f ðxÞ[ f ðx�Þ: In this case, z� is said to be a weakly
non-dominated vector. Finally, a feasible solution x� for the
problem is said to be properly efficient or properly Pareto
optimal if it is efficient and the trade-offs between objec-
tives are bounded.
Also, the concepts of ideal and nadir values are relevant
in MOP. The ideal point is defined as z�i ¼ maxffiðxÞ j x 2Xg: It contains the best values that each objective function
can get in the POF. On the other hand, the worst values of
each objective function in the POF constitute a vector
known as a nadir point znad 2 <p with components
z�i ¼ min fiðxÞ j x 2 POF:
2.2 Learning techniques for multiple instance learning
The learning paradigm of multi-instance learning has found
great popularity in the machine learning (ML) community
and many proposals for classification (Dietterich et al.
1997), clustering (Zhang 2009) and regression (Ray 2001)
tasks have been studied. All of them optimize a combina-
tion of different objectives. Therefore, they do not obtain a
Pareto Front with the optimization of the different objec-
tives, rather they obtain a single solution which optimizes
composite objectives.
In MIL, classically the algorithms have been classified
into two categories. First, algorithms designed specifi-
cally to work in an MIL environment. In this category,
we find APR algorithms (Dietterich et al. 1997): these
algorithms attempt to find an APR by expanding or
shrinking a hyper-rectangle in the instance feature space
to maximize the number of instances from different
positive bags enclosed by the rectangle while minimizing
the number of instances from negative bags within the
rectangle. A bag was classified as positive if at least one
of its instances fell within the APR; otherwise, the bag
was classified as negative. Maron and Lozano (1997)
whose proposal came very soon after Dietterich’s,
developed a method called Diverse Density that is still
one of the most popular today. The interest this proposal
held can still be seen in how it is used combined with
other methods, e.g., with the expectation maximization
method, EM-DD (Wang 2000), and even recently there
are studies that propose extensions of it (Pao et al. 2008;
Xu et al. 2008).
The other category considers algorithms adapted from
the traditional supervised learning paradigm to perform
correctly in MIL. Thus, we can find that the most relevant
techniques in traditional supervised learning using single
instances have been adapted to MIL. In the following, we
present a review of the main methods developed. In the
lazy learning context, Wang and Zucker (2000) proposed
two alternatives, Citation-KNN and Bayesian-KNN, which
extend the k nearest-neighbour algorithm (KNN) to the
MIL context. With respect to decision trees and learning
rules, Chevaleyre and Zucker (2001) implemented ID3-MI
and RIPPER-MI, which are multi-instance versions of the
decision tree algorithm ID3 and rule learning algorithm
RIPPER, respectively. At that time, Ruffo (2000) presented
a multi-instance version of the C4.5 decision tree, which
was called RELIC. Later, Zhou et al. (2005) presented the
Fretcit-KNN algorithm, which is a variant of the Citation-
KNN. We can also find contributions which extend stan-
dard neural networks to MIL. Ramon and Raedt (2000)
presented a neural network framework for MIL. Following
their work, others have appeared, improving or extending
it. For instance, Zhou and Tang (2002) proposed BP-MIP, a
neural network-based multi-instance learner derived from
the traditional backpropagation neural network. An exten-
sion of this method adopting two different feature selection
techniques was proposed by Zhang and Zhou (2004). Other
interesting proposals of this paradigm are the works of
Zhang and Zhou (2006) and Chai and Yang (2007). The
support vector machine (SVM) has been another approach
adapted to the MIL framework. There are numerous pro-
posals for this approach: such as Gartner et al. (2002) and
Andrews et al. (2002), which adapted kernel methods to
work with MIL data by modifying the kernel distance
measures to handle sets. On the other hand, other works as
Chen and Wang (2004) and Chen et al. (2006) adapted
SVMs by modifying the form of the data rather than
changing the underlying SVM algorithms. Recently, other
works that use this approach include those of Mangasarian
and Wild (2008), Gu et al. (2008), and Lu et al. (2008).
The use of ensembles also has been considered in this
learning, the works of Xu and Frank (2004), Zhang and
Zhou (2005), and Zhou and Zhang (2007) are examples of
this paradigm. Finally, we can find multi-instance evolu-
tionary algorithms which adapt grammar-guided genetic
programming to this scenario (Zafra 2010), but from a
mono-objective perspective.
2.3 Classification system optimization
Multi-objective strategies are especially appropriate for
classification tasks because different measures taken to
Multi-objective approach based on G3P for MIL 957
123
evaluate the quality of the solutions are related, so if the
value of any of them is maximized, the value of the others
can be significantly reduced (Veldhuizen 2000). These
measurements involve the search for classification systems
that represent understandable, interesting and simple
descriptions of a specified class, even when that class has
few representative cases in the data. Multi-objective
metaheuristics can be used to produce different trade-offs
between confidence, coverage and complexity in the results
of classifiers.
A great variety of different methods have been devel-
oped for multi-objective optimization during the last few
years. Among them the multi-objective evolutionary
algorithms (MOEAs) have been a widely used approach to
confronting these types of problem (Coello et al. 2007).
The use of evolutionary algorithms (EAs) is mainly moti-
vated by the population-based nature of EAs which allow
the generation of several elements in the Pareto Optimal
Set in a single run. That is why, it is easy to find examples
of multi-objective techniques for evolutionary computation
on classification topics where significant advances in
results have been achieved (Shukla 2007; Tsang et al.
2007; Wee et al. 2009; Tan et al. 2009). Specifically, if we
evaluate its use in Genetic Programming (GP), we find that
its transfer to the multi-objective domain provides better
solutions than those obtained using standard GP in the
mono-objective domain, and with a lower computational
cost (Parrott et al. 2005; Mugambi 2003; Dehuri 2008).
3 Multi-objective grammar guided genetic
programming for MIL
G3P (Whigha 1995) is an extension of traditional GP
systems. G3P facilitates the efficient automatic discovery
of empirical laws providing a more systematic way to
handle typing. Moreover, the use of a context-free
grammar establishes a formal definition of the syntactical
restrictions. The motivation to include this paradigm is
on the one hand, the variable length solution represen-
tation which introduces a high flexibility in the solution
representation and on the other hand, the high efficiency
achieved both in obtaining classification rules with low
error rates, and in other tasks related to prediction, such
as feature selection and the generation of discriminant
functions (Chien et al. 2002; Kishore et al. 2000). With
respect to multi-objective strategies, the main motivation
to include them is, as indicated in Sect. 2.3, that they are
especially appropriate for classification tasks where it
is necessary to optimize two or more conflicting objec-
tives subject to certain constraints and also because
their use with GP has improved those obtained using
standard GP.
The most relevant aspects which have been taken into
account in the design of our proposal, such as individual
representation, genetic operators, and fitness function are
explained in detail next. Moreover, we describe the three
different evolutionary approaches that have been consid-
ered in our study resulting three different extensions called
MOG3P-MIðv1Þ; MOG3P-MIðv2Þ and MOG3P-MIðv3Þ: These
approaches follow the philosophy of well-known multi-
objective algorithms in traditional supervised learning.
Thus, the first one follows the evolutionary strategy of the
Strength Pareto Evolutionary Algorithm (SPEA2) (Zitzler
et al. 2001), the second one is based on the Non-dominated
Sorting Genetic Algorithm (NSGA2) (Deb et al. 2000) and
the third one on Multi-objective genetic local search
(MOGLS) (Jaszkiewicz 2003).
3.1 Individual representation
In our system, an individual is expressed by information in
the form of IF–THEN classification rules which provide a
natural extension to knowledge representation. The IF part
of the rule (antecedent) contains a logical combination of
conditions about the values of predicting attributes, and the
THEN part (consequent) contains the predicted class for
the concepts that satisfy the antecedent of the rule. This
rule determines if a bag should be considered positive (i.e.,
if it is an instance of the concept we want to represent) or
negative (if it is not). The scheme of the rules is seen in
expression 1.
If ðcondBðmÞÞ thenthe bag, m; represents the concept
that we want to learn.
Elsethe bag, m; does not represent the
concept that we want to learn:End-If
ð1Þ
where condB is a condition that is applied to the bag.
Following the Dietterich et al. (1997) hypothesis, condB
can be expressed as Eq. 2 where m is an example or bag, mi
is the ith instance of the bag m;V is the feature vectors of m
and VðmiÞ the feature vector of instance mi:
condBðmÞ ¼1; if 9VðmiÞ 2 m j condIðVðmiÞÞ ¼ 1
0; otherwise
�ð2Þ
The individuals’ genotypes that evolve in our system
correspond to the condition applied to instances (i.e.,
condI), while the phenotype represents the entire rule that
is applied to the bags (i.e., expression 1). The conditions of
the rule applied to instances ðcondIÞ are described
according to the language specified by a context-free
grammar ðGÞ which establishes a formal definition of the
syntactical restrictions of the problem to be solved and
958 A. Zafra, S. Ventura
123
allows of expressing the rule in a very natural way. The
grammar is defined by means of the next four elements:
G ¼X
N;X
T ; S;P� �
whereP
N \P
T ¼ ø;P
N is the alphabet of non terminal
symbols,P
T is the alphabet of terminal symbols, S is the
axiom and P is the set of productions written in Backus–Naur
form (BNF) (Knut 1964). Figure 1 shows the grammar used
to represent the antecedent of our rules which correspond to
the specification of condI and Fig. 2 shows the representation
of the genotype of an individual in our system.
3.2 Genetic operators
The process of generating new individuals in a given
generation of the evolutionary algorithm is carried out by
two operators, crossover and mutator, based on the pro-
posal of Whigham (1996). In this section, we briefly
describe their functioning. A interesting work about genetic
operators in multi-objective evolutionary optimization can
be consulted in Qian et al. (2011).
3.2.1 Crossover operator
This operator creates new rules by mixing the contents of
two parent rules. To do so, a non-terminal symbol is chosen
with a specific probability from among the available non-
terminal symbols in the grammar and two subtrees (one
from each parent) are selected whose roots coincide with
the same symbol selected or with a compatible symbol. To
reduce bloating, if either of the two offspring is too large, it
will be replaced by one of its parents. Specifically, the
crossover operator, given two individuals ðtree1 and tree2Þ;consists of the steps specified in Listing 1. The main fea-
tures of this operator are:
Fig. 1 Grammar used for representing individuals’ genotypes
Fig. 2 Representation of individual’s genotype
Multi-objective approach based on G3P for MIL 959
123
1. It avoids excessive growth in the size of the derivation
trees representing individuals (Couchet et al. 2006), a
problem well known as code bloat (Panait 2004).
2. It provides an appropriate balance between the ability of
exploration and exploitation of search space, preserving
the context in which subtrees appear in the parent trees.
3. It uses a list with the symbols which can be selected
and their probabilities can be specified. Thus, we can
increase or decrease the probability of modifying
certain symbols by other ones.
3.2.2 Mutation operator
This operator selects with a specific probability a symbol to
carry out the operation. Then, a node is chosen randomly
containing that symbol in the tree where the mutation is to
take place. If the node is a terminal node, it will be replaced
by another compatible terminal symbol. More precisely,
two nodes are compatible if they are derivations of
the same non-terminal. When the selected node is a non-
terminal symbol, the grammar is used to derive a new
960 A. Zafra, S. Ventura
123
subtree which replaces the subtree underneath that node. If
the new offspring is too large, it will be eliminated to avoid
having invalid individuals. Specifically, the mutation
operator given an individual ðtreeÞ performs the steps
specified in Listing 2.
The main features of this operator are:
1. It avoids an excessive growth in the size of the derivation
trees representing individuals (Couchet et al. 2006), a
problem well known as code bloat (Panait 2004).
2. It modifies individuals by changing simple compatible
terminal symbols between them. This allows us to
make small modifications when the population has
already converged.
3. It uses a list with the symbols which can be selected
and their probabilities can be specified. Thus, we can
increase or decrease the probability of modifying
certain symbols by other ones.
3.3 Fitness function
The fitness function is a measure of the effectiveness of the
classifier. There are several measures to evaluate different
components of the classifier and determine the quality of
each rule. We consider two widely accepted parameters for
characterizing models in classification problems: sensitiv-
ity (Se) and specificity (Sp) (Bojarczuk et al. 2000; Tan
et al. 2002). Sensitivity is the proportion of cases correctly
identified as meeting a certain condition and specificity is
the proportion of cases correctly identified as not meeting a
certain condition. Both are specified as follows:
sensitivity ¼ tptp þ fn
tp number of positive bags correctly identified:fn number of positive bags not correctly identified:
�
Multi-objective approach based on G3P for MIL 961
123
specificity ¼ tntn þ fp
tn number of negative bags correctly identified:fp number of negative bags not correctly identified:
�
We look for rules that maximize both sensitivity and
specificity at the same time. Nevertheless, there exists a
well-known trade-off between these two parameters
because they evaluate different and conflicting
characteristics in the classification process. Sensitivity
alone does not tell us how well the test predicts other
classes (i.e., the negative cases) and specificity alone does
not clarify how well the test recognizes positive cases. We
need to know both the sensitivity of the test for the class
and specificity for the other classes. In binary classification,
the trade-off between two measurements occurs because
when we increase the sensitivity value, the rule improves
the correct classification of positive objects, and it
normally produces a decrease in specificity because more
positive examples are accepted at the expense of also
accepting objects that belong to another class (negative
class). There is a similar scenario when we increase
specificity values at the expense of sensitivity values.
3.4 Evolutionary algorithms
Three different proposals are implemented according to the
previous specifications. With respect to the evolutionary
process, the idea is to extend three of the most widely used
and compared multi-objective evolutionary algorithms in
traditional supervised learning to MIL. A brief description
of the evolutionary process is described for each approach
considered. The algorithm finished when the number of
generations specified is researched or when an acceptable
solution is found. In our case, a solution is considered as
acceptable when it classifies correctly all examples of
training set.
3.4.1 MOG3P-MIðv1Þ Algorithm
The main steps of this algorithm are based on the prin-
ciples of the well-known Strength Pareto Evolutionary
Algorithm 2 (SPEA2) (Zitzler et al. 2001). The main
characteristics are: an external set (archive) is created for
storing primarily non-dominated solutions, the use of a
fitness value which takes into account how many indi-
viduals each individual dominates and is dominated by
and which is used to decide what dominated solutions
will take part in the population when the number of non-
dominated solutions is smaller than the archive size and a
truncation operator to reduce the number of non-domi-
nated solutions when this number is higher than the
population size. This operator considers that the smallest
distance to the other solutions will be removed from the
set to produce greater diversity. The general outline of
this algorithm is shown in Listing 3.
962 A. Zafra, S. Ventura
123
3.4.2 MOG3P-MIðv2Þ Algorithm
The main steps of this algorithm are based on the well-
known Non-Dominated Sorting Genetic Algorithm
(NSGA2) (Deb et al. 2000). The main differences between
the previous algorithm and this one is the elitism-preser-
vation operation. MOG3P-MIðv2Þ uses dominance ranking
to classify the population into a number of layers, so that
the first layer is the best layer in the population. The
archive is created based on the order of ranking layers: the
best rank being selected first. If the number of individuals
in the archive is smaller than the population size, the next
layer will be taken into account and so on. In this case, a
truncation operator based on the crowding distance is
applied if, on adding a layer the number of individuals in
the archive exceeds the initial population size. This oper-
ator removes the individual with the smallest crowding
distance. The crowding distance of a solution is defined as
the averaged total of objective-value differences between
two adjacent solutions of that solution. To calculate this,
the population is sorted according to each objective, thus
the adjacent solutions are localized and infinite values are
assigned to boundary solutions. The general outline of this
algorithm is shown in Listing 4.
3.4.3 MOG3P-MIðv3Þ Algorithm
The main steps of this algorithm are based on the well-
known multi-objective genetic local search (MOGLS)
(Jaszkiewicz 2003). MOG3P-MIðv3Þ uses genetic local
search (GLS) and implements the idea of the simulta-
neous optimization of all weighted Tchebycheff scalar-
izing functions by random choice of the utility function
optimized in each iteration. That is, in each iteration, it
tries to improve the value of a randomly selected utility
function. A single iteration in this algorithm consists of a
single recombination of a pair of solutions and the off-
spring are then used as a starting point for a local
search. The general outline of this algorithm is shown in
Listing 5.
Multi-objective approach based on G3P for MIL 963
123
4 Experimental section
In this section, two different experiments have been per-
formed to evaluate the effectiveness of the multi-objective
methods designed in MIL scenario. The first experiment
consists of evaluating the different extensions of multi-
objective proposal developed to compare the quality of
their solutions in different multi-instance application
domains to discover the more promising approach. Then,
we select according to this study the evolutionary process
of our proposal. The second experiment compares
MOG3P-MI with some of the most significant algorithms
designed through the years in MIL. This experimental
study contemplates eight data sets and 16 other algorithms
applied on MIL. Finally, we evaluate some models
obtained by MOG3P-MI to study their comprehensibility.
This section is structured as follows. First, the applica-
tion domains are presented, followed by a brief description
of the experimental methodology. Second, the study of the
different multi-objective proposals is developed where
different relevant measurements to compare Pareto Fronts
are evaluated. Third, the comparative study of MOG3P-MI
with other previous algorithms is discussed. Finally, the
classifiers obtained by MOG3P-MI are shown.
4.1 Application domains
The data sets used in the experiments represent two
applications well known in MIL, drug activity prediction
which consists of determining whether a drug molecule
will bind strongly to a target protein (Dietterich et al. 1997)
and content-based image retrieval which consists of iden-
tifying the intended target object(s) in images (Andrews
et al. 2002; Zhang et al. 2010). Detailed information about
these data sets is summarized in Table 1. All data sets are
partitioned using tenfold stratified cross validation (Ro
1995; Wiens et al. 2008). Folds are constructed on bags, so
that every instance in a given bag appears in the same fold
and this process is repeated randomly five times. Thus,
results consider the average on the five different tenfold
964 A. Zafra, S. Ventura
123
stratified cross validation. This procedure is carried out to
guarantee the reliability of results. All these partitions of each
data set are available at http://www.uco.es/grupos/kdis/
momil to facilitate future comparisons.
4.2 Experimental setup
The designed multi-objective algorithms have been
implemented in the JCLEC software (Ventura et al. 2007)
with the main parameters as shown in Table 2. All
experiments are repeated with ten different seeds and the
average results are reported in the result tables.
The algorithms used for comparison are obtained from the
WEKA workbench (Witten 1999) and JCLEC framework
(Ventura et al. 2007) and consider some of the most signif-
icant proposals in this research field. The configuration of
these methods is based on an experimental study considering
different configurations and then selecting the best config-
uration in each case. The results of this experiment can be
consulted in http://www.uco.es/grupos/kdis/mil.
4.3 Comparison of multi-objective strategies
In this section, a comparative study between the multi-
objective proposals is performed. First, we analyse the
Pareto Front obtained by each proposal and then we eval-
uate the quality of the final classifier that would be selected
in each case.
4.3.1 Analysis of the Pareto Front quality
of multi-objective strategies
The outcome of the multi-objective algorithms used is a non-
dominated solution set (an approximation of the Pareto
Optimal Front). A sample of the Pareto Front obtained in one
execution with each method can be found in Figs. 3, 4 and 5.
An analysis of the quality of these approximation sets is
made to compare the different multi-objective techniques.
Many performance measures which evaluate different
characteristics have been proposed; some of the most pop-
ular such as spacing, hypervolume, and coverage of sets
(Coello et al. 2007) are analysed in this work and their
average results on the different data sets studied are shown in
Table 3. The spacing (Coello et al. 2007) metric describes
the spread of the non-dominated set. According to the results
shown, the non-dominated front of MOG3P-MIðv2Þ has all
solutions more equally spaced than the other algorithms
because the value is the lowest one. Another measurement
considered is the hypervolume indicator (Coello et al.
2007), defined as the area of coverage of a non-dominated
Table 1 General information about data sets
Data set ID Bags Attributes Instances Average
bag sizePos Neg Total
Musk1 Mus1 47 45 92 166 476 5.17
Musk2 Mus2 39 63 102 166 6,598 64.69
Mutagenesis-atoms MutA 125 63 188 10 1,618 8.61
Mutagenesis-bonds MutB 125 63 188 16 3,995 21.25
Mutagenesis-chains MutC 125 63 188 24 5,349 28.45
Elephant ImgE 100 100 200 230 1,391 6.96
Tiger ImgT 100 100 200 230 1,220 6.10
Fox ImgF 100 100 200 230 1,320 6.60
Table 2 Parameters used for multi-objective algorithms
Parameter MOG3P-MIðv1Þ MOG3P-MIðv2Þ MOG3P-MIðv3Þ
Population size 1,000 1,000 80
External population size 100 100 50
Temporal population size – – 50
Number of generations 150 150 5,000
Crossover probability 95% 95% 100%
Mutation probability 80% 80% –
Parent selector Tournament selector (size = 2) Tournament selector (size = 2) Random selector
(With repetition) (Without repetition)
Maximum tree depth 50 50 50
Multi-objective approach based on G3P for MIL 965
123
set with respect to the objective space. The results show that
the non-dominated solutions of MOG3P-MIðv2Þ obtain the
highest values, therefore its POF covers more area than the
other techniques’ POFs. Finally, the coverage of two sets
(Coello et al. 2007) is evaluated. This measurement can be
termed the relative coverage comparison of two sets. The
results show that MOG3P-MIðv2Þ obtains the highest values
when it is compared with the other techniques, then by
definition the outcomes of MOG3P-MIðv2Þ dominate the
outcomes of the other algorithms. Taking into account, all
the results obtained in the different metrics, it could be said
that MOG3P-MIðv2Þ achieves a better approximation of the
POF than the other techniques.
4.3.2 Analysis of solution quality of multi-objective
strategies
In this section, we analyse the average values of accuracy,
sensitivity and specificity for the different data sets. To do
this, in our proposal, the solution (classifier) of Pareto
0
0,2
0,4
0,6
0,8
1S
peci
ficity
Sensitivity
(a) Mutagenesis Atoms
0
0,2
0,4
0,6
0,8
1
Spe
cific
ity
Sensitivity
(b) Mutagenesis Bonds
0
0,2
0,4
0,6
0,8
1
Spe
cific
ity
Sensitivity
(c) Mutagenesis Chains
0
0,2
0,4
0,6
0,8
1
Spe
cific
ity
Sensitivity
(d) Elephant
0
0,2
0,4
0,6
0,8
1
Spe
cific
ity
Sensitivity
(e) Tiger
0
0,2
0,4
0,6
0,8
1
0 0,2 0,4 0,6 0,8 1 0 0,2 0,4 0,6 0,8 1 0 0,2 0,4 0,6 0,8 1
0 0,2 0,4 0,6 0,8 1 0 0,2 0,4 0,6 0,8 1 0 0,2 0,4 0,6 0,8 1
Spe
cific
ity
Sensitivity
(f) Fox
Fig. 3 Pareto Front of MOG3P-MIðv1Þ for the different learning problems
0
0,2
0,4
0,6
0,8
1
Spe
cific
ity
Sensitivity
(a) Mutagenesis Atoms
0
0,2
0,4
0,6
0,8
1
Spe
cific
ity
Sensitivity
(b) Mutagenesis Bonds
0
0,2
0,4
0,6
0,8
1
Spe
cific
ity
Sensitivity
(c) Mutagenesis Chains
0
0,2
0,4
0,6
0,8
1
Spe
cific
ity
Sensitivity
(d) Elephant
0
0,2
0,4
0,6
0,8
1
Spe
cific
ity
Sensitivity
(e) Tiger
0
0,2
0,4
0,6
0,8
1
0 0,2 0,4 0,6 0,8 1 0 0,2 0,4 0,6 0,8 1 0 0,2 0,4 0,6 0,8 1
0 0,2 0,4 0,6 0,8 1 0 0,2 0,4 0,6 0,8 1 0 0,2 0,4 0,6 0,8 1
Spe
cific
ity
Sensitivity
(f) Fox
Fig. 4 Pareto Front of MOG3P-MIðv2Þ for the different learning problems
966 A. Zafra, S. Ventura
123
Front obtained in training set with the best balance in
sensitivity and specificity measurements is selected to be
evaluated on test partition and then its results are used in
this comparison. Formally, the procedure carried out is
shown in Fig. 6.
The average results of the accuracy (Acc), sensitivity
(Se) and specificity (Sp) values for each measurement are
shown for each data set in Table 4. At first glance,
MOG3P-MIðv2Þ seems to obtain better results in the most of
data sets and the different metrics compared with the other
versions. However, to evaluate the differences between the
algorithms more precisely, a statistical test is carried out.
Specifically, Wilcoxon signed-rank test (Demsa 2006) is
used. This is a non-parametric test recommended in
Demsar (2006) and Garcia et al. (2010) which allows us to
address the question of whether there are significant dif-
ferences between the accuracy, sensitivity and specificity
values obtained by the three proposals. To do this, the null
hypothesis maintains that there are no significant differ-
ences between the values of the three metrics obtained by
the different techniques while the alternative hypothesis
asserts that there are. This test evaluates the differences in
performance comparing the proposals two by two. Thus,
three difference comparisons are made for each measure-
ment (accuracy, sensitivity and specificity). Table 5 shows
the results obtained in all possible comparisons among the
three algorithms considered in the study for the different
metrics. In all cases, we use a 95% confidence level. Thus,
we mark in bold the winner algorithm in each row when the
Wilcoxon test asserts that there are significant differences
between the results of two algorithms.
Regarding accuracy metric, the best algorithm is the
MOG3P-MIðv2Þ that obtains a considerable higher value of
ranks. This means that this version obtains a better
accuracy values in a greater number of data sets. In the
same table, it is shown the p-value that allows to reject
the null hypothesis with a confidence level of 95% and
we can determine that MOG3P-MIðv2Þ overcomes the
results in the comparison with MOG3P-MIðv1Þ and with
MOG3P-MIðv3Þ being a better proposal than the others.
0
0,2
0,4
0,6
0,8
1S
peci
ficity
Sensitivity
(a) Mutagenesis Atoms
0
0,2
0,4
0,6
0,8
1
Spe
cific
ity
Sensitivity
(b) Mutagenesis Bonds
0
0,2
0,4
0,6
0,8
1
Spe
cific
ity
Sensitivity
(c) Mutagenesis Chains
0
0,2
0,4
0,6
0,8
1
Spe
cific
ity
Sensitivity
(d) Elephant
0
0,2
0,4
0,6
0,8
1
Spe
cific
ity
Sensitivity
(e) Tiger
0
0,2
0,4
0,6
0,8
1
0 0,2 0,4 0,6 0,8 1 0 0,2 0,4 0,6 0,8 1 0 0,2 0,4 0,6 0,8 1
0 0,2 0,4 0,6 0,8 1 0 0,2 0,4 0,6 0,8 1 0 0,2 0,4 0,6 0,8 1
Spe
cific
ity
Sensitivity
(f) Fox
Fig. 5 Pareto Front of MOG3P-MIðv3Þ for the different learning problems
Table 3 Analysis of quality of POFs considering average values for all data sets studied
Algorithm Hypervolume (HV) Spacing (S) Two set coverage (CS)
MOG3P-MIðv3Þ 0.844516 0.016428 CS(MOG3P-MIðv3Þ; MOG3P-MIðv2Þ) 0.357052
CS(MOG3P-MIðv3Þ;MOG3P-MIðv1Þ) 0.430090
MOG3P-MIðv2Þ 0.890730 0.007682 CS(MOG3P-MIðv2Þ;MOG3P-MIðv3Þ) 0.722344
CS(MOG3P-MIðv2Þ;MOG3P-MIðv1Þ) 0.776600
MOG3P-MIðv1Þ 0.872553 0.012290 CS(MOG3P-MIðv1Þ;MOG3P-MIðv3Þ) 0.508293
CS(MOG3P-MIðv1Þ;MOG3P-MIðv2Þ) 0.235222
Multi-objective approach based on G3P for MIL 967
123
Moreover, MOG3P-MIðv1Þ also obtains better results than
MOG3P-MIðv3Þ in the comparison two to two.
A similar analysis can be made for the other metrics:
sensitivity and specificity. With respect to sensitivity
metric, MOG3P-MIðv2Þ obtains better results than MOG3P-
MIðv3Þ; but there are no significant differences between the
results obtained compared to MOG3P-MIðv1Þ: However,
MOG3P-MIðv2Þ achieves a higher rank sum than MOG3P-
MIðv1Þ: Finally, in the specificity values, again, MOG3P-
MIðv2Þ overcomes the other two models in the individual
comparisons.
In conclusion, MOG3P-MIðv2Þ achieves the best
approximation to Pareto Optimal Front according to any of
the measurements evaluated and the best accuracy, sensi-
tivity and specificity values in the different data sets.
Moreover, Wilcoxon test determines that its results are
better than both versions compared one to one for accu-
racy and specificity measurements and it is better than
MOG3P-MIðv3Þ for sensitivity values. Therefore, finally our
proposal, MOG3P-MI, will be based on this evolutionary
scheme.
4.4 Comparison with other proposals
In this section, we make a general comparison between
MOG3P-MI and classic techniques developed previously
in MIL. To do that, this study evaluates the most relevant
proposals described in the MIL literature. These methods
include:
• Methods based on diverse density This was proposed by
Maron and Lozano (1997). This algorithm, given a set
of positive and negative bags, tries to learn a concept
that is close to at least one instance in each positive bag,
but far from all instances in all the negative bags. The
algorithms considered in this paradigm are MIDD
(Maron 1997), MIEMDD (Zhang 2001) and MDD
(Witten 2005).
• Methods based on logistic regression These methods
are popular machine learning methods in standard
single instance learning. The algorithm considered in
this paradigm is MILR (Xu 2003) which adapts a
standard SI logistic regression to MI learning by
assuming a standard single instance logistic model at
the instance level and using its class probabilities to
compute bag-level class probabilities using the noisy-or
model employed by DD.
• Distance-based approaches The k-nearest neighbour
(k-NN) in an MIL framework was introduced by Wang
Fig. 6 Procedure to select the classifier from Pareto Front set
Table 4 Experimental results. Comparative of multi-objective algorithms
Algorithm MOG3P-MIðv1Þ MOG3P-MIðv2Þ MOG3P-MIðv3Þ
Acc Se Sp Acc Se Sp Acc Se Sp
Elephant 0.8920 0.9036 0.8568 0.9014 0.8968 0.9060 0.8732 0.8896 0.8804
Tiger 0.8702 0.8996 0.8076 0.8848 0.9104 0.8592 0.8456 0.8836 0.8408
Fox 0.7468 0.7992 0.6484 0.7484 0.8128 0.6840 0.7210 0.7936 0.6944
Musk1 0.9025 0.9456 0.8610 0.9302 0.9656 0.8958 0.9166 0.9678 0.8572
Musk2 0.8976 0.8877 0.9191 0.9077 0.8983 0.9119 0.9063 0.8827 0.9011
MutAtoms 0.8295 0.9296 0.5971 0.8276 0.9266 0.6281 0.8141 0.9225 0.6274
MutBonds 0.8549 0.9044 0.6903 0.8566 0.8987 0.7702 0.8311 0.9003 0.7541
MutChains 0.8747 0.9220 0.7253 0.8820 0.9230 0.7996 0.8494 0.9118 0.7810
968 A. Zafra, S. Ventura
123
and Zucker (2000). The main difference between the
different proposals for nearest neighbour algorithms
lies in the definition of the distance metrics used to
measure the distance between bags. Two schemes
extensively employed are the minimal Hausdorff
distance and the Kullback-Leibler distance. The algo-
rithms considered here in this paradigm are: Citation-
KNN (Wang 2000) and MIOptimalBall (Auer 2004).
• Methods based on rules These methods employ differ-
ent classic algorithms of traditional supervised learning
adapted to MIL. There are two possible ways to adapt
them: the MIWrapper (Witten 2005) method, which
assigns the class label of a bag to all its instances and
then trains a single instance algorithm on the resulting
data, and MISimple (Witten 2005) computing summary
statistics for a bag to form a single instance from it. The
algorithms considered in this paradigm are: PART(MI-
Wrapper), PART(MISimple), and combinations of
the different proposals using rule-based systems:
AdaBoost & PART (MISimple), Bagging & PART
(MIWrapper) and AdaBoost &PART (MIWrapper).
• Methods based on Support Vector Machines(SVM)
There are large number of ways to adapt this approach
to the MIL framework with results that perform well in
different applications. The experimentation here con-
siders two algorithms: the MISMO algorithm which
replaces the single-instance kernel function with a
multi-instance kernel (i.e., an appropriate similarity
function that is based on bags of instances) and the
SMO algorithm (Keerthi et al. 2001) for SVM learning
in conjunction with an MI kernel (Gartner et al. 2002)
using the procedure based on MIWrapper that has been
commented on previously to do the adaptation.
• Methods based on decision trees These methods are
inspired by AdaBoost which builds a series of weak
classifiers using a single instance learner based on
appropriately re-weighted versions of the input data,
and each instance receives its bag’s label (Xu and
Frank2004). The algorithm considered in this paradigm
is MIBoost (Witten 2005).
• Methods based on naive Bayes The method is adapted
to MIL using the procedure based on MIWrapper that
has been commented on previously. The algorithm
considered here is Naive Bayes (Witten 2005).
• Methods based on evolutionary algorithms Such a
method employs grammar guided genetic programming
adapted to MIL. The algorithm considered is G3P-MI
(Zafra 2010) which could be considered a mono-
objective version of this proposal.
Table 6 shows the average results of accuracy, sensi-
tivity and specificity for all algorithms in each data set. To
evaluate whether significant differences exist among the
algorithms more precisely, the Friedman and Iman-
Davenport tests (Garcia et al. 2009, 2010) are used. These
are nonparametric tests that compare the average ranks of
the algorithms. These ranks help us know which algorithm
obtains the best results, considering all data sets. The cal-
culation of these rankings follows the next procedure: the
algorithm with the highest accuracy in one data set is given
a rank of 1 for that data set, the algorithm with the next
highest accuracy value has the rank of 2, and so on.
To obtain more reliable results, the experimental study
is divided in algorithm groups. Thus, the first group com-
pares the results between MOG3P-MI and methods based
on diverse density, the second group carries out the com-
parison between our proposal and methods based on SVMs,
the third one uses distance-based approaches, the forth
group compares with methods based on rules and finally,
the rest of types of algorithms are compared in the fifth
group. This group includes a previous mono-objective
version of this proposal (Zafra 2010). The average ranks of
each algorithm for all data sets are shown in Table 7. These
ranks let us known which algorithm has better performance
Table 5 Wilcoxon test results for multi-objective algorithms comparison
Measurement Algorithms Rþ R� p value
Algorithm (a) Algorithm (b)
MOG3P-MI(v1) MOG3P-MI(v2) 3 33 0.0391
Accuracy MOG3P-MI(v1) MOG3P-MI(v3) 33 3 0.0391
MOG3P-MI(v2) MOG3P-MI(v3) 36 0 0.0078
MOG3P-MI(v1) MOG3P-MI(v2) 9 27 0.2000
Sensitivity MOG3P-MI(v1) MOG3P-MI(v2) 28 8 0.1953
MOG3P-MI(v2) MOG3P-MI(v3) 33 3 0.0391
MOG3P-MI(v1) MOG3P-MI(v2) 1 35 0.0156
Specificity MOG3P-MI(v1) MOG3P-MI(v3) 3 33 0.2000
MOG3P-MI(v2) MOG3P-MI(v3) 34 2 0.0234
Multi-objective approach based on G3P for MIL 969
123
Table 6 Experimental results for the different algorithms and data sets
Data sets ImgE ImgT ImgF Musk1 Musk2 MutA MutB MutC
MOG3P-MI Acc 0.9014 0.8848 0.7484 0.9302 0.9077 0.8276 0.8566 0.8820
Se 0.8968 0.9104 0.8128 0.9656 0.8983 0.9266 0.8987 0.9230
Sp 0.9060 0.8592 0.6840 0.8958 0.9119 0.6281 0.7702 0.7996
G3P-MI Acc 0.9774 0.9714 0.9642 0.8540 0.7694 0.7538 0.7514 0.7718
Se 0.9736 0.9680 0.9528 0.9550 0.6447 0.9198 0.9244 0.8923
Sp 0.9812 0.9748 0.9756 0.7500 0.8439 0.4267 0.4092 0.5329
CitationKNN Acc 0.5000 0.5000 0.5000 0.8936 0.8311 0.7451 0.7501 0.7607
Se 0.0000 0.0000 0.0000 0.8810 0.7883 0.8651 0.8596 0.8542
Sp 1.0000 1.0000 1.0000 0.9030 0.8609 0.5100 0.5357 0.5771
MIOptimalBall Acc 0.7900 0.6330 0.4820 0.7560 0.7875 0.7412 0.7476 0.7267
Se 0.6940 0.6260 0.4240 0.7180 0.6467 0.7818 0.7994 0.8617
Sp 0.8860 0.6400 0.5400 0.7930 0.8743 0.6643 0.6433 0.4614
MDD Acc 0.7860 0.7070 0.6640 0.8000 0.7233 0.7106 0.7213 0.7575
Se 0.7740 0.6700 0.7120 0.7330 0.4933 0.9554 0.9100 0.8946
Sp 0.7980 0.7440 0.6160 0.8750 0.8648 0.2262 0.3490 0.4843
MIDD Acc 0.8010 0.7460 0.5970 0.8440 0.8053 0.7262 0.7532 0.7819
Se 0.7960 0.7240 0.6160 0.8620 0.6450 0.8553 0.8795 0.8963
Sp 0.8060 0.7680 0.5780 0.8240 0.9067 0.4700 0.5033 0.5557
MIEMDD Acc 0.7520 0.7240 0.6090 0.8420 0.8487 0.7057 0.7254 0.7103
Se 0.8020 0.7920 0.7260 0.8910 0.8950 0.8914 0.8885 0.7937
Sp 0.7020 0.6560 0.4920 0.7910 0.8228 0.3329 0.4029 0.5471
MILR Acc 0.7740 0.7700 0.5460 0.8413 0.7847 0.7113 0.6946 0.7256
Se 0.8600 0.8040 0.6120 0.8850 0.7467 0.9265 0.9263 0.9087
Sp 0.6880 0.7360 0.4800 0.7950 0.8067 0.2872 0.2348 0.3643
MIBoost Acc 0.8130 0.8210 0.6620 0.8129 0.7704 0.6851 0.7647 0.7778
Se 0.8080 0.8440 0.7100 0.8690 0.6717 0.9021 0.8906 0.8817
Sp 0.8180 0.7980 0.6140 0.7580 0.8319 0.2538 0.5162 0.5676
MISMO Acc 0.8180 0.8150 0.5740 0.8633 0.8351 0.6976 0.8051 0.8294
Se 0.8320 0.8740 0.7800 0.9040 0.8600 0.8549 0.8368 0.8554
Sp 0.8040 0.7560 0.3680 0.8200 0.8195 0.3914 0.7448 0.7767
PART (MIWrapper) Acc 0.8220 0.7880 0.6370 0.8473 0.8058 0.7584 0.8305 0.8552
Se 0.8620 0.7960 0.6380 0.8380 0.7317 0.8896 0.8992 0.8989
Sp 0.7820 0.7800 0.6360 0.8630 0.8547 0.5019 0.6948 0.7662
Bagging&PART (MIWrapper) Acc 0.8660 0.8270 0.6640 0.8593 0.8233 0.7833 0.8339 0.8586
Se 0.9060 0.8780 0.6980 0.9320 0.7883 0.9053 0.8896 0.8958
Sp 0.8260 0.7760 0.6300 0.7820 0.8490 0.5457 0.7252 0.7829
AdaBoost&PART (MIWrapper) Acc 0.8540 0.8110 0.6370 0.8569 0.8231 0.7751 0.8242 0.8584
Se 0.8840 0.8320 0.6780 0.8790 0.7983 0.8948 0.8881 0.8958
Sp 0.8240 0.7900 0.5960 0.8360 0.8433 0.5414 0.6995 0.7824
SMO (MIWrapper) Acc 0.8330 0.8020 0.5860 0.8525 0.8216 0.6649 0.6649 0.6723
Se 0.8740 0.8360 0.7540 0.8730 0.7367 1.0000 1.0000 1.0000
Sp 0.7920 0.7680 0.4180 0.8350 0.8767 0.0000 0.0000 0.0214
970 A. Zafra, S. Ventura
123
considering all data sets. Thus, the algorithm with the value
closest to 1 is the best algorithm in most data sets. We can
see that MOG3P-MI obtains in all comparisons and mea-
surements a rank value equal or very closer to 1 being in all
cases the lower value. A priori, this indicates that for all
data sets or the most of them, MOG3P-MI achieves the
classifier with better values for accuracy, sensitivity and
specificity.
Following, the results of Friedman and Iman-Davenport
tests are evaluated. According to these tests, if the null-
hypothesis is accepted, we state that all methods obtain
similar results, i.e., they do not have significant differences
in their results. On the other hand, if the null-hypothesis is
rejected, we state that there are differences between the
results obtained by algorithms. Table 8 shows the results of
applying these tests. Specifically, it is shown the Friedman
and Iman-Davenport values, the p value and the final
decision using a level of confidence of a ¼ 0:5: Results of
both test are very similar and given that the test rejects the
null hypothesis in the most cases, there are significant
differences among the observed results in the different
comparisons. Therefore, it is necessary a posthoc statistical
analysis for the different comparison and metrics. We
apply two powerful procedures called Hocherbg and Holm
methods (Garcia et al. 2009) for comparing the control
algorithm with the rest of algorithms in the comparison.
Tables 9, 10, 11, 12 and 13 show all the adjusted p values
for the different comparisons which involves the control
algorithm and the different metrics. Algorithms which are
not worse than the control considering a level of signifi-
cance a = 0.05 are marked in bold. Moreover, with the aim
of making easier the comprehensibility of these tables, the
algorithms appear sorted by their ranking value. Thus, the
algorithm with number 1 has the higher ranking for that
measurement in that comparison and the algorithm with the
highest number is the closer to control algorithm. First,
Table 9 is analysed to show how the information is rep-
resented. This table displays the application of these tests
on the different metrics (accuracy, sensitivity and speci-
ficity) in the comparison with methods based on diverse
density. In bold, we can see algorithms which do not
present significantly worse results than the control algo-
rithm (associated with our multi-objective proposal in all
cases, MOG3P-MI). Observing these results, we can see
that in accuracy obtains the best results and the other
proposals are considered statistically worse proposals to
solve these problems. With respect to the other metrics, if
we observe the sensitivity values, we can see that MIE-
MDD algorithm is not consider statistically a worse pro-
posal. However, in specificity values, this proposal is
considered not only as that with significant worse results
than control algorithm, but also the worst proposal of all
compared algorithms. Similar study can be found in the
other comparisons (Tables 10, 11, 12, 13). In general, it is
convenient to point out that when one algorithm does not
obtain significant differences in one metric with respect to
the control algorithm, this algorithm is considered the
worst in the other metric showing significant differences in
the statistical test. In classification, this is a very relevant
problem. It is well known that there is a trade-off between
sensitivity and specificity values. Thus, when algorithms
increase sensitivity, normally they decrease specificity (and
vice versa). This fact can be seen in all algorithms in the
rest of comparisons (Tables 10, 11, 12, 13). In all cases, if
one of them obtains similar results to control algorithm in
one metric, then they are considered the worst proposal in
the other one. For example, Table 10 shows the case of
SMO algorithm. Table 11 shows the case of CitationKNN
Table 7 Average Rankings of the algorithms
Algorithm Ranking
Acc Se Sp
Group 1: Methods based on diverse density
MOG3P-MI 1.000 1.250 1.000
MDD 3.375 3.000 3.125
MIDD 2.375 3.250 2.250
MIEMDD 3.250 2.500 3.625
Group 2: Methods based on SVM
MOG3P-MI 1.000 1.375 1.000
MISMO 2.250 2.500 2.500
SMO (MIWrapper) 2.750 2.125 2.500
Group 3: Methods based on distance
MOG3P-MI 1.000 1.000 1.625
CitationKNN 2.250 2.500 1.875
MIOptimalBall 2.750 2.500 2.500
Group 4: Methods based on rule
MOG3P-MI 1.125 1.375 1.250
PART (MIWrapper) 4.438 3.875 4.000
Bagging&PART (MIWrapper) 2.250 2.438 3.500
AdaBoost&PART (MIWrapper) 3.688 3.313 3.875
PART (MISimple) 3.750 4.000 3.500
AdaBoost&PART (MISimple) 5.750 6.000 4.875
Group 5: Other methods
MOG3P-MI 1.375 1.875 1.500
G3P-MI 2.125 2.375 2.372
MILR 3.750 3.125 4.000
NaiveBayes(MIWrapper) 4.625 3.500 4.000
MIBOOST 3.125 4.125 3.125
Multi-objective approach based on G3P for MIL 971
123
Table 8 Results of Friedman test (p = 0.01)
Friedman p value Conclusion Iman-Davenport p value Conclusion
Group 1: Methods based on diverse density
Acc 17.250 6.280 9 10-4 Reject null hypothesis 17.889 5.359 9 10-6 Reject null hypothesis
Se 11.400 9.748 9 10-3 Reject null hypothesis 6.333 3.146 9 10-3 Reject null hypothesis
Sp 19.050 2.670 9 10-4 Reject null hypothesis 26.939 2.157 9 10-7 Reject null hypothesis
Group 2: Methods based on SVM
Acc 13.000 1.503 9 10-3 Reject null hypothesis 30.333 8.147 9 10-6 Reject null hypothesis
Se 5.250 7.244 9 10-2 Accept null hypothesis 3.419 6.180 9 10-2 Accept null hypothesis
Sp 12.000 2.479 9 10-3 Reject null hypothesis 21.000 6.104 9 10-5 Reject null hypothesis
Group 3: Methods based on distance
Acc 13.000 1.503 9 10-3 Reject null hypothesis 30.333 8.147 9 10-6 Reject null hypothesis
Se 12.000 2.479 9 10-3 Reject null hypothesis 21.000 6.104 9 10-5 Reject null hypothesis
Sp 3.250 1.969 9 10-1 Accept null hypothesis 1.784 2.041 9 10-1 Accept null hypothesis
Group 4: Methods based on rule
Acc 30.268 1.300 9 10-4 Reject null hypothesis 21.771 7.460 9 10-9 Reject null hypothesis
Se 28.161 3.400 9 10-4 Reject null hypothesis 16.650 2.085 9 10-7 Reject null hypothesis
Sp 16.786 4.925 9 10-2 Reject null hypothesis 5.062 1.356 9 10-3 Reject null hypothesis
Group 5: Other methods
Acc 21.200 2.890 9 10-3 Reject null hypothesis 13.741 2.556 9 10-5 Reject null hypothesis
Se 10.200 3.719 9 10-2 Reject null hypothesis 3.275 2.533 9 10-2 Reject null hypothesis
Sp 14.900 4.913 9 10-3 Reject null hypothesis 6.099 1.164 9 10-3 Reject null hypothesis
Table 9 Adjusted p values for the comparison of the control algo-
rithm in each measure with algorithms based on Diverse Density
(Hochberg and Holm test)
Group 1: Methods based on diverse density
i Algorithm pvalue pHochberg pHolm
Accuracy metric (MOG3PMI is the control algorithm)
1 MDD 0.000234 0.000702 0.000702
2 MIEMDD 0.000491 0.000982 0.000982
3 MIDD 0.03316 0.03316 0.03316
Sensitivity metric (MOG3PMI is the control algorithm)
1 MIDD 0.001946 0.005837 0.005837
2 MDD 0.006706 0.013413 0.013413
3 MIEMDD 0.052808 0.052808 0.052808
Specificity metric (MOG3PMI is the control algorithm)
1 MIEMDD 0.000048 0.000143 0.000143
2 MDD 0.000995 0.001989 0.001989
3 MIDD 0.052808 0.052808 0.052808
Table 10 Adjusted p values for the comparison of the control algo-
rithm in each measure with algorithms based on SVM (Hochberg and
Holm test)
Group 2: Methods based on SVM
i Algorithm pvalue pHochberg pHolm
Accuracy metric (MOG3PMI is the control algorithm)
1 SMOa 0.000465 0.000931 0.000931
2 MISMO 0.012419 0.012419 0.012419
Sensitivity metric (MOG3PMI is the control algorithm)
1 MISMO 0.024449 0.048898 0.048898
2 SMOa 0.133614 0.133614 0.133614
Specificity metric (MOG3PMI is the control algorithm)
1 MISMO 0.0027 0.0027 0.0027
2 SMOa 0.0027 0.0027 0.0027
a MIWrapper
972 A. Zafra, S. Ventura
123
and MIOptimalBall algorithms. Table 12 shows the case of
AdaBosst & PART (MIWrapper) and Bagging & PART
(MIWrapper) algorithms. Finally, Table 13 shows the case
of MIBoost and MILR algorithms.
The only exceptions are: MOG3P-MI which obtains the
best results in all metrics being in all cases the algorithm
with lower ranking and G3P-MI. G3P-MI in all cases
obtains worst results than MOG3P-MI, but statistically
there are no significant differences between their results.
However, if we observe the results by data set, we can find
that G3P-MI has a problem when the data sets are not
equally balanced. In these cases, their performance always
is worse significantly than MOG3P-MI. This fact was one
motivation to introduce a multi-objective proposal that
allows us to find Pareto Front to optimize the balance
between the different metrics.
We can conclude that the multi-objective technique
implemented obtains the best results with respect to accu-
racy, sensitivity and specificity values. On the other hand,
the rest of the algorithms optimize one measurement only
to the great expense of another. This shows that the quality
of the rules is not good enough because one of the classes is
not correctly classified. Thus, it is shown that problems
with a trade-off between different metrics are convenient to
work with independent objectives because the algorithms
that use a single criterion for learning (e.g., accuracy or an
scalar combination of objectives with weights predefined
a priori) tend to omit solutions which offer a good balance
between conflicting objectives. Thus, multi-objective
models which generate Pareto Front is an interesting
technique to find the balance between contradictory metrics
in a MIL framework.
4.5 Study of classifiers obtained in MOG3P-MI
Currently, the comprehensibility in the knowledge dis-
covery process is so important as obtaining accurate
models. As we have commented, MOG3P-MI generates
IF-THEN prediction rules as result of their process. The
advantage of these models is that they are a very intuitive
way to knowledge representation due to the ability of the
human mind to comprehend them.
In this section, one example of each application domain
has been selected to show that rules generated by our proposal
are generally comprehensible and compact and provide rep-
resentative information so that experts can acquire useful
knowledge about relevant attributes and their intervals.
The first rule is obtained for Musk data set. This rule
shows representative information about the problem of
predicting if a molecule have the musky property.
The attributes considered in the rule maintain informa-
tion about the shape of the molecule, for this it is measured
the distance of 162 rays emanating from the origin of the
molecule to the molecule surface. In addition to these 162
Table 11 Adjusted p values for the comparison of the control algo-
rithm in each measure with algorithms based on Distance (Hochberg
and Holm test)
Group 3: Methods based on distance
i Algorithm pvalue pHochberg pHolm
Accuracy metric (MOG3PMI is the control algorithm)
1 MIOptimalBall 0.000465 0.000931 0.000931
2 CitationKNN 0.012419 0.012419 0.012419
Sensitivity metric (MOG3PMI is the control algorithm)
1 CitationKNN 0.0027 0.0027 0.0027
2 MIOptimalBall 0.0027 0.0027 0.0027
Specificity metric (MOG3PMI is the control algorithm)
1 MIOptimalBall 0.080118 0.160237 0.160237
2 CitationKNN 0.617075 0.617075 0.617075
Table 12 Adjusted p values for the comparison of the control algo-
rithm in each measure with algorithms based on Rules (Hochberg-
Holm test)
Group 4: Methods based on rules
i Algorithm pvalue pHochberg pHolm
Accuracy metric (MOG3PMI is the control algorithm)
1 AdaBoost&PARTa 0.000001 0.000004 0.000004
2 PARTb 0.000398 0.001593 0.001593
3 PARTa 0.005012 0.012309 0.010025
4 AdaBoost&PARTb 0.006155 0.012309 0.012309
5 Bagging&PART1 0.229102 0.229102 0.229102
Sensitivity metric (MOG3PMI is the control algorithm)
1 AdaBoost&PARTa 0.000001 0.000004 0.000004
2 PARTa 0.005012 0.020049 0.015053
3 PARTb 0.007526 0.022579 0.022579
4 AdaBoost&PARTb 0.038333 0.076666 0.076666
5 Bagging &PARTb 0.256015 0.256015 0.256015
Specificity metric (MOG3PMI is the control algorithm)
1 AdaBoost&PARTa 0.000106 0.000532 0.000532
2 PARTb 0.003283 0.013134 0.010025
3 AdaBoost&PARTb 0.005012 0.015037 0.015037
4 Bagging&PART b 0.016157 0.016157 0.016157
5 PARTa 0.016157 0.016157 0.016157
a MISimpleb MIWrapper
Multi-objective approach based on G3P for MIL 973
123
shape features, also four domain-specific features that
represent the position of a designated atom (an oxygen
atom) on the molecular surface are considered. According
to this initial information, the rule provides information
about the geometric structure of the molecule to determine
if it satisfies the property evaluated or not. Thus, the
experts in the area can increase the knowledge to identify
or developed molecules with this property.
The following rule clearly represents the most relevant
attributes for predicting if a molecule presents the muta-
genecity property. The available information of a molecule
indicates several description levels which describe it
according its atoms, bonds and chains. With this initial
information, the rule obtained determines the characteris-
tics concerning the entire molecule to provide a theory that
will best predict its mutagenic activity.
Finally, the last application domain is classification of
content-based images. In this context, the next rule repre-
sents the most relevant attributes for identifying a tiger
inside of a image. In this problem, the informations about
the image are features of color, texture and position of
pixels of the image. Concretely, with respect to color fea-
tures the information maintained is about the Lab space
color. With respect to texture features, the contrast, the
anisotropy and the polarity are considered and with respect
to position features, the coordinates is considered (x, y) of
the position of the pixel. According to this initial infor-
mation, the rule provides information about what descrip-
tors of the color, texture and position should have a region
of the image to contain a tiger. The experts in the area can
obtain the main characteristics of one image to be able to
recognize a tiger.
5 Conclusion
This paper presents a multi-objective grammar guided
genetic programming algorithm (MOG3P-MI) for MIL.
The proposal was based on the G3P paradigm and evalu-
ates two conflicting measurements very commonly used in
classification. To determine the philosophy of evolutionary
process, three different extensions which work with MIL
were implemented and a preliminary comparative study to
check the performance of different representative approa-
ches was carried out. Experimental results that compared
the different multi-objective techniques helped us to
determine the best evolutionary process and to study the
performance of different multi-objectives techniques in
MIL setting. In short, the most relevant result obtained is
that the different approaches have not so influence on the
results achieved. There are no significant differences
between the three different extensions. Although, in
Table 13 Adjusted p values for the comparison of the control algo-
rithm in each measure with other algorithms (Hochberg and Holm
test)
Group 5: Other methods
i Algorithm pvalue pHochberg pHolm
Accuracy metric (MOG3PMI is the control algorithm)
1 NaiveBayesa 0.000039 0.000158 0.000158
2 MILR 0.002663 0.007989 0.007989
3 MIBoost 0.026857 0.053713 0.053713
4 G3PMI 0.342782 0.342782 0.342782
Sensitivity metric (MOG3PMI is the control algorithm)
1 MIBoost 0.004427 0.017706 0.017706
2 NaiveBayesa 0.039833 0.119498 0.119498
3 MILR 0.113846 0.227693 0.227693
4 G3PMI 0.527089 0.527089 0.527089
Specificity metric (MOG3PMI is the control algorithm)
1 NaiveBayesa 0.001565 0.004696 0.004696
2 MILR 0.001565 0.004696 0.004696
3 MIBoost 0.039833 0.079665 0.079665
4 G3PMI 0.268382 0.268382 0.268382
a MIWrapper
974 A. Zafra, S. Ventura
123
general, one of them obtained always better results in all
evaluation metrics.
A comparison between MOG3P-MI and some of the most
relevant algorithms in MIL considering 16 different algo-
rithms and eight different data sets determined that our
proposal obtains the best accuracy values of the algorithms in
the different domains. The Friedman test determined that in
both accuracy and sensitivity and specificity measurements,
there are significant differences between the algorithms
and the posteriori test carried out concluded that MOG3P-
MI is the most interesting proposal. With respect to
accuracy and sensitivity, it obtains the best models and
with respect to specificity values, this multi-objective
technique obtains competitive results being very close to
the control algorithm and there is no significant differ-
ences between their results. In this way, MOG3P-MI finds
a balance between the different measurements, since
despite obtaining better results for accuracy and sensitiv-
ity measurements it is capable of obtaining competitive
results in the specificity measurement achieving finally the
more accurate models in all data sets. In short, the most
relevant result obtained is that multi-objective technique
manages certain benefits to obtain better solutions in
classification problems in MIL.
Although the results obtained are of great interest, we
feel that the yield of multi-objective algorithms for solving
multi-instance problems could be improved in some ways.
On the one hand, it would be interesting to reduce the space
dedicated to features, thus facilitating the search for opti-
mal solutions. For this reason, it would be of special
interest to study the application of feature selection tech-
niques in this learning. Another future line of research
would be to carry out a study of the different measurements
needed to select from the Pareto Front the solution most
guaranteed to obtain a correct classification.
Acknowledgments The authors gratefully acknowledge the finan-
cial subsidy provided by the Spanish Department of Research under
TIN2008-06681-C06-03 and P08-TIC-3720 Projects and FEDER
fund.
References
Andrews S, Tsochantaridis I, Hofmann T (2002) Support vector
machines for multiple-instance learning. In: NIPS’02: proceed-
ings of neural information processing system, Vancouver,
Canada, pp 561–568
Auer P, Ortner R (2004) A boosting approach to multiple instance
learning. In: ECML’04: proceedings of the 5th European
conference on machine learning. Lecture Notes in Computer
Science, vol 3201. Springer, Pisa, pp 63–74
Bojarczuk CC, Lopes HS, Freitas AA (2000) Genetic programming
for knowledge discovery in chest-pain diagnosis. IEEE Eng Med
Biol Mag 19(4):38–44
Chai YM, Yang ZW (2007) A multi-instance learning algorithm
based on normalized radial basis function network. In: ISSN’07:
proceedings of the 4th International Symposium on Neural
Networks. Lecture Notes in Computer Science, vol 4491.
Springer, Nanjing, pp 1162–1172
Chen X, Zhang C, Chen S, Rubin S (2009) A human-centered
multiple instance learning framework for semantic video
retrieval. IEEE Trans Syst Man Cybern Part C Appl Rev
39(2):228–233
Chen Y, Bi J, Wang J (2006) MILES: Multiple-instance learning via
embedded instance selection. IEEE Trans Pattern Anal Mach
Intell 28(12):1931–1947
Chen Y, Wang JZ (2004) Image categorization by learning and
reasoning with regions. J Mach Learn Res 5:913–939
Chevaleyre YZ, Zucker JD (2001) Solving multiple-instance and
multiple-part learning problems with decision trees and decision
rules. Application to the mutagenesis problem. In: AI’01:
proceedings of the 14th of the Canadian society for computa-
tional studies of intelligence. Lecture Notes in Computer
Science, vol 2056. Springer, Ottawa, pp 204–214
Chien BC, Lin JY, Hong TP (2002) Learning discriminant functions
with fuzzy attributes for classification using genetic program-
ming. Expert Syst Appl 23(1):31–37
Coello CA, Lamont GB, Veldhuizen DAV (2007) Evolutionary
algorithms for solving multi-objective problems. Genetic and
evolutionary computation. 2nd edn. Springer, Berlin
Couchet J, Manrique D, Ros J, Rodrguez-Patn A (2006) Crossover
operators for grammar-guided genetic programming. Soft Com-
put A Fusion of Found Methodol Appl 11(10):943–955
Deb K, Agrawal S, Pratap A, Meyarivan T (2000) A fast elitist non-
dominated sorting genetic algorithm for multi-objective optimi-
sation: NSGA-II. In: PPSN VI: proceedings of the 6th interna-
tional conference on parallel problem solving from nature.
Springer, London, pp 849–858
Dehuri S, Cho SB (2008) Multi-objective classification rule mining
using gene expression programming. In: ICCIT ’08: proceedings
of the 3rd international conference on convergence and hybrid
information technology. IEEE Computer Society, Washington,
DC, pp 754–760
Demsar J (2006) Statistical comparisons of classifiers over multiple
data sets. J Mach Learn Res 7:1–30
Dietterich TG, Lathrop RH, Lozano-Perez T (1997) Solving the
multiple instance problem with axis-parallel rectangles. Artif
Intell 89(1–2):31–71
Gao S, Suna Q (2008) Exploiting generalized discriminative multiple
instance learning for multimedia semantic concept detection.
Pattern Recogn 41(10):3214–3223
Garcia S, Fernandez A, Luengo J, Herrera F (2009) A study of
statistical techniques and performance measures for genetics-
based machine learning: accuracy and interpretability. Soft
Comput A Fusion of Found Methodol Appl 13:959–977
Garcia S, Fernandez A, Luengo J, Herrera F (2010) Advanced
nonparametric tests for multiple comparisons in the design of
experiments in computational intelligence and data mining:
experimental analysis of power. Inf Sci 180(10):2044–2064
Gartner T, Flach PA, Kowalczyk A, Smola AJ (2002) Multi-instance
k0ernels. In: ICML’02: proceedings of the 19th international
conference on machine learning. Morgan Kaufmann, Sydney,
pp 179–186
Gu Z, Mei T, Hua X, Tang J, Wu X (2008) Multi-layer multi-instance
learning for video concept detection. IEEE Trans Multimedia
10(8):1605–1616
Jaszkiewicz A, Kominek P (2003) Genetic local search with distance
preserving recombination operator for a vehicle routing problem.
Eur J Oper Res 151(2):352–364
Multi-objective approach based on G3P for MIL 975
123
Keerthi S, Shevade S, Bhattacharyya C, Murthy K (2001) Improve-
ments to platt’s smo algorithm for svm classifier design. Neural
Comput 13(3):637–649
Kishore JK, Patnaik LM, Mani V, Agrawal VK (2000) Application of
genetic programming for multicategory pattern classification.
IEEE Trans Evol Comput 4(3):242–258
Knuth DE (1964) Backus normal form vs. backus naur form.
Commun ACM 7(12):735–736
Liao S, Hsieh C, Lai P (2011) An evolutionary approach for multi-
objective optimization of the integrated location-inventory
distribution network problem in vendor-managed inventory.
Expert Syst Appl 38(6):6768–6776
Lu J, Ma S, Zhang M (2008) Multi-instance clustering approach for
web image using one-class support vector machine. J Comput Inf
Syst 4(3):1231–1240
Mangasarian OL, Wild EW (2008) Multiple instance classification via
successive linear programming. J Optim Theory Appl 137(3):
555–568
Maron O, Lozano-Perez T (1997) A framework for multiple-instance
learning. In: NIPS’97: proceedings of neural information
processing system 10, Denver, CO, USA, pp 570–576
Mugambi EM, Hunter A (2003) Multi-objective genetic programming
optimization of decision trees for classifying medical data. In:
KES’03: knowledge-based intelligent information and engineer-
ing systems, pp 293–299.
Panait L, Luke S (2004) Alternative bloat control methods. In:
GECCO’04: proceedings of the 2004 conference on genetic and
evolutionary computation, Seattle, Washington, USA, pp 630–641
Pang J, Huang Q, Jiang S (2008) Multiple instance boost using graph
embedding based decision stump for pedestrian detection. In:
ECCV’08: proceedings of the 10th European conference on
computer vision. Lectures Note in Computer Science, vol 5305.
Springer, Berlin, pp 541–552
Pao H, Chuang S, Xu Y, Fu H (2008) An EM based multiple instance
learning method for image classification. Expert Syst Appl
35(3):1468–1472
Parrott D, Xiaodong L, Ciesielski V (2005) Multi-objective tech-
niques in genetic programming for evolving classifiers. In: IEEE
congress on evolutionary computation, vol 2, pp 1141–1148
Qian C, Yu Y, Zhou ZH (2011) An analysis on recombination in
multi-objective evolutionary optimization. In: GECCO’11: pro-
ceedings of the 13th ACM conference on genetic and evolu-
tionary computation, Dublin, Ireland, pp 2051–2058
Ramon J, De Raedt L (2000) Multi-instance neural networks. In:
ICML’00: a workshop on attribute-value and relational learning
at the 17th conference on machine learning
Ray S, Page D (2001) Multiple instance regression. In: Proceedings of
the eighteenth international conference on machine learning,
ICML’01, San Francisco, CA, USA, pp 425–432
Ron K (1995) A study of cross-validation and bootstrap for accuracy
estimation and model selection. In: IJCAI’95: international
joint conference on artificial intelligence, Montreal, Canada,
pp 1137–1145
Ruffo G (2000) Learning single and multiple instance decision tree
for computer security applications. PhD thesis, Department of
Computer Science. University of Turin, Torino, Italy
Shukla PK, Deb K (2007) On finding multiple pareto-optimal
solutions using classical and evolutionary generating methods.
Eur J Oper Res 181(3):1630–1652
Tan KC, Tay Lee A, Heng T, CM (2002) Mining multiple
comprehensible classification rules using genetic programming.
In: CEC’02: proceedings of the congress on evolutionary
computation, Honolulu, HI, USA, vol 2, pp 1302–1307
Tan K, Chiam S, Mamun A, Goh C (2009) Balancing exploration and
exploitation with adaptive variation for evolutionary multi-
objective optimization. Eur J Oper Res 197(2):701–713
Tsang CH, Kwong S, Wang H (2007) Genetic-fuzzy rule mining
approach and evaluation of feature selection techniques for
anomaly intrusion detection. Pattern Recogn 40(9):2373–2391
Veldhuizen DV, Lamont G (2000) Multiobjective evolutionary
algorithms: analyzing the state-of-the-art. Evol Comput
8(2):125–147
Ventura S, Romero C, Zafra A, Delgado JA, Hervas C (2007) JCLEC:
AZ java framework for evolutionary computation soft comput-
ing. Soft Comput 12(4):381–392
Wang J, Zucker JD (2000) Solving the multiple-instance problem: a
lazy learning approach. In: ICML’00: Proceedings of the 17th
international conference on machine learning, Stanford, CA,
USA, pp 1119–1126
Wee H, Lo C, Hsu P (2009) A multi-objective joint replenishment
inventory model of deteriorated items in a fuzzy environment.
Eur J Oper Res 197(2):620–631
Whigham PA (1995) Grammatically-based genetic programming. In:
Proceedings of the workshop on genetic programming: from
theory to real-world applications, Tahoe City, CA, USA, pp 33–41
Whigham PA (1996) Grammatical bias for evolutionary learning.
PhD thesis, School of Computer Science, University College,
University of New South Wales, Australian Defence Force
Academy, Canberra, Australia
Wiens TS, Dale BC, Boyce MS, Kershaw PG (2008) Three way
k-fold cross-validation of resource selection functions. Ecol
Model 212(3–4):244–255
Witten I, Frank E (2005) Data mining: practical machine learning
tools and techniques. 2nd edn. Morgan Kaufmann, San Francisco
Witten IH, Frank E (1999) Data mining: practical machine learning
tools and techniques with java implementations. Morgan Kauf-
mann, San Francisco
Wu F, Zhou H, Zhao J, Cen K (2011) A comparative study of the
multi-objective optimization algorithms for coal-fired boilers.
Expert Syst Appl 38(6):7179–7185
Xu L, Guo MZ, Zou Q, Liu Y, Li HF (2008) An improved diverse density
algorithm for multiple overlapped instances. In: ICNC’08: Pro-
ceedings of the 4th international conference on natural computa-
tion. IEEE Computer Society, Washington, DC, pp 88–91
Xu X (2003) Statistical learning in multiple instance problems. PhD
thesis, Department of Computer Science. University of Waikato
Xu X, Frank E (2004) Logistic regression and boosting for labeled
bags of instances. In: PAKDD’04: Proceedings of the 8th
Conference of Pacific–Asia. Lecture Notes in Computer Science,
vol 3056. Springer, Sydney, pp 272–281
Yang C, Dong M, Fotouhi F (2005) Region based image annotation
through multiple-instance learning. In: Multimedia’05: proceed-
ings of the 13th annual ACM international conference on
multimedia, New York, USA, pp 435–438
Yang E, Erdogan AT, Arslan T, Barton NH (2011) Multi-objective
evolutionary optimizations of a space-based reconfigurable
sensor network under hard constraints. Soft Computing 15(1):
25–36
Zhou Z-H, Wu J, Tang W (2002) Ensembling neural networks: Manycould be better than all. Artif Intell 137(1–2): 239–263
Zafra A, Romero C, Ventura S (2011) Multiple instance learning for
classifying students in learning management systems. Expert
Syst Appl 38(12):15020–15031
Zafra A, Ventura S (2010) G3P-MI: A genetic programming
algorithm for multiple instance learning. Information Sciences
180(23):4496–4513
Zafra A, Ventura S, Romero C, Herrera-Viedma E (2009) Multi-
instance genetic programming for web index recommendation.
Expert Syst Appl 36(9):11470–11479
Zhang D, Wanga F, Shib Z, Zhanga C (2010) Interactive localized
content based image retrieval with multiple-instance active
learning. Pattern Recognition 43(2):478–484
976 A. Zafra, S. Ventura
123
Zhang ML, Zhou ZH (2004) Improve multi-instance neural networks
through feature selection. Neural Processing Letters 19(1):1–10
Zhang ML, Zhou ZH (2005) Ensembles of multi-instance Neural
Networks. In: IIP’04: International Conference on intelligent
information processing II. IFIP international federation for
information processing, Beijing, China. vol 163, pp 471–474
Zhang ML, Zhou ZH (2006) Adapting RBF Neural Networks to
multi-instance learning. Neural Processing Letters 23(1):1–26
Zhang ML, Zhou ZH (2009) Multi-instance clustering with applica-
tions to multi-instance prediction. Applied Intelligences 31(1):
47–68
Zhang Q, Goldman S (2001) EM-DD: An improved multiple-instance
learning technique. In: NIPS’01: Proceedings of Neural Informa-
tion Processing System 14, Vancouver, Canada, pp 1073–1080
Zhou ZH, Jiang K, Li M (2005) Multi-instance learning based web
mining. Applied Intelligence 22(2):135–147
Zhou ZH, Zhang ML (2007) Solving multi-instance problems with
classifier ensemble based on constructive clustering. Knowledge
and Information Systems 11(2):155–170
Zitzler E, Laumanns M, Thiele L (2001) SPEA2: Improving the
Strength Pareto Evolutionary Algorithm. Tech. Rep. 103,
Gloriastrasse 35
Multi-objective approach based on G3P for MIL 977
123