A Multi-agent Evolutionary Algorithm for Software Module …xin/papers/MAEA-SMCPs.pdf · 1 A...

1

A Multi-agent Evolutionary Algorithm for Software

Module Clustering Problems

Jinhuang Huang, Jing Liu1

Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education,

Xidian University, Xi’an 710071, China

Xin Yao

Centre of Excellence for Research in Computational Intelligence and Applications (CERCIA),

School of Computer Science, The University of Birmingham, Birmingham B15 2TT, U.K.

Abstract: The aim of software module clustering problems (SMCPs) is to automatically find a good quality

clustering of software modules based on relationships among modules. In this paper, we propose a

multi-agent evolutionary algorithm to solve this problem, labeled as MAEA-SMCPs. With the intrinsic

properties of SMCPs in mind, three evolutionary operators are designed for agents to realize the purpose of

competition, cooperation and self-learning. In the experiments, practical problems are used to validate the

performance of MAEA-SMCPs. The results show that MAEA-SMCPs can find clusters with high quality

and small deviations. The comparison results also show that MAEA-SMCPs outperforms two existing

multi-objective algorithms, namely MCA and ECA, and two existing single-objective algorithms, namely

GGA and GNE, in terms of MQ.

Keywords: Software module clustering, multi-agent evolutionary algorithm, modularization quality.

1. Introduction

It is generally known that good modularization of software leads to systems that are easier

to design, develop, test, maintain, and evolve [1], [17]. However, it becomes difficult for large

software systems, especially they are not documented well [18], [24]. In general, a software

module clustering problem (SMCP) can be described as a problem to find a particular set of

modules, which is organized into clusters according to predefined criteria [2]. The problem of

finding the best clustering for a given set of modules is an NP hard problem [3]; that is, it is

1Corresponding author. For additional information regarding this paper, please contact Jing Liu, e-mail: [email protected],

homepage: http://see.xidian.edu.cn/faculty/liujing/

2

impossible to exhaustively search for the best clustering but to estimate the goodness of a

cluster.

Mancoridis et al., who first suggested the search-based approach to module clustering [4],

formulates the attributes of a good modular decomposition as objectives, the evaluation of

which as a “fitness function” guides a search-based optimization algorithm. In this work, they

developed a tool called Bunch for automated software module clustering. Afterwards, guided

by the concept of search-based approaches, many heuristics methods have been proposed to

solve SMCPs, such as hill-climbing techniques, genetic algorithms and simulated annealing,

and so forth [3], [5], [6], [7], [10], [18], [19], [20], [21]. All these studies indicate that the

hill-climbing algorithm has performed the best in terms of both the solution quality and

execution time [8]. To guide the process of searching, an objective function is needed to

evaluate the objectives. In the software domain, the Modularization Quality (MQ) [30], which

is based on the trade-off between inter- and intra- connectivity, has been widely used as the

objective function.

Bunch includes a complete automated clustering engine and executes a set of

meta-heuristic clustering algorithms [30]. This approach partitions each level of subsystems

from the top level, the whole system is partitioned as subsystems at a high level. Finally, the

clustered system with the highest quality would be produced by the detailed level. It turns out

to be effective for large systems [3]. However, the shortcoming of hill-climbing methods is

premature convergence, which leads to local optima [6]. An attempt to overcome this problem

was a multiple hill-climbing approach ([6], [9]). Praditwong et al. [10] have proposed two

evolutionary algorithms, namely GGA and GNE, to solve SMCPs. They have obtained

comparatively good averaged MQ values, but they ignored the cohesion and coupling which

are significant measures of a software system. Praditwong et al. [11] also modeled the SMCP

as a multi-objective search problem and use evolutionary algorithms [25] to solve it, namely

MCA and ECA. MCA and ECA consider MQ, intra-edges and inter-edges as the objectives,

and obtained good results compared with other single-objective algorithms. Besides the MQ

measure, there are many other measures for SMCPs, such as EVM [31]. We choose MQ as

the primary measure in this paper as it is widely used for SMCPs.

In our previous work, a multi-agent genetic algorithm (MAGA) was proposed for

3

large-scale global numerical optimization [12]. MAGA integrated multi-agent systems with

GAs, and can optimize functions with 10000 dimensions, which is the first GA that can solve

functions with such high dimensions. It is also extended to solve constraint satisfaction

problems and combinatorial optimization problems successfully [13], [14]. Since MAGA

showed a good performance, we propose a multi-agent evolutionary algorithm to solve

SMCPs, labeled as MAEA-SMCPs.

With the intrinsic properties of SMCPs in mind, three evolutionary operators, namely the

neighborhood competition operator, the mutation operator and the self-learning operator are

designed. In MAEA-SMCPs, all agents live in a lattice-like environment. During the process

of interacting with the environment and the other agents, each agent increases energy (fitness)

as much as possible, so that MAEA-SMCPs can find the optima. In the experiments, a set of

problems extracted from real-world software with varying sizes are used to validate the

performance of MAEA-SMCPs. The experimental results show that MAEA-SMCPs has a

good performance, and outperforms other algorithms, such as GGA, GNE, MCA and ECA, in

terms of MQ.

The rest of this paper is organized as follows. Section 2 describes the SMCPs. Section 3

introduces the MAEA-SMCPs in detail. The experiments are given in Section 4. Finally, the

conclusions are given in Section 5.

2. Software Module Clustering Problems

Clustering provides the properties of groups instead of individuals within them [10].

Besides, clustering could not only reveal the internal relations and differences among data, but

also provide important basis for further analysis of data and discovery of knowledge. In

recently years, clustering methods have been widely used to assist analysis and

comprehension of software systems.

In an SMCP, a Module Dependency Graph (MDG) [4], which is a directed graph, has been

used as a representation of the problem. In MDGs, a vertex stands for a module (e.g.

Functions, sources files) in software system and a link, or an arc, stands for a relation (e.g.

Function calls) between two modules [26], [27], [28], [29]. The MDG file contains modules

4

and their relationships in the software system as a list of from-to-weighted information [6].

According to the characterization of edges, MDGs can be categorized into two types, the one

with and without weighted edges. If the edges of the MDG associate with positive number,

called weights, then the MDG is weighted; otherwise, it is unweighted.

Suppose an MDG is labeled as G = (V, E), where V={v1, v2, …, vn} is the set of n modules

and E is the set of links between modules. All modules need to be divided into k

non-overlapping clusters C1, C2, …, Ck; that is, C1∪C2∪...∪Ck=V, Ci≠∅ and Ci∩Cj=∅, i, j=1,

2, …, k, and i≠j. The purpose of SMCPs is to get clusters that are both densely intra-connected

and sparsely inter-connected.

There are two primitive principles of software design: coupling and cohesion. Coupling is a

measure of the degree to which clusters are related to other clusters, and cohesion denotes the

close relationship between components or modules within the same clusters [15], [21]. The

MDG partitioning, introduced above, tries to represent clusters of systems, which are

cohesive modules in the clusters and loosely connection between clusters [3].

Basic Modularization Quality (Basic MQ) was proposed in Bunch as a fitness function [3].

The Basic MQ is a metric which represents maximizing intra-edges and minimizing

inter-edges, where the intra-edge is a link between modules in the same cluster and the

inter-edge is a link between a module in a cluster and a module in another cluster. MQ is the

sum of the ratio of intra-edges and inter-edges in each cluster, called the Modularization

Factor (MFl) for cluster Cl. MFl can be defined as follows:

12

0, if 0

, if 0il

i j

iMF

i+

== > (1)

where i is the number of intra-edges and j is that of inter-edges for an unweighted problem [6].

For a weighted MDG, i and j are the sum of weights of intra-edges and inter-edges,

respectively.

The MQ can be calculated in terms of MF as

1

k

l

l

MQ MF=

=∑ (2)

The goal of MQ is to limit excessive coupling, but not to eliminate coupling altogether. It

implies that we should not simply pursue high cohesion and neglect proper coupling. Thus,

5

MQ attempts to make a trade-off between coupling (Inter-edges) and cohesion (Intra-edges)

through combining them into a single measurement. The aim is to reward increased cohesion

with a higher MQ score and to punish increased coupling with a lower MQ score [11].

3. Multi-agent Evolutionary Algorithm for SMCPs

3.1 Definition of software module clustering agents

Given an MDG with n modules, the arbitrary partition of this MDG can be represented as

character string coding X={x1, x2, ..., xn}, where xi is the cluster identifier of module i, which

can be represented by an integer number. As for the pair of modules i and j in X, if xi=xj,

modules i and j are in the same community; otherwise, they are in different ones. Thus, an

agent for SMCPs can be defined as follows:

Definition 1: An agent is a character string coding X={x1, x2, ..., xn} representing a

candidate partition for an MDG. Its energy is equal to the negative value of the following

objective function,

Energy(X) = - MQ(X) (3)

The purpose of X is to increase its energy as much as possible.

Definition 2: All agents live in a latticelike environment, L, which is called an agent lattice.

The size of L is Lsize×Lsize , where Lsize is an integer. Each agent is fixed on a lattice-point and

it can only interact with its neighbors. Suppose that the agent located at (i, j) is represented as

Li,j , i, j=1, 2, …, Lsize , then the neighbors of Li,j, Neighborsi,j are defined as follows:

{ }, , , , ,, , ,i j i j i j i j i jNeighbors L L L L′ ′ ′′ ′′= (4)

where 1 1

1size

i ii

L i

− ≠′ = =,

1 1

1size

j jj

L j

− ≠′ = =,

1

1

size

size

i i Li

i L

+ ≠′′ = =,

1

1

size

size

j j Lj

j L

+ ≠′′ = =.

For example, the agent lattice can be depicted as the one in Fig.1. Each circle stands for an

agent and any two connected agents can interact with each other. Besides, the data in each

circle means its position in the lattice.

6

Fig.1. The model of the agent lattice.

3.2 Evolutionary Operators for Agents

According to [12] [13], each agent has some behaviors. In addition to the aforementioned

behaviors of competition and cooperation, each agent can also increase its energy by using its

knowledge. On the basis of such behaviors, we design three evolutionary operators for agents.

The neighborhood competition crossover operator realizes the behavior of competition. The

mutation operator and the self-learning operator realize the behaviors of making use of

knowledge. Suppose that the three operators are performed on the agent located at (i, j), Li,j =

(l1, l2, …, ln ), and Maxi,j=(m1, m2,…, mn) is the agent with maximum energy among the

neighbors of Li,j, namely Maxi,j∈Neighborsi,j and ∀a∈Neighborsi,j, then

Energy(a)≤Energy(Maxi,j).

Neighborhood competition crossover operator: If Li,j satisfies (5), it is a winner; otherwise,

it is a loser.

Energy(Li,j)>Energy(Maxi,j) (5)

If Li,j is a winner, it can still live in the agent lattice. If Li,j is a loser, it must die, and its

lattice-point will be occupied by Maxi,j. The strategy we used to generate the agent to occupy

the lattice-point is the one-way crossover operator, which is introduced in [16]. In this

operator, the operation is conducted on Li,j and Maxi,j, Li,j is chosen as the source chromosome

while Maxi,j is the destination one. A node is first selected at random, and the cluster and

identifier of this node in Li,j is determined. Then, a value is assigned to all nodes belonging to

the same cluster in Maxi,j. In this way, information related to the community structure in A can

be transferred to Maxi,j. An example of the one-way crossover is given in Table 1, where node

7

3 is selected. The community label of node 3 is 2, which is also the community label of node

1. Then, the community label of node 1 in Maxi,j is changed to 2.

Table 1 An example for the one-way crossover with node 3 is selected.

V Li,j (source) Maxi,j (destination) Maxi,j (new)

1

2

3(selected)

4

5

2

1

2

3

1

1

3

2

5

3

2

3

2

5

3

*Bold values indicate changed positions

Mutation operator: In this operator, in order to make use of the information of one node

and its neighbors to which it links, a new mutation operator, namely neighbor-based mutation,

is designed, which works as follows: a node is selected randomly from Li,j , and then its

community label is changed to one of its neighbors.

Self-learning operator: According to our experiences, integrating local searches with EAs

can improve the performance for numerical optimization problems. There are several ways to

realize the local searches. Enlightened by MAGA [12], we propose the self-learning operator

which realizes the behavior of using knowledge by using a small scale MAEA regenerated

below. In order to be distinguished from the other parameters in MAEA, all symbols of the

parameters in this operator begin with an ‘s’.

In the self-learning operator, first, a new agent lattice, sL, with sLsize×sLsize agents, is

generated as follows: the best 10% agents in the current population are randomly put into this

lattice, and then other agents are initialized randomly. Second, the neighborhood competition

crossover operator and the mutation operator are iteratively performed on sL. Finally, the

agent with maximum energy found during the above process is returned. Algorithm 1

summarizes this operator in detail.

Algorithm 1: Self-learning operator

Input: sLk represents the agent lattice in the kth generation, and sL

k+1/2 is the mid-lattice between sL

k and

sLk+1

. sBestk is the best agent among sL

0, sL

1,…, sL

k, and sCBest

k is the best agent in sL

k. sPm is the

probability to perform the mutation operator, and sGen is the number of generations.

Output: Li,j.

1. begin

2. Randomly put best 10% agents in the current population to sL0, and then generate other agents in sL

0

randomly; update sBest0, and k←0;

8

3. Calculate energy of sL0, energy(sL

0)=MQ(sL

0);

4. Repeat

5. sLk+1/2

←NeighborhoodCompetition(sLk, sPc);

6. sLk+1

←NeighborMutation(sLk+1/2

, spm);

7. If Energy(sCBestk+1

)>Energy(sBestk),

8. sBestk+1

←sCBestk+1

;

9. else

10. sBestk+1

←sBestk, sCBest

k+1←sBest

k;

11. k←k+1

12. Until stop criteria (k>= sGen) are reached,

13. Li,j←sBestk.

14. end

3.3 Implementation of MAEA-SMCPs

In MAEA-SMCPs, the neighborhood competition operator is performed on each agent.

Consequently, the agents with low energy are cleaned out from the agent lattice so that there

is more developing space for the agents with high energy [12]. The mutation operator is

performed on each agent with probability Pm. The self-learning operator is performed on the

best 10% agents in each generation. Generally, the three operators employ different ways to

simulate the behaviors of agents and do performance to the results, respectively. Algorithm 2

summarizes MAEA-SMCPs in detail.

Algorithm 2: MAEA-SMCPs

Input: Lr represents the agent lattice in the rth generation, and L

r+1/2 is the mid-lattice between L

r and L

r+1.

Bestr is the best agent among L

0, L

1, …, L

r , and CBest

r is the best agent in L

r . Pc and Pm are the

probabilities to perform the neighborhood competition crossover operator and the mutation

operator, respectively.

Output: C={C1, C2, ..., Cn}: the partition with the best objective function value found.

1. begin

2. Initialization(L0), update Best

0, and r←0;

3. Calculate energy of L0, energy(L

0)=MQ(L

0);

4. repeat

5. Lr+1/2

←NeighborhoodCompetition(Lr, Pc);

6. Lr+1

←NeighborMutation(Lr+1/2

,pm);

7. Find the best 10% agents in Lr+1

, then perform the self-learning operator on them;

8. If Energy(CBestr+1

)>Energy(Bestr),

9. Bestr+1

←CBestr+1

;

10. else

11. Bestr+1

←Bestr, CBest

r+1←Best

r;

12. r←r+1

9

13. until stop criteria are reached

14. C←Bestr;

15. end

3.4 Computational complexity analysis

Here, we give a brief analysis on the computational complexity of MAEA-SMCPs. n and

m are used to denote the number of modules and the number of edges, respectively. To

calculate MQ, the main computational cost lies in calculating the number of inter-edges and

intra-edges. First, all modules are stored using an adjacency list, and the neighbor modules of

each module are stored in order. Then, assume that we have an array C[k], where k is the

number of clusters. Besides, For two connected modules u and v, where u belongs to cluster-i

and v belongs to cluster-j, C[i] plus one if i equals j; otherwise, C[i] plus one and C[j] plus one.

This process should be iterated for m times due to the number of edges of the system is m. In

order to calculate MQ, the array C should be traversed. Thus, the time complexity of

calculating MQ is O(m).

For MAEA-SMCPs, the main computational cost lies in Steps 3-12; the process of

searching for the best partition. The computational complexity of Step 4 is O(Lsize·Lsize·n), and

that of Step 5 is O(Lsize·Lsize·n2), where Lsize is the size of agent lattice L. For the calculation of

energy for each agent in Step 7, the computational complexity of Step 7 is O(Lsize·Lsize·m).

During the self-learning process, the worst case time complexity is

O(sLsize·sLsize·n)+O(sLsize·sLsize·n2)+O(sLsize·sLsize·m). Therefore, the worst case time

complexity for MAEA-SMCPs can be simplified as O(maxgen·Lsize·Lsize·(n2+ m)), where

maxgen is the number of iterations.

4. Experiments

In this section, 17 real-world problems [3], [6] are employed to validate the performance

of MAEA-SMCPs. The descriptions about these problems are given in Table 2. In the

following experiments, the parameters Pc, Pm, Lsize are set to 0.5, 0.05, 8, respectively.

Here, an experiment on a representative software system “mtunis” is first conducted as a

case study, and then the performance of the five algorithms optimizing MQ is compared on

the 17 problems.

10

Table 2

Descriptions of Testing Problems.

Name Description #Modules #Links

mtunis

ispell

rcs

bison

grappa

bunch

incl

icecast

gnupg

inn

bitchx

xntp

exim

mod_ss

ncurses

lynx

nmh

Turing operating system for educational purposes.

Spelling and typographical error correction software.

System used to manages multiple revisions of files.

Parser generator for converting grammar description int C.

Genome Rearrangements Analyzer.

Software Clustering tool(Essential java classes only).

Graph drawing tool.

Streaming MP3 audio codec media server.

Complete implementation of the OpenPGP standard.

Unix news group software.

Open source IRC client.

Time synchronization tool.

Message transfer agent for Unix systems.

Apache SSLITLS Interface.

Display and update software for text-only terminals.

Web browser for UNIX and VMS platforms.

Mail client software.

20

24

29

37

86

116

174

60

88

90

97

111

118

135

138

148

198

57

103

163

179

295

365

360

650

601

624

1653

729

1225

1095

682

1745

3262

4.1 Case study

Here, we first use the data “mtunis” (Mini-Tunis) [23], an operating system for education

purposes, as an example to explain the results in detail. Since “mtunis” is written in the Turing

language [22], each module stands for Turing modules and each edge stands for import

relationships between any two modules. Fig.4(a) shows the structure of “mtunis” as depicted

in the design documentation, which consists of 20 modules and 57 edges. Fig.4(b) shows the

obtained partitions by optimizing the basic MQ using MAEA-SMCPs.

In Fig.2(a), 6 different module clusters represent 6 major subsystems: red modules

constitute MAIN, sky blue modules constitute COMPUTER, dark blue and green modules

constitute FILESYSTEM (dark blue for FILE and green for INODE), pink modules constitute

DEVICE and yellow modules constitute GLOBALS.

Fig.2(b) is the automatically obtained partition for “mtunis” by optimizing the basic MQ

using MAEA-SMCPs. From Fig.2(b), we can see that the obtained partition is quite similar to

the partition in the design documentation. The main difference between Fig.2(a) and Fig.2(b)

lies in the confusion of FILE and INODE. For example, the module “Inode”, which belongs

to INODE, is divided falsely into FILE. What causes such misplacement is that the module

11

“Inode”, as an interface module, has a high connection with FILE. Similarly, the modules

“System” and “Panic”, which belong to GLOBAL (cluster with the yellow modules), are also

assigned falsely to other subsystems. The comparison between Fig.2(a) and Fig.2(b) shows

that the obtained partitions using MAEA-SMCPs still has space for improvement, although

they are quite similar.

(a) (b)

Fig.2. Two partitions of mtunis. (a) Structure of mtunis as depicted in the design

documentation; (b) obtained partition by optimizing the basic MQ using MAEA-SMCPs.

4.2 Results and discussions

Tables 3-6 report the averaged results of MAEA-SMCPs over 30 independent runs in terms

of MQ, intra-edges and inter-edges. Since GGA, GNE, MCA and ECA obtained good

performance, a comparison between MAEA-SMCPs and these four algorithms are also

conducted in Table 3-6. Among them, Table 4 reports the percentage that the new method

outperforms as per the results obtained.

As can be seen from Table 3 and 4, in terms of averaged MQ, MAEA-SMCPs outperforms

GGA and GNE on 6 out of 7 problems in unweighted problems. In weighted problems,

compared with GGA, besides the results for “inn”, “mod_ssl”, “ncurses”, the mean values of

MQ obtained by MAEA-SMCPs are better than those of GGA. However, MAEA-SMCPs

outperforms GNE in all weighted problems. Also, the standard deviations obtained by

MAEA-SMCPs are smaller than those of GGA and GNE in all problems, which indicates that

MAEA-SMCPs is more stable than GGA and GNE.

Compared with multi-objective problems, MAEA-SMCPs outperforms MCA and ECA in

most of the 17 problems in terms of MQ. Compared with MCA, the mean values of MQ

12

obtained by MAEA-SMCPs are better than those of MCA, and the standard deviations

obtained by MAEA-SMCPs are also smaller, which indicates that MAEA-SMCPs is more

stable than MCA. Meanwhile, compared with ECA, besides the results for “rcs”, “grappa”,

and “inn”, the mean values of MQ obtained by MAEA-SMCPs are also better than those of

ECA, and the standard deviations obtained by MAEA-SMCPs are also smaller, which

indicates that MAEA-SMCPs is also more stable than ECA.

Table 3

Comparison among 5 algorithms in terms of the averaged MQ.

Name MAEA-SMCPs GGA GNE MCA ECA

Mean ± STD Mean ± STD Mean ± STD Mean ± STD Mean ± STD

Not

weighted

mtunis

ispell

rcs

bison

grappa

bunch

Incl

2.314 ± 0.000

2.353 ± 0.000

2.228 ± 0.001

2.659 ± 0.001

12.495 ± 0.005

13.559 ± 0.024

13.569 ± 0.016

2.231 ± 0.051

2.338 ± 0.016

2.232 ± 0.029

2.231 ± 0.051

12.454 ± 0.144

12.938 ± 0.140

13.139 ± 0.170

2.289 ± 0.026

2.346 ± 0.014

2.263 ± 0.020

2.289 ± 0.026

10.828 ± 0.097

8.991 ± 0.100

8.143 ± 0.089

2.294 ± 0.013

2.269 ± 0.043

2.145 ± 0.034

2.416 ± 0.038

11.586 ± 0.106

12.145 ± 0.225

11.811 ± 0.351

2.314 ± 0.000

2.339 ± 0.022

2.239 ± 0.022

2.648 ± 0.029

12.578 ± 0.053

13.455 ± 0.088

13.511 ± 0.059

Weighted icecast

gnupg

inn

bitchx

xntp

exim

mod_ssl

ncurses

lynx

nmh

2.711 ± 0.000

7.004 ± 0.003

7.797 ± 0.002

4.307 ± 0.000

8.175 ± 0.004

6.440 ± 0.001

9.767 ± 0.003

11.410 ± 0.000

4.921 ± 0.003

8.973 ± 0.030

2.665 ± 0.042

6.874 ± 0.094

7.911 ± 0.037

4.252 ± 0.043

8.111 ± 0.083

6.251 ± 0.086

9.831 ± 0.072

11.452 ± 0.141

4.521 ± 0.073

8.770 ± 0.094

2.668 ± 0.016

6.072 ± 0.046

6.296 ± 0.057

3.565 ± 0.032

6.029 ± 0.065

4.848 ± 0.078

6.860±0.085

7.439±0.103

3.037 ± 0.031

4.576 ± 0.056

2.401 ± 0.057

6.259 ± 0.072

7.421 ± 0.077

3.572 ± 0.055

6.482 ± 0.110

5.316 ± 0.132

8.832 ± 0.097

10.211 ± 0.145

3.447 ± 0.086

6.671 ± 0.177

2.654 ± 0.039

6.905 ± 0.055

7.876 ± 0.046

4.267 ± 0.027

8.168 ± 0.076

6.361 ± 0.084

9.749 ± 0.071

11.297 ± 0.133

4.694 ± 0.060

8.592 ± 0.148

*Bold values indicate the best values among the five algorithms.

As can be seen from Table 3 and 4, the results of MAEA-SMCPs are similar to those of

ECA, so we try to analyze MAEA-SMCPs and ECA in terms of MQ using t-test in Table 5.

The results show that MAEA-SMCPs significantly outperforms ECA in 12 out of the 17

problems.

As we know, a better software system tends to have higher cohesion and lower coupling.

Cohesion is measured by the number of intra-edges in the modularization (those edges that lie

inside a cluster), while coupling is a measure of the number of inter-edges in the

modularization (the edges that connect two clusters) [11]. So we further compare intra-edges

13

and inter-edges between MAEA-SMCPs and two multi-objective algorithms (MCA and ECA)

in Table 6. As can be seen, the results provide strong evidence that MAEA-SMCPs

outperforms MCA for both unweighted and weighted graphs. That is, MAEA-SMCPs

outperforms MCA in all but 5 of the 17 problems studied. The comparison of MAEA-SMCPs

and ECA is somewhat inconclusive for unweighted graphs, with each approach able to

outperform the other in some cases. However, for weighted MDGs, the ECA approach

outperforms the MAEA-SMCPs in all cases but one of the 10 problems.

Table 4

Comparison between MAEA-SMCPs and GNE, MAEA-SMCPs and ECA in terms of the averaged MQ.

Name MAEA-SMCPs GNE ECA Percentage

(MAEA/GNE)

Percentage

(MAEA/ECA) Mean ± STD Mean ± STD Mean ± STD

Not

weighted

mtunis

ispell

rcs

bison

grappa

bunch

Incl

2.314 ± 0.000

2.353 ± 0.000

2.228 ± 0.001

2.659 ± 0.001

12.495 ± 0.005

13.559 ± 0.024

13.569 ± 0.016

2.289 ± 0.026

2.346 ± 0.014

2.263 ± 0.020

2.289 ± 0.026

10.828 ± 0.097

8.991 ± 0.100

8.143 ± 0.089

2.314 ± 0.000

2.339 ± 0.022

2.239 ± 0.022

2.648 ± 0.029

12.578 ± 0.053

13.455 ± 0.088

13.511 ± 0.059

1.118%

0.286%

-1.385%

1.385%

13.336%

32.327%

38.142%

0%

0.589%

-0.437%

0.418%

-0.626%

0.701%

0.491%

Weighted icecast

gnupg

inn

bitchx

xntp

exim

mod_ssl

ncurses

lynx

nmh

2.711 ± 0.000

7.004 ± 0.003

7.797 ± 0.002

4.307 ± 0.000

8.175 ± 0.004

6.440 ± 0.001

9.767 ± 0.003

11.410 ± 0.000

4.921 ± 0.003

8.973 ± 0.030

2.668 ± 0.016

6.072 ± 0.046

6.296 ± 0.057

3.565 ± 0.032

6.029 ± 0.065

4.848 ± 0.078

6.860±0.085

7.439±0.103

3.037 ± 0.031

4.576 ± 0.056

2.654 ± 0.039

6.905 ± 0.055

7.876 ± 0.046

4.267 ± 0.027

8.168 ± 0.076

6.361 ± 0.084

9.749 ± 0.071

11.297 ± 0.133

4.694 ± 0.060

8.592 ± 0.148

1.530%

13.618%

17.433%

16.717%

34.756%

24.904%

27.706%

35.547%

36.496%

48.256%

2.254%

0.103%

-1.132%

0.920%

0.008%

1.267%

0.239%

1.035%

4.283%

4.468%

According to [11], MCA and ECA are two multi-objective approaches, and each contains 5

objectives. In MCA, it chooses the sum of intra-edges of all clusters, the sum of inter-edges of

all clusters, the number of clusters, MQ, the number of isolated clusters as the 5 objectives

used to be optimized. However, ECA uses the difference between the maximum and

minimum number of modules in a cluster to take place of the fifth objective of MCA, the

number of isolated clusters. Compared to MCA, ECA prefers to choose the solutions which

have the less difference between the maximum and minimum number of modules in a cluster,

which accounts for the relatively homogeneous structures of the clustering. However, in

14

MAEA-SMCPs, MQ is the only energy function considered, which tends to pursue the

solution contained the highest MQ value but ignore the intra-edges and the inter-edges.

Taking the problem of “ispell” for example, we choose one normal results of MAEA-SMCPs

and ECA in 30 running times, respectively. The results of the clustering results are shown in

Figs.3 and 4.

Table 5

t-test over MAEA-SMCPs and ECA in terms of the averaged MQ.

Name MAEA-SMCPs ECA t-test

Mean ± STD Mean ± STD

Not

weighted

mtunis

ispell

rcs

bison

grappa

bunch

incl

2.314 ± 0.000

2.353 ± 0.000

2.228 ± 0.001

2.659 ± 0.001

12.495 ± 0.005

13.559 ± 0.024

13.569 ± 0.016

2.314 ± 0.000

2.339 ± 0.022

2.239 ± 0.022

2.648 ± 0.029

12.578 ± 0.053

13.455 ± 0.088

13.511 ± 0.059

0

5.787

-0.750

5.244

-5.802

10.213

12.876

Weighted icecast

gnupg

inn

bitchx

xntp

exim

mod_ssl

ncurses

lynx

nmh

2.711 ± 0.000

7.004 ± 0.003

7.797 ± 0.002

4.307 ± 0.000

8.175 ± 0.004

6.440 ± 0.001

9.767 ± 0.003

11.410 ± 0.000

4.921 ± 0.003

8.973 ± 0.030

2.654 ± 0.039

6.905 ± 0.055

7.876 ± 0.046

4.267 ± 0.027

8.168 ± 0.076

6.361 ± 0.084

9.749 ± 0.071

11.297 ± 0.133

4.694 ± 0.060

8.592 ± 0.148

9.711

10.109

-13.540

26.376

0.596

8.508

3.613

9.749

24.646

4.205

*Numbers in bold are significant at the 95 percent level.

As can be seen, every box with dashed line denotes a cluster of the net. The blue box means

the cluster including the same points between MAEA-SMCPs and ECA, while the red box

means the cluster including different points between the two algorithms. From the figures, it

is easy to see that the differences between the two clustering structures are the locations of the

points “6” and “18”. In Fig. 3, point “6” is in the left red box, and point “18” is in the right red

box, while in Fig. 4, point “6” is in the right red box, and point “18” is in the middle red box.

The characteristics of the clustering structure by MAEA-SMCPs are: MQ=2.353,

Intra-edges=32, Inter-edges=142, the number of clusters = 7, while the characteristics of the

clustering structure by ECA are: MQ=2.329, Intra-edges=34, Inter-edges=138, the number of

clusters = 7. Obviously, MAEA-SMCPs outperforms ECA in terms of MQ values, while ECA

15

beats MAEA-SMCPs in terms of intra-edges and inter-edges, which suggests that the ECA

approach tries to obtain the better MQ value with high intra-edges and low inter-edges, while

MAEA-SMCPs only considers the MQ value which may obtain the low intra-edges and the

high inter-edges. From the results of Table 6 we can see, although MAEA-SMCPs is a

single-objective approach, it outperforms MCA in most of the 17 problems, which indicates

that MAEA-SMCPs has a good capacity in dealing with the isolated clusters, and it has a

better optimizing ability than MCA does. Although MAEA-SMCPs is outperformed by ECA

in terms of intra-edges and inter-edges on some problems, its better MQ values indicates that

MAEA-SMCPs still has a potentiality to improve the optimizing results through the

optimization of the energy function.

Table 6

Comparison between MAEA-SMCPs and MCA, ECA in terms of Intra-edges and Inter-edges.

Name Intra-edges Inter-edges

MAEA-SMCPs MCA ECA MAEA-SMCPs MCA ECA

Mean ± STD Mean ± STD Mean ± STD Mean ± STD Mean ± STD Mean ± STD

mtunis

ispell

rcs

bison

grappa

bunch

Incl

27.000±0.000

32.000±0.000

41.233±0.142

45.200±0.223

99.400±0.300

101.433±0.333

141.667±1.649

24.633±2.092

23.100±3.220

45.133±15.335

40.367±8.231

84.767±11.190

73.567±8.324

91.767±14.024

27.000±0.000

30.033±2.798

47.567±7.859

52.800±6.217

101.167±8.301

111.700±5.305

140.200±3.836

60.000±0.000

142.000±0.000

243.533±0.285

267.600±0.446

391.200±0.600

525.133±0.667

436.667±3.298

64.733±4.185

159.800±6.440

235.733±30.669

277.267±16.463

420.467±22.380

580.867±16.648

536.467±28.048

60.000±0.000

145.933±5.595

230.867±15.719

252.400±12.434

387.667±16.601

504.600±10.611

439.600±7.673

icecast

gnupg

inn

bitchx

xntp

exim

mod_ssl

ncurses

lynx

nmh

1380.000±0.000

1242.133±1.133

1005.600±16.300

7469.000±0.000

1123.433±0.667

2847.167±0.969

2425.833±1.833

713.000±0.000

2885.500±0.025

2327.300±20.500

1609.900±294.921

1104.733±167.834

771.633±162.630

7644.633±2703.349

733.800±109.722

3279.300±563.781

2911.733±310.981

574.433±94.392

2428.567±863.007

2032.267±438.220

1643.167±208.189

1494.167±103.830

1336.900±190.263

7840.600±633.068

1117.967±54.502

3146.567±525.155

3476.800±244.174

806.367±57.515

3730.633±478.016

2704.600±236.782

8096.000±0.000

4916.667±2.267

5708.800±32.600

36290.000±0.000

3681.133±1.333

14759.667±1.938

14110.333±3.667

2794.000±0.000

22237.000±0.050

19331.400±41.000

7636.200±589.843

5192.530±335.669

6176.730±325.260

35938.700±5406.697

4460.400±219.445

12347.400±1127.563

12138.500±621.962

3071.130±188.785

23150.900±1726.014

19921.500±876.440

7569.670±416.378

4413.670±207.660

5046.200±380.526

35546.800±1266.136

3692.070±109.004

12612.900±1050.310

11008.400±488.348

2607.270±115.030

20546.700±956.032

18576.800±473.564

*Bold values indicate the best values among the three algorithms.

Tables 5-6 show that MAEA-SMCPs outperforms ECA on most problems in terms of the

averaged MQ, while ECA outperforms MAEA-SMCPs on most problems in terms of

intra-edges and inter-edges. Thus, we consider the location of solutions produced by

MAEA-SMCPs and ECA in Figs. 5-6, where intra-edges and MQ are chosen to see the

16

difference between them.

Fig.3. Clustering structure by MAEA-SMCPs Fig.4. Clustering structure by ECA

In Figs. 5 and 6, intra-edges and MQ are to be maximized, so the optimal solutions are

located in the uppermost and rightmost areas of the intra-edges and MQ space, while the least

optimal are located in the lowermost and leftmost area of the space. From the figures we can

see that each of two concentrates in a different region of the two objective search space. It is

apparent that MAEA-SMCPs focuses on the uppermost and leftmost areas of the space, while

ECA focuses on the lowermost and rightmost areas of the space in most problems. This

suggests that it is hard to say which algorithm is the better. Although the previous results

indicate that MAEA-SMCPs outperforms ECA in terms of MQ and ECA outperforms

MAEA-SMCPs in terms of intra-edges and inter-edges, the visualization of the result

locations in two-dimensional objective space indicates that each of the algorithms could not

produce the absolutely better results than the other one, but the solutions produced by

MAEA-SMCPs are more stable than ECA.

Next, we consider the location of solutions of MAEA-SMCPs and ECA in the intra-edges

and inter-edges space in Fig. 7. Finally, Table 7 reports the number of evaluations of the 3

algorithms (MAEA-SMCPs, MCA and ECA), which shows that the computational cost of

these 3 algorithm are similar.

17

24 25 26 27 28 29 301

1.5

2

2.5

3

3.5

Intra-edges

MQ

ECA

MAEA-SMCPs

27 28 29 30 31 32 33 34

2.24

2.26

2.28

2.3

2.32

2.34

2.36

2.38

2.4

2.42

2.44

Intra-edges

MQ

ECA

MAEA-SMCPs

(a) mtunis (b) ispell

30 35 40 45 50 55 60 652.16

2.18

2.2

2.22

2.24

2.26

2.28

2.3

Intra-edges

MQ

ECA

MAEA-SMCPs

35 40 45 50 55

2.62

2.64

2.66

2.68

2.7

2.72

2.74

Intra-edges

MQ

ECA

MAEA-SMCPs

(c) rcs (d) bison

85 90 95 100 105 110 115 12012.3

12.35

12.4

12.45

12.5

12.55

12.6

12.65

12.7

12.75

12.8

Intra-edges

MQ

ECA

MAEA-SMCPs

95 100 105 110 115 120 12513.25

13.3

13.35

13.4

13.45

13.5

13.55

13.6

13.65

13.7

Intra-edges

MQ

ECA

MAEA-SMCPs

(e) grappa (f) bunch

18

136 138 140 142 144 146

13.46

13.48

13.5

13.52

13.54

13.56

13.58

13.6

13.62

13.64

Intra-edges

MQ

ECA

MAEA-SMCPs

(g) incl

Fig.5. Solutions in the Intra-edges and MQ space for unweighted MDGs.

1200 1300 1400 1500 1600 1700 1800 1900 2000 2100

2.6

2.65

2.7

2.75

Intra-edges

MQ

ECA

MAEA-SMCPs

1150 1200 1250 1300 1350 1400 1450 1500 1550 1600 16506.75

6.8

6.85

6.9

6.95

7

7.05

7.1

Intra-edges

MQ

ECA

MAEA-SMCPs

(a) icecast (b) gnupg

900 1000 1100 1200 1300 1400 1500 16007.78

7.8

7.82

7.84

7.86

7.88

7.9

7.92

7.94

7.96

7.98

Intra-edges

MQ

ECA

MAEA-SMCPs

6500 7000 7500 8000 8500 90004.24

4.25

4.26

4.27

4.28

4.29

4.3

4.31

4.32

4.33

4.34

Intra-edges

MQ

ECA

MAEA-SMCPs

(c) inn (d) bitchx

19

1020 1040 1060 1080 1100 1120 1140 1160 1180 1200

8.12

8.14

8.16

8.18

8.2

8.22

8.24

Intra-edges

MQ

ECA

MAEA-SMCPs

2400 2600 2800 3000 3200 3400 3600 3800 4000

6.1

6.15

6.2

6.25

6.3

6.35

6.4

6.45

6.5

6.55

Intra-edges

MQ

ECA

MAEA-SMCPs

(e) xntp (f) exim

2200 2400 2600 2800 3000 3200 3400 3600 3800 4000

9.64

9.66

9.68

9.7

9.72

9.74

9.76

9.78

9.8

9.82

9.84

Intra-edges

MQ

ECA

MAEA-SMCPs

650 700 750 800 850 90011.15

11.2

11.25

11.3

11.35

11.4

11.45

11.5

11.55

Intra-edges

MQ

ECA

MAEA-SMCPs

(g) mod_ssl (h) ncurses

2800 3000 3200 3400 3600 3800 4000 4200 44004.5

4.55

4.6

4.65

4.7

4.75

4.8

4.85

4.9

4.95

5

Intra-edges

MQ

ECA

MAEA-SMCPs

1800 2000 2200 2400 2600 2800 3000 32008.3

8.4

8.5

8.6

8.7

8.8

8.9

9

9.1

9.2

9.3

Intra-edges

MQ

ECA

MAEA-SMCPs

(i) lynx (j) nmh

Fig.6. Solutions in the Intra-edges and MQ space for weighted MDGs.

20

28 29 30 31 32 33 34140

141

142

143

144

145

146

147

148

149

Intra-edges

Inte

r-ed

ges

ECA

MAEA-SMCPs

30 35 40 45 50 55 60200

210

220

230

240

250

260

270

Intra-edges

Inte

r-ed

ges

ECA

MAEA-SMCPs

(a) ispell (b) rcs

36 38 40 42 44 46 48 50 52 54 56245

250

255

260

265

270

275

280

285

290

Intra-edges

Inte

r-e

dg

es

ECA

MAEA-SMCPs

85 90 95 100 105 110 115 120350

360

370

380

390

400

410

420

Intra-edges

Inte

r-ed

ges

ECA

MAEA-SMCPs

(c) bison (d) grappa

Fig.7. Solutions in the Intra-edges and Inter-edges space for unweighted MDGs.

Table 7

Number of Evaluations of MAEA-SMCPs, ECA and MCA.

Name MAEA-SMCPs ECA/MCA

mtunis ispell

rcs bison

grappa bunch incl

icecast gnupg

inn bitchx xntp exim

mod_ssl ncurses

lynx nmh

300000 420000 550000

1180000 7850000

18320000 45580000 6830000

14780000 13550000 17560000 22440000 25450000 28540000 32560000 39890000 65430000

800000 1152000 1682000 2738000 14792000 26912000 60552000 7200000 15488000 16200000 18818000 24642000 27848000 36450000 38088000 43808000 78408000

5. Conclusions

In this paper, we propose a multi-agent evolutionary algorithm to solve SMCPs. The

experiments on practical problems illustrate the good performance of MAEA-SMCPs, and the

21

comparison also shows that MAEA-SMCPs outperforms two existing single-objective

algorithms and two existing multi-objective algorithms in terms of MQ. It could be further

studied through the optimization of the energy function, though the results of intra-edges and

inter-edges have great space to improve when compared with one of the multi-objective

algorithms, ECA. Meanwhile, the deviations obtained by MAEA-SMCPs are smaller than

those of existing algorithms, which implies that the performance of MAEA-SMCPs is more

stable. Besides, the number of evaluations of MAEA-SMCPs is less than that of other

algorithms, which further validates the effectiveness of MAEA-SMCPs. In this paper, we

only use MQ as the objective to solve SMCPs. Future research will focus on designing more

effective multi-objective algorithms for SMCPs.

Acknowledgments

This work is partially supported by the EU FP7-PEOPLE-2009-IRSES project under

Nature Inspired Computation and its Applications (NICaiA) (247619), the Outstanding

Young Scholar Program of National Natural Science Foundation of China (NSFC) under

Grant 61522311, the General Program of NSFC under Grant 61271301, the Overseas, Hong

Kong & Macao Scholars Collaborated Research Program of NSFC under Grant 61528205,

the Research Fund for the Doctoral Program of Higher Education of China under Grant

20130203110010, and the Fundamental Research Funds for the Central Universities under

Grant K5051202052.

References

[1] L. L. Constantine and E. Yourdon, Structured Design. Prentice Hall, 1979.

[2] K. Mahdavi, “A clustering genetic algorithm for software modularization with a multiple hill climbing

approach,” Ph.D. dissertation, Brunei University, U.K., 2005.

[3] B. S. Mitchell, “A heuristic search approach to solving the software clustering problem,” Ph.D.

dissertation, Drexel University, USA, 2002.

[4] S. Mancoridis, B. S. Mitchell, C. Rorres, Y. F. Chen, and E. R. Gansner, “Using automatic clustering

to produce high-level system organizations of source code,” Proc. Int’l Workshop Program

Comprehension, pp. 45-53, 1998.

[5] M. Harman, R. Hierons, and M. Proctor, “A new representation and crossover operator for

search-based optimization of software modularization,” Proc. Genetic and Evolutionary Computation

22

Conf., pp. 1351-1358, 2002.

[6] K. Mahdavi, M. Harman, and R. M. Hierons, “A multiple hill climbing approach to software module

clustering,” Proceedings of the International Conference on Software Maintenance, pp. 315-324,

2003.

[7] B. S. Mitchell and S. Mancoridis, “Using heuristic search techniques to extract design abstractions

from source code,” Proc. Genetic and Evolutionary Computation Conf., pp. 1375-1382, 2002.

[8] M. Harman, S. Swift, and K. Mahdavi, “An empirical study of the robustness of two module

clustering fitness functions,” Proceedings of the 2005 Conference on Genetic and Evolutionary

Computation, pp. 1029-1036, 2005.

[9] K. Mahdavi, M. Harman, and R. M. Hierons, “Finding building blocks for software clustering,” Proc.

of Genetic and Evolutionary Computation Conference, pp. 2513-2514, 2003.

[10] K. Praditwong, “Solving software module clustering problem by evolutionary algorithm,” Proc. of the

8th International Joint Conference Computer Science and Software Engineering, pp. 154-159, 2011.

[11] K. Praditwong, M. Harman, and X. Yao, “Software module clustering as a multi-objective search

problem,” IEEE Trans. Software Engineering, 37(2), pp. 264-282, 2011.

[12] W. Zhong, J. Liu, M. Xue, and L. Jiao, “A multiagent genetic algorithm for global numerical

optimization”, IEEE Trans. on Systems, Man, and Cybernetics, Part B, 34(2): 1128-1141, 2004.

[13] J. Liu, W. Zhong, and L. Jiao, “A multiagent evolutionary algorithm for constraint satisfaction

problems,” IEEE Trans. on Systems, Man, and Cybernetics, Part B, 36(1), 54-73, 2006.

[14] J. Liu, W. Zhong, and L. Jiao, “A multiagent evolutionary algorithm for combinatorial optimization

problems,” IEEE Trans. on Systems, Man, and Cybernetics Part B, 40(1), 229-240, 2010.

[15] R. S. Pressman, Software Engineering: A Practitioner's Approach, 6th ed., McGraw-Hill Higher

Education, 2005.

[16] M. Tasgin, A. Herdagdelen, and H. Bingol, “Community detection in complex networks using genetic

algorithms,” arXiv: 0711.0491, 2007.

[17] V. R. Basil, and A. J. Turner, “Iterative enhancement: A practical technique for software

development,” IEEE Trans. Software Engineering, vol. SE-1, no. 4, pp. 390-396, 1975.

[18] D. Doval, S. Mancoridis, and B. S. Mitchell, “Automatic clustering of software systems using a

genetic algorithm.” Proceedings of IEEE conference on Software Technology and Engineering

Practice (STEP'99), pp. 73-81, 1999.

[19] A. C. Kumari and K. Srinivas, “Software module clustering using a fast multi-objective

hyper-heuristic evolutionary algorithm,” International Journal of Applied Information Systems, vol. 5,

no. 6, pp. 12-18, 2013.

[20] A. C. Kumari, K. Srinivas, and M. P. Gupta, “Software module clustering using a hyper-heuristic

based multi-objective genetic algorithm,” Advance Computing Conference (IACC), 2013 IEEE 3rd

International, pp. 813-818, 2013.

[21] A. S. Mamaghani, and M. R. Meybodi, “Clustering of software systems using new hybrid algorithms.”

Proceedings of the Ninth IEEE International Conference on Computer and Information Technology

(CIT'09), vol. 1, 2009.

[22] R. C. Holt, and J. R. Cordy. The Turing Programming Language. Communications of the ACM, vol. 31,

no. 12, pp. 1410-1423, 1988.

[23] R. C. Holt. Concurrent Euclid, The UNIX System and Tunis. Addison Wesley, Reading, Massachusetts,

1983.

[24] S. D. Hester, D. L. Parnas, and D. F. Utter, “Using documentation as a software design medium.” Bell

23

System Technical Journal, vol. 60, no. 8, pp. 1941-1977, 1981.

[25] K. Praditwong and X. Yao, “A new multi-objective evolutionary optimisation algorithm: the

two-archive algorithm,” Proc. Int’l Conf. Computational Intelligence and Security, Y.-M. Cheung, Y.

Wang, and H. Liu, eds., vol. 1, pp. 286-291, 2006.

[26] S. C. Choi and W. Scacchi, “Extracting and restructuring the design of large systems,” IEEE Software,

vol. 7, no. 1, pp. 66-71, Jan. 1990.

[27] R. Lutz, “Recovering high-level structure of software systems using a minimum description length

principle,” Proc. 13th Irish Conf. Artificial Intelligence and Cognitive Science, Sept. 2002.

[28] D. H. Hutchens and V. R. Basili, “System structure analysis: clustering with data bindings,” IEEE

Trans. Software Engineering, vol. 11, no. 8, pp. 749-757, Aug. 1985.

[29] R. Koschke, “Atomic architectural component recovery for program understanding and evolution,”

PhD thesis, Inst. For Computer Science, Univ. of Stuttgart, 2000.

[30] S. Mancordis, B. S. Mitchell, Y. Chen, and E. R. Gansner, “Bunch: a clustering tool for the recovery

and maintenance of software system Structures”, in Proc. Of Int. Conf. of Software Maintenance, pp.

50-59, 1999.

[31] M. Harman, S. Swift, and K. Mahdavi, “An empirical study of the robustness of two module

clustering fitness functions”, in Proc. Of the 7th annual Conf. on Genetic and Evolutionary

Computation, pp. 1029-1036, 2005.

Date post:	10-Sep-2018
Category:	Documents
Upload:	leanh
View:	222 times
Download:	0 times

A Multi-agent Evolutionary Algorithm for Software Module …xin/papers/MAEA-SMCPs.pdf · 1 A...

Documents