+ All Categories
Home > Education > A Modified Genetic Algorithm based Load Distribution Approach towards Web Hotspot rescue

A Modified Genetic Algorithm based Load Distribution Approach towards Web Hotspot rescue

Date post: 11-May-2015
Category:
Upload: idescitation
View: 42 times
Download: 0 times
Share this document with a friend
Description:
Web hotspot is a serious problem often experienced in case popular websites. It provides dramatic load spike in a website, which occurs when a huge number of users accessing the same website. A prominent solution to this problem is server load balancing. Dynamic load balancing involves allocation of requests to the server or processor dynamically when they arrive. For effective load balancing, a near-optimal schedule of incoming requests or processes must be determined “on-the-fly”, so that execution of requests can be completed in shortest possible time. So we have proposed a Genetic Algorithm based load balancing scheme which relies on a process scheduling policy. Genetic Algorithm provides to search for the optimal solution out a search of candidate solutions. It follows the survival-of-the-fittest principle, to achieve the optimal solution, through a number of generations. The proposed algorithm is evaluated for various population size and number of generations, to maximize the processor utilization of nodes/ processors in the system.
Popular Tags:
11
A Modified Genetic Algorithm based Load Distribution Approach towards Web Hotspot rescue Debashree Devi 1 , Y. Jayanta Singh 2 1, 2 Department of Computer Science & Engineering and Information Technology, DonBosco College of Engineering and Technology, Assam DonBosco University, Guwahati, India Email: [email protected] Email: [email protected] AbstractWeb hotspot is a serious problem often experienced in case popular websites. It provides dramatic load spike in a website, which occurs when a huge number of users accessing the same website. A prominent solution to this problem is server load balancing. Dynamic load balancing involves allocation of requests to the server or processor dynamically when they arrive. For effective load balancing, a near-optimal schedule of incoming requests or processes must be determined “on-the-fly”, so that execution of requests can be completed in shortest possible time. So we have proposed a Genetic Algorithm based load balancing scheme which relies on a process scheduling policy. Genetic Algorithm provides to search for the optimal solution out a search of candidate solutions. It follows the survival-of-the-fittest principle, to achieve the optimal solution, through a number of generations. The proposed algorithm is evaluated for various population size and number of generations, to maximize the processor utilization of nodes/ processors in the system. Index Terms— Dynamic Load Balancing, Genetic Algorithm, Server load balancing, Web hotspot. I. INTRODUCTION With the rapid increase in the no. of internet users, it is obvious for a website to get a millions of hits per day. For popular websites, e.g. social networking website, online audio/ video playing website etc, this rapid increase in load may cause serious problem. Addition to this, the rapid development of internet applications, diversifies the services offered by popular websites. These services are real-time and dynamic. Hence handling of all these requests by one single server will lead to a situation of overloading. Technically such a situation can be termed as “web-hotspot”. Situations like web-hotspot generally stay for a very short period of time [1]. But it can seriously degrade the performance of a website. The use of a high performance system as a solution would be very costly. We can use a flexible web server system, which is scalable with the changing load in the website. But it also costs a high amount of money, as it leads to more hardware requirements. The concept of load balancing is not that much old. In 1995, when the Internet was first introduced, it was only used for some academic purposes. But as soon as it was introduced in to the business world, people started to use the internet for various tasks. With increasing number people accessing the internet, number of issues has to take care in order to provide good service to the customers. This is where the concept of load balancing lies. Load balancing can be defined as a form of system performance evaluation, analysis and DOI: 03.LSCS.2013.6.574 © Association of Computer Electronics and Electrical Engineers, 2013 Proc. of Int. Conf. on Computational Intelligence and Information Technology
Transcript
Page 1: A Modified Genetic Algorithm based Load Distribution Approach towards Web Hotspot rescue

A Modified Genetic Algorithm based Load

Distribution Approach towards Web Hotspot rescue

Debashree Devi1, Y. Jayanta Singh2 1, 2 Department of Computer Science & Engineering and Information Technology,

DonBosco College of Engineering and Technology, Assam DonBosco University, Guwahati, India Email: [email protected] Email: [email protected]

Abstract—Web hotspot is a serious problem often experienced in case popular websites. It provides dramatic load spike in a website, which occurs when a huge number of users accessing the same website. A prominent solution to this problem is server load balancing. Dynamic load balancing involves allocation of requests to the server or processor dynamically when they arrive. For effective load balancing, a near-optimal schedule of incoming requests or processes must be determined “on-the-fly”, so that execution of requests can be completed in shortest possible time. So we have proposed a Genetic Algorithm based load balancing scheme which relies on a process scheduling policy. Genetic Algorithm provides to search for the optimal solution out a search of candidate solutions. It follows the survival-of-the-fittest principle, to achieve the optimal solution, through a number of generations. The proposed algorithm is evaluated for various population size and number of generations, to maximize the processor utilization of nodes/ processors in the system. Index Terms— Dynamic Load Balancing, Genetic Algorithm, Server load balancing, Web hotspot.

I. INTRODUCTION

With the rapid increase in the no. of internet users, it is obvious for a website to get a millions of hits per day. For popular websites, e.g. social networking website, online audio/ video playing website etc, this rapid increase in load may cause serious problem. Addition to this, the rapid development of internet applications, diversifies the services offered by popular websites. These services are real-time and dynamic. Hence handling of all these requests by one single server will lead to a situation of overloading. Technically such a situation can be termed as “web-hotspot”. Situations like web-hotspot generally stay for a very short period of time [1]. But it can seriously degrade the performance of a website. The use of a high performance system as a solution would be very costly. We can use a flexible web server system, which is scalable with the changing load in the website. But it also costs a high amount of money, as it leads to more hardware requirements. The concept of load balancing is not that much old. In 1995, when the Internet was first introduced, it was only used for some academic purposes. But as soon as it was introduced in to the business world, people started to use the internet for various tasks. With increasing number people accessing the internet, number of issues has to take care in order to provide good service to the customers. This is where the concept of load balancing lies. Load balancing can be defined as a form of system performance evaluation, analysis and DOI: 03.LSCS.2013.6.574 © Association of Computer Electronics and Electrical Engineers, 2013

Proc. of Int. Conf. on Computational Intelligence and Information Technology

Page 2: A Modified Genetic Algorithm based Load Distribution Approach towards Web Hotspot rescue

10

optimization, which is used to distribute the load constantly assigned to a single server, across a network of processing elements or servers, so as to equalize the load among the servers, at any point of time. A most commonly used load balancing technique is DNS Round Robin, a DNS-based load balancing process. This technique provides a function to associate more than one IP address to a single hostname, as shown in Fig.1 [16]. For e.g., the hostname, www.vegan.net, is associated with multiple IP addresses, provided to distribute the traffic evenly among the IP addresses. However DNS Round Robin ended up with limitations like caching issues, traffic distribution etc. Now-a-days, the SLB (server load balancing) process is quite effective in context to solve problems like redundancy, scalability and server management. SLB generally comes with components like VIP (virtual IP address), server, user access levels, redundancy, persistence, service checking, load balancing algorithms etc. Load balancing algorithms are mathematically programmed into SLB device. They are assigned to individual VIPs. There is a number of load balancing algorithm, those can be categorized as global or local, static or dynamic, centralized or distributed etc. In our proposed system, the concept of SLB is implemented, with application of an optimization algorithm, namely Genetic Algorithm. Genetic algorithm combines the exploitation of previous results with the exploration of new solutions of the search space. It generally follows the survival-of-the-fittest technique [2]. Genetic algorithm provided to maintain a population of candidate solutions that evolves over time and ultimately converges to give the optimal solution. In a population, individuals are represented by chromosome, which is represented as a string of bits. To evolve the best solution and to implement natural selection, an objective function is defined, which helps to measure a candidate solution’s relative fitness. The domain of our key problem is distributed system. Generally a distributed system comprises of a no. of computers, which acts as client, accessing services from another set of computers, which acts as servers. The most common example of distributed system is the World Wide Web (www). The WWW, everyday it intercepts a large traffic, directing it to a web server system.The purpose of web server is to store information and serve client requests. A web server system is consisting of multiple web server hosts, running a number of web applications simultaneously. Dynamic load balancing comes with the need of allocating servers/ resources to client requests, at the moment they arrive. It is “mission-critical” as it is unpredictable to determine the incoming load. It involves key issues like task migration and load sharing. According to Ref. [3], Load sharing provides to manage the tasks in the system in such a way that no processor in the system is idle. Generally a process is migrated to another processor if the migration cost or overhead is less than some predetermined matrix, in order to improve processor utilization. Migration of processes generally requires more hardware requirements, which in turn leads to increase the cost of execution. The load balancing problem strategy tries to ensure that the processors or servers in the system are equally loaded and every processor or server does same number of request processing. After receiving of the requests, a good scheduling policy should be maintained which can assure assigning of requests, to appropriate servers, within the shortest execution time. In this paper, we have considered the problem of load balancing is a process scheduling policy which takes every incoming request as one process and assign it to a processor or server for processing. The rest of the paper is organized as follows: In section 2, a brief description of the related works has given. In section 3, theoretical details about the Genetic Algorithm are described. In section 4, the system and process model is introduced. Section 5 included the proposed Genetic Algorithm based load balancing approach. The implementations and results are discussed in section 6. Section 7 includes the conclusion part.

Figure 1: DNS round Robin mechanism

Page 3: A Modified Genetic Algorithm based Load Distribution Approach towards Web Hotspot rescue

11

II. RELATED WORKS

Web hotspot being a serious problem as it degrades the quality of the website. Manual control on this whole process would surely affect the website quality. Ref. [1] defined DotSlash autonomic rescue system, given by Weibin Zhao, provided a solution to this problem. In order to solve the problem, DotSlash enables the web site to create a distributed web server system on the fly, adaptive to the changing environment. In the design model of DotSlash autonomic rescue system, a cost effective mechanism was applied to handle the increase request load. According to it, different web sites can form a mutual-aid community of web servers, so that in case of critical period it can use the spare capacity of other web sites in the community. The working of DotSlash rescue system can be described simply by the following steps:

Dynamic Virtual Hosting Request Redirection Workload Monitoring Rescue Control Service Recovery.

A. Dynamic load balancing Approaches In client based approach, requested documents can be routed to any replicated web server even when the nodes are loosely (or not) coordinated. Routing of requests to the web clusters can be done by either Web-clients or by Client-side proxy servers [4]. DNS- based approach provides to overcome the limitations of client based approach as it uses request routing mechanism in the cluster side. The cluster DNS or the authoritative DNS server for the distributed Web server system’s nodes, translates the URL to the IP address of one server, so as provides architecture transparency at the URL-level [4] [5]. Based on the scheduling algorithm, used by the cluster DNS, to balance the load on the Web server’s node, the DNS-based approach can be categorized into Constant TTL Algorithm and Adaptive or Dynamic TTL Algorithm. Cluster based approach for peer-to-peer system; B. Mortazavi and G. Kesidis [6] have used a reputation framework, based on which they have designed a game, in which players play in order to receive maximum files from the system. Brighten Godfrey and et al. [7] has proposed an algorithm for load balancing for heterogeneous and dynamic P2P system. Kalman Graffi et al [8] have used a DHT-based information gathering and system analyzing technique. Ananth Rao et al [9] to address the load balancing problem in p2p system have proposed an algorithm, which gives the idea of virtual server. Song Fu et al [10] has characterized the behaviour of randomized search algorithms in the general P2P environment. In case of dispatcher based approach, Harikesh Singh et al. [5] have addressed an advanced DNS dispatching technique provided to distribute the HTTP requests from the clients, by using Round Robin and proximity based scheduling algorithm. Many of the approaches of load balancing involved optimization techniques like Fuzzy logic, Genetic Algorithm also. Load balancing problem is known to be NP- hard in context to number of requests versus number of machines/ servers. It leads to search for an optimum solution to this problem. Yu-Kwong Kwok and et al [14] defined a new dynamic fuzzy-decision-based load balancing system incorporated in a distributed object computing environment. With the help of conventional control theory, the sudden increase in the load was considered as an external force to the system. A feedback mechanism is maintained which provide to minimize the effect of the external force to the system. A Genetic Algorithm based approach was introduced by Bibhudatta Sahoo et al [15] for dynamic load distribution in heterogeneous distributed system. It has defined the load balancing as a job scheduling mechanism, comparing the proposed system with two scheduling policies like LERT-MW and LERT-MWM. Priyanka Gonade et al [3] defined a modified Genetic Algorithm approach with an objective function for minimum load deviation of a node.

III. GENETIC ALGORITHM- THEORETICAL CONCEPT

Genetic Algorithm (GA) is search based method which works based on the principle of natural selection and genetics. It is a model based on search methods, provided to obtain the optimal solution out of a search space consists of a population of potential solution. This algorithm follows the principle of survival of the fittest, where each individual presents a point in problem solution’s search space. An individual which represents a candidate solution can be expressed as string of bits, referred to as chromosomes. Each chromosome is

Page 4: A Modified Genetic Algorithm based Load Distribution Approach towards Web Hotspot rescue

12

composed of variables called genes and values associated with the genes are termed as alleles. To evolve the best solution and to implement natural selection, an objective function is defined, which helps to measure a candidate solution’s relative fitness. The objective function is an important concept as it is used subsequently used by the GA to guide the evolution of best solutions. After the problem is encoded in a chromosomal manner and an objective function has been chosen, solution to the search problem can be evolved by using the following steps [11]:

INITIALIZATION: The initial population of candidate solutions is usually generated randomly across the search space.

EVALUATION: After initialization of the population, the fitness values of all the candidate solutions are evaluated by using the objective function.

SELECTION: Selection provides to select those solutions with higher fitness value to the next generation and thus imposes the survival-of-the-fittest mechanism on the candidate solutions. The main idea of selection is to prefer better solutions to worse ones, and many selection procedures have been proposed to accomplish this idea. Some of the selection techniques are roulette-wheel selection, stochastic universal selection, ranking selection and tournament selection.

RECOMBINATION: Recombination provides to combine parts of two or more parental candidate solutions to create a new, possibly better solutions, termed as offspring. The offspring under recombination will not be identical to any particular parent and will instead combine parental traits in a novel manner [13].

MUTATION: The task of mutation is to locally but randomly modify a solution. It generally involves changing one or more traits of an individual. We can say that the mutation performs a random walk in the space of the candidate solutions.

REPLACEMENT: The offspring population created by selection, recombination, and mutation replaces the original parental population. Many replacement techniques such as elitist replacement, generation-wise replacement and steady-state replacement methods are used in GAs.

TERMINATION CONDITIONS OR STOPPING CONDITIONS: Termination conditions are generally problem dependent. Some general stopping conditions are obtaining of optimal solution, same fitness value for more than one generation, consecutively etc.

A. Basic Genetic Algorithm Operators SELECTION OPERATOR: The basic selection techniques can be distinguished into two categories:

FITNESS PROPORTIONATE SELECTION This includes methods such as roulette-wheel selection and stochastic universal selection [11]. In roulette-wheel selection, each individual in the population is assigned a roulette wheel slot sized according to its fitness value. Thus a better solution will have a larger slot than a less fit solution.

ORDINAL SELECTION This includes methods such as tournament selection and truncation selection [11]. In tournament selection, s-number of chromosomes are selected in random and put in tournament against each other. The fittest group with k-number of individuals is selected as the parent. RECOMBINATION OPERATOR: After selection, individuals from the offspring pool are recombined (or crossed over) to create new, hopefully better, offspring. In recombination process, two individuals are selected randomly and recombined with predefined probability, pc, termed as crossover probability. A uniform random number, r is defined which is compared with the pc. If r<= pc, then the individuals are recombined and if r> pc, then individuals are simply taken to be the copy of their parents. A pseudo code for the above mechanism is given below: Pseudo Code: [1] Start [2] Define r any random number [3] Define pc, pc= crossover probability [4] If r <= pc

Page 5: A Modified Genetic Algorithm based Load Distribution Approach towards Web Hotspot rescue

13

[5] then perform recombination [6] else [7] copy the parents to the next generation. [8] End MUTATION OPERATOR: The significance of mutation operator is to add diversity to the population and to ensure the exploration of entire search space. Mutation is the primary variation/search operator, which is performed with low probability in GA. Bit-flip mutation is the most common mutation technique used. A mutation probability, pm is defined, according to which each bit in a binary string is changed (as 0 is converted to 1, and vice versa). REPLACEMENT OPERATOR: Replacement techniques are used to introduce the newly generated offspring into the parental population. Some of the replacement techniques are:

Delete-All: It provides to delete all the individuals in a current population and replace them with same number of newly created offspring.

Steady-State: This technique provides to delete n-number of old members and replace them with n-number of new offspring. The number to delete and replace, n, at any one time is a parameter to this deletion technique.

Steady-state-no-duplicates: While replacing n-number of parents with n-number of offspring, this technique ensures that no duplicate chromosomes are added to the population.

IV. SYSTEM AND PROCESS MODEL

In this paper, the problem of load balancing implements a process scheduling policy. Every incoming client requests are taken as one process. We have to find the optimum schedule according to which the processes/ requests are allocated to different servers, according to their demand. Process scheduling mechanism can be implemented into two phases:

PROCESS DISTRIBUTION: Provide to distribute the load equally on the processor.

PROCESS EXECUTION ORDERING: Genetic algorithm concept is used in this stage. GA provided to search random search methods that mimic the principle of evolution mad natural selection. From an entire solution space, GA provided to search for the optimal solution. Every request that arrived at the distributed server system is considered as one process. A request queue is defined which will entry each request in it, i.e. received every request is put into the queue. They are taken out from the queue for processing in a FCFS order. Let P= (p1, p2, p3.........pn) denoted the set of processor or server in the distributed system. Constraint is applied as one processor can execute only one request at a time. J= (j1, j2, j3............jm) denotes the set of processes to be executed A n×m assignment matrix, where the value pik , 1 < i < n, 1 < k < m; denotes number of times a process, pi is allocated to a specific server, jk. With every schedule, the matrix gets increased. The process scheduling mechanism can be depicted by the Fig. 2 [3] [12]. As shown in Fig.2, represents the request queue and 1, 2,.........m represents the processors or severs in the distributed system. In this paper it is considered the underlying system architecture has the following components, as shown in figure 3.

Clients Forwarding Machine MASTER Servers

Page 6: A Modified Genetic Algorithm based Load Distribution Approach towards Web Hotspot rescue

14

Figure 2: process scheduling mechanism

Clients are connected through a network to the distributed server system. A distributed server system consists of a number of servers, interconnected to each other. When clients have send requests, it is basically received by the FMs (Forwarding machine), which are responsible for forwarding the requests to the servers. Behind every FM, a server or sometimes a cluster of servers is present; to process different requests according to their demand. The MASTER performs the role of server load balancer. Server load balancing can be defined as the process of distributing the traffic occurred in a web site, among a number of servers, using a Network-based device. Generally it is a user- transparent process. In our underlying system model, the MASTER is responsible for taking decisions about the process/ request assignment. To avoid deadlock-type situation, the process scheduling mechanism must fulfil the following two constraints: Time Constraint (CT): Processes/ requests with same demand can’t be allocated to a server simultaneously. There should be a specific time-interval while allocating processors/servers, to different requests/ processes. Activity Constraint (CA): No server should be active or ideal forever. After a specific time period, each server should assign to sleep (idle) state, while other servers processing the requests. To simply define the load balancing problem, suppose we have a set of n-requests or tasks, which we have to assign to m- machines/ servers. We are given an array of non-negative elements, T[1,2,….n], where the value T[i] represents the running time of a task, i. The assignment is given by the assignment matrix.

A. Performance Metrics: To evaluate the performance of the proposed model, we have considered the following metrics PROCESSOR LOAD: It is defined as the number of processes allocated to a specific process/ server. It is denoted by load(pi), gives the total number of processes a processor has which is the sum of number of processor already allocated to that processor and the newly assigned processes to that processor. Mathematically,

퐿표푎푑(푝푖) = ∑ 푛표.표푓푎푙푟푒푎푑푦푎푙푙표푐푎푡푒푑푝푟표푐푒푠푠표푛푝푟표푐푒푠푠표푟, 푖 푎 , + ∑ 푛표. 표푓푛푒푤푙푦푎푙푙표푐푎푡푒푑

푝푟표푐푒푠푠표푛푝푟표푐푒푠푠표푟, 푖푎 , (1)

MAKESPAN: It is defined as the maximum finishing time or total execution time required to complete the maximum load on any processor, pi, at any time, t. Mathematically,

푀푎푘푒푠푝푎푛 = max 푒 푚푎푥 푙표푎푑(푝 ) (2)

PROCESSOR UTILIZATION: Processor utilization for any processor, pi is obtained by dividing the processor load, load (pi), by the value of makespan. Mathematically,

푢푡푖푙푖푧푎푡푖표푛(푝 ) = ( )( )

(3)

Page 7: A Modified Genetic Algorithm based Load Distribution Approach towards Web Hotspot rescue

15

Figure 3: Underlying system architecture

Average processor utilization is given by:

퐴푣푒푟푎푔푒푢푡푖푙푖푧푎푡푖표푛 = ∑ . ( ) .

(4)

V. PROPOSED GENETIC ALGORITHM BASED APPROACH

Generally the Genetic Algorithm provides an efficient way to search for an optimal solution. The algorithm starts by randomly generating an initial population of possible solutions. In this paper, the proposed GA-implemented load balancing provides distribution of processes, among different processor or server, based on processor load. When a process is assigned to a processor, the processor load is updated with the latest assigning process to that processor or server, which is given by the assignment (n×m) matrix. In context to our problem, the initial population is created by randomly taking incoming requests/ processes. A request queue is defined with all un- processed requests/ processes within it. After a specific time- interval, the request/ processes are taken out from the queue in FCFS order and randomly allocated to the processors (servers). Then each schedule is evaluated according to a fitness function. Two best schedules are selected, to produce the next generation. Mutation and crossover functions are performed over the selected schedules to produce schedules with higher fitness value, in order to maintain the population size. In every generation, individuals are evaluated with the fitness function and less fit solutions are got rejected. The algorithm can be implemented in the following phases:

INITIALIZATION PHASE: Genetic algorithm provides to search from a large population of individuals. The initial population is created by randomly selecting the processes/ requests from the request queue, in FCFS order and then randomly assigned to processors (servers). The order of assigning the processes is taken as a condition. The initial population is obtained by swapping the orders of assignment of processes, for a fixed number of times.

EVALUATION PHASE: Evaluation phase provides to find a quality measure to determine how fit one individual is among the population. In context to our key problem, “web hotspot”, where the load on the server suddenly get increased to high; we define the fitness of a schedule as the number of un-processed requests. This is because our foremost aim is to find an optimum solution for distributing load among the processors (servers), so that there is a response for every incoming processes/ requests. The fittest schedule will have zero un-processed requests. The fitness function can be defined as:

푓(푠) = {푢 = 0,∀푖 = 1,2, … . 푛} (5)

So, a schedule, s is said to be fittest if there is no un-processed request. This fitness function is applied to find the fitness of individuals in the initial population.

SELECTION PHASE: Selection phase provides to get more copies of the solutions with higher fitness value and hence survival-of –the-fittest mechanism can be implemented on the candidate solution. This will improve the total fitness of

Page 8: A Modified Genetic Algorithm based Load Distribution Approach towards Web Hotspot rescue

16

the population. For selection of individuals, a quality measure, q is defined where q=u. Therefore, schedules with less number of un-processed requests/ processes are selected to the next generation.

RECOMBINATION PHASE: Recombination phase provides to create new and better offspring by combining two or more parental solutions, selected through the selection phase. There are number of recombination operators are defined to accomplish this, as follows:

Crossover It involves exchanging parts of information between two randomly selected individuals. Two best individuals are randomly selected and from one string, a process, ji is selected at random and put it into the second string. For two process, ji and jk, for ji, jk € J, both the processes are exchanged between the two individuals. For one of the parents being the best individual, then we simply mutate the second string.

Mutation It involves change the gene values in the chromosomes. It replaced the gene value with a new value selected from a definite domain for the gene. For mutating two processes, two number r and c are defined with the conditions:

i) r # c ii) Set r is not empty.

From set, r one process is selected at random and replaces it on c. TERMINATION PHASE:

Stopping conditions are defined as when the programme encounters these situations, it will get terminated. Some of the stopping conditions are:

Reached maximum number of generations, Obtaining equal fitness for number of generations. Obtaining desired solution.

All these steps are combined to give the GA based load balancing algorithm. Here is the algorithm: Algorithm:

GA based Load Balancing { [1] Initialization [2] Load Checking [3] Repeat through step [9] until request_queue is empty Until topping conditions are TRUE { [4] Randomly create the initial population [5] Apply fitness function [6] Choose two best individuals from the population [7] Crossover the selected individuals [8] Mutate the child [9] Replace the worse individuals in the population with best ones } } [10] End

The whole mechanism is shown in figure 4.

VI. IMPLEMENTATION AND RESULTS

We have implemented our proposed algorithm on Pentium Core-i3-540 with 3.06 GHz processor and with 500GB HDD and 4GB RAM. We have used JDK 1.6 as the coding language and Netbeans IDE 7.0.1 as the front-end tool. For application of our proposed Genetic Algorithm to solve the load balancing problem, we have set the parameters as follows:

Population size: It defines the number of processes/ requests taken at random, in every execution. The population size will vary from 20- 100.

Number of Generations: It defines the number of cycles the algorithm is run, to converge towards the optimal solution.

Page 9: A Modified Genetic Algorithm based Load Distribution Approach towards Web Hotspot rescue

17

Figure 4: Flowchart for GA based load balancing mechanism

MI (million instructions): It defines process length, i.e. the number of instructions, each process contain as processing requirements. It varies from 1-10 MI per request.

For simulating our proposed algorithm, we have implemented it with GridSim 5.4 toolkit. The GridSim toolkit comes up with multiple entities like users, brokers, Resource, GIS (Grid Information Service) and Input-Output. In our implementation of Genetic Algorithm based load balancing, for each process, is given as input with varying processing times and input file size. The “Gridlet” Package contains all the information related to a process and its execution. During simulation, GridSim provides scheduling of processes/ jobs based on two events, either time-shared or space-shared. For easier implementation of our proposed algorithm, we have considered the space-shared scheduling for simulation. For every incoming process/ request, this scheduling provides to allocate the machines/ servers immediately, if there are available machines. Otherwise the processes are queued. During Gridlet assignment, processing time for each request is determined and an event is scheduled. After completion of execution of scheduled Gridlet process, the resource simulator frees the machines/ servers and checks for request in the queue. Then it is assigned to available machines/ servers. In Table 1, a statistical scenario of a space-shared scheduling is given, for four Gridlet processes, with processing requirements are 6.5, 4.6, 10 and 8 MI respectively. We have simulated the proposed algorithm for different population size and number of generations, and then evaluate the GA convergence to maximize the processor utilization. The results are shown in Fig. 5 and Fig. 6 respectively.

TABLE I: A SCHEDULING STATISTICS SCENARIO FOR SPACE-SHARED RESOURCES IN GRIDSIM

Gridlet Numbers

Request Length (MI)

Arrival time (a)

Start Time (s)

Finish Time (f)

Elapsed Time (f-a)

G1 6.5 0 0 6.5 6.5

G2 4.6 4 4 8.6 4.6

G3 10 6 6.5 16.5 10.5

G4 8 8 8.6 16.6 8.6

Page 10: A Modified Genetic Algorithm based Load Distribution Approach towards Web Hotspot rescue

18

Figure 5: GA convergence for various populations with respect to processor utilization

VII. CONCLUSION

With the rapid increase in the number of internet users, the problem of load balancing is becoming “mission-critical” as it has to cover issues like redundancy, scalability, flexibility, QoS etc. in its solution. In this paper, we have considered the web hotspot as our key problem, with distributed system as our problem domain. We have proposed a GA approach for load distribution. Our aim was to assign the requests among the servers in such a way that every request get processed, even in situations like “web hotspot”, where the load in the site get suddenly increased to very high. While implementing GA, we have formulated the load balancing problem as a process scheduling policy. A modified genetic algorithm is introduced with an objective function equal to number of un-processed requests/ processes in a random population. We have simulated our proposed model by using GridSim distributed system simulator, with space-shared scheduling of resources. The results have shown that with different population size, the GA convergence for maximizing processor utilization is obtained well. Even with the increasing number of processes, the proposed algorithm converges towards the optimal solution. For varying number of generations, it has given a near optimal result. Through the work, we did not compare our work with any previous work. Besides more number of parameters have to be evaluated, for various situation. In our future work, we would try to simulate our proposed algorithm for more number of parameters, and in different problem domain.

VIII. ACKNOWLEDGEMENT

While concluding the paper, we would like to thank those people who endowed upon us their constant guidance and encouragement during the work. We would like to thank and express our sense of gratitude to the faculty members of Department of Computer Science Engineering & Information Technology, for their kind help and encouragement. Lastly we express our gratitude to our parents and all the friends who helped us in one way or the other.

REFERENCES

[1] Weibin Zhao, “Towards Autonomic Computing: Service Discovery and Web Hotspot Rescue”, COLUMBIA UNIVERSITY, 2006.

[2] Albert Y. Zomaya, Yee-HweiTeh, “Observations on Using Genetic Algorithms for Dynamic Load-Balancing”, IEEE, transactions on parallel and distributed systems, Vol. 12, Number. 9, September 2001.

[3] Priyanka Gonnade, Sonali Bodkhe, “An Efficient load balancing using Genetic algorithm in Hierarchical structured distributed system”, International Journal of Advanced Computer Research, Volume-2 Number-4 Issue-6 December-2012.

[4] Valeria Cardellini, Michale Colajanni, Phillip S.Yu, “ Dynamic Load balancing web Server System”, IEEE Internet Computing, vol. 3, Number 3, PP 28-39, May-June 1999.

Processor Utilization

0102030405060708090

100

0 10 20 30 40 50 60 70 80 90

---->Nos. of processor

--->U

tiliz

atio

n%

Population size=80Popualtion Size=40Popualation size= 100

Page 11: A Modified Genetic Algorithm based Load Distribution Approach towards Web Hotspot rescue

19

Figure 6: Performance comparison of processor utilization with respect to number of generations

[5] Harikesh Singh, Dr. Shishir Kumar, “Dispatcher Based Dynamic Load Balancing on Web Server System”, International Journal of Grid and Distributed Computing Vol. 4, Number. 3, September, 2011.

[6] B. Mortazavi and G. Kesidis, "Cumulative Reputation Systems for Peer-to-Peer Content Distribution", in proceedings of IEEE Annual Conference on Information Sciences and Systems, PP 1546- 1552, 22-24 March 2006.

[7] BrightenG odfrey, KarthikLakshminarayanan, Sonesh Surana, Richard Karp, IonStoica, “Load Balancing in Dynamic Structured P2P Systems”, IEEE INFOCOM 2004.

[8] Kalman Graffi, Sebastian Kaune, Konstantin Pussep, Aleksandra Kovacevic, Ralf Steinmetz, “Load Balancing for Multimedia Streaming in Heterogeneous Peer-to-Peer Systems” NOSSDAV, Braunschweig, Germany, 2008.

[9] Ananth Rao, Karthik Lakshminarayanan, Sonesh Surana, Richard Karp, Ion Stoica, “Load Balancing in Structured P2P Systems”, Elsevier Science Publishers B. V. Amsterdam, The Netherlands, volume 63, Issue 3, March 2006.

[10] Song Fu, Cheng-Zhong Xu, Haiying Shen, “Random Choices for Churn Resilient Load Balancing in Peer-to-Peer Networks”, In Proceedings of the 22nd ACM/IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2008.

[11] Kumara Sastry, David Goldberg, Graham Kendall, “GENETIC ALGORITHMS”, Search Methodologies, Springer, PP 97-125, 2005.

[12] Vinay Harsora, Apurva Shah, “A Modified Genetic Algorithm for Process Scheduling in Distributed System”, IJCA Special Issue on “Artificial Intelligence Techniques - Novel Approaches & Practical Applications”, AIT, 2011.

[13] D. E. Goldberg, “Design of Innovation: Lessons From and For Competent Genetic Algorithms”, Kluwer, Boston, MA, 2002.

[14] Yu-Kwong Kwok_ and Lap-Sun Cheung, “A new fuzzy-decision based load balancing system for distributed object computing”, Elsevier Journal of Parallel and Distributed Computing, 2003.

[15] Bibhudatta Sahoo, Sudipta Mohapatra, and Sanjay Kumar Jena, “A Genetic Algorithm Based Dynamic Load Balancing Scheme for Heterogeneous Distributed Systems”, Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, Vol-2, Issue-July, 2008. Bools:

[16] Tony Bruke, “Server Load Balancing”, Published by O'Reilly & Associates, Inc., 101 Morris Street, Sebastopol, August, 2001.

Procesor Utilization

0

20

40

60

80

100

120

10 15 20 25 30

---->No. of Generations

--->P

roce

ssor

Utiliz

ation

Processor Utilization


Recommended