+ All Categories
Home > Documents > [IEEE 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS) - Cambridge,...

[IEEE 2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS) - Cambridge,...

Date post: 23-Dec-2016
Category:
Upload: marty
View: 214 times
Download: 2 times
Share this document with a friend
12
Scaling and Scheduling to Maximize Application Performance within Budget Constraints in Cloud Workflows Ming Mao Department of Computer Science University of Virginia Charlottesville, VA 22904 USA [email protected] Marty Humphrey Department of Computer Science University of Virginia Charlottesville, VA 22904 USA [email protected] AbstractIt remains a challenge to provision resources in the cloud such that performance is maximized and financial cost is minimized. A fixed budget can be used to rent a wide variety of resource configurations for varying durations. The two steps resource acquisition and scheduling/allocation are dependent on each other and are particularly difficult when considering complex resource usage such as workflows, where task precedence need to be preserved and the budget constraint is assigned for the whole cloud application instead of every single job. The ability to acquire resources dynamically and trivially in the cloud while being incredibly powerful and useful exacerbates this particular resource acquisition and scheduling problem. In this paper, we design, implement and evaluate two auto-scaling solutions to minimize job turnaround time within budget constraints for cloud workflows. The scheduling-first algorithm distributes the application-wide budget to each individual job, determines the fastest execution plan and then acquires the cloud resources, while the scaling-first algorithm determines the size and the type of the cloud resources first and then schedules the workflow jobs on the acquired instances. The scaling-first algorithm shows better performance when the budget is low while the scheduling-first algorithm performs better when the budget is high. The two algorithms can reduce the job turnaround time by 9.6% - 45.2% compared to choosing a fixed general machine type. Moreover, they show good tolerance (between -10.2% and 16.7%) to inaccurate parameters (±20% estimation error). Keywords-cloud; autoscaling; scaling/provisioning; scheduling/allocation; workflows I. INTRODUCTION Maximizing the return from the cloud investment is one major goal for cloud users. The key to the success of cloud adoption is to acquire and allocate the cloud resources in a cost-efficient way. Though the cloud offers on-demand computing power and storage capacity, and charges users based on the pay-as-you-go model. However, such dynamic scalability does not really shift away the resource ill-sizing risk. The facts that cloud users need to control the underlying resources based on the continuously changing workload and different performance requirements, and at the same time determine the job placement in an optimized way makes it a very challenging problem. Either resource over/under- provisioning or inefficient job placement could hurt the application performance or cost users more than necessary. This problem essentially sets up a great barrier to cloud adoption. loud Providers Cloud Providers Unlimited budget Goal Meet application performance requirements (e.g. finish in job deadlines) at minimum cost Limited budget Goal Maximize the utility (e.g. minimize job turnaround time) within the budget constraints Service Providers Service Customers Figure 1. The three-layer cloud application model. In the three-layer cloud application model (Figure 1), there are three roles. They are Infrastructure-as-a-Service (IasS) cloud providers who offer different types of virtual machines (VMs), such as Amazon AWS [1], Windows Azure [2] and Rackspace [3], service providers who build their value-added services using the cloud resources, and service customers (end users) who process their jobs through the services built on the cloud resources. The service providers purchase the cloud resources (VMs) from the cloud providers and serve the client requests submitted by the service customers. In this scenario, the cloud providers charge the service providers for the consumed resources and the service providers may or may not charge the service customers depending on their business goals. For some service providers, they charge the service customers for the services they provide. As long as the revenue of serving each client request is higher than the cloud cost, the service provider is making profits. Therefore, these service providers have an unlimited budget. For every penny they spend in the cloud, they charge more from the customers. Their goal of using cloud resources is to minimize the cost (hence maximize the profits) while meeting application performance requirements. These service providers include Netflix, Foursquare, etc. For other service providers, they mainly use the cloud as internal computing services instead of making profits, such as the IT department for a scientific research organization, and they normally have a budget cap when purchasing the cloud resources that is approved by the finance department every fiscal year or constrained by the available project funding. For such service providers, their goal is to get the fastest performance within the budget constraints to maximize the return of cloud investment. The budget limited service providers are the targets of this paper. 2013 IEEE 27th International Symposium on Parallel & Distributed Processing 1530-2075/13 $26.00 © 2013 IEEE DOI 10.1109/IPDPS.2013.61 67
Transcript

Scaling and Scheduling to Maximize Application Performance within Budget Constraints in Cloud Workflows

Ming Mao Department of Computer Science

University of Virginia Charlottesville, VA 22904 USA

[email protected]

Marty Humphrey Department of Computer Science

University of Virginia Charlottesville, VA 22904 USA

[email protected]

Abstract—It remains a challenge to provision resources in the cloud such that performance is maximized and financial cost is minimized. A fixed budget can be used to rent a wide variety of resource configurations for varying durations. The two steps –resource acquisition and scheduling/allocation – are dependent on each other and are particularly difficult when considering complex resource usage such as workflows, where task precedence need to be preserved and the budget constraint is assigned for the whole cloud application instead of every single job. The ability to acquire resources dynamically and trivially in the cloud – while being incredibly powerful and useful –exacerbates this particular resource acquisition and scheduling problem. In this paper, we design, implement and evaluate two auto-scaling solutions to minimize job turnaround time within budget constraints for cloud workflows. The scheduling-first algorithm distributes the application-wide budget to each individual job, determines the fastest execution plan and then acquires the cloud resources, while the scaling-first algorithm determines the size and the type of the cloud resources first and then schedules the workflow jobs on the acquired instances. The scaling-first algorithm shows better performance when the budget is low while the scheduling-first algorithm performs better when the budget is high. The two algorithms can reduce the job turnaround time by 9.6% -45.2% compared to choosing a fixed general machine type. Moreover, they show good tolerance (between -10.2% and 16.7%) to inaccurate parameters (±20% estimation error).

Keywords-cloud; autoscaling; scaling/provisioning;scheduling/allocation; workflows

I. INTRODUCTION

Maximizing the return from the cloud investment is one major goal for cloud users. The key to the success of cloud adoption is to acquire and allocate the cloud resources in a cost-efficient way. Though the cloud offers on-demand computing power and storage capacity, and charges users based on the pay-as-you-go model. However, such dynamic scalability does not really shift away the resource ill-sizing risk. The facts that cloud users need to control the underlying resources based on the continuously changing workload and different performance requirements, and at the same time determine the job placement in an optimized way makes it a very challenging problem. Either resource over/under-provisioning or inefficient job placement could hurt the application performance or cost users more than necessary. This problem essentially sets up a great barrier to cloud adoption.

loudProviders

Cloud Providers

Unlimited budget

Goal Meet application performance requirements (e.g. finish in job deadlines) at minimum cost

Limited budget

Goal Maximize the utility (e.g. minimize job turnaround time) within the budget constraints

Service Providers

Service Customers

Figure 1. The three-layer cloud application model.

In the three-layer cloud application model (Figure 1),there are three roles. They are Infrastructure-as-a-Service (IasS) cloud providers who offer different types of virtual machines (VMs), such as Amazon AWS [1], Windows Azure [2] and Rackspace [3], service providers who build their value-added services using the cloud resources, and service customers (end users) who process their jobs through the services built on the cloud resources. The service providers purchase the cloud resources (VMs) from the cloud providers and serve the client requests submitted by the service customers. In this scenario, the cloud providers charge the service providers for the consumed resources and the service providers may or may not charge the service customers depending on their business goals. For some service providers, they charge the service customers for the services they provide. As long as the revenue of serving each client request is higher than the cloud cost, the service provider is making profits. Therefore, these service providers have an unlimited budget. For every penny they spend in the cloud, they charge more from the customers. Their goal of using cloud resources is to minimize the cost (hencemaximize the profits) while meeting application performance requirements. These service providers include Netflix, Foursquare, etc. For other service providers, they mainly use the cloud as internal computing services instead of making profits, such as the IT department for a scientific research organization, and they normally have a budget cap when purchasing the cloud resources that is approved by the finance department every fiscal year or constrained by the available project funding. For such service providers, their goal is to get the fastest performance within the budget constraints to maximize the return of cloud investment. The budget limited service providers are the targets of this paper.

2013 IEEE 27th International Symposium on Parallel & Distributed Processing

1530-2075/13 $26.00 © 2013 IEEE

DOI 10.1109/IPDPS.2013.61

67

We assume they have budget cap in the form of dollars per time quantum and the goal is to help them maximize their application performance through cost-efficient resource scaling and scheduling decisions.

In the cloud, an auto-scaling solution is a mechanism that can acquire/release cloud resources in an autonomic way.Particularly, an auto-scaling mechanism in an IaaS cloud should meet the following requirements. (1) It should not only determine the number of the VM instances but also the types of the instances, in other words, both horizontal scalingand vertical scaling. (2) It should consider the workload distribution (i.e. job placement) on the acquired instances. (3) It should help the users to achieve their optimization objectives, such as maximizing resource utilization, meeting job deadlines, minimizing cost or optimizing other service level agreement goals. (4) It should be dynamic and automatic, such as accommodating unpredictable workload changes and VM startup delays. Even though we see several research projects and industrial solutions (see Section II) that try to solve this resource provisioning problem, there is a lack of research that targets a continuously running cloud application that processes unpredictable workflow jobs.From the solution perspective, the relationship/differences between the resource provisioning and allocation have not been sufficiently discussed. In this paper, we design and implement two cloud auto-scaling algorithms to help the service providers to dynamically provision and allocate their cloud resources in a cost-efficient way. We believe the two solutions are helpful to the budget limited/sensitive service providers like non-profit organizations and the early stage startup companies. We highlight the challenges and contributions of this work as follows. � One unique aspect of this work is that it targets an

application built in the IasS cloud not some specific workload (e.g. a job or a set of jobs). The budget is in the form of dollars/hour instead of dollars/job, because the service is running continuously and the workload is not known in advance. This is the first effort (to the best of our knowledge) that addresses the budget constraints in the form of dollars per time quantum (hour). We also propose several budget allocation schemes to divide the budget from a longer horizon to shorter periods.

� We explicitly consider the resource provisioning and allocation as two separate processes and develop two auto-scaling solutions – scaling-first and scheduling-first – to solve this circular reference problem. The two algorithms make resource provisioning and job placement decisions in different orders, and they show different performance under different budget constraints, while most related research falls into the scheduling-first category.

� We design an instance consolidation process to handle the time quantum fragmentation problem which has not been sufficiently discussed in the previous literature. In addition to quantifying the benefits of the instance consolidation process, we also analyze the mechanism’scapability of handling inaccurate parameters and the mechanism’s overhead.

The rest of the paper is organized as follows. Section IIdiscusses the related work. Section III describes the application model and formalizes the problem. Section IV proposes two cloud auto-scaling solutions – the scheduling-first and scaling-first algorithms. Section V evaluates the solutions, compares their advantages in difference budget ranges and analyzes the mechanism overhead. Finally section VI concludes the paper.

II. RELATED WORK

In this section, we discuss the related work. We compare our work to the previous research by different criteria.

Fixed-size resource pool vs. fixed budget. Compared to a fixed-size resource pool [4][5][6], the cloud offers more flexibility. With the same amount of budget, the service providers can choose the most cost-efficient resource combination and therefore maximize its return from the cloud investment. However, this flexibility also significantly increases the complexity of the problem. Because the resource acquisition and job scheduling steps are not independent of each other, an auto-scaling mechanism needs to determine the size of the underlying resource pool while making the scheduling decisions. In our paper, we propose two solutions to solve this circular reference problem.

Job based vs. application based. Both in the grid and cloud environment, we see workflow scheduling with multiple optimization criteria, such as deadline, cost, security and other SLAs [7][8][9][10][11], in which the job scheduling decision is made for each single job individually and independently. However, in this paper, we assume the workload is a stream of workflow jobs submitted to an application running in the cloud. In other words, the resource scaling and job scheduling decisions are based on the whole application workload instead of a single job (or a set of jobs).One notable difference is that the budget is in the form of $/hour instead of $/job.

Formal methods vs. heuristics. From the solution perspective, there are generally two approaches to this resource management problem in the cloud. One is using the formal methods. One of the most popular solutions is to convert the resource allocation optimization problem into a linear programming problem with constraints (such as job deadlines and cost) [12][13][14][15]. The output of the approach is the number of instances for each VM type. The key lies in the conversion from the resource management problem to the mathematical model. However, note that the linear programming approach may not scale well when the number of parameters goes large. And it is not trivial to map a workflow to the linear programming model. The other approach is heuristics. Research works either developed their own ideas or extended algorithms from previous literature to solve the problem [7][9][10][11][16]. The core idea of these heuristics is to maximize some utility or performance metric in each decision-making step. Therefore the final result is close to the optimal through the greedy process. Our solution falls into the heuristic category in this paper. We consider the weighted average of the job turnaround time as the application performance metric.

68

Minimize job response time vs. maximize utility.There are different optimization objectives for the resource management problem in the previous literature. For example, [16] proposed an auto-scaling mechanism to minimize cost and meet application deadlines for cloud workflows. [17]studied the relationship between service profits and customer satisfaction based on the utility theory. [12] developed a pricing model to maximize the service provider profits based on queuing theory. [18] proposed a selection model to meet multiple service provider SLA parameters in the cloud. In this paper, we assume the service provider is a budget limited/sensitive organization instead of a profit-making service provider. The goal is to minimize the job turnaround time within budget constraints.

Research vs. industrial practice. Cloud providers and third-party cloud services [1][19][20] have developed schedule-based (e.g. time-of-the-day) and rule-based (e.g. CPU utilization thresholds) auto-scaling mechanism to help cloud service providers to dynamically acquire/release resources. The mechanisms are simple and convenient. However, it is not always straightforward for the users to select the “right” scaling indicators and thresholds, especially when the application models are complex, and the resource utilization indicators are limited and very low-level. In other words, these trigger mechanisms do not really solve the performance-resource mapping problem, and sometimes the resource utilization indicators are not expressive enough to address user performance requirements directly. Moreover, these trigger mechanisms do not carefully consider the availability of multiple different VM types, which should be chosen wisely based on the workload. More importantly, these trigger mechanisms do not consider the user budget constraints. They tend to minimize the cost instead of maximize the performance. In this paper, we directly target the application performance metric – the job turnaround time, and design two solutions to acquire cloud VMs within budget constraints not to minimize cost.

III. PROBLEM DEFINITION

In this section, we define the auto-scaling problem ofminimizing job turnaround time within budget constraints for cloud workflows. We first describe a motivating example and the assumptions, and then formalize the problem.

Motivating Example The department of environmental science plans to conduct the watershed modeling research in the cloud. The domain scientists first collect and process a large amount of data from multiple field observation sites. Then they perform model calibrations to determine the appropriate equation parameters to describe the watershed conditions. They also perform Monte Carlo simulations to predict future watershed movements. As a budget sensitive research organization, they decide to move the computing infrastructure into the cloud and process the research workload using cloud resources. The IT team therefore builds the computing service in the cloud and runs the application with a budget constraint (e.g. $10/hour). The scientific application is built on an IasS cloud, such as Amazon EC2. The VMs allow users to take full control of the software stack, install customized binaries and make

persistent configuration changes. These customized OS images can be saved and reused, which ideally serve as the templates for the scalable services or components. This application will process workflow jobs submitted by the department students, domain scientists and researchers, as well as other collaboration organizations. One important goal is to get the fastest performance for all the jobs within the budget cap. In other words, they need an auto-scaling mechanism to minimize the job turnaround time and at the same time make sure the running cost does not exceed the budget constraint.

In such a cloud application, we assume the workload is a stream of workflow jobs that are continuously submitted by service customers. Every workflow job is associated with a priority and composed of connected sub tasks (or simply referred to as tasks) in the form of directed acyclic graph (DAG). These tasks need to be processed in order and the tasks may have different execution times on different types of VMs. For example, a compute intensive task runs faster on a high-CPU machine than on a high-I/O machine. These machines have different prices. Therefore, a task may prefer one machine type over another considering different metrics, such as the execution time, cost or cost-efficiency. For example, as shown in Figure 2, job1 has four tasks. Considering the shortest execution time possible, tasks are scheduled (dash line) on their preferred (fastest) machine types. While making the scheduling decisions (mapping a task to a VM), at the same time, the service providers need to dynamically acquire VMs from the cloud providers based on the workload and within the budget constraints. The role that a budget aware auto-scaling mechanism plays is to help the service providers to determine the number and the type of the acquired resources and allocate those resources to the workflow jobs in a cost-efficient way to reduce the weighted average job turnaround time. Based on the description above, we formally define the problem as follows.

Service Customers(domain scientists)

Service Providers(IT team)

Cloud Providers(e.g. EC2)

Monte Carlo Simulation

T1T2

T3T4

Example- Job1

VM Type 1

VM Type 2

VM Type 3

$ x /hour

Figure 2. A motivating example.

Jobs There are multiple job classes Jc. Every job class Jcis defined as a tuple with two elements, including the workflow (wc) which describes the processing order of the tasks and the priority (pc) which indicates the job class’simportance. A larger number pc implies a higher priority and a job with high priority should be finished earlier compared to the job with low priority (assuming the two jobs have the same workflow and are submitted at the same time).

Jc = (wc , pc)

69

Service customers can submit several instances of thesame job class. For example, two students submit the same Monte Carlo simulation jobs. Therefore, there are two jobs j1and j2 submitted in the cloud application, and they belong to the same job class Jmc. For each job, we assume that we can record the job submission time (jsubmit) and the finish time (jfinish). We define the job turnaround time for job j as

jturnaround = jfinish – jsubmitTask The workflow w is composed of tasks. It is defined

as a directed acyclic graph, w = DAG{task} and the tasks need to be processed in order within the workflow. Every task inherits the same priority from its job. Tasks have different execution times on different VMs and the execution time is not necessarily linear to the VM price (i.e. an expensive machine does not imply a faster machine). Weassume we know the task execution time on different VM types, which can be achieved by using existing performance estimation techniques [21][22][23]. As in [10][16], we also assume that the input/output data for each job is stored in a shared cloud storage system and that the intermediate data transfer times are included in task runtimes. Therefore, we can always rank the machines for a task from the fastest to the slowest based on the execution time. The execution time of one task running on a VM type is defined as below. Moreover, when a task is scheduled on a VM, it cannot be preempted by other tasks, even with higher priorities.

exe(task,VMtype) IaaS VM The cloud provider offers different types of

VMs. Every VM type includes two pieces of information, the first one is the price ctype, in terms $x/hour, and the second one is the VM startup delay lagtype. Note in the cloud, an instance can be acquired at any time, but it may take some time for the instance to be ready to use [24][25]. We assume a task can run on any type of VM because the execution binary is saved in the OS image. Starting a VM implies adding an execution instance for any service component.

VMtype = (ctype , lagtype) Budget The cloud application has a budget constraint B,

in the form of $x/hour. In other words, the cost of all the running instances at any time t belonging to the cloud application cannot exceed the budget constraint B.

∑ ����(VMinstance)t ≤ B Workload We assume the application owner does not

know the incoming requests in advance. Therefore, the workload is defined as all the jobs that have been submitted into the application. They are either waiting to be processed or partially processed. At some time t, we use all the tasks that are waiting to be processed to represent the workload Wt.

Wt = ∑ ∑ ∑ ������

Goal Because we are dealing with an application in which the workload is a stream of workflow jobs, the weighted average job turnaround time is defined as the performance metric to represent the overall application execution speed in the observation period. A smaller average job turnaround time means faster execution (better performance) of the application. The goal of the auto-scaling mechanism is therefore to minimize the weighted average of the job turnaround time.

Min(weighted(jturnaround)) = Min(∑ ((pj × jturnaround )/ ∑ ))

Output The output of the auto-scaling mechanism is to generate two plans – the scaling plan and the scheduling plan. The scaling plan determines the number of the instances for each VM type, while the scheduling plan determines the running instance for each task.

scaling plan: scalingt = {VMtype → number} scheduling plan: schedulingt = {task → VMinstance}

IV. SOLUTION

Scheduling the processing nodes of a workflow graph onto a set of available machines is a well-known NP-hard problem. In this section, we propose two solutions (heuristics) for the auto-scaling problem. These two solutions make scaling and scheduling decisions in different orders, which are named scheduling-first algorithm and scaling-firstalgorithm separately.

A. Scheduling-First Algorithm The idea of scheduling-first algorithm is to first allocate

the service provider budget to each individual job based on the job priority and then schedule as many tasks as possible to their fastest execution VMs within the job budget constraint. When the VM type for each task is determined, the auto-scaling mechanism will acquire the VM instances based on the scheduling plan.

Step 1 – Distribute budget. Budget is allocated to individual jobs based on priority. High priority jobs have bigger budgets and therefore more tasks are scheduled on the faster machines. While low priority jobs have smaller budgets and more tasks run on slower machines. A simple budget distribution scheme is to allocate the service provider budget to individual jobs proportionally to their priorities. For example, if there are three jobs with priority 1, 2, 3 and the service provider budget is $6/hours, then job1 gets $1/hour, job2 gets $2/hour and job3 gets $3/hour. This means that the cost of the running machines allocated for job1cannot exceed $1/hour. Note, in this paper, the budget is in the form $/hour instead of $/job. The auto-scaling mechanism targets the cloud application as a whole instead of every single job. Therefore, we define the individual budget Bj for jobj at any time t as follows.

Bj = B × / ∑

Step 2 – Schedule tasks. For each job, the auto-scaling mechanism further allocates the budget Bj to the tasks that are running (taskr) and the tasks that are ready and waiting to run (taskw). The machine types for the running tasks are known and their cost can be calculated easily, which is just the sum of the cost of all the running machines for that job. We denote the cost as cost(taskr). The idea is to allocate the remaining budget (Bj - cost(taskr)) to the waiting tasks. If the remaining budget is less than or equal to 0, or there are no waiting tasks, the task scheduling process is stopped. Otherwise, the auto-scaling mechanism allocates the remaining budget to the waiting tasks and schedules them on their fastest machines until there are no waiting tasks or the budget is insufficient (See Algorithm 1, line 4 - 8).

70

Step 3 – Consolidate budget. After step 2, there may be remaining budget available for some individual jobs. Some budget is left because there are no waiting tasks. Some budget is left because the budget is not big enough to acquire a new VM. The remaining budget of each job is returned to the system (Algorithm 1, line 10). The consolidated budget can be further used to purchase new machines for waiting tasks that are ready but not scheduled in step 2. This is like another round of task scheduling, however this time, we do not distribute the budget to individual jobs, but use the budget to directly purchase machines for high priority tasks among all the jobs. This potentially means that a job may actually be allocated with a budget bigger than calculated in step 1, and only high priority jobs have such advantages.

Step 4 – Acquire Instances. After the budget is distributed to every job and the scheduled VM type for each task is determined, the auto-scaling mechanism will acquire the VMs based on the scheduling plan. If there are idle instances for the scheduled VM type, the tasks will run on the idle instances first. When there are no idle instances for the VM type, the auto-scaling mechanism acquires more instances and schedules the tasks on the newly acquired instances. Because the scheduling plan is determined within the budget constraint, the auto-scaling mechanism makes sure that the running cost will not exceed the budget constraint.

T1T2

T3T4

T5T6

T7T8

$1

$0.5

$0.5

$0.1

$0.3

$0.5

$0

$0.1

$0.5

L

L

S

M

L

S

L

T1T2

T3T4

T5T6

T7T8

$1

$0.5

$0.5

$0.1

$0.3

$0.5

$0.1

$0.3

$0.5

L

L

S

M

L

M

L

S

Consolidate budget

Instance consolidationJob 1-$0.5

$0$0

Job 1-$0.5 Job 1-$0.5

Job 2-$0.5 Job 2-$0.5 Job 2-$0.5

Job 1-$0.5

Job 2-$0.5

Job 1-$0.4 Job 1-$0.5

Job 2-$0.6 Job 2-$0.5

Figure 3. The scheduling-first algorithm.

To better understand the idea of the scheduling-firstalgorithm, we use the above example (Figure 3) to illustrate the four steps. Without loss of generality, we assume there are two jobs with the same priority submitted to the cloud application. There are three VM types available, large (L)with price $.5/hour, medium (M) with price $.3/hour and small (S) with price $.1/hour (Note tasks have different execution times on different VM types. Larger machines do not imply faster execution). The system budget is $1/hour so each job gets $.5/hour. Therefore, T1 and T5 are both scheduled on the large machines. When T1 and T5 are finished, the auto-scaling mechanism further allocates job budget to the remaining tasks. In the example, T2 is scheduled on a small machine and T3 is scheduled on a medium machine. At the same time, T6 is scheduled on a large machine and T7 does not have sufficient budget. After budget consolidation process, job1 returns $.1/hour to the system and the returned budget can be used to schedule T7on a small machine (the “consolidate budget” dash line). At last, T4 is scheduled on a small instance and T8 is scheduled

on a large instance. Though T4 originally should run on a small machine but it is actually scheduled on a medium machine that finishes T3. This process is called instance consolidation that tries to save partial instance hours. It will be further detailed later.

Algorithm 1 – scheduling-firstwaitTasks: tasks that are ready and wait for runningremainTasks: tasks that wait for budget consolidation

1: Allocate budget to individual jobs based on priority2: for (each job)3: waitTasks ← GetReadyTasks(job)4: while(Bj > cost(VMcheapeast) && waitTasks.size>0 )5: tempTask ← waitTasks.pop6: schedule tempTask on VMfastest_possible(tempTask)

7: Bj ← Bj – cost(VMfastest_possible(tempTask))8: end while9: remainTasks.add(waitTasks)10: B ← B+ Bj //Return Bj to the system B11: end for12: while (B > cost(VMcheapeast) && remainTasks.size>0)13: tempTask = remainTasks.popByPriority14: schedule tempTask on VMfastest_possible(tempTask)

15: B ← B – cost(VMfastest_possible(tempTask))16: end while

B. Scaling-First Algorithm The idea of the scaling-first algorithm is to first

determine the type and the number of the cloud VMs within the budget constraint and then schedule the submitted jobs on the acquired resources based on job priority to minimize the weighted average job turnaround time. The scaling-firstalgorithm first makes resource scaling decisions and then makes job scheduling decisions.

Step 1 – Determine the number and the type of the instances. To determine the number and the type of the cloud instances, the scaling-first algorithm assumes all the tasksare scheduled on their fastest machines and calculates the cost Cfast of the cloud resources needed within the next hour.Because the system only has budget B, the ratio B/Cfast can be used to acquire the cloud resources proportionally for each VM type. For example in Figure 4, there are three jobs submitted into the cloud application. We assume all the tasks are scheduled on their fastest machines within the next hour. T1 will be scheduled on a large machine running for 0.5 hour; T2 will be scheduled on a large machine as well but for 1 hour. However within the next hour, it can only run on a large machine for 0.5 hour because it only starts after T1finishes. Similarly, T4 will consume 0 hour on a small machine because it cannot start within the next hour. The auto-scaling mechanism applies the same instance consumption calculation for all the three jobs. When adding together the instance consumption of all the three jobs, the total cost Cfast is $2.2/hour (3 large instances, 2 medium instances and 1 small instance. Note instance number needs to round up). Assuming the system budget is $1.4/hour, proportionally (B/Cfast), we can acquire 1 large instance, 1 medium instance and 1 small instance (Instance number needs to round down considering the budget constraint).

71

T5T6

T7T8

T9 T10 T11

Job 2

Job 3

L0.5

M

S

0.4

0

L0.5

L0.5

M

S

0.3

0.2

L3

M

S

2

1

Calculate the number of machines neededT1

T2

T3T4

Job 1

L0.5

M

S

0.4

0

L0.5

Cfast=$2.2/hour

0.63

L1

M

S

1

1Ratio = B/Cfast

Figure 4. Determine the VM type and number.

Step 2 – Consolidate budget. Instances are acquired proportionally to the ratio of budget over fastest cost (B/Cfast). However, there could be some remaining budget available since the number of instances can only be an integer (rounding down effects). In some cases, the remaining budget can be large enough to purchase more machines. As shown in Figure 5, based on the ratio B/Cfast, the auto-scaling mechanism acquires one instance for each VM type and the actual cost is $0.9/hour (0.5+0.3+0.1). Therefore, the remaining budget $0.5/hour can be used to purchase one more large machine. In other words, the auto-scaling mechanism finally acquires 2 large machines, 1 medium machine, and 1 small machine.

L1

M

S

1

1

$0.9/hour

$1.4/hour L1

L2

M

S

1

1

$0.5/hour

?

Figure 5. Consolidate budget.

Step 3 – Schedule jobs. When the resources have been acquired, jobs are scheduled on their fastest machines based on priority. We can assume that for each VM type, there is a priority queue for the waiting tasks. Tasks that are ready to run will enter into the priority queue of its preferred (fastest) VM type and compete for the next available instance based on its priority (inherited from their jobs). For example in Figure 6, we can see that only 2/3 machines can be acquired because of the budget constraint. All the tasks enter into the priority queue for each VM type. Because job2 has the highest priority, all its tasks are in the front. While job3 has the lowest priority, all its tasks are in the back. Job1’s tasks are in the middle.

T1T2

T3T4

T5 T7 T8

T9T10

T11T12 T13

L L L L L L

M M M

S S S SS S S

L L L L

M

S S S S

M

T5 T1 T2 T13 ...

T8 T3 T12

T7 T4 T9 T10 T11 ... ...

Budget($2)/Fastest($3) = 2/3

...

... ... ... ...

Reorganized job queue

...

Job 1

Job 2

Job 3

Figure 6. The scaling-first algorithm.

The scaling-first algorithm assumes that all the tasks will run on their fastest machines and calculates the cost to acquire all these fastest machines at the same time. Because of the budget constraint, the auto-scaling mechanism can

only acquire a fraction based on the ratio of B/Cfast. When the acquired instances are ready, the individual tasks will compete for their fastest VMs based on the job priority.

Algorithm 2 – scaling-firstfastConsumption: the vector to indicate the number of instances

for each VM type, initialized as [0,…,0]capableConsumption: the number of instances that can be

acquired within the budget constraint1: fastConsumption ← [0,…,0]2: for (each job)3: calculate the instance consumption (jobConsumption)

for the next hour4: fastConsumption← fastConsumption + jobConsumption5: end for6: costfast ← calculateCost(fastConsumption)7: capableConsumption ← fastConsumption × (B/costfast)8: while(true)9: if (runningCost + cost(VMcheapest) > B)10: goto line 18 //insufficient budget, go to scheduling11: for(each VM type VMv)12: if(size(VMv) < capableConsumption [VMv]

&& runningCost + cost(VMv) ≤ B)13: acquire one instance of VMv

14: B ← B – cost(VMv)15: end if16: end for17: end while18: for (each VMv)19: schedule the tasks on the idle instance based on priority20: end for

C. Instance Consolidation In the cloud, VM instances are billed by hours. Partial

instance hours are charged as full hours. The idea of mapping the tasks to their fastest machines is to minimize the job execution time as much as possible. However, this strategy does not consider the unutilized partial instance hours. In such cases, one effective strategy is to schedule tasks to the unutilized VMs to improve the resource utilization and reduce the job turnaround time.

The instance consolidation processes for both scheduling-first and scaling-first algorithms are similar. There are basically two cases. If some instance is idle and some ready tasks are waiting for their scheduled VM types, the auto-scaling mechanism can try to place the ready task on this idle machine. If the idle instance is faster than the originally scheduled machine type and there is no task that will be ready before this task is finished, the auto-scaling mechanism can go ahead and schedule the task on this faster idling machine. If the idle instance is slower than the originally scheduled VM type or there could be tasks becoming ready before this task is finished, we need to make sure two things, scheduling the task on this idling machine can finish this task earlier, in other words, the longer execution time saves more time than waiting for the originally scheduled faster VM type. Second, scheduling the waiting task on the slower machine will not affect the

72

execution of future tasks. The following figure illustrates these two cases.

L L L L

M

S S S S

M

T1 T2 T3 T4 ...

... ... ...

T5 T6 T7 T8 T9 ... ...

...

... ... ... ...

... L L L L

M

S S S S

M

T1 T2 T3 T4 ...

...

T5 T6 T7 T8 ... ...

...

... ... ... ...

......T9

T10T9 ...T11

Simple Case Tricky Case

Figure 7. Instance consolidation.

As shown in Figure 7, in the simple case, T9 is originally waiting for a small instance. However there are two medium idle instances available. In such case, the auto-scaling mechanism will schedule T9 on one medium machine to speed up the job execution (assuming T9 runs faster on the medium machine) and improve instance utilization. In the tricky case, T9 is originally scheduled on a large machine. However, before the auto-scaling mechanism moves its execution on an idle medium machine, it needs to check two conditions. The first one is that moving T9 on a medium machine will finish the task earlier than letting it wait for a large machine. The second condition is that, during the execution of T9, it will not delay the tasks which are scheduled on the medium instances. For example, we can see there is only one task T10 (starts after T5 is finished) that will be ready when T9 is executing on one medium machine. The task T10 will be scheduled on the other medium machine. However, because the task T11 can only start after T1 is finished and T9 will finish its execution at that time, there will be no conflicts in this case. Algorithm 3 – instance consolidation1: for (each task in waitTasks)2: for(VM type v that exe(task,VMv)<exe(task, VMschedule))3: if(no conflicts on incoming tasks)4: schedule the task on one idle instance of VMv5: goto line 166: end if7: end for //simple case8: for(VM type v that exe(task,VMv)>exe(task, VMschedule))9: if(exe(task,VMv) < waitingTime + exe(task, VMschedule)

10: && no conflicts on incoming tasks)11: schedule the task on one idle instance of VMv12: goto line 1613: end if14: end for//tricky case15: end for16: //break the loop

D. VM Startup and Shutdown In the cloud, service providers can request VM instances

at any time. However, it may take several minutes or even longer for an acquired instance to be ready to use. The VM startup time varies by cloud providers, image size, data center location, instance type, the number of requested instances at the same time, and other factors [24][25]. In the scheduling-first algorithm, the real execution time should be refined as lagv + exe(t,VMv) if the task needs to be scheduled

on a newly acquired machine. However, for the scaling-first algorithm, because the tasks are placed in the priority queue, the waiting time is less predictable, therefore, we do not refine the execution time in this algorithm. In the evaluation section, we show that the proposed solutions have a good tolerance to the inaccurate VM startup delay parameters.

The instance shutdown decisions are simpler to make than acquiring an instance. Because instances are billed by full hours, in our solution, the auto-scaling mechanism will check all the idle instances that are approaching full hour operation. If there are no tasks scheduled on them (scheduling-first) or the number of instances is more than needed (scaling-first), we can shut down the VM.

E. Budget Allocation Schemes In this paper, we assume the budget constraint is in the

form of dollars per hour instead of dollars per job. This is because the budget is assigned to a continuously running application, in which the start time and end time cannot be clearly identified. This is a big difference from the per-job level budget constraint (in the form of dollars). We believe it is a more reasonable assumption from the service provider perspective because they are targeting the overall application performance, instead of a single job or a set of jobs as specific workload that is known in advance [9][10][11].Further, this dollars per hour form is also consistent with the current prevailing hourly pricing scheme in the cloud [1][2][3]. We propose three budget allocation schemes to convert a budget for a long time horizon to hourly based budget. (1) The running application can distribute the total budget evenly in its service life cycle. If a yearly budget x is approved by the finance department, the daily budget and hourly budget will be x/365 and x/8760. (2) If the applicationexperiences seasonal behaviors, budget can be allocated based on the workload. For example, holiday season and weekends normally experience less workload compared to regular business hours, therefore a smaller hourly budget can be allocated. (3) If workload prediction technique can be used, more fine-grained and accurate budget allocation strategies can be developed. This essentially converts the budget form from $/hour to $/job.

F. The Overall Algorithm As a component for a continuously running application,

the auto-scaling mechanism works like a monitor-control loop. It runs periodically to collect updated workload and VM information, and makes dynamic scaling and scheduling decisions in response to the changes. The pseudo code of the overall mechanism can be found in algorithm 4.

Algorithm 4 – auto-scaling1: while (true)2: collect the runtime information (VM + workload)3: run scheduling-first or scaling-first algorithm4: run instance consolidation5: shutdown idle instances if approaching full-hour operation6: wait for the next monitoring interval7: end while

Essentially, the scheduling-first algorithm makes local job scheduling decisions first by distributing the system budget

73

to each job while the scaling-first algorithm makes resource acquisition decisions first by looking at the overall workload within the next hour. The job priority affects the amount of budget allocated to each job in the scheduling-first algorithm and therefore, high priority jobs have large budgets and run faster. While the job priority affects the task waiting time for their preferred resources in the scaling-first algorithm, therefore high priority jobs have shorter waiting time and run faster. Since jobs are associated with priorities which can hugely affect the budget and resources allocation decisions, job starvation is always a problem. Job starvation prevention mechanisms, such as aging, can be applied to ensure old and low priority jobs to be handled in time. However, this is not the focus of this paper, so we do not further discuss it.

V. EVALUATION

In this section, we evaluate the two proposed algorithms using four representative workload patterns and three application workflows [7][16]. Using the synthetic workload help us to control the input parameters and locate the key factors in the auto-scaling mechanisms. It also covers a larger problem space than a specific application through different combinations and helps to analyze the relationship between the mechanism performance and the budget constraints. Moreover, it speeds up the evaluation processand saves the evaluation cost.

A. Workload, Application and VM The three application workflows are pipeline, parallel

and hybrid. The four workload patterns are stable, cycle, on-and-off and growing. Each of these workloads represents a typical application or scenario. For example, the growing workload pattern may represent a scenario in which a news or video suddenly becomes popular and attracts more and more users to hit the button. The workload keeps increasingvery fast. The cycle/bursting workload may represent the workload pattern experiencing seasonality. Daytime has more workload than the night and business open hours have more workload than business off hours.

Cycle/Bursting

T0 T2T1 T4T3

Pipeline

T0

T3

T1

T7

T5

T4

T2

T8

T6T9

T12

T10

T16

T14

T13

T11

T17

T15T18

Parallel

T0

T4

T1

T7

T5

T2

T8

T6T9

T16

T14

T17

T15T18

Hybrid

T3

Stable

Growing On-and-Off

Figure 8. The workflow applications and workload patterns.

TABLE I. THE VMS

VM Type PriceMicro $0.02/hour

Standard $0.080/hourHigh-CPU $0.66/hour

High-Memory $0.45/hourExtra-Large $1.3/hour

Based on CloudSim [26], we developed a cloud application simulator. We simulated five VM types as listed in TABLE I. . The configuration and prices are borrowed from EC2 [1]. We have a workload generator that continuously submits workflow jobs based on the workload patterns. The average submission rate is 120 jobs per hour with standard deviation ±20 jobs. Particularly, the growing pattern increases 20 jobs every hour, and the cycle pattern has the large peak around 200 jobs per hour and the small peak around 80 jobs per hour. The job’s priority is randomly chosen between 1 and 6. The auto-scaling component monitors the runtime workload information, acquires/releases VMs and schedules the tasks based on the two algorithms. The individual task execution time ranges from 3 minutes to 4 hours on different VMs with a distribution discussed in [27]. All the workload statistics are generated based on previous studies [27][28] to represent a real world workload characteristics. In total, there are 12combinations (3 workflow applications × 4 workload patterns) with 5 VM types. We simulate the application execution for 48 hours and every combination is tested for 3000 times with randomly generated task execution times. We discuss the results in the following sections.

B. Job Turnaround Time We first report the performance (the weighted average

job turnaround time in the observed 48 hours) of the two auto-scaling mechanisms and compare them with the strategy of choosing a fixed machine type (Standard). Their performance is shown in Figure 9, Figure 10 and Figure 11.

For all the workflow applications and workload patterns, both scheduling-first and scaling-first algorithms have shorter average job turnaround time than choosing a fixed VM type. They can reduce the job turnaround time from 9.6%- 45.2%, depending on the amount of available budget. When the budget is small (e.g. $5/hour), there is not too much budget available to choose faster VM types, and therefore the performance improvement of the scheduling-first and scaling-first algorithms is small. When the budget becomes larger, the advantages of the two proposed algorithms are clearer. This is because a larger budget allows more tasks to run on faster machines and the overall job turnaround time can be reduced more compared to small budget. This result confirms the importance of choosing appropriate instance types based on the workload and scheduling tasks on their preferred machines.

In our experiment settings, we can see that when the budget is small, the scaling-first algorithm works better (shorter job turnaround time) than the scheduling-first algorithm. When the budget is large, the scheduling-first algorithm works better. The reason behind this result is the trade-off between the waiting time for the resources and the task performance degradation on slower machines. The scheduling-first algorithm starts as many tasks as possible, which essentially reduces the task waiting time. In contrast, the scaling-first algorithm always tries to schedule tasks on their fastest machines, which reduces the task execution time. In our experiment settings, we can see that budget $15/hour and $20/hour are close to the threshold. When the budget is

74

below this threshold, the waiting time for faster machines is shorter than the performance degradation on the slower machines, so the scaling-first algorithm wins. When the budget is above the threshold, the performance degradation on slower machines becomes smaller than the waiting time for faster machines, so the scheduling-first algorithm wins. Moreover, we can see that in the growing workload pattern, the scaling-first algorithm always beats the scheduling-first algorithm, this is because the budget is always below the threshold considering the continuously increasing workload. The relationship between the budget and VM prices largely determines the performance of the two algorithms, because the relationship essentially determines the winner of task waiting time and task execution time.

The performance differences of the two algorithms are also related to the arrival patterns of the workload. If there are many jobs arriving at the cloud application around the

same time, the scaling-first algorithm works better than the scheduling-first algorithm. The reason is that the scaling-first algorithm has a global view and can acquire instances based on all the tasks running within the next following hour. However, the scheduling-first algorithm will acquire instances based on the ready tasks only. The scheduled VM types may not be cost-efficient picks for the following tasks in the workflow. On the other hand, when the jobs do not arrive in a batch pattern, the scheduling-first algorithm works better than the scaling-first algorithm. This is because the scheduling-first algorithm makes job wise decisions so it does not affect the global scaling decisions that much. However, in such cases, the scaling-first algorithm may make aggressive decisions based on a few jobs submitted in the system. This could result in bad VM choices for the future workload.

Figure 9. The job turnaround time for pipeline applications.

Figure 10. The job turnaround time for parallel applications.

Pipeline

0

2000

4000

6000

8000

10000

$5/hour $10/hour $15/hour $20/hour $25/hour $30/hour

(seconds)Stable Fixed VM Type Scheduling-First Scaling-First

0

2000

4000

6000

8000

10000

$5/hour $10/hour $15/hour $20/hour $25/hour $30/hour

(seconds)Cycle Fixed VM Type Scheduling-First Scaling-First

0

2000

4000

6000

8000

10000

$5/hour $10/hour $15/hour $20/hour $25/hour $30/hour

(seconds)OnOff Fixed VM Type Scheduling-First Scaling-First

0

5000

10000

15000

20000

25000

30000

35000

$5/hour $10/hour $15/hour $20/hour $25/hour $30/hour

(seconds)Growing Fixed-VM Type Scheduling-First Scaling-First

Parallel

0

5000

10000

15000

20000

$5/hour $10/hour $15/hour $20/hour $25/hour $30/hour

(seconds)Stable Fixed VM Type Scheduling-First Scaling-First

0

5000

10000

15000

$5/hour $10/hour $15/hour $20/hour $25/hour $30/hour

(seconds)

Cycle Fixed VM Type Scheduling-First Scaling-First

0

2000

4000

6000

8000

10000

12000

$5/hour $10/hour $15/hour $20/hour $25/hour $30/hour

(seconds)OnOff Fixed VM Type Scheduling-First Scaling-First

0

10000

20000

30000

40000

50000

$5/hour $10/hour $15/hour $20/hour $25/hour $30/hour

(seconds)Growing Fixed VM Type Scheduling-First Scaling-First

75

Hybrid

0

2000

4000

6000

8000

10000

12000

14000

$5/hour $10/hour $15/hour $20/hour $25/hour $30/hour

(seconds)Stable Fixed VM Type Scheduling-First Scaling-First

0

2000

4000

6000

8000

10000

$5/hour $10/hour $15/hour $20/hour $25/hour $30/hour

(seconds)OnOff Fixed VM Type Scheduling-First Scaling-First

0

2000

4000

6000

8000

10000

$5/hour $10/hour $15/hour $20/hour $25/hour $30/hour

(seconds)Cycle Fixed VM Type Scheduling-First Scaling-First

0

10000

20000

30000

40000

$5/hour $10/hour $15/hour $20/hour $25/hour $30/hour

(seconds)Growing Fixed VM Type Scheduling-First Scaling-First

Figure 11. The job turnaround time for hybrid applications.

C. Instance Consolidation The cloud instances are billed by hours. Partial instance

hours are charged as full hours even that the instances are idling. The idea of instance consolidation is to take advantage of such unutilized instance hours by scheduling tasks on them. In this section, we show the reduction of job turnaround time and the improvement of resource utilization contributed by the instance consolidation process. Since the four workload patterns show similar performance, here we only show the figures for the hybrid applications with the cycle workload pattern. From Figure 12 and Figure 13, we can see that instance consolidation helps to improve resource utilization and more importantly reduce the job turnaround time. For both scheduling-first and scaling-first algorithms, when the budget is small, the improvement attributed by instance consolidation is low, this is because the workload is large enough to feed the acquired resources, there is not too much room for instance consolidation. When the budget is large, the improvement contributed by instance consolidation is also low, this is because most of the tasks are processed on their fastest machines and there is no need to consolidate instances. When the budget is in between, the benefit of instance consolidation is clearer. For example, in the budget range between $15/hour and $25/hour, utilization rate improvement ranges from 2.2% to 19.9% while the job turnaround time improvement ranges from 9.0% to 35.1%. Generally speaking, compared to the scaling-first algorithm, the scheduling-first algorithm benefits more from the instance consolidation process. This is because the scheduling-first algorithm is a job wise resource allocation scheme and has less global runtime information. Instance consolidation therefore is a strong complementary strategy to fill those unutilized instance hours.

Figure 12. Instance consolidation for scheduling-first (hybrid + cycle).

Figure 13. Instance consolidation for scaling-first (hybrid + cycle).

D. Sensitivity to Inaccurate Parameters In this paper, we assume the auto-scaling mechanism

knows the task execution time and the VM startup time. However, such information may not be always available or accurate in practice, and the cloud VM performance may not be stable in due to multitenancy. In this section, we test our auto-scaling mechanism’s sensitivity to inaccurate parameters. For both algorithms, we first allow the estimated task execution time to be evenly centered around the real task execution time with ±20% error and then we allow the estimated VM startup delay to be evenly centered around the

76

real VM startup delay with 20% error. We show the testing results in the following two figures. In both cases, we do not see significant performance degradation. The performance difference ranges from -10.2% to 16.7% in our experiment settings. This implies that the proposed auto-scaling mechanisms can handle inaccurate parameters well. Moreover, we increase the estimation error of the task execution time to see if we can find some threshold which could significantly degrade the mechanism’s performance. We finally find when the estimation error reaches around ±60%, the results would become significantly worse. The job turnaround time will increase more than 48.3%. The main reason is that the algorithms cannot correctly rank the fastest machines among different VM types and the tasks treat slower machines as their preferred machine type. Therefore, as long as the auto-scaling mechanism can correctly rank the machine type by speed, inaccurate task execution time will not significantly hurt the performance. In practice, the cloud VM startup time is much more stable compared to the task execution time [25], so we do not conduct the stress test on the estimation of VM startup time.

Figure 14. Sensitivity to inaccurate params (±20%) for scheduling-first (hybrid + cycle).

Figure 15. Sensitivity to inaccurate params (±20%) for scaling-first (hybrid + cycle).

E. Mechanism Overhead In this section, we evaluate the overhead of the two

algorithms. Particularly, we report two overhead components, the overhead of the core algorithm (before instance consolidation) and the overhead of the instance consolidation (ic). The test runs on a desktop with Intel 2.4G quad core CPU, 4G memory and 512G (7200rpm) hard disk. We

record the running time of the two algorithms with different number of jobs (100000 jobs and 500 job classes maximum).

From Figure 16, we can see that the scheduling-first algorithm has bigger overhead than the scaling-first algorithm and the majority of the overhead comes from the instance consolidation process. In our implementation, the task scheduling plan (in the scheduling-first algorithm) and the instance hour consumption (in the scaling-first algorithm) can be calculated only once for each job class. The results can be stored in the cache for future look up. The same calculation for the incoming workflow jobs is therefore avoided and the mechanism overhead is greatly reduced. So we can see that the overhead of the core algorithms is not high and it scales linearly with the number of jobs. However, the instance consolidation process is largely dependent on the runtime information (i.e. submitted jobs and running machines) which needs to be updated and recalculated every time. Therefore, the instance consolidation process dominates the overall mechanism overhead. And the scheduling-first algorithm has higher overhead because the instance consolidation process plays a bigger role compared to the scaling-first algorithm.

Figure 16. The overhead of scheduling-first and scaling-first.

VI. CONCLUSIONS

Maximizing the return from the cloud investment is one major goal for cloud users. The key to the success of cloud adoption is to acquire and allocate the cloud resources in a cost-efficient way. In this paper, we propose two auto-scaling mechanisms to solve this problem. We assume that the workload is a stream of unpredicted workflow jobs, and the service provider’s goal is to minimize the weighted average job turnaround time while keeping the running cost below the budget constraint. The scheduling-first algorithm distributes budget to individual jobs to determine the task scheduling plan first and then acquires the VMs, while the scaling-first algorithm calculates the size of resource pool first and then schedules the jobs based on the job priority. We show that the scheduling-first and scaling-first algorithms have different advantages over each other within different budget ranges. They can reduce the job turnaround time by 9.6% - 45.2% compared to choosing a fixed general machine type. The two algorithms show good tolerance to inaccurate parameters unless the task execution time ranks the machines incorrectly by speed. The instance consolidation process helps to improve the resource

77

utilization rate and reduce job turnaround time, but it also dominates the mechanism overhead.

In the future, we plan to extend this research in the following directions. (1) auto-scaling mechanisms for data intensive applications. In this paper, the performance and cost for data transfer and storage are considered as part of the task execution time. One of our ongoing research projects is to consider the intermediate data transfer performance explicitly, and design the data placement strategies while making resource provisioning/allocation decisions. Cloud storage such as S3 can be used for more efficient data broadcasting and higher data availability. (2) auto-scaling mechanisms to maximize service utility. In this paper, we assume jobs are associated with priorities and the system goal is to minimize job turnaround time. We also plan to extend this research with a different goal – service provider utility, in which jobs are associated with time-utility functions and the goal is to maximize the system utility. In such cases, not all the jobs will be admitted by the cloud application and the admitted jobs need to be finished in time to avoid negative job utility. An admission control mechanism will be developed. (3) auto-scaling mechanisms with workload prediction techniques. If the workload information can be known in advance, better budget allocation, resource acquisition and job scheduling decisions can be made. The performance of both algorithms will be improved.

REFERENCES

[1] Amazon Web Service. http://aws.amazon.com/autoscaling [2] Windows Azure. http://www.windowsazure.com [3] Rackspace. http://www.rackspace.com [4] B. Chun and D. Culler. 2002. User-centric performance analysis of

market-based cluster batch schedulers. In 2nd IEEE International Symposium on Cluster Computing and the Grid (Berlin, Germany, May 21-24, 2002). CCGrid 2002.

[5] B. Chun and D. Culler. 2000. Market-based proportional resource sharing for clusters. University of California, Berkeley, Missouri, USA, Technical Report CSD-1092, Jan. 2000.

[6] J. Sherwani, N. Ali, N. Lotia, Z. Hayat, and R. Buyya. 2004. Libra: A computational economy based job scheduling system for clusters. In Int. J. Software: Practice and Experience, May 2004, vol. 34, pp. 573–590.

[7] J. Yu, R. Buyya and C. Tham. 2005. A Cost-based Scheduling of Scientific Workflow Applications on Utility Grids. In Proceedings of the First IEEE International Conference on e-Science and Grid Computing. Melbourne, Australia, Dec. 2005, pp. 140-147.

[8] R. Sakellariou and H. Zhao. 2005. Scheduling workflows with budget constraints. Integrated Research in Grid Computing. CoreGrid series, Springer-Verlag, 2005.

[9] Meng Xu; Lizhen Cui; Haiyang Wang; Yanbing Bi; , "A Multiple QoS Constrained Scheduling Strategy of Multiple Workflows for Cloud Computing," Parallel and Distributed Processing with Applications, 2009 IEEE International Symposium on , vol., no., pp.629-634, 10-12 Aug. 2009.

[10] M. Malawski, G. Juve, E. Deelman and J. Nabrzyski , "Cost- and deadline-constrained provisioning for scientfic workflow ensembles in IaaS clouds", High Performance Computing, Networking, Storage and Analysis (SC 2012), 2012 International Conference for , 10-16 Nov. 2012.

[11] H. Kllapi, E. Sitaridi, M. Tsangaris, and Y. Ioannidis, "Schedule optimization for data processing flows on the cloud." In Proceedings

of the 2011 international conference on Management of data, pages 289–300. ACM, 2011.

[12] Y. Lee, C. Wang, A. Zomaya and B. Zhou. 2010. Profit-driven service request scheduling in clouds. In 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (Melbourne, Victoria, Australia, May 17-20, 2010). CCGrid 2010.

[13] M. Mao, J. Li and M. Humphrey. 2010. Cloud auto-Scaling with deadline and budget constraints. In Proceedings of 11th ACM/IEEE International Conference on Grid Computing (Brussels, Belgium, Oct 25-28, 2010). Grid 2010.

[14] A. Oprescu and T. Kielmann. 2010. Bag-of-Tasks Scheduling under Budget Constraints. In IEEE 2nd International Conference on Cloud Computing Technology and Science (Indianapolis, USA, Nov 30 –Dec 3, 2010). CloudCom 2010.

[15] R. Bossche, K. Vanmechelen, and J. Broeckhove. 2010. Cost-optimal scheduling in hybrid IaaS clouds for deadline constrained workloads. In IEEE 3rd International Conference on Cloud Computing (Miami, Florida, USA, Jul 5-10, 2010). CLOUD 2010.

[16] Ming Mao; Humphrey, M.; , "Auto-scaling to minimize cost and meet application deadlines in cloud workflows," High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for , vol., no., pp.1-12, 12-18 Nov. 2011

[17] J. Chen, C. Wang, B. Zhou, L. Sun, Y. Lee and A. Zomaya. 2011. Tradeoffs between profit and customer satisfaction for service provisioning in the cloud. In 20th International Symposium on High Performance Distributed Computing (San Jose, California, June 8-11, 2011). HPDC 2011.

[18] L. Wu, S. Garg and R. Buyya. 2011. Sla-based resource allocation for software as a service provider (saas) in cloud computing environments. In 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (Newport Beach, May 23-26, 2011). CCGrid 2011.

[19] RightScale. http://www.rightscale.com [20] Scalr. http://www.scalr.net/ [21] G. Nudd, D. Kerbyson, E. Papaefstathiou, S Perry, J. Harper and D.

Wilcox. 2000. PACE- A Toolset for the performance Prediction of Parallel and Distributed Systems. In International Journal of High Performance Computing Applications, Volume 14 Issue 3, August 2000.

[22] K. Cooper et al. 2004. New Grid Scheduling and Rescheduling Methods in the GrADS Project. In NSF Next Generation Software Workshop, International Parallel and Distributed Processing Symposium, Santa Fe, IEEE CS Press, Los Alamitos, CA, USA, April 2004.

[23] S. Jang et al. 2004. Using Performance Prediction to Allocate Grid Resources. Technical Report 2004-25, GriPhyN Project, USA. 2004.

[24] M. Mao and M. Humphrey. 2012. A performance study on the VM startup time in the cloud. In IEEE the 5th International Conference on Cloud Computing (Honolulu, Hawaii, USA, June 24 – 29, 2012). Cloud 2012.

[25] Z. Hill, J. Li, M. Mao, A. Ruiz-Alvarez and M. Humphrey. 2010. Early observations on the performance of windows Azure. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. (Chicago, Illinois, June 21). HPDC 2010.

[26] Rajkumar Buyya, Rajiv Ranjan and Rodrigo N. Calheiros, Modeling and Simulation of Scalable Cloud Computing Environments and the CloudSim Toolkit: Challenges and Opportunities, Proceedings of the 7th High Performance Computing and Simulation Conference (HPCS 2009, ISBN: 978-1-4244-4907-1, IEEE Press, New York, USA), Leipzig, Germany, June 21-24, 2009

[27] S. Ostermann, A. Iosup, R. Prodan, T. Fahringer, and D.H.J. Epema, "On the Characteristics of Grid Workflows," Proc. Workshop Integrated Research in Grid Computing (CGIW), pp. 431-442, 2008.

[28] Iosup, A.; Epema, D.; , "Grid Computing Workloads," Internet Computing, IEEE , vol.15, no.2, pp.19-26, March-April 2011.

78


Recommended