On-demand, Spot, or Both: Dynamic Resource …...On-demand, Spot, or Both: Dynamic Resource...

On-demand, Spot, or Both: Dynamic Resource Allocation for ExecutingBatch Jobs in the Cloud

Ishai MenacheMicrosoft Research

Ohad ShamirWeizmann Institute

Navendu JainMicrosoft Research

Abstract

Cloud computing provides an attractive computingparadigm in which computational resources are rentedon-demand to users with zero capital and maintenancecosts. Cloud providers offer different pricing optionsto meet computing requirements of a wide variety ofapplications. An attractive option for batch computingis spot-instances, which allows users to place bids forspare computing instances and rent them at a (often) sub-stantially lower price compared to the fixedon-demandprice. However, this raises three main challenges forusers: how many instances to rent at any time? whattype (on-demand, spot, or both)? and what bid value touse for spot instances? In particular, renting on-demandrisks high costs while renting spot instances risks job in-terruption and delayed completion when the spot marketprice exceeds the bid. This paper introduces an onlinelearning algorithm for resource allocation to address thisfundamental tradeoff between computation cost and per-formance. Our algorithm dynamically adapts resourceallocation by learning from its performance on prior jobexecutions while incorporating history of spot prices andworkload characteristics. We provide theoretical boundson its performance and prove that the averageregret ofour approach (compared to the best policy in hindsight)vanishes to zero with time. Evaluation on traces from alarge datacenter cluster shows that our algorithm outper-forms greedy allocation heuristics and quickly convergesto a small set of best performing policies.

1 Introduction

This paper presents an online learning approach that al-locates resources for executing batch jobs on cloud plat-forms by adaptively managing the tradeoff between thecost of renting compute instances and the user-centricutility of finishing jobs by their specified due dates.Cloud computing is revolutionizing computing as a ser-

Figure 1: The variation in Amazon EC2 spot market pricesfor ’large’ computing instances in the US East-coast region:Linux (left) and Windows (right). The fixed on-demand pricefor Linux and Windows instances is 0.34 and 0.48, respectively.

vice due to its cost-efficiency and flexibility. By allow-ing multiplexing of large resources pools among users,the cloud enablesagility—the ability to dynamicallyscale-out and scale-in application instances across host-ing servers. Major cloud computing providers includeAmazon EC2, Microsoft’s Windows Azure, Google Ap-pEngine, and IBM’s Smart Business cloud offerings.

The common cloud pricing schemes are (i)reserved,(ii) on-demand, and (iii) spot. Reserved instances offerusers to make a one-time payment for reserving instancesover 1-3 years and then receive discounted hourly pric-ing on usage. On-demand instances allow users to payfor instances by the hour without any long-term commit-ment. Spot instances, offered by Amazon EC2, allowusers to bid for spare instances and to run them as long astheir bid price is above the spot market price. Forbatchapplicationswith flexibility on when they can run (e.g.,Monte Carlo simulations, software testing, image pro-cessing, web crawling), renting spot instances can sig-nificantly reduce the execution costs. Indeed, several en-terprises claim to save 50%-66% in computing costs byusing spot instances over on-demand instances, or theircombination [4].

Reserved instances are most beneficial for hostinglong running services (e.g., web applications), and may

1

also be used for batch jobs, especially if future load canbe predicted [19]. The focus of this work, however, is onmanaging the choice between on-demand and spot in-stances, which are suitable for batch jobs that performcomputation for a bounded period.Customers face a fun-damental challenge of how to combine on-demand andspot instances to execute their jobs. On one hand, al-ways renting on-demand incurs high costs. On the otherhand, spot instances with a low bid price risks high de-lay before the job gets started (till the bid is accepted),or frequent interruption during its execution (when thespot market price exceeds the bid). Figure 1 shows thevariation in Amazon EC2 spot prices for their US eastcoast region for Linux and Windows instances of type’large’. We observe that spot market prices exhibit asignificant fluctuation, and at times exceed even the on-demand price. For batch jobs requiring strict comple-tion deadlines, this fluctuation can directly impact the re-sult quality. For example, web search requires frequentcrawling and update of search index as the freshness ofthis data affects the end-user experience, product pur-chases, and advertisement revenues [2].

Unfortunately, most customers resort to simple heuris-tics to address these issues while renting computing in-stances; we exemplify this observation by analyzing sev-eral case studies, reported on the Amazon EC2 web-site [4]. Litmus [16] offers testing tools to marketingprofessionals for their web site designs and email cam-paigns. Its heuristic for resource allocation is to firstlaunch spot instances and then on-demand instances ifspot instances do not get allocated within 20 minutes.Their bid price is set to be above the on-demand priceto improve the probability of their bid getting accepted.Similarly, BrowserMob [8], a startup that provides web-site load testing and monitoring services, attempts tolaunch spot instances first at a low bid price. If in-stances do not launch within 7 minutes, it switches to on-demand. Other companies manually assign delay sensi-tive jobs to on-demand instances, and delay-tolerant onesto spot instances. In general, these schemes do not pro-vide any payoff guarantees or how far do they operatefrom the optimal cost vs. performance point. Further,as expected, these approaches are limited in terms of ex-plored policies, which account for only a small portionof the state space. Note that a strawman of simply wait-ing for the spot instances at the lowest price and pur-chasing in bulk risks delayed job completion, insufficientresources (due to limit on spot instances and job paral-lelism constraints), or both. Therefore, given fluctuat-ing and unpredictable spot prices (Fig. 1), users do nothave an effective way of reinforcing the better perform-ing policies.

In this paper, we propose an online learning approachfor automated resource allocation for batch applications,

which balances the fundamental tradeoff between cloudcomputing costs and job due dates. Intuitively, given aset of jobs and resource allocation policies, our algo-rithm continuously adjusts per-policy weights based ontheir performance on job executions, in order to rein-force best performing policies. In addition, the learningmethod takes into account prior history of spot prices andcharacteristics of input jobs to adapt policy weights. Fi-nally, to prevent overfitting to only a small set of policies,our approach allows defining a broad range of param-eterized policy combinations (based on discussion withusers and cloud operators) such as (a) rent on-demand,spot instances, or both; (b) vary spot bid prices in a pre-defined range; and (c) choose bid value based on pastspot market prices. Note that these policy combinationsare illustrative, not comprehensive, in the sense that ad-ditional parameterized families of policies can be definedand integrated into our framework. Likewise, our learn-ing approach can incorporate other resource allocationparameters being provided by cloud platforms e.g., Vir-tual Machine (VM) instance type, datacenter/region.

Our proposed algorithm is based on machine learningapproaches (e.g., [9]), which aim to learn good perform-ing policies given a set of candidate policies. While theseschemes provide performance guarantees with respect tothe optimal policy in hindsight, they are not applicableas-isto our problem. In particular, they require a payoffvalue per execution step to measure how well a policyis performing and to tune the learning process. How-ever, in batch computing, the performance of a policycan only be calculated after the job has completed. Thus,these schemes do not explicitly address the issue ofdelayin getting feedback on how well a particular policy per-formed in executing jobs. Our online learning algorithmhandles bounded delay and provides formal guaranteeson its performance which scales with the amount of de-lay and the total number of jobs to be processed.

We evaluate our algorithms via simulations on a jobtrace from a datacenter cluster and Amazon EC2 spotmarket prices. We show that our approach outperformsgreedy resource allocation heuristics in terms of totalpayoff – in particular, the average regret of our approach(compared to the best policy in hindsight) vanishes tozero with time. Further, it provides fast convergencewhile only using a small amount of training data. Fi-nally, our algorithm enables interpreting the allocationstrategy of the output policies, allowing users to applythem directly in practice.

2 Background and System Model

In this section we first provide a background on the on-line learning framework and then describe the problemsetup and the parameterized set of policies for resource

2

allocation.Regret-minimizing online learning. Our online learn-ing framework is based on the substantial body of workon learning algorithms that make repeated decisionswhile aiming to minimizeregret. The regret of an al-gorithm is defined as the difference between the cumu-lative performance of the sequence of its decisions andthe cumulative performance of the best fixed decision inhindsight. We present only a brief overview of these al-gorithms due to space constraints.

In general, an online decision problem can be for-mulated as a repeated game between a learner (or deci-sion maker) and the environment. The game proceeds inrounds. In each roundj, the environment (possibly con-trolled by an adversary) assigns a rewardf j (a) to eachpossible actiona, which is not revealed beforehand tothe learner. The learner then chooses one of the actionsa j , possibly in a randomized manner. The average payoffof an actiona is the average of rewards1J ∑J

j=1 f j (a) overthe time horizonJ, and the learner’s average payoff isthe average received reward1

J ∑Jj=1 f j (a j) over the time

horizon. The average regret of the learner is defined asmaxa

1J ∑J

j=1 f j(a)− 1J ∑J

i=1 f j(a j), namely the differencebetween the average payoff of the best action and thelearner’s sequence of actions. The goal of the learner isto minimize the average regret, and approach the averagegain of the best action. Several learning algorithms havebeen proposed that approach zero average regret as thetime horizonJ approaches infinity, even against a fullyadaptive adversary [9].

Our problem of allocating between on-demand andspot instances can be cast as a problem of repeated deci-sion making in which the resource allocation algorithmmust decide in a repeated fashion over which policiesto use for meeting job due dates while minimizing jobexecution costs. However, our problem also differs fromstandard online learning, in that the payoff of each policyis not revealed immediately after it is chosen, but only af-ter some delay (due to the time it takes to process a job).This requires us to develop a modified online algorithmand analysis.Problem Setup. Our problem setup focuses on a singleenterprise whose batch jobs arrive over time. Jobs mayarrive at any point in time, however job arrival is moni-tored every fixed time interval ofL minutes e.g.,L = 5.For simplicity, we assume that each hour is evenly di-vided into a fixed number of such time intervals (namely,60/L). We refer to this fixed time interval as atime slot(or slot); the time slots are indexed byt = 1,2, . . . .

Jobs. Each job j is characterized by five parame-ters: (i) Arrival slot A j : If job j arrives at time∈[L(t ′−1),Lt ′), thenA j = t ′. (ii) Due dated j ∈ N (mea-sured in hours): If the job is not completed afterd j timeunits since its arrivalA j , it becomes invalid and further

execution yields zero value. (iii) Job sizezj (measuredin CPU instance hours to be executed): Note that formany batch jobs such as parameter sweep applicationsand software testing,zj is known in advance. Otherwise,a small bounded over-estimate ofzj suffices. (iv) Paral-lelism constraintc j : The maximal degree of parallelismi.e., the upper bound on number of instances that can besimultaneously assigned to the job. (v) Value function:Vj : N → R+, which is a monotonically non-increasingfunction withVj(τ) = 0 ∀τ > d j .

Thus, jobj is described by the tuple{A j ,d j ,zj ,c j ,Vj}.The job j is said to beactiveat time slotτ if less thand j hours have passed since its arrivalA j , and the totalinstance hours assigned so far are less thanzj .

Allocation updates.Each jobj is allocated computinginstances during its execution. Given the existing cloudpricing model of charging based on hourly boundaries,the instance allocation of each active job is updated ev-ery hour. Thei-th allocation update for jobj is formallydefined as a triplet of the form(oi

j ,sij ,b

ij). oi

j denotesthe number of assigned on-demand instances;si

j denotesthe number of assigned spot instances andbi

j denotestheir bid values. The parallelism constraint translates tooi

j + sij ≤ c j . Note that a NOP decision i.e., allocating

zero resources to a job, is handled by settingoij andsi

j tozero.

Spot instances.The spot instances assigned to a joboperate until the spot market price exceeds the bid price.However, as Figure 1 shows, the spot prices may changeunpredictably implying that spot instances can get ter-minated at any time. Formally, consider some jobj;let us normalize the hour interval to the closed inter-val [0,1]. Let yi

j ∈ [0,1] be the point in time in whichthe spot price exceeded thei-th bid for job j; formally,yi

j = infy∈[0,1]{ps(y)> bij}, whereps(·) is the spot price,

andyij ≡ 1 if the spot price does not exceed the bid. Then

the cost of utilizing spot instances for jobj for its i-th al-

location is given bysij ∗ pi

j , wherepij =

∫ yij

0 p j(y)dy, andthe total amount of work carried out for this job by spotinstances issi

j ∗ yij (with the exception of the time slot in

which the job is completed, for which the total amountof work is smaller). Note that under spot pricing, theinstance is charged for the full hour even if the job fin-ishes earlier. However, if the instance is terminated dueto market price exceeding the bid, the user is not chargedfor the last partial hour of execution.Further, we assumethat the cloud platform provides advance notification ofthe instance revocation in this scenario.1 Finally, as in

1[23] studies dynamic checkpointing strategies for scenarios wherecustomers might incur substantial overheads due to out-of-bid situa-tion. For simplicity, we do not model such scenarios in this paper.However, we note that the techniques developed in [23] are comple-mentary, and can be applied in conjunction to our online learning

3

Amazon EC2, our model allows spot instances to be per-sistent, in the sense that the user’s bid will keep beingsubmitted after each instance termination, until the jobgets completed or the user cancels it .

On-Demand instances.The price for an on-demandinstance is fixed and is denoted byp (per-unit per time-interval). As above, the instance hour is paid entirely,even if the job finishes before the end of the hourly inter-val.

Utility . The utility for a user is defined as the differ-ence between the overall value obtained from executingall its jobs and the total costs paid for their execution.Formally, letTj be the number of hours for which jobj is executed (actual duration is rounded up to the nexthour). Note that if the job did not complete by its lifetime

d j , we setTj = d j +1 and allocationaTjj = (0,0,0).

The utility for job j is given by:

U j(a1j , . . . ,a

Tjj ) =Vj(Tj)−

Tj

∑i=1

{

pijs

ij + p ·oi

j

}

(1)

The overall user utility is then simply the sum of job util-

ities: U(a) = ∑ j U j(a1j , . . . ,a

Tjj ). The objective of our

online learning algorithm is to maximize the total userutility.

For simplicity, we restrict attention todeadline valuefunctions, which are value functions of the formVj(i) =v j , for all i ∈ [1, . . . ,d j ] andVj(i) = 0 otherwise, i.e.,completing job j by its due date has a fixed positivevalue. Note that our learning approach can be easily ex-tended to handle general value functions.

Remark.We make an implicit assumption that a userimmediately gets the amount of instances it requests ifthe “price is right” (i.e., if it pays the required price foron-demand instances, or if its bid is higher than mar-ket price for spot instances. In practice, however, a usermight exhibit delays in getting all the required instances,especially if it requires a large amount of simultaneousinstances. While we could seamlessly incorporate suchdelays into our model and solution framework, we ignorethis aspect here in order to keep the exposition simple.Resource Allocation Policies. Our algorithmic frame-work allows defining a broad range of policies for allo-cating resources to jobs and the objective of our onlinelearning algorithm is to approach the performance of thebest policy in hindsight. We describe the parameterizedset of policies in this section, and present the learningalgorithm to adapt these policies, in detail in Section 3.

For each active job, a policy takes as input the jobspecification and (possibly) history of spot prices, andoutputs an allocation. Formally, a policyπ is a mappingof the formπ : J ×R+ ×R+ ×R

n+ → A , which for

every active jobj at timeτ takes as input:

framework.

(i) the job specification ofj: {A j ,d j ,zj ,c j ,Vj}

(ii) the remaining work for the jobzτj

(iii) the total execution costCj incurred for j up to time

τ (namely,Cτj△= ∑τ−1

t′=A jst′

j pt′j + p ·ot′

j , and

(iv) a history sequenceps(·) of past spot prices.

In return, the policy outputs an allocation.As expected, the set of possible policies define an ex-

plosively large state space. In particular, we must care-fully handle all possible instance types (spot, on-demand,both, or NOP), different spot bid prices, and their expo-nential number of combinations in all possible job execu-tion states. Of course, no approach can do an exhaustivesearch of the policy state space in an efficient manner.Therefore, our framework follows a best-effort approachto tackle this problem by exploring as many policies aspossible in thepractical operating rangee.g., a spot bidprice close to zero has very low probability of being ac-cepted; similarly, bidding is futile when the spot marketprice is above the on-demand price. We address this is-sue in detail in Section 3.

An elegant way to generate this practical set of poli-cies is to describe them by a small number ofcontrolparametersso that any particular choice of parametersdefines a single policy. We consider two basic familiesof parameterized policies, which represent different waysto incorporate the tradeoff between on-demand instancesand spot-instances: (1)Deadline-Centric.This family ofpolicies is parameterized by a deadline thresholdM. Ifthe job’s deadline is more thanM time units away, thejob attempts allocating only spot-instances. Otherwise(i.e., deadline is getting closer), it uses only on-demandinstances. Further, it rejects jobs if they become non-profitable (i.e., cost incurred exceeds utility value) or ifit cannot finish on time (since deadline value functionVj will become zero). (2)Rate-Centric.This family ofpolicies is parameterized by a fixed rateσ of allocatingon-demand instances per round. In each round, the pol-icy attempts to assignc j instances to jobj as follows:it requestsσ ∗ c j instances on-demand (for simplicity,we ignore rounding issues) at pricep. It also requests(1− σ) ∗ c j spot instances, using a bid price strategywhich will be described shortly. The policy monitors theamount of job processed so far, and if there is a risk ofnot completing the job by its due date, it switches to on-demand only. As above, it rejects jobs if they becomenon-profitable or if it cannot finish on time.A pseudo-code implementing this intuition is presented in Algo-rithm 1. The pseudo-code for the deadline-centric familyis similar and thus omitted for brevity.

We next describe two different methods to set the bidsfor the spot instances. Each of the policies above can

4

use each of the methods described below: (i)Fixed bid.A fixed bid valueb is used throughout. (ii)Variablebid. The bid price is chosen adaptively based on pastspot market prices (which makes sense as long as theprices are not too fluctuating and unpredictable).Thevariable bid method is parameterized by a weightγ anda safety parameterε to handle small price variations.At each round, the bid price for spot instances is set asthe weighted average of past spot prices (where the ef-fective horizon is determined by the weightγ) plus ε.For brevity, we shall often use the termsfixed-bid poli-ciesor variable-bid policies, to indicate that a policy (ei-ther deadline-centric or rate-centric) uses the fixed-bidmethod or the variable-bid method, respectively.Ob-serve that variable bid policies represent one simple al-ternative for exploiting the knowledge about past spotprices. The design of more “sophisticated” policies thatutilize price history, such as policies that incorporate po-tential seasonality variation, is left as an interesting di-rection for future work.

ALGORITHM 1: Ratio-centric PolicyParameters (with Fixed-Bid method): On-demand rateσ ∈ [0,1]; bid b∈ R+

Parameters (with Variable-Bid method): On-demandrateσ ∈ [0,1]; weightγ ∈ [0,1]; safety parameterε ∈ R+

Input: Job parameters{d j ,zj ,c j ,v j}If c j ∗d j < zj or p∗σ ∗zj > v j , drop job //Job too large orexpensive to handle profitablyfor Time slott in which the job is activedo

If job is done, returnLet m be the number of remaining time slots till jobdeadline (including the current one)Let r be the remaining job sizeLet q be the cost incurred so far in treating the job// Check if more on-demand instances needed to ensuretimely job completionif (σ +m−1)min{r,c j}< r then

// Check if running job just with on-demand is stillworthwhileif p∗ r +q< v j then

Request min{r,c j} on-demand instanceselse

Drop jobend if

elseRequestσ ∗min{r,c j} on-demand instancesRequest(1−σ)∗min{r,c j} spot instances at price:• Fixed-Bid method: Bid Priceb• Variable-Bid method: 1

Z

∫

y ps(y)γτ−ydy+ ε, whereZ =

∫

y γτ−ydy is normalization constantend if

end for

Note that these policy sets include, as special cases,some simple heuristics that are used in practice [3]; for

example, heuristics that place a fixed bid or choose abid at random according to some distribution (both withthe option of switching to on-demand instances at somepoint). These heuristics (and similar others) can be im-plemented by fixing the weights given to the differentpolicies (e.g., to implement a policy which selects thebid uniformly at random, set equal weights for policiesthat use the fixed-bid method and zero weights for thepolicies that use the variable-bid method). The learn-ing approach which we describe below is naturally moreflexible and powerful, as itadaptsthe weights of the dif-ferent policies based on performance. More generally,we emphasize that our framework can certainly includeadditional families of parameterized policies, while ourfocus on the above two families is for simplicity andproof of concept. In addition, our learning approach canincorporate other parameters for resource allocation thatare provided by cloud platforms e.g., VM instance type,datacenter/region. At the same time, some of these pa-rameters may be set a priori based on user constraintse.g., an ’extra-large’ instance may be fixed to accommo-date large working sets of an application in memory, anda datacenter may be fixed due to application data storedin that location.

3 The Online Learning Algorithm

In this section we first give an overview of the algorithm,and then describe how the algorithm is derived and pro-vide theoretical guarantees on its performance.Algorithm Overview. The learning algorithm pseudo-code is presented as Algorithm 2. The algorithm worksby maintaining a distribution over the set of allocationpolicies (described in Section 2). When a job arrives,it picks a policy at random according to that distribu-tion, and usesthat policy to handle the job. After thejob finishes execution, the performance of each policyon that job is evaluated, and its probability weight ismodified in accordance with its performance.The up-date is such that high-performing policies (as measuredby f j(π)) are assigned a relatively higher weight thanlow-performing policies. The multiplicative form of theupdate ensures strong theoretical guarantees (as shownlater) and practical performance. The rate of modifi-cation is controlled by a step-size parameterη j , whichslowly decays throughout the algorithm’s run. Our algo-rithm alsouses a parameterd defined as an upper boundon the number of jobs that arrive during any single job’sexecution. Intuitively,d is a measure of the delay in-curred between choosing which policy to treat a givenjob, till we can evaluate its performance on that job.Thus,d is closely related to job lifetimesd j defined inSection 2. Note that whiled j is measured in time units(e.g., hours),d measures the number of new jobs arriv-

5

ing during a given job’s execution. We again emphasizethat this delay is what sets our setting apart from stan-dard online learning, where the feedback on each pol-icy’s performance is immediate, and necessitates a mod-ified algorithm and analysis. The running time of thealgorithm scales linearly with the number of policies andthus our framework can deal with (polynomially) largesets of policies. It should be mentioned that there existonline learning techniques which can efficiently handleexponentially large policy sets by taking the set structureinto account (e.g. [9], Chapter 5). Incorporating thesetechniques here remains an interesting direction for fu-ture work.

We assume, without loss of generality, that the payofffor each job is bounded in the range[0,1]. If this doesnot hold, then one can simply feed the algorithm withnormalized values of the payoffsfi( j). In practice, itis enough for the payoffs to be on the order of±1 onaverage for the algorithm to work well, as shown in ourexperiments in Section 4.

ALGORITHM 2: Online Learning AlgorithmInput: Set ofn policiesπ parameterized by{1, . . . ,n},upper boundd on jobs’ lifetimeInitialize w1 = (1/n,1/n, . . . ,1/n)for j = 1, . . . ,J do

Receive jobjPick policyπ with probabilityw j,π , and apply to jobjif j ≤ d then

w j+1 := w j

elseη j :=

√

2log(n)/d( j −d)for π = 1, . . . ,n do

Computef j(π) to be the utility for jobj −d,assuming we used policyπw j+1,π := w j,π exp

(

η j f j(π))

end forfor π = 1, . . . ,n do

w j+1,π := w j+1,π/∑nr=1w j+1,r

end forend if

end for

Derivation of the Algorithm.Next we provide a formalderivation of the algorithm as well as theoretical guaran-tees. The setting of our learning framework can be ab-stracted as follows: we divide time into rounds such thatround j starts when jobj arrives. At each such round,we make some choice on how to deal with the arrivingjob. The choice is made by picking a policyπ j from afixed set ofn policies, which will be parameterized by{1, . . . ,n}. However, initially, we do not know the utilityof our policy choice as future spot prices are unknown.We can eventually compute this utility in retrospect, butonly after≤ d rounds have elapsed and the relevant spot

prices are revealed.Let f j (π j−d) denote the utility function of the policy

choiceπ j−d made in roundj −d. Note that according toour model, this function can be evaluated given the spotprices till round j. Thus,∑J+d

j=1+d f j (π j−d) is our totalpayoff from all the jobs we handled. We measure thealgorithm’s performance in terms ofaverage regretwithrespect to any fixed choice in hindsight, i.e.,

maxπ

1J

J+d

∑j=1+d

f j(π)−1J

J+d

∑j=1+d

f j (π j−d).

Generally speaking, online learning algorithms attemptto minimize this regret, and ensure that asJ increasesthe average regret converges to 0, hence the algorithm’sperformance converges to that of the single best policy inhindsight. A crucial advantage of online learning is thatthis can be attained withoutany statistical assumptionson the job characteristics or the price fluctuations.

Whend = 0, this problem reduces to the standard set-ting of online learning, where we immediately obtainfeedback on the chosen policy’s performance. However,as discussed in Section 1, this setting does not apply herebecause the functionf j does not depend on the learner’scurrent policy choiceπ j , but rather on its choice at anearlier round,π j−d. Hence, there is a delay between thealgorithm’s decision and feedback on the decision’s out-come.

Our algorithm is based on the following randomizedapproach. The learner first picks ann-dimensional dis-tribution vectorw1 = (1/n, . . . ,1/n), whose entries areindexed by the policiesπ . At every roundj, the learnerchooses a policyπ j ∈ {1, . . . ,n} with probabilitywj ,π j . Ifj ≤ d, the learner letsw j+1 = w j . Otherwise it updatesthe distribution according to

wj+1,π =wj ,π exp(η j f j (π))

∑nπ=1wj ,i exp(η j f j (i))

,

whereη j is a step-size parameter.Again, this form ofupdate puts more weight to higher-performing policies,as measured byf j(π).Theoretical Guarantees. The following result quanti-fies the regret of the algorithm, as well as the (theoreti-cally optimal) choice of the step-size parameterη j . Thistheorem shows that the average regret of the algorithmscales with the jobs’ lifetime boundd, and decays tozero with the number of jobsJ. Specifically, asJ in-creases, the performance of our algorithm converges tothat of the best-performing policy in hindsight. This be-havior is to be expected from a learning algorithm, andcrucially, occurs without any statistical assumptions onthe jobs characteristics or the price fluctuations. The per-formance also depends - but very weakly - on the size

6

n of our set of policies. From a machine learning per-spective, the result shows that the multiplicative-updatemechanism that we build upon can indeed be adapted to adelayed feedback setting, by adapting the step-size to thedelay bound, thus retaining its simplicity and scalability.

Theorem 1. Suppose (without loss of generality) thatf j for all j = 1, . . . ,J is bounded in [0,1]. Forthe algorithm described above, suppose we pickη j =√

1log(n)/2d( j −d). Then for anyδ ∈ (0,1), it holdswith probability at least1− δ over the algorithm’s ran-domness that

maxπ

1J

J

∑j=1

f j (π)−1J

J

∑j=1

f j (π j−d) ≤ 9

√

2d log(n/δ )J

.

To prove Theorem 1, we will use the following twolemmas:

Lemma 1. Consider the sequence of distribution vectorsw1+d, . . . ,wJ+d defined byw1+d = (1/n, . . . ,1/n) and

∀ π ∈ {1, . . . ,n}, wj+1,π =wj ,π exp(η j f j(π))

∑nπ=1wj ,π exp(η j f j(π))

,

whereη j =√

alog(n)/( j −d) for some a∈ (0,2]. Thenit holds that

maxπ∈{1,...,n}

J+d

∑j=1+d

f j (π)−J+d

∑j=1+d

n

∑π=1

wj ,π f j (π)≤ 4

√

J log(n)a

.

Proof. For any j = 1, . . . ,J, let g j(π) = 1− f j(π). Thenthe update step specified in the lemma can be equiva-lently written as

∀ π ∈ {1, . . . ,n}, wj+1,π =wj ,π exp(−η jg j(π))

∑nπ=1wj ,π exp(−η jg j(π))

.

The initialization ofw1+d and the update step specifiedabove is identical to the exponentially-weighted averageforecaster of [9], also known as the Hedge algorithm[10]. Using the proof of Theorem 2.3 from [9] (seepg. 20), we have that for any parametera, if we pickη j =

√

alog(n)/ j, then

J+d

∑j=1+d

n

∑π=1

wj ,πg j(π)− minπ∈{1,...,n}

J+d

∑j=1+d

g j(π)

≤√

aT log(n)4

+2

√

(J+1) log(n)a

−√

log(n)a

.

Since√

(J+1) log(n)/a ≤√

J log(n)/a+√

log(n)/a,the expression above can be upper bounded by

√

aT log(n)4

+2

√

J log(n)a

+

√

log(n)a

≤√

aT log(n)4

+3

√

J log(n)a

.

Sincea∈ (0,2], this is at most 4√

J log(n)/a, so we get

J+d

∑j=1+d

n

∑π=1

wj ,πg j(π)− minπ∈{1,...,n}

J+d

∑j=1+d

g j(π)≤ 4√

J log(n)/a.

The result stated in the lemma follows by re-substitutingg j(π) = 1− f j(π), and using the fact that∑π wj ,π = 1.

Lemma 2. Let a1, . . . ,an ∈ [0,1] andη > 0 be fixed. Forany distribution vectorw in the n-simplex, if we definew′ to be the new distribution vector

∀ π ∈ {1, . . . ,n}, w′π =

wπ exp(−ηaπ)

∑nr=1 wr exp(−ηaπ)

,

Then‖w−w′‖1 ≤ 4min{1,η}.

Proof. If η > 1/2, the bound is trivial, since for any twodistribution vectorsw,w′, it holds that‖w−w′‖1 ≤ 2.Thus, let us assume thatη ≤ 1/2.

We have

‖w−w′‖1 =n

∑π=1

|wπ −w′π | =

n

∑π=1

‖w−w′‖1

=n

∑π=1

∣

∣

∣

∣

wπ

(

1− exp(−ηaπ)

∑nr=1wr exp(−ηaπ)

)∣

∣

∣

∣

.

Since‖w‖1 = 1, we can apply Holder’s inequality, andupper bound the above by

maxπ

∣

∣

∣

∣

1− exp(−ηaπ)

∑nr=1 wr exp(−ηaπ)

∣

∣

∣

∣

. (2)

Using the inequality 1− x≤ exp(−x) ≤ 1 for all x ≥ 0,we have that

ηaπ ≤ 1− exp(−ηaπ)

∑nr=1wr exp(−ηaπ)

≤ 1− 11−ηaπ

,

so we can upper bound Eq. (2) by

maxπ

∣

∣

∣

∣

1− 11−ηaπ

∣

∣

∣

∣

= maxπ

ηaπ1−ηaπ

≤ 2η maxπ

aπ ≤ 2η ,

using our assumption thataπ ≤ 1 for all π , and thatηaπ ≤ η ≤ 1/2.

Using these lemmas, we are now ready to prove The-orem 1:

Proof. Our goal is to upper bound

J+d

∑j=1+d

f j (π)−J+d

∑j=1+d

f j (π j−d) (3)

for any fixedπ . We note thatπ1, . . . ,πJ are independentrandom variables, since their randomness stems only

7

from the independent sampling of eachπ j from eachw j . Thus, the regret can be seen as a function overJindependent random variables. Moreover, for any choiceof π1, . . . ,πJ, if we replaceπ j by any otherπ ′

j , the re-gret expression will change by at most 1. Invoking Mc-Diarmid’s inequality [18], which captures how close totheir expectation are such “stable” random functions, itfollows that Eq. (3) is at most

E

[

J+d

∑j=1+d

f j(π)−J+d

∑j=1+d

f j(π j−d)

]

+√

log(1/δ )J (4)

with probability at least 1− δ .We now turn to bound the expectation in Eq. (4). We

have

E

[

J+d

∑j=1+d

f j(π)−J+d

∑j=1+d

f j(π j−d)

]

=J+d

∑j=1+d

f j (π)−J+d

∑j=1+d

Eπ j−d∼w j−d [ f j (π j−d)]. (5)

On the other hand, by Lemma 1, we have that for anyfixed π ,

J+d

∑j=1+d

f j (π)−J+d

∑j=1+d

Eπ∼w j [ f j (π)]≤ 4

√

J log(n)a

. (6)

where we assumeη j =√

alog(n)/ j . Thus, the maincomponent of the proof is to upper bound

J+d

∑j=1+d

(

Eπ∼w j [ f j (π)]−Eπ j−d∼w j−d [ f j (π j−d)])

=J+d

∑j=1+d

n

∑π=1

(wj−d,π −wj ,π) f j (π). (7)

By Holder’s inequality and the triangle inequality, this isat most

J+d

∑j=1+d

‖w j−d−w j‖ ≤J+d

∑j=1+d

d

∑i=1

‖w j−i+1−w j−i‖

which by Lemma 2, is at most∑J+d

j=1+d 4∑di=1min{1,η j−i}, where we takeη j−i = 0

if j − i < 1+ d (this refers to the distribution vectorsw1, . . . ,wd, which don’t change, and hence the norm oftheir difference is zero). This in turn can be bounded by

4dJ+d

∑j=1+d

4min{1,η j} = 4dJ+d

∑j=1+d

min

{

1,

√

alog(n)j −d

}

.

Combining this upper bound on Eq. (7) with Eq. (6), we

can upper bound Eq. (5) by

4

√

J log(n)a

+4dJ+d

∑j=1+d

min

{

1,

√

alog(n)j −d

}

≤ 4

√

J log(n)a

+4dJ+d

∑j=1+d

√

alog(n)j −d

≤ 4

√

J log(n)a

+8d√

aJlog(n).

Picking a = 1/2d, we get that Eq. (5) is at most8√

2d log(n)J. Combining this with Eq. (4), and notingit is a probabilistic upper bound on Eq. (3), we get that

J+d

∑j=1+d

f j(π)−J+d

∑j=1+d

f j (π j−d)

≤ 8√

2d log(n)J+√

log(1/δ )J

≤ 8√

2d log(n/δ )J+√

log(n/δ )J

≤ (8√

2d+1)√

log(n/δ )J ≤ 9√

2d log(n/δ )J.

Dividing by J, and noting that the inequality holds si-multaneously with respect to any fixedπ , the theoremfollows.

4 Evaluation

In this section we evaluate the performance of our learn-ing algorithm via simulations on synthetic job data aswell as a real dataset from a large batch computing clus-ter. The benefits of using synthetic datasets is that itallows the flexibility to evaluate our approach under awide range of workloads. Before continuing, we wouldlike to emphasize that the contribution of our paper is be-yond the design of particular sets of policies - there aremany other policies which can potentially be designedfor our task. What we provide is a meta-algorithm whichcan work on any possible policy set, and in our exper-iments we intend to exemplify this on plausible policysets which can be easily understood and interpreted.

Throughout this section, the parameters of the differ-ent policies are set such that the entire range of plausiblepolicies is covered (with limitation of discretization). Forexample, the spot-price time series in Section 4.2 rangesbetween 0.12 and 0.68 (see Fig. 6(a)). Accordingly, weallow the fixed bidsb to range between 0.15 and 0.7 with5 cents resolution. Higher than 0.7 bids perform exactlyas the 0.7 bid, hence can be excluded; bids of 0.1 or lowerwill always be rejected, hence can be excluded as well.

4.1 Simulations on Synthetic Data

Setup: For all the experiments on synthetic data, we usethe following setup. Job arrivals are generated accord-

8

ing to a Poisson distribution with mean 10 minutes; jobsize zj (in instance-hours) is chosen uniformly and in-dependently at random up to a maximum size of 100,and the parallelism constraintc j was fixed at 20 instance-hours. Job values scale with the job size and the instanceprices. More precisely, we generate the value asx∗ p∗zj,wherex is a uniform random variable in[0.5,2], andp isthe on-demand price. Similarly, job deadlines also scalewith size and are chosen to bex∗ zj/c j , wherex is uni-formly random on[1,2]. As discussed in Section 3, theon-demand and spot prices are normalized (divided by10) to ensure that the average payoff per job is on the or-der of±1. The on-demand price is 0.25 per hour, whilespot prices are updated every 5 minutes (the way we gen-erate spot prices varies across experiments).Resource allocation policies.We generate a parameter-ized set of policies. Specifically, we use 204 deadline-centric policies, and a same number of rate-centric poli-cies. These policy set uses six values forM (M ∈{0, . . . ,5}) and σ (σ ∈ {0,0.2,0.4,0.6,0.8,1}), respec-tively.

For either policy set, we have policies that usethe fixed-bid method (b ∈ {0.1,0.15,0.2,0.25}), andpolicies that use the variable-bid method (weightγ ∈ {0,0.2,0.4,0.6,0.8}, and safety parameterε ∈{0,0.02,0.04,0.06,0.08,0.1}).Simulation results: Experiment 1. In the first experi-ment, we compare the total payoff across 10k jobs of allthe 408 policies to our algorithm. Spot prices are cho-sen independently and randomly as 0.15+0.05x, wherex is a standard Gaussian random variable (negative valueswere clipped to 0). The results presented below pertainto a single run of the algorithm, as they were virtuallyidentical across independent runs. Figure 2 shows the to-tal payoff for the 408 policies for this dataset. The first204 policies are rate-centric policies, while the remain-ing 204 are deadline-centric policies. The performanceof our algorithm is marked using dashed line. As can beseen, our algorithm performs close to the best policies inhindsight. Further, it is interesting to note that we haveboth deadline-centric and rate-centric policies among thebest policies, indicating that one needs to consider bothsets as candidate policies.

We perform three additional experiment with similarsetup to the above, in order to obtain insights on the prop-erties and inner-working of the algorithm. To be able todive deeper into the analysis, we use only the 204 rate-centric policies. The only element that we modify acrossexperiments is the statistical properties of the spot-pricessequence.Experiment 2.Spot prices are generated as above, exceptthat we use 0.1 as their mean (opposed to 0.2 above).After executing 1000 jobs, our algorithm performs closeto that of the best policy as it assigns probability close

0 50 100 150 200 250 300 350 400−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

Policy Number

Tot

al P

ayof

f (x

1e5)

Figure 2: Total payoff for processing 20k jobs across each ofthe 408 resource allocation policies (while algorithm’s payoffis shown as a dashed black line). The first 204 policies arerate-centric, and the last 204 policies are deadline-centric.

0 50 100 150 2000

0.005

0.01

0.015

0.02

Pro

babi

lity

Distribution after 500 Jobs

0 50 100 150 2000

0.02

0.04

0.06

Pro

babi

lity


0 50 100 150 2000

0.1

0.2

0.3

0.4

Policy Number

Pro

babi

lity


0 50 100 150 2000

0.2

0.4

0.6

0.8

Policy Number

Pro

babi

lity


Figure 3: Evaluation under stationary spot-price distribution(mean spot price of 0.1): Probability assigned per policy afterexecuting 500, 1000, 2500 and 5000 jobs.

to 1 for that policy, while outperforming 199 out of total204 policies. Further, its average regret is only 1.3 asopposed to 7.5 on average across all policies. Note thatthe upper bound on the delay in this experiment isd =66, i.e., up to 66 jobs are being processed while a singlejob finishes execution. This shows that our approach canhandle significant delay in getting feedback, while stillperforming close to the best policy.

In this experiment, the best policy in hindsight uses afixed-bid of 0.25. This can be explained by consideringthe parameters of our simulation: since the on-demandprice is 0.25 and the spot price is always relatively lower,a bid of 0.25 always yields allocation of spot instancesfor the entire hour. This result also highlights the easy in-terpretation of the resource allocation strategy of the bestpolicy. Figure 3 shows the probability assignment foreach policy over time by our algorithm after executing500, 1000, 2500 and 5000 jobs. We observe that as thenumber of processed jobs increase, our algorithm pro-vides performance close to the best policy in hindsight.Experiment 3.In the next experiment, the spot prices isset as above for the first 10% of the jobs, and then the

9

0 50 100 150 200−8

−6

−4

−2

Policy Number

Tot

al P

ayof

f (x

10,0

00)

0 50 100 150 2000

0.01

0.02

0.03

Policy Number

Pro

babi

lity

Figure 4: Evaluation under non-stationary distribution (meanspot price of 0.2): (a) Total payoff for executing 10k jobs acrosseach of the 204 policies (while algorithm’s payoff is shown asa dashed black line) and (b) the final probability assigned perpolicy by our learning algorithm.

mean is increased to 0.2 (rather than 0.1) during the ex-ecution of the last 90% jobs. This setup corresponds toa non-stationary distribution: a learning algorithm whichsimply attempts to find the best policy at the beginningand stick to it, will be severely penalized when the dy-namics of spot prices change. Figure 4 shows the eval-uation results. We observe that our online algorithm isable to adapt to changing dynamics and converges to aprobability weight distribution different from the previ-ous setting; Overall, our algorithm attains an average re-gret of only 0.5, as opposed to 4.8 on average across 204baseline policies.

Note that in this setting, the best policies are thosewhich rely purely on on-demand instances instead ofspot instances. This is expected because the spot pricestend to be only slightly lower than the on-demand price,and their dynamic volatility make them unattractive incomparison. This result demonstrates that there are in-deed scenarios where the dilemma between choosing on-demand vs. spot instances is important and can signifi-cantly impact performance, and that no single instancetype is always suitable.

Experiment 4.This time we set the spot price to alternatebetween 0.3 for one hour and then zero in the next. Thisvariation is favorable for variable-bid policies with smallγ, which use a small history of spot prices to determinetheir next bid. Such policies quickly adapt when the spotprice drops. In contrast, fixed-bid policies and variable-bid policies with largeγ suffer, as their bid price is notsufficiently adaptive. Figure 5 shows the results. Wefind that the group of highest-payoff policies are thosefor whichγ = 0 i.e., they use the last spot price to choosea bid for the current round, and thus quickly adapt tochanging spot prices. Further, our algorithm quickly de-tects and adapts to the best policies in this setting. Theaverage regret obtained by our algorithm is 0.8 comparedto 4.5 on average for our baseline policies. Moreover, thealgorithm’s overall performance is better than 192 out of204 policies.

0 50 100 150 200−2

−1

0

1

2

Policy Number

Tot

al P

ayof

f (x

10,0

00)

0 50 100 150 2000

0.05

0.1

0.15

0.2

Policy Number

Pro

babi

lity

Figure 5: Evaluation under highly dynamic distribution (hourlyspot prices alternate between 0.3 and zero): (a) Total payoff forprocessing 10k jobs across each of the 204 policies (algorithm’spayoff is shown as a dashed black line), and (b) the final prob-ability assigned per policy by our learning algorithm.

0 500 1000 15000.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

spot

pric

e

Time (Hr)0 100 200 300 400 500 600

0

0.5

1

1.5

2

2.5

3

3.5

Policy Number

Tot

al P

ayof

f (x

1e9)

Figure 6: Evaluation on real dataset: (a) Amazon EC2 spotpricing data (subset of data from Figure 1) for Linux instancesof type ’large’. The fixed on-demand price is 0.34; (b) To-tal payoff for processing 20k jobs across each of the 504 re-source allocation policies (while algorithm’s payoff is shownas a dashed black line)

4.2 Evaluation on Real Datasets

Setup: Workload data.We use job traces from a largebatch computing cluster for two days consisting of about600 MapReduce jobs. Each MapReduce job comprisesmultiple phases of execution where the next phase canstart only after all tasks in the previous phase havecompleted. The trace includes the runtime of the jobin server CPU hours (totCPUHours), the total num-ber of servers allocated to it (totServers) and the max-imum number of servers allocated to a job per phase(maxServersPerPhase). Since our job model differs fromthe MapReduce model in terms of phase dependency,we construct the parallelism constraint from the trace asfollows: since the average running time of a server istotCPUHours

totServers , we set the parallelism boundc j for each jobto bec j = maxServersPerPhase∗ totCPUHours

totServers . Note thatthis bound is in terms of CPU hours as required. Sincethe deadline values per job are not specified, we use thejob completion time as its deadline. For assigning val-ues per job, we generate them using the same approachas for synthetic datasets. Specifically, we assign a ran-dom value for each jobj equal to its total size (in CPUhours) times the on-demand price timesB = (α + Nj)whereα = 5 andNj ∈ [0,1] is drawn uniformly at ran-dom. The job trace is replicated to generate 20k jobs.

Spot Prices. We use a subset of the historical spot

10

price from Amazon EC2 as shown in Figure 1 for ’large’Linux instances. Figure 6(a) shows the selected sampleof spot price history showing significant price variationover time. Intuitively, we expect that overall that policiesthat use a large ratio of spot instances will perform bet-ter since on average, the spot price is about half of theon-demand price.

Resource Allocation Prices.We generated a total of504 policies, half rate-centric and half deadline-centric.In each half, the first 72 are fixed-bid policies (i.e. poli-cies that use the fixed-bid method) in increasing or-der of (on-demand rate, bid price). The remaining180 variable-bid policies are in increasing order of (on-demand rate, weight, safety parameter). The possi-ble values for the different parameters are as describedfor the synthetic data experiments, with the exceptionthat we allow more options for the fixed bid price,b ∈{0.15,0.2,0.25,0.3,0.35,0.4,0.45,0.5,0.55,0.6,0.65,0.7}.

Evaluating our online algorithm on the real traceposes several new challenges compared to the syntheticdatasets in Section 4.1. First, jobs sizes and hence theirvalues are highly variable, to the effect that the differencein size betweensmallandlarge jobs can be of six ordersof magnitude. Second, spot prices can exhibit high vari-ability, or alternatively be almost stable towards the endas exemplified in Figure 6(a).Simulation results: Figure 6(b) shows the results fora typical run of this experiment. Notably, the payoffof our algorithm outperforms the performance of mostof individual policies, and obtains comparable perfor-mance to the best individual policies (which are a sub-set of the rate-centric policies). We repeated the ex-periment 20 times, and obtained the following results:The average regret per job for our learning algorithm is2071±1143, while the average regret across policies is70654± 12473. Note that the average regret of our al-gorithm is around 34 times better (on average) than theaverage regret across policies.

Figure 7 shows the evolution of policy weights overtime for a typical run, until converging to final policyweights (after handling the entire 20000 jobs). We ob-serve that our algorithm evolves from preferring a rel-atively large subset of both deadline-centric and rate-centric policies (at around 150 jobs) to preferring onlyrate-centric policies, both fixed-bid and variable-bid (ataround 2000 jobs). Eventually, the algorithm convergesto a single rate-centric policy with fixed bid. This behav-ior can be explained based on spot pricing data in Fig-ure 6(a): Due to initially high variability in spot prices,our algorithm “alternates” between fixed-bid policies andvariable-bid policies, which try to learn from past prices.However, since the prices show little variability for theremaining two thirds of the data, the algorithm progres-sively adapts its weight for the fixed-bid policy, which is

0 100 200 300 400 500 6000

1

2

3

4

5

6x 10

−3

Pro

babi

lity

Distribution after 150 jobs

0 100 200 300 400 500 6000

0.005

0.01

0.015

0.02

0.025

Pro

babi

lity


0 100 200 300 400 500 6000

0.005

0.01

0.015

0.02

0.025

0.03

Pro

babi

lity

Policy Number


0 100 200 300 400 500 6000

0.2

0.4

0.6

0.8

1

Pro

babi

lity

Policy Number


Figure 7: Evaluation on real dataset: The probability assignedper policy by our learning algorithm after processing 150, 1000,3000 and 5000 jobs. The algorithm converges to a single policy(fixed-bid rate-centric policy) marked by an arrow.

commensurate with the almost stable pricing curve.

5 Related literature

While there exist other potential approaches to our prob-lem, we considered an online learning approach due toits lack of any stochastic assumptions, its online (ratherthan offline) nature, its capability to work on arbitrarypolicy sets, and its ability to adapt to delayed feedback.The idea of applying online learning algorithms for se-quential decision-making tasks is well known ([10]), andthere are quite a few papers which study various engi-neering applications (e.g., [11, 6, 12, 15]). However,these efforts do not deal with the problem of delayedfeedback as it violates the standard framework of onlinelearning. The issue of delay has been previously con-sidered (see [14] and references therein), but are eithernot in the context of the online techniques we are using,or propose less-practical solutions such as running manymultiple copies of the algorithm in parallel. In any case,we are not aware of any prior study of delay-tolerant on-line learning procedures for our application domain.

The launch of commercial cloud computing offeringshas motivated the systems research community to inves-tigate how to exploit this market for efficient resourceallocation and cost reductions. Some solution conceptsare borrowed from earlier works on executing jobs inmultiple grids (e.g., [20] and references therein). How-ever, new techniques are required in the cloud comput-ing context, which directly incorporate cost considera-tions and a variety of instance renting options. The havebeen numerous works in this context dealing with dif-ferent provider and customers scenarios. One branch of

11

papers consider the auto-scaling problem, where an ap-plication owner has to decide on the right number andtype of VMs to be purchased, and dynamically adapt re-sources as a function of changing workload conditions(see, e.g., [17, 7] and references therein).

We focus the reminder of our literature survey oncloud resource management papers that include spot in-stances as one of the allocation options. Some papersfocus on building statistical models for spot prices whichcan be then used to decide when to purchase EC2 spotinstances (see, e.g., [13, 1]). Similarly, [24] examinesthe statistical properties of customer workload with theobjective of helping the cloud determine how much re-sources to allocate for spot instances.

In the context of large-scale batch applications, [5]proposes a probabilistic model for bidding in spot priceswhile taking into account job termination probabilities.However, [5] focuses on pre-computation of a fixed (non-adaptive) bid, which is determined greedily based on ex-isting market conditions; moreover, the suggested frame-work does not support an automatic selection betweenon-demand and spot instances. [22] uses a genetic algo-rithm to quickly approximate the pareto-set of makespanand cost for a bag of tasks; each underlying resourceconfiguration consists of a different mix of on-demandand spot instances. The setting in [22] is fundamen-tally different than ours, since [22] optimizes a globalmakespan objective, while we assume that jobs have in-dividual deadlines.Finally, [21] proposes near-optimalbidding strategies for cloud service brokers that utilizethe spot instance market to reduce the computational costwhile maximizing the profit. Our work differs from [21]in two main aspects. First, unlike [21], our online learn-ing framework does not require any distributional as-sumptions on the spot price evolution (or the job model).Second, our model may associate a differentvalueanddeadlinefor each job, whereas in [21] the value is onlya function of job size, and deadlines are not explicitlytreated.

6 Conclusion

In this paper we design and evaluate an online learningalgorithm for automated and adaptive resource allocationfor executing batch jobs over cloud computing platforms.Our basic model can be extended to solve other resourceallocation problems in cloud domains such as rentingsmall vs. medium vs. large instances, choosing com-puting regions, and different bundling options in termsof CPU, memory, network and storage. We expect thatthe learning framework developed here would be usefulin addressing these extensions.An interesting directionfor future research is incorporating reserved instances,for long-term handling of multiple jobs. This makes the

algorithm stateful, in the sense that its actions affect thepayoffs of policies chosen in the future. This does not ac-cord with our current theoretical framework, but may behandled using different tools from competitive analysis.

Acknowledgements. We thank our shepherd AlexandruIosup and the ICAC reviewers for the useful feedback.

References

[1] AGMON BEN-YEHUDA, O., BEN-YEHUDA, M.,SCHUSTER, A., AND TSAFRIR, D. Deconstructingamazon ec2 spot instance pricing.ACM Transac-tions on Economics and Computation 1, 3 (2013),16.

[2] A LIZADEH , M., GREENBERG, A., MALTZ , D.,PADHYE , J., PATEL , P., PRABHAKAR , B., SEN-GUPTA, S., AND SRIDHARAN , M. Data centerTCP (DCTCP). InACM SIGCOMM ComputerCommunication Review(2010), vol. 40, ACM,pp. 63–74.

[3] http://aws.amazon.com/solutions/case-studies/.

[4] https://aws.amazon.com/ec2/purchasing-options/spot-instances.

[5] A NDRZEJAK, A., KONDO, D., AND Y I , S. De-cision model for cloud computing under sla con-straints. InMASCOTS(2010).

[6] A RI, I., AMER, A., GRAMACY, R., MILLER , E.,BRANDT, S., AND LONG, D. Acme: Adaptivecaching using multiple experts. InWDAS(2002).

[7] A ZAR, Y., BEN-AROYA, N., DEVANUR, N. R.,AND JAIN , N. Cloud scheduling with setup cost.In Proceedings of the 25th ACM symposium onParallelism in algorithms and architectures(2013),ACM, pp. 298–304.

[8] https://aws.amazon.com/solutions/case-studies/browsermob.

[9] CESA-BIANCHI , N., AND LUGOSI, G. Prediction,learning, and games. Cambridge University Press,2006.

[10] FREUND, Y., AND SCHAPIRE, R. E. A decision-theoretic generalization of on-line learning and anapplication to boosting.J. Comput. Syst. Sci. 55, 1(1997), 119–139.

[11] GRAMACY, R., WARMUTH , M., BRANDT, S.,AND ARI, I. Adaptive caching by refetching. InNIPS(2002).

12

[12] HELMBOLD , D., LONG, D., SCONYERS, T., AND

SHERROD, B. Adaptive disk spin-down for mobilecomputers.MONET 5, 4 (2000), 285–297.

[13] JAVADI , B., THULASIRAM , R., AND BUYYA , R.Statistical modeling of spot instance prices in pub-lic cloud environments. InFourth IEEE Interna-tional conference on Utility and Cloud Computing(2011).

[14] JOULANI , P., GYORGY, A., AND SZEPESVARI , C.Online learning under delayed feedback. InICML(2013).

[15] KVETON, B., YU, J. Y., THEOCHAROUS, G.,AND MANNOR, S. Online learning with expert ad-vice and finite-horizon constraints. InAAAI (2008).

[16] https://aws.amazon.com/solutions/case-studies/litmus.

[17] MAO, M., AND HUMPHREY, M. Auto-scaling tominimize cost and meet application deadlines incloud workflows. InProceedings of 2011 Inter-national Conference for High Performance Com-puting, Networking, Storage and Analysis(2011),ACM, p. 49.

[18] MCDIARMID , C. On the method of bounded dif-ferences. InSurveys in Combinatorics, J. Siemons,Ed., vol. 141 of London Mathematical SocietyLecture Note Series. Cambridge University Press,1989, pp. 148–188.

[19] SHEN, S., DENG, K., IOSUP, A., AND EPEMA,D. Scheduling jobs in the cloud using on-demandand reserved instances. InEuro-Par 2013 ParallelProcessing. Springer, 2013, pp. 242–254.

[20] SILBERSTEIN, M., SHAROV, A., GEIGER, D.,AND SCHUSTER, A. Gridbot: execution of bagsof tasks in multiple grids. InProceedings of theConference on High Performance Computing Net-working, Storage and Analysis(2009), ACM, p. 11.

[21] SONG, Y., ZAFER, M., AND LEE, K.-W. Opti-mal bidding in spot instance market. InINFOCOM(2012), pp. 190–198.

[22] V INTILA , A., OPRESCU, A.-M., AND K IEL-MANN , T. Fast (re-) configuration of mixedon-demand and spot instance pools for high-throughput computing. InProceedings of the firstACM workshop on Optimization techniques forresources management in clouds(2013), ACM,pp. 25–32.

[23] Y I , S., ANDRZEJAK, A., AND KONDO, D. Mon-etary cost-aware checkpointing and migration onamazon cloud spot instances.Services Computing,IEEE Transactions on 5, 4 (2012), 512–524.

[24] ZHANG, Q., GURSES, E., BOUTABA , R., AND

X IAO , J. Dynamic resource allocation for spot mar-kets in clouds. InHot-ICE (2011).

13

Date post:	04-Jul-2020
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

On-demand, Spot, or Both: Dynamic Resource …...On-demand, Spot, or Both: Dynamic Resource...

Documents