Improving Performance of Heterogeneous MapReduce Clusters ... · as Hadoop still perform poorly in...

1

Improving Performance of HeterogeneousMapReduce Clusters with Adaptive Task Tuning

Dazhao Cheng, Jia Rao, Yanfei Guo, Changjun Jiang and Xiaobo Zhou

Abstract—Datacenter-scale clusters are evolving toward heterogeneous hardware architectures due to continuous server replace-ment. Meanwhile, datacenters are commonly shared by many users for quite different uses. It often exhibits significant performanceheterogeneity due to multi-tenant interferences. The deployment of MapReduce on such heterogeneous clusters presents significantchallenges in achieving good application performance compared to in-house dedicated clusters. As most MapReduce implementationsare originally designed for homogeneous environments, heterogeneity can cause significant performance deterioration in job executiondespite existing optimizations on task scheduling and load balancing. In this paper, we observe that the homogeneous configurationof tasks on heterogeneous nodes can be an important source of load imbalance and thus cause poor performance. Tasks should becustomized with different configurations to match the capabilities of heterogeneous nodes. To this end, we propose a self-adaptivetask tuning approach, Ant, that automatically searches the optimal configurations for individual tasks running on different nodes. Ina heterogeneous cluster, Ant first divides nodes into a number of homogeneous subclusters based on their hardware configurations.It then treats each subcluster as a homogeneous cluster and independently applies the self-tuning algorithm to them. Ant finallyconfigures tasks with randomly selected configurations and gradually improves tasks configurations by reproducing the configurationsfrom best performing tasks and discarding poor performing configurations. To accelerate task tuning and avoid trapping in localoptimum, Ant uses genetic algorithm during adaptive task configuration. Experimental results on a heterogeneous physical clusterwith varying hardware capabilities show that Ant improves the average job completion time by 31%, 20%, and 14% compared tostock Hadoop (Stock), customized Hadoop with industry recommendations (Heuristic), and a profiling-based configuration approach(Starfish), respectively. Furthermore, we extend Ant to virtual MapReduce clusters in a multi-tenant private cloud. Specifically, Antcharacterizes a virtual node based on two measured performance statistics: I/O rate and CPU steal time. It uses k-means clusteringalgorithm to classify virtual nodes into configuration groups based on the measured dynamic interference. Experimental results onvirtual clusters with varying interferences show that Ant improves the average job completion time by 20%, 15%, and 11% comparedto Stock, Heuristic and Starfish, respectively.

Index Terms—MapReduce Performance Improvement, Self-Adaptive Task Tuning, Heterogeneous Clusters, Genetic Algorithm

F

1 INTRODUCTION

In the past few years, MapReduce has proven to be aneffective platform to process a large set of unstructureddata as diverse as sifting through system logs, runningextract transform load operations, and computing webindexes. Since big data analytics requires distributedcomputing at scale, usually involving hundreds to thou-sands of machines, access to such facilities becomes asignificant barrier to practicing big data processing insmall business. Deploying MapReduce in datacenters orcloud platforms, offers a more cost-effective model to im-plement big data analytics. However, the heterogeneityin datacenters and clouds present significant challenges

• D. Cheng is with the Department of Computer Science, University ofNorth Carolina at Charlotte, NC, 28223.E-mail: [email protected].

• J.Rao and X. Zhou are with the Department of Computer Science,University of Colorado, Colorado Springs, CO, 80918.E-mail: {jrao,xzhou}@uccs.edu.

• C. Jiang is with the Department of Computer Science & Technology, TongjiUniversity, 4800 Caoan Road, Shanghai, China.Email: [email protected].

• Y. Guo is currently a Postdoc Fellow in the Argonne National Lab.Email:[email protected].

• X. Zhou is the corresponding author.

in achieving good MapReduce performance [1], [2].Hardware heterogeneity occurs because servers are

gradually upgraded and replaced in datacenters. Inter-ferences from multiple tenants sharing the same cloudplatform can also cause heterogeneous performance evenon homogeneous hardware. The difference in processingcapabilities on MapReduce nodes breaks the assumptionof homogeneous clusters in MapReduce design and canresult in load imbalance, which may cause poor perfor-mance and low cluster utilization. To improve MapRe-duce performance in heterogeneous environments, workhas been done to make task scheduling [2], [3] andload balancing [1], [4] heterogeneity aware. Despite theseoptimizations, most MapReduce implementations suchas Hadoop still perform poorly in heterogeneous en-vironments. For the ease of management, MapReduceimplementations use the same configuration for tasks.Existing research [5], [6] has shown that MapReduceconfigurations should be set according to cluster sizeand hardware configurations. Thus, running tasks withhomogeneous configurations on heterogeneous nodesinevitably leads to sub-optimal performance.

In this work, we propose a task tuning approachthat allows tasks to have different configurations, eachoptimized for the actual hardware capabilities, on het-

2

erogeneous nodes. We address the following challengesin automatic MapReduce task tuning. First, determiningthe optimal task configuration is a tedious and error-prone process. A large number of performance-criticalparameter can have complex interplays on task execu-tion. Previous studies [7], [6], [8], [9] have shown thatit is difficult to construct models to connect parametersettings with MapReduce performance. Second, there isno one-size-fit-all model for different MapReduce jobs,and even different configurations are needed for differ-ent execution phases or input sizes. In a cloud envi-ronment, task configurations should also be changed inresponse to the changes in multi-tenancy interferences.Finally, most MapReduce implementations use fixed taskconfigurations that are set during job initializations [10].Adaptive task tuning requires new mechanisms for on-the-fly task reconfiguration.

We present Ant, a self-adaptive tuning approach fortask configuration in heterogeneous environments. Antmonitors task execution in large MapReduce jobs, whichcomprise multiple waves of tasks and optimizes taskconfigurations as job execution progresses. It clustersworker nodes (either physical or virtual nodes) intogroups according to their hardware configurations or theestimated capability based on interference statistics. Foreach node group, Ant launches tasks with different con-figurations and considers the ones that complete sooneras good settings. To accelerate tuning speed and avoidtrapping in local optimum, Ant uses genetic functionscrossover and mutation to generate task configurationsfor the next wave from the two best performing tasksin a group. We implement Ant in Hadoop, the pop-ular open source implementation of MapReduce, andperform comprehensive evaluations with representativeMapReduce benchmark applications. Experimental re-sults on a physical cluster with three types of machinesshow that Ant improves the average job completiontime by 31%, 20%, and 14% compared to stock Hadoop(Stock), customized Hadoop with industry recommen-dations (Heuristic), and a profiling-based configurationapproach (Starfish), respectively. Our results also showthat although Ant is not quite effective for small jobswith only a few waves of tasks, it can significantlyimprove the performance of large jobs. Experiments withMicrosoft’s MapReduce workload, which consist of 10%large jobs, demonstrate that Ant is able to reduce theoverall workload completion time by 12.5% and 8%compared to heuristic- and profiling-based approaches.

A preliminary version of the paper appeared in [11].In this manuscript, we have extended Ant to virtualMapReduce clusters in multi-tenancy private cloud en-vironments. Specifically, Ant characterizes a virtual nodebased on two measured performance statistics: I/O rateand CPU steal time. We consider two representativeinterference scenarios in the cloud: stable interferenceand dynamic interference. Experimental results on vir-tual clusters with varying interferences show that Antimproves the average job completion time by 20%, 15%,

Fig. 1. The Hadoop framework.

and 11% compared to Stock, Heuristic and Starfish,respectively.

The rest of this paper is organized as follows. Section 2gives motivations on improvement of MapReduce con-figuration framework. Section 3 describes the design ofAnt. Section 4 presents the details of the proposed self-tuning algorithm. Section 5 introduces moving Ant intothe cloud. Section 6 gives Ant implementation details.Section 7 presents the experimental results and analysison a physical cluster. Section 8 presents the experimentalresults and analysis on a virtual cluster. Section 9 reviewsrelated work. Section 10 concludes the paper.

2 MOTIVATIONS

2.1 BackgroundMapReduce is a distributed parallel programming modeloriginally designed for processing a large volume of datain a homogeneous environment. Based on the defaultHadoop framework, a large number of parameters needto be set before a job can run in the cluster. These pa-rameters control the behaviors of jobs during execution,including their memory allocation, level of concurrency,I/O optimization, and the usage of network bandwidth.As shown in Figure 1, slave nodes load configurationsfrom the master node where the parameters are config-ured manually. By design, tasks belonging to the samejob share the same configuration.

In Hadoop, there are more than 190 configurationparameters, which determine the settings of the Hadoopcluster, describe a MapReduce job to the Hadoop frame-work, and optimize task execution [10]. Cluster-levelparameters specify the organization of a Hadoop clusterand some long-term static settings. Changes to suchparameters require rebooting the cluster to take effect.Job-level parameters determine the overall executionsettings, such as input format, number of map/reducetasks, and failure handling. These parameters are rel-atively easier to tune and have uniform effect on alltasks even in a heterogeneous environment. Task-levelparameters control the fine-grained task execution onindividual nodes and can possibly be changed inde-pendently and on-the-fly at runtime. Parameter tuning

3

TABLE 1Multiple machine types in the cluster.

Machine model CPU Memory Disk

Supermicro Atom 4*2.0GHz 8 GB 1 TBPowerEdge T110 8*3.2GHz 16 GB 1 TBPowerEdge T420 24*1.9GHz 32 GB 1 TB

60

90

120

0.1 0.5 0.9Ave

rag

e t

ask c

om

ple

tio

n t

ime

(s)

io.sort.record.percent

WordcountTerasort

Grep

(a) Heterogeneous workloads.

60

90

120

100 250 500Ave

rag

e t

ask c

om

ple

tio

n t

ime

(s)

io.sort.mb (MB)

AtomT110T420

(b) Heterogeneous platforms.

Fig. 2. The optimal task configuration changes withworkloads and platforms.

at the task level opens up opportunities for improvingperformance in heterogeneous environments and is ourfocus in this work.

Hadoop installations pre-set the configuration param-eters to default values assuming a reasonably sizedcluster and typical MapReduce jobs. These parametersshould be specifically tuned for a target cluster and in-dividual jobs to achieve the best performance. However,there is very limited information on how the optimal set-tings can be determined. There exist rule of thumb rec-ommendations from industry leaders (e.g., Cloudera [12]and MapR [13]) as well as academic studies [6], [8]. Theseapproaches can not be universally applied to a widerange of applications or heterogeneous environments. Inthis work, we develop an online self-tuning approachfor task-level configuration. Next, we provide motivatingexamples to show the necessity of configuration tuningfor heterogeneous workloads and hardware platforms.

2.2 Motivating Exampleswe created a heterogeneous Hadoop cluster composed ofthree types of machines listed in Table 1. Three MapRe-duce applications from the PUMA benchmark [14], i.e.,WordCount, Terasort and Grep, each with 300 GB inputdata, were run on the cluster. We configured each slavenode with four map slots and two reduce slots, andHDFS block size was set to 256MB. The heap sizemapred.child.java.opts was set to 1 GB and otherparameters were set to the default values. We measuredthe map task completion time in two different scenarios –heterogeneous workload on homogeneous hardware andhomogeneous workload on heterogeneous hardware. Weshow that the heterogeneity either in the workload orhardware makes the determination of the optimal taskconfiguration difficult.

Figure 2(a) shows the average map completion timesof the three heterogeneous benchmarks on a homo-

Fig. 3. The architecture of Ant.

geneous cluster only consisting of the T110 machines.The completion times changed as we altered the valuesof parameter io.sort.record.percent. The figureshows that wordcount, terasort, and grep achieved theirminimum completion times when the parameter wasset to 0.4, 0.2, and 0.6, respectively. Figure 2(b) showsthe performance of wordcount on machines with differenthardware configurations. Map completion times variedas we changed the value of parameter io.sort.mb.The figure suggests that uniform task configurations donot lead to the optimal performance in a heterogeneousenvironment. For example, map tasks achieved the bestperformance on the Atom machine when the parameterwas set to 125 while the optimal completion time on theT420 machine was due to the parameter being set to 275.

[Summary] We have shown that the performance ofHadoop applications can be substantially improved bytuning task-level parameters for heterogeneous work-loads and platforms. However, parameter optimizationis an error-prone process involving complex interplaysamong the job, the Hadoop framework and the archi-tecture of the cluster. Furthermore, manual tuning stillremains difficult due to the large parameter search space.As many MapReduce jobs are recurring or have multiplewaves of task execution, it is possible to learn the besttask configurations based on the feedback of previousruns. These observations motivated us to develop a taskself-tuning approach to automatically configure parame-ters for various Hadoop jobs and platforms in an onlinemanner.

3 ANT DESIGN AND ASSUMPTIONS

3.1 Architecture

Ant is a self-tuning approach for multi-wave MapReduceapplications, in which job executions consist of severalrounds of map and reduce tasks. Unlike traditionalMapReduce implementations, Ant centers on two keydesigns: (1) tasks belonging to the same job run withdifferent configurations matching the capabilities of thehosting machines; (2) the configurations of individualtasks dynamically change to search for the optimal set-tings. Ant first spawns tasks with random configurationsand executes them in parallel. Upon task completion,Ant collects task runtimes and adaptively adjusts tasksettings according to the best-performing tasks. After

4

several rounds of tuning, task configurations on differ-ent nodes converge to the optimal settings. Since tasktuning starts with random settings and improves withjob execution, ant does not require any priori knowledgeof MapReduce jobs and is model independent. Figure 3shows the architecture of Ant.

• Self-tuning optimizer uses a genetic algorithm(GA)-based approach to generate task configura-tions based on the feedback reported by the taskanalyzer. Settings that are top-ranked by the taskanalyzer are used to re-produce the optimized con-figurations.

• Task analyzer uses a fitness (utility) function toevaluate the performance of individual tasks due todifferent configurations. The fitness function takesinto account task completion time as well as otherperformance critical execution statistics.

Ant operates as follows. When a job is submittedto the JobTracker, the configuration optimizer gen-erates a set of parameters randomly in a reasonablerange to initialize the task-level configuration. Then theJobTracker sends the randomly initialized tasks totheir respective TaskTrackers. The steps of task tuningcorrespond to the multiple waves of tasks execution.Upon completing a wave, the task analyzer residing inthe JobTracker recommends good configurations tothe configuration optimizer for the next wave of exe-cution. This process is repeated until the job completes.

3.2 Assumptions

Our findings that uniform task configurations lead tosub-optimal performance in a heterogeneous environ-ment motivated the design of Ant, a self-tuning ap-proach that allows differentiated task settings in thesame job. The effectiveness of Ant relies on two as-sumptions – substantial performance improvement canbe achieved via task configurations and the MapReducejobs are long running ones (e.g., with multiple waves)which allow for task reconfiguration and performanceoptimization. There are two levels of heterogeneity thatcan affect task performance, i.e., task-level data skew-ness and machine-level varying capabilities. Althoughdue to data skew some tasks inherently take longerto finish, Ant assumes that the majority of tasks haveuniform completion time with identical configurations.Ant focuses on improving performance for average tasksby matching task configurations to the actual hardwarecapabilities. To address hardware heterogeneity, Antgroups nodes with similar hardware configurations orcapabilities together and compares parallel executingtasks to determine the optimal configurations for thenode group. However, task skew and varying hardwarecapabilities due to interferences in multi-tenant cloudscan possibly impede getting good task configurations.

Fig. 4. Task self-tuning process in Ant.

4 SELF-ADAPTIVE TUNING

Ant identifies good configurations by comparing the per-formance of parallel executing tasks on the nodes withsimilar processing capabilities in a self-tuning manner.Due to the multi-wave task execution in many MapRe-duce jobs, Ant is able to continuously improve perfor-mance by adapting task configurations. In this section,we first describe how Ant forms self-tuning task groupsin which different configurations can be compared. Wethen discuss the set of parameters Ant optimizes andthe utility function Ant uses to evaluate the goodness ofparameters. Finally, we present the design of a geneticalgorithm-based self-tuning approach and a strategy toaccelerate the tuning speed.

4.1 Forming Self-tuning GroupsWe begin with describing Ant’s workflow in a homo-geneous cluster and discuss how to form homogeneoussub-clusters in a heterogeneous cluster.

Homogeneous cluster. In a homogeneous cluster, allnodes have the same processing capability. Thus, Antconsiders the whole Hadoop cluster as a self-tuninggroup. Each node in the Hadoop cluster is configuredwith a predefined number of map and reduce slots. Ifthe number of tasks (e.g., mappers) exceeds the availableslots in the cluster (e.g., map slots), execution proceeds inmultiple waves. Figure 4 shows the multi-wave task self-tuning process in a homogeneous cluster. Ant starts mul-tiple tasks with different configurations concurrently inthe self-tuning group and reproduces new configurationsbased on the feedback of completed tasks. We framethe tuning process as an evolution of configurations, inwhich each wave of execution refers to one configurationgeneration. The reproduction of generations is directedby a genetic algorithm which ensures that the goodconfigurations in prior generations are preserved in newgenerations.

Heterogeneous cluster. In a heterogeneous cluster, asshown in Figure 5, Ant divides nodes into a numberof homogeneous subclusters based on their hardwareconfigurations. Hardware information can be collectedby the JobTracker on the master node using theheartbeat connection. Ant treats each subcluster as an

5

Fig. 5. Ant on a heterogeneous cluster.

homogeneous cluster and independently applies the self-tuning algorithm to them. The outcomes of the self-tuning process are significantly improved task-level con-figurations, one for each subcluster. Since each subclusterhas different processing capability, the optimized taskconfigurations can be quite different across subclusters.

4.2 Task-level Parameters

TABLE 2Task-level parameters and search space.

Task-level parameters Search space Symbol

io.sort.factor {1, 300} g1io.sort.mb {100, 500} g2

io.sort.record.percent {0.05, 0.8} g3io.sort.spill.percent {0.1, 0.9} g4

io.file.buffer.size {4K, 64K} g5mapred.child.java.opts {200, 500} g6

Task-level parameters control the behavior of task exe-cution, which is critical to the Hadoop. Previous studieshave shown that a small set of parameters are criticalto Hadoop performance. Thus, as shown in Table 2, wechoose task-level parameters which have significant per-formance impact as the candidates for tuning. We furthershrink the initial searching space of these parameters to areasonable range in order to accelerate the search speed.This simple approach allows us to cut the search timedown from a few hours to a few minutes.

4.3 Evaluating Task Configurations

To compare the performance of different task config-urations, Ant requires a quantitative metric to rankconfigurations. As the goal of task tuning is to minimizejob execution time, task completion time (TCT) is anintuitive metric to evaluate performance. However, TCTitself is not a reliable metric to evaluate task configu-rations. A longer task completion time does not neces-sarily indicate a worse configuration as some tasks areinherently longer to complete. For example, due to dataskew, tasks that have expensive records in their inputfiles can take more than five times longer to complete.Thus, we combine TCT with another performance metric

Algorithm 1 Ant task self-tuning algorithm.1: /*Evaluate the fitness of each completed task*/2: f(C1), · · · , f(Ci), · · · , f(CM )3: repeat4: if Any slot is available then5: Select two configuration candidates as parents;6: Do Crossover and Mutation operations;7: Use the obtained new generation Cnew to assign

the task to the available slot;8: end if9: until the running job completed.

to construct a utility function (or a fitness functionin genetic algorithms). We found that most task mis-configurations are related to task memory allocationsand incur excessive data spill operations. If either ofthe kvbuffer or the metadata buffer fills up, a maptask spills intermediate data to local disks. The spillscould lead to three times more I/O operations [12], [15].Thus, Ant is designed to simultaneously minimize taskcompletion time and the number of spills.

We define the fitness function of a configuration candi-date (Ci) as: f(Ci) =

1TCT 2(Ci)×(#spills) , where TCT is the

task completion time and #spills is the number of spilloperations. Since majority of tasks have little or no dataskew, we give more weight to TCT in the formulation ofthe fitness function. Task configurations with high fitnessvalues will be favored in the tuning process. Note thatthe fitness function does not address the issue of dataskew due to non-uniform record distributions in taskinputs. We believe that configurations optimized for atask with inherently more data can be even harmful tonormal tasks as allocating more memory to normal tasksincurs resource waste.

4.4 Task Self-tuningAnt deployes an online self-tuning approach based ongenetic algorithm to search the optimal task-level con-figurations. We consider MapReduce jobs composed ofmultiple waves of map tasks. The performance of in-dividual task T is determined by its parameter set C.A set candidate Ci consisting of a number of selectedparameters (refer to genes in GA), denoted as Ci =[g1, g2, · · · , gn], represents a task configuration set, wheren is the number of parameters. Each element g representsa task-level parameter as shown in Table 2.

Reproduction process. Ant begins with an initial con-figuration of randomly generated candidates for the taskassignment. After that, it evolves individual task con-figuration to breed good solutions during each intervalby using the genetic reproduction operations. As shownin Algorithm 1, Ant first evaluates the fitness of allcompleted tasks in the last control interval. Note thatM represents the total number of the completed tasks inthe last control interval. When there is any available slotin the cluster, it selects two configuration candidates asthe evolving parents. Ant generates the new generation

6

Fig. 6. Reproduction operations.

configuration candidates by using the proposed geneticreproduction operations. Finally, it assigns the task withthe new generated configuration set to the available slot.

There are many variations of the reproduction algo-rithm obtained by altering the selection, crossover, andmutation operators as shown in Figure 6. The selec-tion determines which two parents (task configurationsets) in the last generation will have offsprings in thenext generation. The crossover operator determines howgenes are exchanged between the parents to create thoseoffsprings. The mutation allows for random alterationof genes. While the selection and crossover operatorstend to increase the quality of the task execution in thenew generation and force convergence, mutation tendsto bring in divergence.

Parents selection. A popular selection approach isthe Roulette Wheel (RW) mechanism. In this method, iffitness f(Ci) is the fitness of completed task performancein the candidate population, its probability of beingselected is Pi = f(Ci)∑M

i=1 f(Ci), where M is the number of

tasks completed in the previous interval. This allowscandidates with good fitness values to have a higherprobability of being selected as parents. The selectionmodule ensures reproduction of more highly fit candi-dates compared to the number of less fit candidates.

Crossover. A crossover function is used to cut thesequence of elements from two chosen candidates (par-ents) and swap them to produce two new candidates(children). As crossover operation is crucial to the suc-cess of Ant and it is also problem dependent, an ex-clusive crossover operation is employed for each in-dividual. We implement relative fitness crossover [16]instead of absolute fitness crossover operation, becauseit moderates the selection pressure and controls therate of convergence. Crossover operation is exercised onconfiguration candidates with a probability, known ascrossover probability (Pc).

Mutation. The mutation function aims to avoid trap-ping in the local optimum by randomly mutating anelement with a given probability. Instead of performinggene-by-gene mutation at each generation, a randomnumber r is generated for each individual. If r is largerthan the mutation probability (Pm), the particular in-dividual undergoes the mutation process. Otherwise,the mutation operation involves replacing a randomlychosen parameter with a new value generated randomlyin its search space. This process prevents premature

Algorithm 2 Aggressive selection algorithm.1: /*Select the best configuration candidate Cbest*/2: best = argmax

i[f(Ci)], i ∈ {1, · · · ,M}

3: /*Select another configuration set from the candi-dates with fitness scores that exceed the mean byWR*/

4: Avgf = 1M

∑Mi=1 f(Ci)

5: if f(Ci) > Avgf then6: Select CWR = f(Ci) with possibility Pi;7: end if8: Use Cbest and CWR as parents.

convergence of the population and helps Ant sample theentire solution space.

4.5 Aggressive Selection

The popular Roulette Wheel selection mechanism has ahigher probability of selecting good candidates to be par-ents than bad ones. However, this approach still resultsin too many task evaluations, which in turn reduces thespeed of convergence. Therefore, our selection procedureis more aggressive and deterministically selects goodcandidates to be parents. We use the following twostrategies to accelerate the task-level parameter tuning.

Elitist strategy: We found that good candidates aremore likely to produce good offsprings. In this work, anelitist strategy is developed similar to the proposed GAsin study [16]. Elitism provides a means for reducing ge-netic drift by ensuring that the best candidate is allowedto copy their attributes to the next generation. Sinceelitism can increase the selection pressure by preventingthe loss of low salience genes of candidates due todeficient selection pressure, it improves the performancewith regard to optimality and convergence. However, theelitism rate should be adjusted suitably and accuratelybecause high selection pressure may lead to prematureconvergence. The best candidate with highest fitnessvalue in the previous generation will be preserved asone of the parents in the next generation.

Inferior strategy: We also found that it is unlikely fortwo low fitness candidates to produce an offspring withhigh fitness. This is due to the fact that bad performanceis often caused by a few key parameters and these badsettings continue to be inherited in real clusters. Forinstance, for an application that is both CPU and shuffle-intensive in a cluster with excessive I/O bandwidth andlimited CPU resources, enabling compression of mapoutputs would stress the CPU and degrade applicationperformance, regardless of others. The selection methodshould eliminate this configuration quickly. In order toquickly eliminate poor candidates, we calculate the meanfitness of the completed tasks for each generation andonly select parents with fitness scores that exceed themean.

Aggressive selection algorithm. Based on the abovetwo aggressive selection strategies, the parents selection

7

of the self-tuning reproduction is operated by an inte-grated selection algorithm. As shown in Algorithm 2,Ant firstly selects the best configuration candidate withthe highest fitness in the last interval as one of the re-production parents. Then it selects another configurationset from the candidates with fitness scores that exceedthe mean by applying the Roulette Wheel approach.Finally, Ant generates the new generation configurationcandidates by using the two selected candidates (i.e.,Cbest and CWR) as the reproduction parents.

Furthermore, the aggressive selection strategies alsoreduce the impact of task skews during the tuning pro-cess. Long tasks due to data skews may add noises in theproposed GA-based task tuning. Taking the advantagesof the aggressive selection, only the best configurationsare possibly used to generate new configurations. It isunlikely that the tasks with skews would be selected asreproduction candidates. Thus, Ant would find the bestconfigurations for the majority of tasks.

5 MOVING ANT INTO THE CLOUD

Cloud computing offers users the ability to accesslarge pools of computational and storage resources ondemand. Developers by using the cloud computingparadigm are enabled to perform parallel data process-ing in a distributed and scalable environment with af-fordable cost and reasonable time. In particular, MapRe-duce deployment in the cloud allows enterprises tocost-effectively analyze large amounts of data withoutcreating large infrastructures of their own [17]. Usingvirtual machines (VMs) and storage hosted by the cloud,enterprises can simply create virtual MapReduce clustersto analyze their data. However, tuning the jobs hostedin virtual MapReduce clusters is significant challengeto users. This is due to the fact that even the virtualnode configured with the same virtual hardware couldhave varying capacity due to interferences from co-located users [18], [19]. When running Hadoop in avirtual cluster, finding nodes with the similar hardwarecapability is more challenging and complex than that ina physical cluster.

For moving Ant into virtual MapReduce clusters, it isapparently inaccurate to divide nodes into a number ofhomogeneous subclusters based on their static hardwareconfigurations. Cloud offers elasticity to slice large, un-derutilized physical servers into smaller, parallel virtualmachines, enabling diverse applications to run in iso-lated environments on a shared hardware platform. Asa result, such a cloud environment leads to various inter-ferences among the different applications. In this work,we focus on two representative interference scenarios inthe cloud: stable interference and dynamic interference.We extend Ant to virtual MapReduce clusters in thecloud environments.

• In the first scenario of static interference, Ant onlyclassifies VMs at the beginning of the adaptation

process when interferences are relatively stable dur-ing the job execution process.

• In the second scenario of dynamic interference, thebackground interference could change and the per-formance of virtual nodes change over time. Thus, are-clustering of the virtual nodes is needed to formnew tuning groups. Ant is adapt to the variationsof VM performance by periodically performing re-grouping if significant performance changes in VMsare detected.

In the following, we describe how to classify a virtualMapReduce cluster into a number of homogeneous sub-clusters in the above two scenarios.

5.1 Stable Interference Scenario

Ant estimates the actual capabilities of virtual nodesbased on low-level resource utilizations of virtual re-sources. Previous study found that MapReduce jobs aremostly bottlenecked by the slow processing of a largeamount of data [20]. Excessive I/O rate and a lackof CPU allocation are signs of slow processing. Antcharacterizes a virtual node based on two measuredperformance statistics: I/O rate and CPU steal time.Both statistics can be measured at the TaskTracker ofindividual nodes.

Ant monitors the number of data bytes written todisk during the execution of a task. Since there is littledata reuse in MapReduce jobs, the volume of writes is agood indicator of I/O access rate and memory demand.When the in-memory buffer, controlled by parame-ter mapred.job.shuffle.merge.percent, runs outor reaches the threshold number of map outputsmapred.inmem.merge.threshold, it is merged andspilled to disks. The spilled records are written todisks, which include both map and reduce spills.TaskTracker automatically records the data writingsize and spilling duration of each task. We calculate theI/O access rate of individual nodes based on the dataspilling operations.

The CPU steal time is the amount of time that thevirtual node is ready to run but failed to obtain CPUcycles because the hypervisor is serving other users. Itreflects the actual CPU time allocated to the virtual nodeand can effectively calibrate the configuration of virtualhardware according to the experienced interferences. Werecord the real-time CPU steal time by running the Linuxtop command on each virtual machine. TaskTrackerreports the information to the master node in the clusterby the heartbeat connection.

Ant uses the k-means clustering algorithm to classifyvirtual nodes into configuration groups. Formally, givena set of VMs (M1,M2, . . . ,Mm), where each observationis a 2-dimensional real vector (i.e., I/O rate and CPUsteal time), k-means clustering aims to partition the mobservations into k(≤ n) sets S = S1, S2, . . . , Sk so as to

8

minimize the within-group sum of squares,

arg min

k∑i=1

∑M∈Sk

‖M − µi‖2. (1)

Here µi is the mean of points in Si. ‖M−µi‖2 is a chosendistance(intra) measure between a data point Mm andthe group centre µi. It is is an indicator of the distanceof the mth VM from its respective group centers.

5.2 Dynamic Interference ScenarioThe determination of k value in the proposed k-meansclassification approach is more difficult and complexwhen Ant is applied in a highly dynamic cloud environ-ment. The real capacity of virtual nodes in the clusterwill change over time as the background interferencechanges. Furthermore, virtual nodes could be migratedamong different physical host machines in the cloud,which also has significant impact on the performanceof virtual nodes. These observations motivate us toperiodically perform re-grouping of the virtual nodesto form new tuning groups. Accordingly, we have todecide the number of homogeneous subclusters and thefrequency of re-grouping the subclusters in a dynamicinterference environment.

5.2.1 K value selectionAs designed, the algorithm is not capable of determiningthe appropriate number of subclusters and it dependsupon the user to identify this in advance. However, it isvery difficult for Ant to set the number of groups in thedynamic interference scenario. In order to achieve thedesired performance of Ant, it is necessary to find thenumber of groups for dynamic interferences among VMson the runtime. Fixing the number of groups in a virtualcloud cluster may lead to poor quality of grouping andperformance degradation. Thus, we propose a modifiedk-means method to find the number of groups based onthe quality of the grouping output on the fly. It relieson the following two performance metrics to qualify thegrouping of similar VMs in a virtual MapReduce cluster.• inter is used to measure the separation of homo-

geneous VM groups. The term is the sum of thedistances between the group centroids, which isdefined as: inter =

∑‖µi − µx‖2, i, x ∈ k.

• intra is used to measure the compactness of indi-vidual homogeneous VM groups. Here, we use thestandard deviation as intra to evaluate the compact-ness of the data points in each group of VMs. It isdefined as: intra =

√1n

∑ni=1(Mi − µi)2.

As shown in Algorithm 3, users have the flexibilityeither to fix the number of groups or enter the minimumnumber of groups. In the former case, it works in thesame way as the traditional k-means algorithm does.The value of k is empirically determined based on thehistorical workload records and MapReduce multi-wavebehavior analysis. In the latter case, it lets the algorithm

Algorithm 3 Dynamic grouping of VMs.1: Inputs: k: number of groups (initialize k=2); M : num-

ber of VMs in a virtual cluster.2: Randomly choose k VMs in the cluster, as k groups;3: repeat4: if any other VM is not in a group then;5: assign the VM to the group according to Eq. 1;6: update the mean of the group;7: end if8: until the minimum mean of the group is achieved;9: if the number of groups is fixed then

10: goto Output11: end if12: compute: inter =

∑‖µi − µx‖2, i, x ∈ k

13: compute: intra =√

1n

∑ni=1(Mi − µi)2

14: if intrak < intrak−1 and interk > interk−1 then15: k ← k + 1 and goto line 3 repeat;16: end if17: Output: a set of k groups.

to compute the new number of groups by incrementingthe group counter by one in each iteration until it satis-fies the validity of grouping quality threshold, referringto line 14 in Algorithm 3.

5.2.2 Dynamic Re-groupingSince multi-tenancy interferences in the cloud are usuallytime-varying, grouping of homogeneous virtual nodesshould be executed by k-means approach periodically.Intuitively, the frequency of the re-grouping is deter-mined by two factors: the dynamics of multi-tenancyinterference in the cloud and the size of the virtualMapReduce cluster.• If the multi-tenancy interferences change frequently,

re-grouping should be more frequent correspond-ingly so as to capture dynamic capacity changes ofthe virtual nodes.

• If the size of the virtual cluster is small, re-groupingshould be less frequent. This is due to there is certainamount of time to find favorable task configurationsafter each re-grouping adjustment is done, whichmay weight out the benefit of re-grouping. On thecontrary, re-grouping needs to be more frequentwhen the size of the virtual cluster is large.

Based on the above analysis, we formally formulate thedynamic capacity change of the virtual nodes based onthe two performance metrics (i.e., inter and intra) as,

Dynamic(t) =intra(t)

intra(t− 1)× inter(t− 1)

inter(t). (2)

Either the increase of intra or the decrease of intermeans the dynamic capacity changes of VMs become sig-nificant. The growth of intra means the compactness ofindividual homogeneous VM groups becomes relaxed.inter reduction means the separation of homogeneousVM groups becomes unstable and regrouping becomesnecessary. We periodically monitor the two performance

9

Algorithm 4 Dynamic re-grouping interval.1: Inputs: Tn: the interval of nth re-grouping (initializeT1 = 10 minutes);

2: repeat3: Collect statistics of inter and intra periodically;4: Compute at the end of Tn: Tn+1 = Tn

Avgt∈TnDynamic(t)

according to Eq. 2;5: until the job is completed;6: Output: New re-grouping interval Tn+1.

metrics to reflect the interference in the cloud. Here, t isthe interval to evaluate the dynamic capacity changes.We empirically set the value of t to one minute in theexperiment. It is a tradeoff between the average taskcompletion time and the fluctuation of interference inthe cloud.

We formally design the dynamic re-grouping inter-val selection process as shown in Algorithm 4. There-grouping interval (Tn) is initialized to 10 minutesat the beginning. It is then dynamically adjusted atruntime based on the dynamic capacity changes of thevirtual nodes. As the dynamic increases, the re-groupinginterval decreases, referring to line 4 in Algorithm 4.The item Avgt∈Tn

Dynamic(t) represents the averagedynamics during the previous re-grouping interval Tn.It aims to mitigate the impact of time-varying dynamicinterference in the cloud.

6 IMPLEMENTATION

Hadoop modification: We implemented Ant bymodifying classes JobTracker, TaskTracker andLaunchTaskAction based on Hadoop version 1.0.3.We added a new interface taskConf, which is usedto specify the configuration file of individual taskswhile assigning them to slave nodes. Each set of task-level parameter set is tagged with its correspondingAttemptTaskID. Additionally, we added anothernew interface Optimizer to implement the GAoptimization. During job execution, we created a methodtaskAnalyzer to collect the status of each completedtask by using TaskCounter and TaskReport.

Ant execution process: At slave nodes, once aTaskTracker gets task execution commands from theTaskScheduler by calling LaunchTaskAction, it re-quires task executors to accept a launchTask() actionfrom a local TaskTracker. Ant uses the launchTask()RPC connection to pass on the task-level configurationfile description (i.e., taskConf), which is originallysupported by the Hadoop. Ant creates a directory in thelocal file system to store the per-task configuration datafor map/reduce tasks at TaskTracker. The directoryis under the task working directory, and is tagged withAttemptTaskID which is obtained from JobTracker.Therefore, tasks can load their specified configurationitems by accessing their task local file systems whileinitializing individual tasks by Localizetask(). Thenafter task localization, it kicks off a process to start a

MapTask or ReduceTask thread to execute user-definedmap and reduce functions.

Algorithm implementation: We implemented the self-tuning algorithm to generate the configuration sets forthe new generation tasks in each control interval (i.e., 5minutes). The selection of the control interval is a trade-off between the parameter searching speed and averagetask execution time. If the interval is too long, it will takemore time to find good configurations. If the intervalis too short, the task with new configurations may notcomplete and no performance feedback can be collected.Thus, we choose a control interval of 5 minutes whichis approximately 2 times of the average task executiontime. The mutated value of a parameter is randomly cho-sen from its search space. Since our aggressive selectionalgorithm prunes poor regions, we can use an atypicallyhigh mutation rate (e.g., pm = 0.2) without impactingconvergence. The value of pm is empirically determined.A cut point is randomly chosen in each parent candidateconfiguration and all parameters beyond that point areswapped between two parents to produce two children.We empirically set the crossover probability pc to be 0.7.

7 EVALUATION ON A PHYSICAL CLUSTER

7.1 Experiment Setup

We evaluate Ant on a physical cluster [11] usingthree MapReduce applications from the PUMA bench-mark [14] with different input sizes as shown in Table 3,which are widely used in evaluation of MapReduceperformance by previous works [21]. We compare theperformance of Ant with two other main competitorsin practical use: Starfish [6], a job profiling based con-figuration approach from Duke university, and Rules-of-Thumb 1 (Heuristic), another representative heuris-tic configuration approach from industry leader Cloud-era [12]. For reference, we normalize the Job CompletionTime (JCT) achieved by various approaches to the JCTachieved by the Hadoop stock parameter settings. Unlessotherwise specified, we use the stock configuration set-ting of Hadoop implementation for the other items thatare not listed in the preliminary study [11]. Note thatboth Heuristic and Starfish always maintain identicalconfiguration files for job executions as described in Sec-tion 2. For fairness, the cluster-level and job-level param-eters for all approaches (including the baseline stock con-figuration) in the experiments are set to suggested valuesby the rules of thumb from Cloudera [12]. For example,we roughly set the value of mapred.reduce.tasks(the number of reduce tasks of the job) to 0.9 times thetotal number of reduce slots in the cluster.

1. Cloudera recommends a set of configuration items based on its in-dustry experience, e.g.,io.sort.record.percent is recommendedto set as 16

16+avg−record−size, which is based on the average size of

map output records. More rules are available in [12].

10

TABLE 3The characteristics of benchmark applications used in our experiments.

Category Type Label Input size (GB) Input data # Maps # ReducesWordcount CPU intensive J1/J2/J3 100/300/900 Wikipedia 400/1200/3600 14/ 14/ 14

Grep I/O intensive J4/J5/J6 100/300/900 Wikipedia 400/1200/3600 14/ 14/ 14Terasort I/O intensive J7/J8/J9 100/300/900 TeraGen 400/1200/3600 14/ 14/ 14

0

0.3

0.6

0.9

1.2

1 2 3 4 5 6 7 8 9

JC

T n

orm

alized to S

tock

Job ID

Stock

Heuristic

Starfish

Ant

Fig. 7. Job completion time on the physical cluster.

7.2 Effectiveness of Ant

Reducing job completion time. Figure 7 compares vari-ous job completion times achieved by Heuristic, Starfishand Ant, respectively. The results demonstrate that allof these configuration approaches improve the job com-pletion times more or less when compared with the per-formance achieved by Hadoop stock parameter setting(Stock). Figure 7 shows that Ant improves the averagejob completion time by 31%, 20% and 14% comparedwith Stock, Heuristic and Starfish on the physical cluster,respectively. This is due to the fact that Stock, Heuristicand Starfish all rely on a unified and static task-levelparameter setting. Such unified configuration is appar-ently inefficient in the heterogeneous cluster. The resultsalso reveal that Starfish is more effective than Heuristicon the physical cluster since Starfish benefits from itsjob profiling capability. The learning process of Starfishis more accurate than the experience based tuning ofHeuristic to capture the characteristic of individual jobs.

Impact of job size. Figure 7 shows that Ant slightlyreduces the completion time of small jobs compared withstock Hadoop, e.g., J1, J4 and J7. In contrast, Ant is moreeffective for large jobs, e.g., J3, J6 and J9. This is dueto the fact that small jobs usually have short executiontimes and the self-tuning process can not immediatelyfind the optimal configuration solutions. Thus, suchsmall jobs are not favored by Ant.

Impact of workload type. Figure 7 reveals that Antreduces the job completion time of I/O intensive work-loads, i.e., Grep and Terasort, 10% more than that of CPUintensive workloads, i.e., Wordcount. This is due to thefact that Ant focuses on the task-level I/O operationparameter tuning and accordingly it affects more for I/Ointensive workloads.

Overall, Ant achieves consistently better performancethan Heuristic and Starfish do on the physical Hadoop

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25

io.s

ort

.re

co

rd.p

erc

en

t

Time (minutes)

Value

(a) Atom.

0

0.2

0.4

0.6

0.8

1

0 5 10 15 20 25

io.s

ort

.re

co

rd.p

erc

en

t

Time (minutes)

Value

(b) T110.

Fig. 8. Task-level parameter search process.

cluster. This is due to its capability of adaptively tuningtask-level parameters while considering various work-load preferences and heterogeneous platforms.

7.3 Ant Searching ProcessIn order to take a close look of Ant searchingprocess, Figure 8 depicts the parameterio.sort.record.percent searching path fromtwo representative machines ( i.e., Atom and T110 )in the physical cluster. Figure 8(a) shows that Atomspends around 20 minutes to find a stable parameterrange. In contrast, Figure 8(b) shows that T110 takesonly 7 minutes due to the fact that there are threeT110 machines in the cluster which provide three timesconcurrent running tasks of Atom’s. More parallel taskexecutions mean there are more opportunities to learnthe characteristics of the running job on the cluster inthe same time period.

8 EVALUATION ON VIRTUAL CLUSTERS8.1 Experiment SetupWe built a virtual cluster in our university cloud.VMware vSphere 5.1 was used for server virtualization.VMware vSphere module controls the CPU usage limitsin MHz allocated to VMs. We created a pool of VMs withdifferent hardware configurations from the virtualizedblade server cluster and run them as Hadoop slavenodes. All VMs ran Ubuntu Server 12.04 with Linuxkernel 3.2. The cluster-level configurations for Hadoopare the same as those in the physical cluster (Section 7.1).The number of reduce tasks is set to 42, which is 0.9 timesthe number of the available reduce slots in the cluster.

Stable interference scenario: Please refer to the pre-liminary study [11] for the effectiveness of Ant withstable interference.

11

0

25

50

75

100

0 40 80 120 160

CP

U u

tiliz

atio

n (

%)

Time (minute)

Cluster level CPU ultilization

Fig. 9. Dynamic interference in the cloud.

Dynamic interference scenario: In the virtual clusterthere are 24 VMs, each with 2 vCPU, 4 GB RAM and 80GB hard disk space, which are hosted on blade serversin our cloud. Unlike the stable interference scenario, wedo not run any specific applications in the cloud as thedynamic interference producer. Instead, there are variousapplications (e.g., scientific computations and multi-tierWeb applications) hosted in our university cloud overtime. These different type and time-varying applicationsplay the role of dynamic interference producer in thecloud environment. As shown in Figure 9, we record thecluster level CPU utilization to demonstrate the interfer-ence in the cloud. We set the initialized value of k to 2in Algorithm 3 and then dynamically divide the virtualcluster into a few subclusters by using the modified k-means clustering approach. The re-grouping interval isdynamically adjusted based on the Algorithm 4 so thatthe grouping of virtual nodes is updated dynamically tocapture the interference changes in the cloud.

We firstly demonstrate the performance improvementachieved by Ant on the virtual cluster with dynamicinterferences. We then evaluate the effectiveness of theproposed re-grouping approach in the dynamic interfer-ence environment. Finally, we analyze the effectivenessof the dynamic re-grouping interval selection.

8.2 Effectiveness of Ant under Dynamic InterferenceFigure 10 compares various job completion timesachieved by Heuristic, Starfish and Ant in the cloud withdynamic interference, respectively. Figure 10 demon-strates that all of these configuration approaches im-prove the job completion time compared with thatachieved by stock Hadoop parameter setting approach(Stock). The result shows that Ant improves the averagejob completion time by 20%, 15% and 11% comparedwith Stock, Heuristic and Starfish on the physical cluster,respectively. As described in the physical cluster sce-nario, Stock, Heuristic and Starfish all rely on a unifiedand static task-level parameter setting. Such unified con-figurations are apparently inefficient in a virtual clusterwith various interferences. In the cloud environment,task configurations should be changed in response tothe changes of interferences. The results also reveal thatHeuristic is more effective than Starfish on the virtualcluster. Although the learning based Starfish is more

0

0.3

0.6

0.9

1.2

1 2 3 4 5 6 7 8 9

JC

T n

orm

alized to D

ef

Job ID

Stock

Heuristic

Starfish

Ant

Fig. 10. Job completion time on the virtual cluster withdynamic performance interference.

accurate than the experience based Heuristic in thephysical cluster, it fails to capture the characteristic ofvarious dynamic interferences in the cloud environment.

Figure 11 shows the detailed impact on task-level per-formance while applying Ant in the cloud environment.Figure 11(a) depicts the breakdown of job completiontime of the three jobs (i.e., J3, J6 and J9) used in theexperiment. It reveals that Wordcount is map-intensivewhile Terasort and Grep are shuffle & reduce-intensive.Figure 11(b) shows the average task completions timeof map tasks and reduce tasks in the experiment. Theresult demonstrates that map tasks are relatively smallcompared to reduce tasks. This is due to the fact thatthe number of reduce tasks is configured to 0.9 times ofthe total number of reduce slots in the cluster. It aims tocomplete the execution of reduce tasks in one waive toavoid the overhead caused by data shuffle.

Figure 12 shows the job completion time improvementachieved by Genetic with/without clustering, Starfishwith/without clustering, and Heuristic with/withoutclustering approaches in a single heterogeneous clus-ter. The results demonstrate approaches with cluster-ing achieve better performance improvement than ap-proaches without clustering in heterogeneous environ-ments. We find that Genetic with clustering achievesslight performance improvement compared to the othertwo approaches. This is due to the fact that the maincontribution of Ant relies on the dynamic clusteringand task-level adaptive configuration in heterogeneousenvironments.

8.3 Effectiveness of Subcluster Re-groupingFigure 13(a) compares the job completion time improve-ment achieved by Ant with the dynamic subcluster re-grouping and static grouping capability. It shows thatAnt with the dynamic re-grouping capability can out-perform Ant with static grouping capability by almost100% in terms of the job completion time improvement.This is due to the benefit of re-grouping virtual nodesinto a number of homogeneous subclusters so as tomitigate the impact of interference on task tuning. Inthe approach of Ant without re-grouping, it uses a staticnumber of groups that is similar with the stable inter-ference scenario. Ant only groups the virtual nodes at

12

0

0.5

1

1.5

2

2.5

Wordcount Grep Terasort

Bre

akd

ow

n o

f JC

T (

ho

ur)

MapShuffle

Reduce

(a) Breakdown of JCT.

0

50

100

150

Wordcount Grep Terasort 0

12

24

36

TC

T o

f M

ap

ta

sk (

se

co

nd

)

TC

T o

f R

ed

uce

ta

sk (

min

ute

)

Workload types

MapReduce

(b) Task completion times.

Fig. 11. Impact on task-level performance.

the beginning of the task tuning process in the dynamicinterference environment. Thus, Ant without subclusterre-grouping capability cannot capture time-varying in-terference in the cloud. Figure 13(b) shows the impact ofthe number of groups on the job completion time underthe static grouping approach. We empirically select thestatic number of group (i.e., k = 2) for comparison withthe dynamic subcluster re-grouping capability.

Figure 13(c) shows that both the number of groupsand the re-grouping interval are dynamically adjustedbased on the time-varying interference in the cloud.Ant periodically collects the dynamic capacity changesof the virtual nodes and then re-groups virtual nodesaccording to Algorithms 3 and 4. Figure 13(c) depictsthe dynamic tuning process of Ant in a 160-minute timewindow, which is corresponding to the time-varyinginterference scenario as shown in Figure 9. The numberof groups is initialized as two at the beginning stage.It is then dynamically tuned based on the dynamiccapacity changes. The result shows that the number ofgroups keeps stable at two when the interference isrelatively stable, such as in the period between 120thand 160th minutes. However, the number of groupsfluctuates when the interference changes frequently, suchas in the period around 100th minute.

At the same time, the interval of re-grouping is adap-tively changed with the interference changes at the run-time. When the cloud interference fluctuates significantly(e.g., 90th to 110th minutes), the re-grouping interval be-comes small so that Ant can capture the capacity changesof the virtual nodes in the cluster. The re-groupinginterval is larger when the interference changes morefrequently in the could. It represents system performancesensitivity of the interference dynamics in our universitycloud.

8.4 Sensitivity of GA Parameter SelectionWe change the values of crossover probability and mu-tation rate to study their impact on the performance im-provement in terms of job completion time. Figure 14(a)shows that the job completion time initially decreases asthe crossover probability increases. However, increasingthe crossover probability further leads to performancedegradation. This tells that a very large crossover prob-ability may lead to job completion time deterioration.Thus, we empirically set the crossover probability to

0

10

20

30

40


JC

T im

pro

ve

me

nt

(%)

Workload types

Starfish without clusteringHeuristic without clusteringGenetic without clustering

(a) Without clustering.

0

10

20

30

40


JC

T im

pro

ve

me

nt

(%)

Workload types

Starfish with clusteringHeuristic with clusteringGenetic with clustering

(b) With clustering.

Fig. 12. Comparsion by applying different tuning ap-proaches in a heterogeneous cluster.

0.7 in the experiment. It is a tradeoff between thesearch speed and the job completion time improvement.Figure 14(b) shows tuning the mutation rate has thesimilar phenomenon with the crossover probability inthe experiment. A large mutation rate (e.g., 0.5) leadssignificant performance deterioration due to the instabil-ity of task-level configuration. Thus, we empirically setthe mutation rate to 0.2 without impacting convergencein the experiment.

9 RELATED WORK

Heterogeneous environment. As heterogeneous hard-ware is applied to Hadoop clusters, how to improveMapReduce performance in heterogeneous environ-ments attracts much attention [1], [2], [22], [23]. Ahmadet al. [1] identified key reasons for MapReduce poorperformance on heterogeneous clusters. Accordingly,they proposed an optimization based approach, Tarazu,to improve MapReuce performance by communication-aware load balancing. Zaharia et al. [2] designed a robustMapReduce scheduling algorithm, LATE, to improve thecompletion time of MapReduce jobs in a heterogeneousenvironment. They paid little attention to optimizingHadoop configurations, which has a significant impacton the performance of MapReduce jobs, especially in aheterogeneous Hadoop cluster.

Parameter configuration. Recently, a few studies startto explore how to optimize Hadoop configurations toimprove job performance. Herodotou et al. [7] proposedseveral automatic optimization based approaches forMapReduce parameter configuration to improve jobperformance. Kambatla et al. [24] presented a Hadoopjob provisioning approach by analyzing and comparingresource consumption of applications. It aimed to max-imize job performance while minimizing the incurredcost. Lama and Zhou designed AROMA [8], an approachthat automated resource allocation and configurationof Hadoop parameters for achieving the performancegoals while minimizing the incurred cost. Herodotouet al. proposed Starfish [6], an optimization frameworkthat hierarchically optimizes from jobs to workflows bysearching for good parameter configurations. These ap-proaches mostly rely on the default Hadoop frameworkand configure the parameters by static settings. They are

13

0

10

20

30


JC

T im

pro

vem

ent (%

)

Workloads

Static groupingDynamic re-grouping

(a) Improvement comparison.

0.7

0.8

0.9

1

1.1

1.2

1 2 3 4

Norm

aliz

ed J

CT

Number of groups (k)

(b) Static k value selection.

0

10

20

30

40

0 40 80 120 160 0

1

2

3

4

Tim

e inte

rval (m

inute

)

Gro

up n

um

ber

Time (minute)

Re-grouping intervalNumber of groups

(c) Dynamic re-grouping adjustment.

Fig. 13. Performance improvement comparison between static grouping and dynamic re-grouping approach.

0.6

0.8

1

1.2

1.4

0.5 0.6 0.7 0.8 0.9

No

rma

lize

d J

CT

Crossover probability (pc)

(a) Impact of pc.

0.8

1

1.2

1.4

1.6

0.1 0.2 0.3 0.4 0.5

No

rma

lize

d J

CT

Mutation rate (pm)

(b) Impact of pm.

Fig. 14. Sensitivity of the crossover probability and muta-tion rate tuning.

often not effective when the cluster platform becomesheterogeneous.

MapReduce in the cloud. Currently, there are severaloptions for using MapReduce in the cloud environments,such as using MapReduce as a service, setting up one’sown MapReduce cluster on cloud instances, or usingspecialized cloud MapReduce runtimes that take ad-vantage of cloud infrastructure services. Guo et al. [25]designed FlexSlot, an effective yet simple extension tothe slot-based Hadoop task scheduling framework. Itadaptively changes the number of slots on each virtualnode to promote efficient usage of resource pool in cloudenvironment. Chiang et al. [20] presented TRACON, anovel task and resource allocation control frameworkthat mitigated the interference effects from concurrentdata-intensive applications. Ant differentiates itself fromthose efforts through its capability of adaptive task-leveltuning to achieve performance optimization.

10 CONCLUSIONS AND FUTURE WORK

Although a unified design framework, such as MapRe-duce, is convenient and easy to use for large-scaleparallel and distributed programming, it ignores thedifferentiated needs in the presence of various platformsand workloads. In this paper, we tackle a practical yetchallenging problem of automatic configuration of large-scale MapReduce workloads in heterogeneous environ-ments. We have proposed and developed a self-adaptivetask-level tuning approach, Ant, that automatically findsthe optimal settings for individual jobs running on het-erogeneous nodes. In Ant, tasks are customized with

different settings to match the capabilities of heteroge-neous nodes. It works best for large jobs with multiplerounds of map task execution. Our experimental resultsdemonstrate that Ant can improve the average job com-pletion time on a physical cluster by 31%, 20%, and 14%compared to stock Hadoop, customized Hadoop withindustry recommendations, and a profiling-based config-uration approach, respectively. Experimental results ontwo virtual cloud clusters with varying multi-tenancyinterferences show that Ant improves the average jobcompletion time by 20%, 15%, and 11% compared toStock, Heuristic and Starfish, respectively. Ant can bedeployed to different types of clusters, and thus isflexible and adaptive.

Our method Ant can be extended to other frameworkssuch as Spark, though some additional effort is needed.Different from Hadoop, which executes individual tasksin separate JVMs, Spark uses executors to host multipletasks on worker nodes. To extend Ant to Spark, we needto dynamically change executor sizes without restart-ing a launched job. Since running Spark on anothergeneric cluster management middleware, such as YARN,becomes increasingly popular, it is possible to enablemalleable executors using resource containers. As such,Ant can monitor the completion times of individualtasks and use such information as feedback to determinethe optimal size of Spark executors. In the future, wealso plan to extend Ant to multi-tenancy public cloudenvironments such as Azure and EC2.

ACKNOWLEDGEMENT

This research was supported in part by U.S. NSF researchgrants CNS-1422119, CNS-1320122, CNS-1217979, andNSF of China research grant 61328203. A preliminaryversion of the paper appeared in [11]. The authors aregrateful to the editor and anonymous reviewers for theirvaluable suggestions for revising the manuscript.

REFERENCES

[1] F. Ahmad, S. Chakradhar, A. Raghunathan, and T. N. Vijaykumar.Tarazu: optimizing mapreduce on heterogeneous clusters. In Proc.of Intl Conf. on Architecture Support for Programming Language andOperating System (ASPLOS), 2012.

14

[2] M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. Im-proving mapreduce performance in heterogeneous environments.In Proc. of USENIX Symposium on Operating System Design andImplementation (OSDI), 2008.

[3] D. Cheng, C. Jiang, and X. Zhou. Resource and deadline-awarejob scheduling in dynamic hadoop clusters. In Proc. of IEEE IntlSymposium on Parallel and Distributed Processing (IPDPS), 2015.

[4] D. Cheng, P. Lama, C. Jiang, and X. Zhou. Towards energyefficiency in heterogeneous hadoop clusters by adaptive task as-signment. In Proc. of IEEE Intl Conference on Distributed ComputingSystems (ICDCS), 2015.

[5] H. Herodotou, F. Dong, and S. Babu. No one (cluster) size fitsall: Automatic cluster sizing for data-intensive analytics. In Proc.of ACM Symposium on Cloud Computing (SoCC), 2011.

[6] H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong, F. B. Cetin,and S. Babu. Starfish: A self-tuning system for big data analytics.In Proc. of Conference on Innovative Data Systems Research (CIDR),2011.

[7] H. Herodotou and S. Babu. Profiling, what-if analysis, and cost-based optimization of mapreduce programs. In Proc. of Int Conf.on Very Large Data Bases (VLDB), 2011.

[8] P. Lama and X. Zhou. Aroma: Automated resource allocation andconfiguration of mapreduce environment in the cloud. In Proc. ofIntl Conf. on Autonomic computing (ICAC), 2012.

[9] M. Li, L. Zeng, S. Meng, J. Tan, L. Zhang, N. Fuller, and A. R. Butt.mronline: Mapreduce online performance tuning. In Proc. of ACMSymposium on High-Performance Parallel and Distributed Computing(HPDC), 2014.

[10] T. White. Hadoop: The Definitive Guide, 3rd ed. OReilly Media /Yahoo Press, 2012.

[11] D. Cheng, J. Rao, Y. Guo, and X. Zhou. Improving mapreduceperformance in heterogeneous environments with adaptive tasktuning. In Proc. of ACM/IFIP/USENIX Intl Middleware Conference(Middleware), 2014.

[12] Cloudera. Configuration parameters.http://blog.cloudera.com/blog/author/aaron/, 2012.

[13] MapR. The executives guide to big data.http://www.mapr.com/resources/white-papers, 2013.

[14] PUMA. Purdue mapreduce benchmark suite. 2012.[15] Y. Guo, J. Rao, and X. Zhou. ishuffle: Improving hadoop perfor-

mance with shuffle-on-write. In Proc. of Intl Conf. on Autonomiccomputing (ICAC), 2013.

[16] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A fast andelitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. onEvolutionary Computation, vol. 6, pp. 182197, 2002.

[17] B. Igou. User survey analysis: Cloud-computing budgets aregrowing and shifting; traditional it services providers must pre-pare or perish. gartner Report, 2010.

[18] B. Palanisamy, A. Singh, and L. Liu. Cost-effective resourceprovisioning for mapreduce in a cloud. IEEE Trans. on Paralleland Distributed Systems (TPDS), 2014.

[19] B. Sharma, T. Wood, and C. R. Das. Hybridmr: A hierarchicalmapreduce scheduler for hybrid data centers. In Proc. of IEEEIntl Conference on Distributed Computing Systems (ICDCS), 2013.

[20] R. C. Chiang and H. H. Huang,. Interference-aware scheduling fordata-intensive applications in virtualized environments. In Proc. ofIntl Conference for High Performance Computing, Networking, Storageand Analysis (SC), 2011.

[21] B. Cho, M. Rahman, T. Chajed, I. Gupta, C. Abad, N. Roberts,and P. Lin. Natjam: Eviction policies for supporting prioritiesand deadlines in mapreduce clusters. In Proc. of ACM Symposiumon Cloud Computing (SoCC), 2013.

[22] D. Cheng, J. Rao, C. Jiang, and X. Zhou. Elastic power-aware resource provisioning of heterogeneous workloads in self-sustainable datacenters. IEEE Trans. on Computers (TC), 2016.

[23] D. Cheng, C. Jiang, and X. Zhou. Heterogeneity-aware workloadplacement and migration in distributed sustainable datacenters.In Proc. of IEEE Intl Symposium on Parallel and Distributed Processing(IPDPS), 2014.

[24] K. Kambatla, A. Pathak, and H. Pucha. Towards optimizinghadoop provisioning in the cloud. In Proc. of USENIX HotCloudWorkshop, 2009.

[25] Y. Guo, J. Rao, C. Jiang, and X. Zhou. Moving mapreduce intothe cloud with flexible slot management. In Proc. of Intl Conferencefor High Performance Computing, Networking, Storage and Analysis(SC), 2014.

Dazhao Cheng received his B.S. and M.S. de-grees in Electronic Engineering from Hefei Uni-versity of Technology in 2006 and the Universityof Science and Technology of China in 2009,respectively. He received his Ph.D. degree fromthe University of Colorado, Colorado Springs in2016. He is currently an Assistant Professor inthe Department of Computer Science at the Uni-versity of North Carolina, Charlotte. His researchinterests include cloud computing and Big Dataprcoessing. He is a member of the IEEE.

Jia Rao received his B.S. and M.S. degreesin Computer Science from Wuhan University in2004 and 2006, respectively, and Ph.D. degreefrom Wayne State University in 2011. He iscurrently an Assistant Professor in the Depart-ment of Computer Science at the University ofColorado, Colorado Springs. His research inter-ests include the areas of distributed systems,resource auto-configuration, machine learningand CPU scheduling on emerging multi-coresystems. He is a member of the IEEE.

Yanfei Guo received his B.S. degree in Com-puter Science and Technology from HuazhongUniversity of Science and Technology, China,in 2010, and Ph.D. degree in Computer Sci-ence from the University of Colorado, ColoradoSprings in 2015. He is currently a Postdoc Fellowin the Argonne National Lab. His research inter-ests include cloud computing, big data process-ing and MapReduce, and HPC. He is a memberof the IEEE.

Changjun Jiang received the Ph.D. de-gree from the Institute of Automation, ChineseAcademy of Sciences, Beijing, China, in 1995.Currently he is a Professor with the Departmentof Computer Science, Tongji University, Shang-hai. He is also the Director of Professional Com-mittee of Petri Net of China Computer Federationand the Vice Director of Professional Committeeof Management Systems of China AutomationFederation. His current areas of research areconcurrent theory, Petri net, and intelligent trans-

portation systems. He is a member of the IEEE.

Xiaobo Zhou obtained the BS, MS, and PhDdegrees in Computer Science from Nanjing Uni-versity, in 1994, 1997, and 2000, respectively.Currently he is a Professor and the Chair of theDepartment of Computer Science, University ofColorado, Colorado Springs. His research liesbroadly in computer network systems, specifi-cally, Cloud computing and datacenters, BigDataparallel and distributed processing, autonomicand sustainable computing, scalable Internetservices and architectures. He was a recipient of

the NSF CAREER Award in 2009. He is a senior member of the IEEE.

Date post:	23-Aug-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

Improving Performance of Heterogeneous MapReduce Clusters ... · as Hadoop still perform poorly in...

Documents