+ All Categories
Home > Documents > EAalo: Enhanced Cofiow Scheduling Without Prior …netarchlab.tsinghua.edu.cn › ~junbi ›...

EAalo: Enhanced Cofiow Scheduling Without Prior …netarchlab.tsinghua.edu.cn › ~junbi ›...

Date post: 06-Jul-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
6
2017 IEEE Symposium on Computers and Communications (ISCC) EAalo: Enhanced Cofiow Scheduling Without Prior Knowledge in a Datacenter Network Zhaogeng Li*, Jun Bi*, Yiran Zhang*, Chuang Wang t *Institute for Network Sciences and Cyberspace, Tsinghua University, Beijing, China *Department of Computer Science and Technology, Tsinghua University, Beijing, China *Tsinghua National Laboratory for Information Science and Technology (TNList), Beijing, China tHuawei Corporation, Beijing, China [email protected], [email protected], [email protected], [email protected] Abstract-Coflow scheduling without prior knowledge has been proposed recently. However, the previous solution, Aalo, has two problems: it does not explicitly control the flow rate; it assumes the network is ideally non-blocking (with perfect traffic balancing and no oversubscription). In this paper, we show the performance loss caused by the above two problems. We propose EAalo to cope with the problems. It has three enhancements to Aalo: per-flow bandwidth enforcement, centralized traffic balancing and oversubscription adaption. We evaluate EAalo with simulations. The simulation results show that EAalo has much better performance than Aalo. It can speed up coflow completion by up to 11 %, 17% and 19% in non-blocking networks, networks without oversubscription and networks with 2:1 oversubscription respectively. Index Terms-Datacenter Network, Coflow Scheduling, Traffic Balancing, Oversubscription I. INTRODUCTION A lot of data-intensive applications are running in current datacenters. The communication of these applications takes place between groups of servers in successive computation stages (e.g. MapReduce). Often a computation stage cannot finish until all the flows in this stage have completed. All flows in the same stage are abstracted as a coflow [1]. Decreasing coflow completion time (CCT) can make data-intensive jobs complete faster. This requirement leads to the emerging of coflow scheduling. Recently, many coflow scheduling solutions have been pro- posed. Some of them use FIFO scheduling [2], [3]. Some of them use smallest-bottleneck-first and smallest-total-size- first scheduling [4]. However, these solutions require some prior knowledge, like flow size. Aalo [5] proposes a coflow scheduling without prior knowledge, which is more suitable for the scenarios when the above information is not known in advance [6]. Although Aalo has a good performance, there remain two problems. First, Aalo only controls ON/OFF state of flows and does not use bandwidth enforcement. This is mainly because accurate flow rate does not result in accurate flow completion time since there is no prior knowledge about flow size. Second, Aalo assumes the network fabric is non- blocking. Due to imperfect traffic balancing and the existence of oversubscription, this assumption is not valid. 978-1-5386-1629-1/17/$31 .00 ©2017 IEEE In this paper, we focus on the impacts of the above prob- lems and propose a coflow scheduling solution extended with three enhancements, called EAalo (Enhanced Aalo). First, we uncover the performance degradation when there is no explicit flow rate control. We argue per-flow bandwidth enforcement should be employed. Second, we show the performance loss when only ECMP is used for traffic balancing. We sug- gest to use a centralized flow scheduler to calculate flow forwarding paths with Largest Best Fit (LBF). Third, we show the mismatching between Aalo coordination and network oversubscription. We propose a modified coordination based on abstract topology to handle oversubscription. We evaluated EAalo with simulations. The simulation re- sults suggest EAalo has much better performance than Aalo. It can speed up coflow completion by up to 11 %, 17% and 19% in non-blocking networks, networks without oversubscription and networks with 2: 1 oversubscription respectively. In partic- ular, LBF-based flow scheduling can achieve approximate per- formance to the optimal traffic balancing. The simulation also shows that the coordination interval of tens of milliseconds is suitable for hundreds of servers and hundreds of coflows. This paper has two main contributions. First, we high- light the performance degradation of Aalo caused by the two problems: absence of explicit flow control and invalid assumption of non-blocking fabric. Second, we propose three enhancements to cope with these problems. Note that the three enhancements does not depend on each other. One can use some of the enhancements according to the actual situation to get better performance than Aalo only. The rest of this paper is organized as follows. Section II introduces the background knowledge about coflow schedul- ing, especially Aalo. We show the performance degradation of Aalo and describe the enhancements of EAalo in Section III. Section IV is the evaluation of EAalo. The related works are shown in Section V. We conclude this paper in Section VI. II. BACKGROUND A coflow is a set of flows, which belongs to the same job (e.g. flows in the shuffle stage of a MapReduce job). Performance of this job depends on the longest completion time of these flows. The aim of coflow scheduling is mainly
Transcript
Page 1: EAalo: Enhanced Cofiow Scheduling Without Prior …netarchlab.tsinghua.edu.cn › ~junbi › ISCC-2017.pdfnetworks, networks without oversubscription and networks with 2:1 oversubscription

2017 IEEE Symposium on Computers and Communications (ISCC)

EAalo: Enhanced Cofiow Scheduling Without Prior Knowledge in a Datacenter Network

Zhaogeng Li*, Jun Bi*, Yiran Zhang*, Chuang Wangt *Institute for Network Sciences and Cyberspace, Tsinghua University, Beijing, China

*Department of Computer Science and Technology, Tsinghua University, Beijing, China *Tsinghua National Laboratory for Information Science and Technology (TNList), Beijing, China

tHuawei Corporation, Beijing, China [email protected], [email protected], [email protected], [email protected]

Abstract-Coflow scheduling without prior knowledge has been proposed recently. However, the previous solution, Aalo, has two problems: it does not explicitly control the flow rate; it assumes the network is ideally non-blocking (with perfect traffic balancing and no oversubscription). In this paper, we show the performance loss caused by the above two problems. We propose EAalo to cope with the problems. It has three enhancements to Aalo: per-flow bandwidth enforcement, centralized traffic balancing and oversubscription adaption. We evaluate EAalo with simulations. The simulation results show that EAalo has much better performance than Aalo. It can speed up coflow completion by up to 11 %, 17% and 19% in non-blocking networks, networks without oversubscription and networks with 2:1 oversubscription respectively.

Index Terms-Datacenter Network, Coflow Scheduling, Traffic Balancing, Oversubscription

I. INTRODUCTION

A lot of data-intensive applications are running in current datacenters. The communication of these applications takes place between groups of servers in successive computation stages (e.g. MapReduce). Often a computation stage cannot finish until all the flows in this stage have completed. All flows in the same stage are abstracted as a coflow [1]. Decreasing coflow completion time (CCT) can make data-intensive jobs complete faster. This requirement leads to the emerging of coflow scheduling.

Recently, many coflow scheduling solutions have been pro­posed. Some of them use FIFO scheduling [2], [3]. Some of them use smallest-bottleneck-first and smallest-total-size­first scheduling [4]. However, these solutions require some prior knowledge, like flow size. Aalo [5] proposes a coflow scheduling without prior knowledge, which is more suitable for the scenarios when the above information is not known in advance [6].

Although Aalo has a good performance, there remain two problems. First, Aalo only controls ON/OFF state of flows and does not use bandwidth enforcement. This is mainly because accurate flow rate does not result in accurate flow completion time since there is no prior knowledge about flow size. Second, Aalo assumes the network fabric is non­blocking. Due to imperfect traffic balancing and the existence of oversubscription, this assumption is not valid.

978-1-5386-1629-1/17/$31 .00 ©2017 IEEE

In this paper, we focus on the impacts of the above prob­lems and propose a coflow scheduling solution extended with three enhancements, called EAalo (Enhanced Aalo). First, we uncover the performance degradation when there is no explicit flow rate control. We argue per-flow bandwidth enforcement should be employed. Second, we show the performance loss when only ECMP is used for traffic balancing. We sug­gest to use a centralized flow scheduler to calculate flow forwarding paths with Largest Best Fit (LBF). Third, we show the mismatching between Aalo coordination and network oversubscription. We propose a modified coordination based on abstract topology to handle oversubscription.

We evaluated EAalo with simulations. The simulation re­sults suggest EAalo has much better performance than Aalo. It can speed up coflow completion by up to 11 %, 17% and 19% in non-blocking networks, networks without oversubscription and networks with 2: 1 oversubscription respectively. In partic­ular, LBF-based flow scheduling can achieve approximate per­formance to the optimal traffic balancing. The simulation also shows that the coordination interval of tens of milliseconds is suitable for hundreds of servers and hundreds of coflows.

This paper has two main contributions. First, we high­light the performance degradation of Aalo caused by the two problems: absence of explicit flow control and invalid assumption of non-blocking fabric. Second, we propose three enhancements to cope with these problems. Note that the three enhancements does not depend on each other. One can use some of the enhancements according to the actual situation to get better performance than Aalo only.

The rest of this paper is organized as follows. Section II introduces the background knowledge about coflow schedul­ing, especially Aalo. We show the performance degradation of Aalo and describe the enhancements of EAalo in Section III. Section IV is the evaluation of EAalo. The related works are shown in Section V. We conclude this paper in Section VI.

II. BACKGROUND

A coflow is a set of flows, which belongs to the same job (e.g. flows in the shuffle stage of a MapReduce job). Performance of this job depends on the longest completion time of these flows. The aim of coflow scheduling is mainly

Page 2: EAalo: Enhanced Cofiow Scheduling Without Prior …netarchlab.tsinghua.edu.cn › ~junbi › ISCC-2017.pdfnetworks, networks without oversubscription and networks with 2:1 oversubscription

2017 IEEE Symposium on Computers and Communications (ISCC)

to reduce total coflow compLetion time (CCT)i. A coflow scheduling system often consists of two parts: coflow dae­mons at servers and a centralized coflow coordinator. Coflow daemons control the flows directly (e.g. set ON/OFF state). The coflow coordinator calculates the gLobal scheduling pLan periodically (e.g. O(lOms» and distributes coordination results to coflow daemons.

Besides FIFO, coflow scheduling can aLso use the shortest­job-first policy. Take Varys [4] as an example, there are mainly two principles: 1) Bandwidth allocated to the shortest coflow should be first guaranteed (smallest-totaL-size-first); 2) Bandwidth allocated to the Longest flow in one coflow shouLd be first guaranteed (smallest-bottLeneck-first). However, this kind of scheduling requires prior knowledge to find the shortest coflow and the longest flow. If there is no such prior knowledge, we have to use other methods to accelerate coflows.

Aalo, as an efficient coflow scheduling without prior knowledge, uses D-CLAS (Discretized Coflow-Aware Least­Attained Service) to differentiate long coflows from short ones during centralized coordination. D-CLAS is a set of coflow queues, each of which is associated with a nonoverlapping coflow size range. Every coflow is put into one of the queues according to the current coflow size. Coflows in the queue associated with smaller coflow size have higher priority. For coflows in the same queue, Lower coflow ID (arriving earLier) results in higher priority (FIFO). With the heLp of D-CLAS, all the coflows can be sorted by their priority.

In AaLo, the centralized coordinator schedules coflows one by one in the descending order of D-CLAS priority. For one coflow, AaLo uses max-min fair share to allocate remaining bandwidth to all contained flows. After the coordination, the coordinator sends the information about the flows which get non-zero expected rates to coflow daemons. The coflow daemons at servers can set these flows to ON state (other flows are set to OFF state). If a flow finishes, a coflow daemon can set the next flow in the D-CLAS priority order which shares the same destination to ON state, before the next coordination is done.

Aalo does not require prior knowledges (except the infor­mation which coflow a flow belongs to). It has good statisticaL performance. This is because: 1) The coflow with higher priority has higher probabiLity to be smaller (smallest-totaL­size-first); 2) The max-min fair share among flows in one coflow can reduce the Longest flow compLetion time in the worst case (smallest-bottLeneck-first). The above two essentiaLs help AaLo to approximate the scheduling of Varys.

However, AaLo still has two probLems. First, the coflow daemons do not use explicit rate controL, which means the actuaL flow rate may be different from the expected vaLue. Second, it assumes the network fabric is non-bLocking, which requires perfect traffic balancing and no oversubscription. In the following, we show the impacts of these two problems. In

iSometimes the aim of collow scheduling is to make as many collows as possible meet their deadlines. In this paper, we only consider the aim of minimizing total CCT.

. coflow-· . cofio~- . -Cofio~ ·

Daemon Daemon Daemon Daemon

Servers

, l.:'E~t!~c.! :r2p£1?E~

Coil ow Coordinator

Fig. 1. EAalo architecture. It consists of three parts: coflow daemons, a flow scheduler and a collow coordinator. The main enhancements to Aalo include: I) bandwidth enforcement in coflow daemons; II) the flow scheduler on the Open flow controller; III) coordination based on an abstract topology in the collow coordinator.

order to cope with them, we aLso propose the corresponding enhancements. We use EAaLo (Enhanced AaLo) to denote the coflow scheduling with these enhancements.

Besides minimizing average CCT, coflow scheduling has to avoid coflow starvation. In AaLo, this is guaranteed by weighted sharing among different coflow queues in D-CLAS. It onLy changes the totaL bandwidth avaiLabLe for each queue in D-CLAS. The enhancements introduced in this paper do not affect the weighted sharing. For the consideration of simplicity, we ignore weighted sharing in the rest of this paper.

III. ENHANCEMENTS IN EAALO

As shown in Fig. 1, EAalo consists of three parts: coflow daemons, a flow scheduler and a coflow coordinator. Cor­respondingLy, EAalo has three enhancements to Aalo. First, coflow daemons use per-flow bandwidth enforcement to ex­plicitLy controL flow rate. Second, the flow scheduLer running on the controller baLances traffic with Largest Best Fit (LBF), and distributes flow entries to Openflow switches to explicitly specify flow forwarding paths. Third, the coflow coordinator uses abstract topoLogy to handLe oversubscription.

A. Enhancement I: Bandwidth Enforcement

Due to Lack of prior knowLedge, AaLo is not designed to explicitLy controL flow rate despite the flow rate is aLready calcuLated during the coordination. It is reasonabLe to the fact that accurate rate controL does not result in accurate flow com­pLetion time because flow size is unpredictable. However, the absence of explicit flow rate control may introduce interference of coflows.

In order to understand the interference, we take Fig. 2 as an example. Assuming Coflow 1 has higher priority than Coflow2 after D-CLAS sorting. After coordination, all the four flows (Fl , F2 , F3 and F4) are permitted to be transmit (expected flow rate is Larger than 0). Since Fl and F2 share one link, the expected flow rates of the two flows are both 1/2 (we ignore the unit here, which is the capacity of access links). However, F3 and F4, which are also admitted as a Lower expected flow rate (1 / 4), will interfere with Fl.

Page 3: EAalo: Enhanced Cofiow Scheduling Without Prior …netarchlab.tsinghua.edu.cn › ~junbi › ISCC-2017.pdfnetworks, networks without oversubscription and networks with 2:1 oversubscription

2017 IEEE Symposium on Computers and Communications (ISCC)

Fig. 2. Coflow interference introduced by the absence of bandwidth enforce­ment. 51, 52, 53, 5 4 are all servers. Coflowl (Fl and F2) and Coflow2 (F3 and F4) are permitted to be transmit. Coflow 1 has higher priority than Coflow2. Each flow is notated by Fi : ri (rD. Here ri is the expected flow rate of Fi and r ; is the actual flow rate of Fi after fair sharing.

The impact of the interference is uncertain (depends on flow size). Here we use S i to denote the size of Fi . If Sl = S2 = 1 and S3 = S 4 = 3, the expected total CCT is 2 + 7 = 9, while the actual value is 3 + 7 = 10. It means the interference increases the total CCT. However, if Sl = 1, S2 = 2 and S3 = S4 = 3, the total CCT will not be changed (4 + 7 = 11). If S2 = 3 and Sl = S3 = S 4 = 1, the total CCT will be even decreased (from 6 + 4 = 10 to 6 + 3 = 9). Since Aalo does not know the accurate flow size in advance, it does not explicitly control flow rate.

Although the result of coflow interference is uncertain in the above case, it should still be avoided as far as possible. The reason is that coflows with higher priority have high probability of being small, which is the essential principle of Aalo. As a result, coflow interference will statistically (not always) increase the total coflow completion time, rather than decrease it or make it remain unchanged. In one of our simulations, for 128 servers and 100 coflows starting at the same time, the total CCT is increased from expected 791 s to 886s due to the absence of explicit flow rate control. Therefore, we argue that per-flow bandwidth enforcement is significant in coflow scheduling, even when there is no prior knowledge of flow size. The bandwidth enforcement can be implemented by Netfilter Hooker and Linux TC [7] or just window size limitation [8].

B. Enhancement II: Centralized Traffic Balancing

Aalo supposes the datacenter network has a non-blocking fabric. That means the bottleneck of bandwidth requirements can only happen in the access links. However, congestion in the fabric is not avoidable due to imperfect traffic balancing, even in a datacenter without oversubscription. If the fabric uses ECMP (based on 5-tuple hashing, widely used in current datacenters [9], [10]), the actual rates of one flow may be smaller than the expected value because of congestion in the fabric [11] . This congestion will result in performance loss. In one of our simulations, for 128 servers in an 8-ary fat-tree topology [12] and 100 coflows starting at the same time, the total CCT is increased from expected 791s to 887s when using ECMP for traffic balancing.

In order to cope with this problem, we suggest to use a centralized flow scheduling system like Hedera [11] (see in

Fig. 1). There is a centralized flow scheduler running on the Openflow controller which gets the coflow coordinating results as input and calculates the flow forwarding paths. After that, the scheduler distributes the corresponding flow table entries to Openflow switches in the fabric. Different from scheduler in Hedera, the flow scheduler does not need to estimate the bandwidth requirements of each flow, because the coflow coordinator has already calculated the expected flow rates.

In EAalo, the flow scheduler employs a heuristic algorithm named LBF (Largest Best Fit) to decide which path a flow should use. Algorithm 1 shows the procedure. p.capacity denotes the capacity of path p; p.used denotes the currently allocated bandwidth of path p; p.expect denotes the expected bandwidth allocation of path p (more details are in Section III-C). First, LBF sorts the pending flows by the expected rate in descending order (line 1). Second, for each flow, LBF will search paths in the order by path ID (line 4). A path ID is the link ID concatenation of fabric links in the path. That means the links which have been used already (with smaller remaining bandwidth) can be searched first. If there is a path that has sufficient unused bandwidth, LBF will assign the path to this flow (line 5-7). Otherwise LBF will assign a path with largest unused bandwidth to it (line 9-11). With LBF, the bandwidth consumption on different paths can keep close to each other, which means other flows not belonging to any coflow can avoid congestion as far as possible.

Algorithm 1 Largest Best Fit (LBF)

1: sort flows by expected rate in the descending order 2: for Flow f in flows do 3: max +--- 0 4: for Path p in P do I> P is in the order of path ID 5: if p.expect - p.used > f.rat e then 6: f.path +--- p 7: break 8: else 9: if max < p.capacity - p.used then

10: max +--- p.capacity - p.used 11: '[i +--- p

12: if f.path is empty and max > 0 then 13: f.path +--- '[i 14: if f.rate + '[i.used > '[i. capacity then 15: f.rat e +--- '[i. capacity - '[i.used

16: f.path.used +--- f.path.used + f.rat e

C. Enhancement III: Oversubscription Supporting

Non-blocking fabric also means no oversubscription. How­ever, oversubscription exists in some of the current datacenters. In the oversubscription case, the coordination results in Aalo are incorrect because the bandwidth bottleneck may happen in the fabric. In one of our simulations, for the network shown in Fig. 3 (2: 1 oversubscription), if 100 coflows start at the same time, the total CCT is 1005s even enhancement I and II are used, while the expected total CCT given by Aalo is 791s.

Page 4: EAalo: Enhanced Cofiow Scheduling Without Prior …netarchlab.tsinghua.edu.cn › ~junbi › ISCC-2017.pdfnetworks, networks without oversubscription and networks with 2:1 oversubscription

2017 IEEE Symposium on Computers and Communications (ISCC)

All links lOG bps

Core Switches

Servers

I Abstract Topology

------

Servers Servers Servers

Fig. 3. An example of abstract topology. In the original topology, there are 128 servers In total. Oversubscription (2: 1) happens at the edge-tier.

To cope with this problem, the internal topology must be taken into consideration during the flow rate calculation in the coflow coordination. The complexity of flow rate calculation will be increased dramatically because more links and uncer­tain forwarding are introduced. To simplify the problem, we propose an algorithm based on the abstract topology, which gets rid of uncertain forwarding and reduces the number of links.

For most current datacenter networks, the topologies are multi-tier and group-based (e.g. [l3]). Group-based means that switches in one tier can be classified into different groups. For any two switches in one group, the switches/servers in the lower-tier connecting to one of them must connect to the other one. This kind of topology is also called Folded Clos Network (FCN) [14]. For an FCN network, we get the abstract topology (a tree) in the following steps:

1) Merging equivalent switches in the same tier into one abstract switch. Here we define equivalent switches as the switches which have the same possibility for any flow to traverse any of them. The links connecting to the original switches are inherited to the merged abstract switches.

2) Merging all the links which have the same ends into one abstract link. The capacity of this abstract link is the sum of the capacity of the original links.

For example, for the topology shown in Fig. 3, the ab-

Algorithm 2 Flow rate calculation.

1: reset l.remaining for each link l in AbstractTopology 2: for coflow c in the Queues do [> same order with Aalo 3: for flow f in c do 4: find path j.path for f 5: for link l in j.path do 6: l.flows.append(f)

7:

8:

9:

10:

11:

12:

13:

while not all flows are set do find link I with the lowest I.remain ing

I. J lows .size O for flow f in i.flows do

j.rate +- l.rem aining I.Jlows .sne{)

for link k in j.path do k.remaining +- k.remaining - j.rate k·flows.remove(f)

TABLE I NOTATIONS USED IN S ECTION IV.

Notation Meaning A(I) Aalo with enhancement I A(I+II) Aalo with enhancement I and II EA(OTB) EAalo with optimal traffic balancing

stract topology has only one abstract core switch and four abstract aggregate switches. All edge switches are kept in the abstract topology. The uplinks of abstract edge switches have a capacity of 40Gbps because four links are merged into one. Similarly, the uplinks of abstract aggregate switches have a capacity of 160Gbps. The number of (directed) links in the abstract topology is reduced from 512 (in the original topology) to 296.

After getting abstract topology, we can easily extend flow rate calculation, which is shown in Algorithm 2. Since there is only one path for any flow in the abstract topology, the path calculation is very simple. Note that bandwidth should be fairly shared by all flows in one coflow. We emphasize this because we have found a bug in the published Aalo source code2 that the bandwidth is fairly shared by all flows in one coflow which shares the same ends. This bug causes incomplete bandwidth allocation. It should be fixed in EAalo because bandwidth enforcement is used. After the above calculation, the expected bandwidth allocation of each link is figured out, which is used for the value of p.expect in Algorithm 1.

IV. EVALUATION

A. Methodology

We evaluate EAalo with a flow-level simulator (extended from CofiowSim3). The coflow traces are generated with the workload patterns uncovered in [5] . We let all coflows start at the same time and compare total CCT under different coflow numbers. The inteval of coordination is 20ms. To eliminate

2https://github.com!cof!ow/aalo. 3https://github.com!cof!ow/cof!owsim

Page 5: EAalo: Enhanced Cofiow Scheduling Without Prior …netarchlab.tsinghua.edu.cn › ~junbi › ISCC-2017.pdfnetworks, networks without oversubscription and networks with 2:1 oversubscription

2017 IEEE Symposium on Computers and Communications (ISCC)

1~0ir=======~--'----'----~ 18000ir=======~--'----'-----:iI~

500

12000 - AaJo _ A(I)

500

16000 _ Aalo

14000 _ A(I)

3 12000 ti 10000

~ 8000 ~ 6000 L'c:::-~::cc-~'-c'.

500 Coflow Number Coflow Number Coflow Number

(a) Non-blocking Network (b) Network without Oversubscription (c) Network with 2:1 Oversubscription

Fig. 4. Total coil OW completion time in different kinds of networks.

the impact of randomness, for each coflow number, we have several coflow sets. The total CCT shown in the following is the average value and the error bar (95% confidence interval). In addition to Aalo and EAalo, we also compared some intermediate solutions, shown in Table I. Note that if the network fabric is non-blocking, A(I) = EAalo; and if the network has no oversubscription, A(I+II) = EAalo. Besides, we tried to apply EAalo coordination in a non-blocking network to get the ideal result of EAalo with optimal traffic balancing (denoted by EA(OTB)), which can be viewed as the objective of centralized traffic balancing.

We also evaluated the computation time of coordination and flow scheduling in EAalo, which determines the shortest co­ordination interval. This time reflects the scalability of EAalo. We perform simulations in different scales (the topologies are similar to that in Fig. 3). The CPU we use is Intel i7-4600 (2.lOGHz).

B. Non-blocking Fabric

First, we evaluate Aalo and EAalo in a totally non-blocking fabric. Fig. 4(a) shows the simulation results. The performance gap between two solutions demonstrates the significance of explicit flow rate control. Take 300 coflows as an example, Aalo needs 4683s to complete all coflows, while EAalo only needs 4229s (1.11 x faster).

C. Datacenter Networks Without Oversubscription

Then we evaluate EAalo in a datacenter network without oversubscription. Here we use an 8-ary fat-tree topology (128 servers and all links are lOGbps). Fig. 4(b) shows the simulation results. EAalo has much better performance than Aalo. The performance gap between A(I) and EAalo proves that centralized traffic balancing with LBF algorithm is important. Take 300 coflows as an example, Aalo needs 4974s to complete all coflows and A(I) needs 4753s (1.05 x faster), while EAalo needs 4262 seconds (1.17x faster). In addition, the result of EAalo in such a scenario is very close to the result of EA(OTB) (e.g. 4229 for 300 coflows), which demonstrates the good performance of LBF.

D. Datacenter Networks With Oversubscription

We also evaluate EAalo in a datacenter network with oversubscription. We use the topology shown in Fig. 3 (2: 1

60

lso -Aalo

e 40 -EAalo f= -LBF g 30 .p

~ 20 Co

~ 10

0

Coflow Number

(a) 128 Servers

Coflow Number

(b) 1024 Servers

Fig. 5. Computation time.

oversubscription rate). Fig. 4(c) shows the simulation results. EAalo has much better performance than Aalo in this scenario. The performance gap between A(I+II) and EAalo demon­strates the significance of abstract topology. Take 300 coflows as an example, Aalo needs 6818s to complete all coflows and A(I) needs 6797s (1.01 x faster) , A(I+II) needs 6284s (1.09 x faster), while EAlo needs 5734s (1.19x faster) which is very close to EA(OTB) (5617s).

E. Computation Time

Fig. 5 shows the computation time of Aalo coordination, EAalo coordination and flow scheduling with LBF (in a fat­tree topology [12]). All the computation time shown here is the value in the first cycle (with most coflows). EAalo coordination is always slightly slower than Aalo coordination, because there are more links in the abstract topology. Computation time of either coordination increases with coflow numbers notably. However, computation time of flow scheduling with LBF remains almost unchanged. The reason is that the number of flows allowed to send concurrently remains almost unchanged (small coflows may result in more concurrent active flows).

Page 6: EAalo: Enhanced Cofiow Scheduling Without Prior …netarchlab.tsinghua.edu.cn › ~junbi › ISCC-2017.pdfnetworks, networks without oversubscription and networks with 2:1 oversubscription

2017 IEEE Symposium on Computers and Communications (ISCC)

For hundreds of servers and hundreds of coflows at the same time, the computation time of EAalo coordination is shorter than lOOms, and flow scheduling time is further shorter. That means the coordination interval of EAalo can be smaller than lOOms (EAalo coordination and flow scheduling can be pipelined). However, if there are more servers and more coflows, the coordination interval needs to be hundreds of milliseconds or even longer. Therefore, in the large-scale cases, we should use distributed EAalo coordinators, each of which serves for a part of servers. We will explore distributed EAalo in the future.

V. R ELATED WORKS

Coflow scheduling receives much attention in recent years. Orchestra [2] and Baraat [3] proposes FIFO-based scheduling. Varys [4] proposes smallest-bottleneck-first and smallest-total­size-first scheduling. Besides, RAPIER [7] combines coflow scheduling and flow routing. All these solutions require prior knowledge. Aalo [5] is a coflow scheduling system without prior knowledge. The only used information is which coflow a flow belongs to. CODA [15] does not even use this information either, which uses machine learning to classify flows into coflows.

Traffic balancing is a challenging problem aiming to fully utilize the equivalent paths in the datacenter networks to avoid congestion. ECMP is widely used in current datacenters [9], [10], [13]. Many solutions are proposed to cope with the problem of ECMP. A centralized controller [11], [16] can be used for traffic balancing, which is suitable for large flows like those generated by data-intensive applications. There are other solutions that require specialized switches [17], [18], which is limited by the deployment cost. Besides, spraying-based solutions [19], [20] , [21], [22] can also be used. However, it is not friendly to asymmetric topology and strongly relies on reordering at end servers. If the network cannot be modified, TCP modifications like MPTCP [23] and FlowBender [24] should be used, at the cost of introducing much more CPU overhead and other side effects.

VI. CONCLUSION

In this paper, we show the performance loss of Aalo introduced by the absence of explicit flow rate control, ECMP traffic balancing and oversubscription. Correspondingly, three enhancements are proposed to improve Aalo: per-flow band­width enforcement, centralized traffic balancing with LBF and coordination based on abstract topology. We evaluate these enhancements with simulations. The simulation results suggest the three enhancements can speed up coflow completion by up to 11 %, 17% and 19% in non-blocking networks, networks without oversubscription and networks with 2: 1 oversubscrip­tion respectively.

ACKNOWLEDGMENT

This research is supported by the National Science Founda­tion of China (No.61472213) and sponsored by Huawei. Jun Bi is the corresponding author.

R EFERENCES

[1] M. Chowdhury and I. Stoica, "Coftow: A Networking Abstraction for Cluster Applications," in HalNets, 2012.

[2] M. Chowdhury, M. Zaharia, 1. Ma, M. I. Jordan, and I. Stoica, "Manag­ing Data Transfers in Computer Clusters with Orchestra," in SIGCOMM, 2011.

[3] F. R. Dogar, T. Karagiannis, H. Ballani , and A. Rowstron, "Decentralized Task-aware Scheduling for Data Center Networks," in SIGCOMM, 2014.

[4] M. Chowdhury, Y. Zhong, and I. Stoica, "Efficient Coftow Scheduling with Varys," in SIGCOMM, 2014.

[5] M. Chowdhury and l. Stoica, "Efficient Coftow Scheduling Without Prior Knowledge," in SIGCOMM, 2015.

[6] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and l. Stoica, "Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing," in NSDl, 2012.

[7] Y. Zhao, K. Chen, W. Bai , M. Yu, C. Tian , Y. Geng, Y. Zhang, D. Li, and S. Wang, "RAPIER: Integrating Routing and Scheduling for Coftow­aware Data Center Networks," in INFO COM, 2015.

[8] K. He, E. Rozner, K. Agarwal, Y. Gu, W. Felter, J. Carter, and A. Akella, " ACIDC TCP: Virtual Congestion Control Enforcement for Datacenter Networks;' in Proceedings af the ACM SIGCOMM, 2016.

[9] A. Singh, J. Ong, A. Agarwal, G. Anderson, A. Armistead, E. Bannon, S. Boving, G. Desai, B. Felderman, P. Germano, A. Kanagala, J. Provost, J. Simmons, E. Tanda, 1. Wanderer, U. Hlzle, S. Stuart, and A. Vahdat, "Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network," in SIGCOMM, 2015.

[10] c. Guo, L. Yuan, D. Xiang, Y. Dang, R. Huang, D. Maltz, Z. Liu, V. Wang, B. Pang, H. Chen, Z. Lin, and V. Kurien , "Pingmesh: A Large-Scale System for Data Center Network Latency Measurement and Analysis," in SIGCOMM, 2015.

[11] M. AI-Fares, S. Radhakrishnan, B. Raghavan, N. Huang, and A. Vahdat, "Hedera: Dynamic Flow Scheduling for Data Center Networks," in NSDl,201O.

[12] M. AI-Fares, A. Loukissas, and A. Vahdat, "A Scalable, Commodity Data Center Network Architecture," in SIGCOMM, 2008.

[13] "Introducing: Data Center Fabric, the Next-generation Facebook Data Center Network," hups://codeJacebook.comJposts/360346274145943/ introducing -data-center- fabric- the- next -generation- face book -data- center -network!, 2014.

[14] M. Chiesa, G. Kindler, and M. Schapira, "Traffic Engineering with Equal-Cost-Multipath: An Algorithmic Perspective," in INFO COM, 2014.

[15] H. Zhang, L. Chen, B. Yi , K. Chen, M. Chowdhury, and Y. Geng, "CODA: Toward Automatically Identifying and Scheduling Coflows in the Dark," in SIGCOMM, 2016.

[16] T. Benson, A. Anand, A. Akella, and M. Zhang, "MicroTE: Fine Grained Traffic Engineering for Data Centers," in CaNEXT,2011.

[17] M. Alizadeh, T. Edsall , S. Dharmapurikar, R. Vaidyanathan, K. Chu, A. Fingerhut, V. Lam, F. Matus, R. Pang, N. Yadav, and G. Varghese, "CONGA: Distributed Congestion-aware Load Balancing for Datacen­ters," in SIGCOMM, 2014.

[18] N. Katta, M. Hira, C. Kim, A. Sivaraman, and J. Rexford, "Hula: Scalable Load Balancing Using Programmable Data Planes," in SOSR, 2016.

[19] D. Zats, T. Das, P. Mohan, D. Borthakur, and R. Katz, "DeTail: Reducing the Flow Completion Time Tail in Datacenter Networks," in SIGCOMM, 2012.

[20] J. Cao, R. Xi a, P. Yang, C. Guo, G. Lu, L. Yuan, Y. Zheng, H. Wu, Y. Xiong, and D. Maltz, "Per-packet Load-balanced, Low-latency Rout­ing for Clos-based Data Center Networks," in CaNEXT, 2013.

[21] S. Ghorbani, B. Godfrey, Y. Ganjali, and A. Firoozshahian , "Micro Load Balancing in Data Centers with DRILL," in HatNets, 2015.

[22] K. He, E. Rozner, K. Agarwal, W. Felter, 1. Carter, and A. Akella, "Presto: Edge-based Load Balancing for Fast Datacenter Networks," in SIGCOMM, 2015.

[23] c. Raiciu, S. Barre, C. Pluntke, A. Greenhalgh, D. Wischik, and M. Handley, " Improving Datacenter Performance and Robustness with Multipath TCP," in SIGCOMM, 2011.

[24] A. Kabbani, B. Vamanan, J. Hasan, and F. Duchene, "FlowBender: Flow-level Adaptive Routing for Improved Latency and Throughput in Datacenter Networks," in CaNEXT, 2014.


Recommended