Date post: | 26-Dec-2015 |
Category: |
Documents |
Upload: | gregory-burke |
View: | 213 times |
Download: | 0 times |
Load Balancing Tasks with Overlapping Requirements
Milan VojnovicMicrosoft Research
Joint work with Dan Alistarh, Christos Gkantsidis, Jennifer Iglesias, Bo Zong
2
Motivating Application Scenario: Stream Processing Platforms
3
Tasks and Requirements
4
5
Problem #1: Bi-Criteria Load Balancing
Query Assignment Problem:
• Find an assignment of tasks to machines that
Criteria 1: minimizes the total number of distinct requirements that need to be supplied to machines
Criteria 2: the number of tasks assigned over machines is balanced
6
Problem #2: Min-Max Load Balancing
Query Assignment Problem:
• Find an assignment of tasks to machines that minimizes the maximum number of distinct requirements needed by a machine
7
Other Motivating Application Scenarios• Scheduling tasks in distributed clusters of machines with data locality
• …
• Beyond resource allocation in data centres:
• Clustering of information objects (documents, images, videos)
• Summarizing topics for collections of documents
• …
8
Related Work
Standard load balancing• Identical machines Graham-1996• Related machines Aspnes et al-1993, Cho and Sahni-
1988• Restricted machines Azar et al-1992• Unrelated machines Aspnes et al-1993• Routing Aspnes et al-1993
Min-max multiway cut Bansal et al-2014Svitkina and Tardos 2004
9
Problem #1: Bi-Criteria Load Balancing
Minimize
subject to
for
set of requirements set of tasks 𝑓 (𝑄′ )=∑
𝑠∈𝑆
𝑤 (𝑠 )1 (𝑠 requiredby some 𝑞∈𝑄 ′)𝑆𝑞⊆𝑆 , for every q∈𝑄
10
NP Hardness
• Query Assignment Problem is NP-complete
Proof: Reduction from the well known bin packing problem
11
Random Query Assignment
• Maximum number of tasks per machine:
with probability
[Raab and Steger, 1998]
• The expected number of requirements needed by the machines:
= number of tasks needing requirement
12
Deficiency of Random Query Assignment
𝑛/ 𝑙
𝑛/ 𝑙
𝑛/ 𝑙
𝑚/ 𝑙
𝑚/ 𝑙
𝑚/ 𝑙
• Expected number of needed requirements:
as
• Optimal:
13
Special Case: Tasks with Singleton Requirements
• There exists a polynomial-time algorithm that guarantees 2-approximation for singleton task requirements with arbitrary weights
14
Algorithm
15
Tasks with Arbitrary Sets of Requirements• For unit-weight requirements, there exists a polynomial algorithm
with approximation ratio
where is maximum number of requirements of a task
• For arbitrary-weight requirements, the same approximation ratio holds but with an extra factor: the ratio of the max to the min weight
16
Gadget: Minimum Task Type Packing
• Given a set of requirements , a set of tasks , and a real number • Find a subset of query types that minimizes
subject to
17
Algorithm
1. Pick an empty machine2. Find a subset of query types that approximately solves MQP problem
with parameter
3. Let be the subset of unassigned queries of type in 4. If then apply a pruning procedure5. If there are unassigned queries, go to 1
18
Experimental Evaluation
• Random bipartite graph for subscriptions of tasks to requirements• Number of tasks per requirement according to a Zipf distribution ()• Number of requirements per task fixed to a constant
• Metric: replication factor
= total number of needed requirements / m
19
Offline Algorithms
• MQP = defined in an earlier slide• OffRand = uniform random assignment of a query type to a machine• IC = Incremental cost• MMS = Min-max traffic cost per machine
20
Performance of Offline Algorithms
Number of requirements per task
21
Online Task Assignment
• LeastCost
• LeastSource
• LeastQT
22
Performance of Online Algorithms
Number of requirements per task
23
Problem #2: Min-Max Load Balancing
Minimize
subject to
24
Online Task Assignment
• At each arrival of task
• Compute for every
• Assign task to machine in
25
Hidden Co-Clustering Input
26
Recovery Theorem
• Suppose and
There exists an online assignment of tasks that guarantees asymptotic recovery of hidden clusters
Proof: coupling to a Polya’s urn process
Asymptotic recovery: portion of tasks from the same hidden cluster of tasks that is assigned to the same bin goes to 1 for asymptotically large number of tasks
27
Experimental Evaluation
• Dataset
• Greedy• Random = random task arrival• Decreasing with respect to the number of requirements
• Balance big = large tasks to least loaded, small items according to greedy• Prefer big = large tasks to least loaded, delayed assignment of up to a fixed number of
small tasks
28
Retail dataset
29
Conclusion
• Studied two variants of non-standard load balancing problems• Bi-criteria and min-max
• Approximation ratios for offline problems• Hidden clustering recovery conditions for a simple greedy online task
assignment strategy• Open questions:• Tighter approximation ratios for offline versions of both problems?• Similar hidden cluster recover questions (allowing for more memory)?