SICE UNIVERSITY
NODDLE ASSIGNMENT IN DISTRIBUTED SYSTEMS
by
MI LU
A THESIS SUBMITTED
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE
APPROVED. THESIS COMMITTEE
James B. Sinclair,
Assistant Professor of Computer Science in
tne Department of Electrical Engineering, Chairman Z"*'
r wests
e u
Peter Varman,
Assistant Professor of Computer Science in the Department of Electrical Engineering
J.^Jcooert Jump, Professor of Computer Science in the Department of Electrical Engineering
HOUSTON. TEXAS
APRIL 1984
3 1272 00289 0059
ABSTRACT
The problem of finding an optimal assignment of a modular program
for n processors in a distributed system is studied. We characterize
the distributed programs by Stone's graph model» and attempt to
find an assignment of modules to processors which minimizes the sum of
module execution costs and intermodule communication costs. The prob¬
lem is NP-complete for more than three processors. We first show how to
identify all modules which must be assigned to a particular processor
under any optimal assignment. This usually results in a significant
reduction in the complexity of the optimal assignment problem. We also
present a heuristic algorithm for finding assignments and experimentally
verify that it almost always finds an optimal assignment.
ii
ACKNOWLEDGEMENT
I sincerely thank Dr. J. h, Sinclair, my advisor, for the chance he
gave me to study under his guidance, for his continual encouragement,
support, enlightenment and advice in my research work and all aspects of
graduate study,
I am grateful to Dr, J, R« Jump and Dr, P, Varman for serving on my
thesis committee,
I would like to extend a special thanks to Rick Covington for his
friendship and help,
I want to express my appreciation to all the professors and facul¬
ties of Electrical Engineering Department for making my study and stay
at Rice University a very valuable and pleasurable experience.
This research is supported principally by the National Science
Foundation under Grant MCS80-04107
iii
TABLE OF CONTENTS
I. INTRODUCTION 1
II. DISTRIBUTED SYSTEMS MODEL 5
III. A SURVEY OF THE RECENT RESEARCH 9
IV. ASSIGNMENT GRAPHS 21
4.1 Assignment Graphs for Two-Processors Systems 21
4.2 Assignment Graphs for Systems with
More than Two Processors 28
V. SEARCH SPACE REDUCTION 33
5.1 The Reduction Theorem 33
5.2 Assignment Problem Simplification by
P-reductions 44
VI. HEURISTIC ALGORITHM FOR MULTIPROCESSOR ASSIGNMENT 53
6.1 G-H Tree 53
6.2 Heuristic Algorithm to Find Near-Optimal
n-Processor Assignment 61
6.3 Experimental Results 67
VII. SUMMARY AND CONCLUSIONS 70
7.1 Summary of Results 70
7.2 Suggestions for Future Research 72
REFERENCES 77
APPENDIX 80
CHAPTER I
INTRODUCTION
As the field of computer networking matures and becomes more
sophisticated» distributed processing has received considerable atten¬
tion. By '’distributed" we mean that a computation can simultaneously
execute on different processors in the system. The modularity» rlexi-
bility» and reliability of distributed processing makes it attractive to
many types of users» and several distributed processing systems have
been designed and implemented in recent years. Distributed processing
applications range from large data base installations where processing
load is distributed for organizational efficiency to high-speed signal
processing systems where extremely fast processing must be performed in
a real-time environment.
A distributed system has several processors and a communication
subnetwork. Processors exchange data and control information through
the communication subnetwork. A program may be partitioned into several
parts called modules. Modules can be executed on the same processor or
may be assigned to different processors. A problem of increasing
interest is to determine how to assign the modules of a program to the
various processors in order to optimize the performance of the system»
1
2
according to some appropriate performance measure*
Executing a module of a program has associated with it a cost of
execution for each processor in the system. It may be advantageous to
execute a module on a particular processor because of the availability
of a faster arithmetic component» a particular data base» higher speed
memory or other resources.
Modules may exchange information during program execution. If two
modules which communicate with one another are assigned to different
processors» the information that they exchange will be transmitted over
the communication subnetwork» and the transmission will incur a communi¬
cation cost. An optimal assignment minimizes the sum of the module exe¬
cution costs and intermodule communication costs.
When the system consists of only two processors» we can efficiently
find one optimal assignment by the construction of an assignment graph
and the application of a max-flow min-cut algorithm[l]. Extension of
this approach to three or more processors does not appear to be feasi¬
ble. Previous research shows that the efficient solution of finding an
absolute optimal assignment of program modules to n (n>3) processors
does not exist[2]. Since many real systems of interest have more than
two processors» and the number of possible assignments grows exponen¬
tially with the number of modules» it is important to be able to effi¬
ciently select an assignment that has a cost that is nearly minimal» at
3
least most of the time* In this thesis» we examine the problem of
reducing the searching space for optimal assignment and efficiently
finding near optimal (near minimum cost) assignment* Ve can find an
algorithm by which the complexity of the assignment problem is greatly
reduced* We can also present an algorithm which solves the problem for
near-optimal solution. Experimental results show that the error of the
heuristic solution compared to the real-optimal solution is quite small*
The remainder of the thesis consists of six parts. Chapter II
explains the model of a distributed system that we will use» including
the relevant aspects of the processors» communication subnetwork» and
programs. We also give a precise definition of the optimal assignment
problem.
Chapter III presents a brief survey of previous research in the
general problem of optimal assignments in distributed systems* We
describe a method for solving the assignment problem for two processors
which provides a basis for the approach we use for the n>2 processor
case» and we outline previous work dealing with systems containing more
than two processors* This work can be categorized into a few
approaches: one of them is based on an assignment graph models and
another on shortest path approaches.
Chapter IV describes many of the assumptions and definitions used
in the remainder of this thesis* It also describes the construction of
4
the assignment graphs for both two-processor systems and for systems
with more than two processors.
In Chapter V we present a reduction theorem which permits simplifi¬
cation of the optimal assignment problem in many cases. We give exam¬
ples of the application of this result. These examples illustrate the
effectiveness of this approach in reducing the complexity of finding
solutions to the optimal assignment problem for n>2-processor systems.
Even with the aid of the reduction theorem, we cannot guarantee
that the number of possible assignments can be made small enough to
allow an exhaustive search for the optimal assignment. In Chapter VI we
present a heuristic algorithm for finding an assignment when the number
of processors is more than 2. Based on the idea of a Gomory-Hu
tree[3], the algorithm efficiently finds assignments which are experi¬
mentally shown to be almost always optimal, and nearly optimal in the
remaining cases.
Chapter VII presents a summary of our main results. We also dis¬
cuss some potentially rewarding extensions to the methods we developed.
CHAPTER II
DISTRIBUTED SYSTEMS MODEL
In this chapter we describe a model tor a distributed system. We
give more precise definitions to the program module# assignment problem
and cost function in the distributed system.
In our model a distributed system contains several programmable
processors# and a single program that can be distributed over two or
more processors for execution. The site of program activity shifts from
one processor to the other dynamically in time as the program being exe¬
cuted.
A single program consists of a set of modules. The modules should
be viewed as program segments which either contain executable instruc¬
tions plus local data or contain global data accessible to other seg¬
ments. The modules are free to be assigned to any one of the proces¬
sors# taking advantage of specific efficiencies of some processors in
executing some program modules. The modules executed on different pro¬
cessors may communicate with one another during an execution of the pro¬
gram.
5
6
Some modules have a fixed assignment because these modules depend
on capabilities or resources that are available at only one processor.
The facilities might be high-speed arithmetic unit, access to a particu¬
lar data base, the need for a large high-speed memory, access to a
specific peripneral device, or any other facility that is associated
with only one processor.
Each processor is an independent computer with full memory, con¬
trol. and arithmetic capability. Two processors may both be multipro-
grammed. The processors are connected through a communication subnet¬
work to create a distributed processing system.
Each assignment of modules to processors has an associated cost.
In our model, the total assignment cost has two components. The first
component is the cost associated with module executions. It is assumed,
in general, to depend on the processor to which that module is assigned
and the amount of computation performed by that module. The second com¬
ponent is the cost of intermodule communication between pairs of modules
when they are resident on different processors and transmit information
through the communication subnetwork. Ve assume that the cost of an
intermodule transfer is zero when the transfer is made between two
modules assigned to the same computer.
The cost of an assignment of modules to processors is the sum of
the module execution costs for all modules plus the sum of intermodule
7
communication costs for all pairs of modules that are not coresident.
An optimal assignment for the program is an assignment with minimum
cost. The assignment problem is to find an assignment of modules to
processors that minimizes total cost. Modules tend to be assigned to
processors on which their execution costs are lower, while at the same
time pairs of modules which communicate heavily tend to be assigned to
the same processor.
The cost function can be either the program's elapsed run time or a
dollar cost. If the cost function is elapsed run time, we will insist
that program execution be serial; that is. even though there are several
computers in the system, two processors may not concurrently execute
modules of the same program. Program activity can shift bach and forth
between different computers, but at any given time only one module of a
given program is in execution.
We can define the cost function in precise terms. Let an assign¬
ment A: M -> P be a function that maps the set of program modules M into
the set of processors P on which the modules may be executed. If (i.h)
is in A, then module i will be executed on processor h. e. . is the i» h
cost of executing module i on processor h. If module i communicates
with module j and A(i) + A(j), they incur a communication cost t. .. i* J
Suppose there are m modules in the program. The total execution cost E
is given by
8
E “ 2 ei h i=l ith
The total communication cost T is given by
m m 1 = 5 5
i j=i+l i.j
A(i)M(j)
Thus the total assignment cost C is defined by
C = E + T
The assignment problem is to find an assignment
assignments such that
opt from all possible k
C(A .) = min( CU,), C(A.), .... C(A.) ) opt 12 k
In the next chapter, we describe previous research on solving the
assignment problems. These results deal with finding an optimal assign¬
ment in a two-processor system as well as in an n>2-processor system.
CHAPTER III
A SURVEY OF THE RECENT RESEARCH
In this chapter ve present a survey of previous research on the
problem of finding optimal assignments in distributed systems. We first
describe the method of solving the assignment problem for two proces¬
sors. as well as certain extensions based on this method. Then we dis¬
cuss the published results dealing with finding optimal or near optimal
assignments in systems with n>2 processors.
The first results in this area dealt with two-processor systems and
were based on a graph model due to Stoned]. This model was later
extended in several respects. Stone[4], Sinclair[5]. and Gusfield[6]
studied the problem of varying system parameters that influence assign¬
ment costs. These parameters include the load on one.or both processors
and the channel traffic in a broadcast communication subnetwork. Rao,
Stone and Hu[7], Gonsalves[8] and Sinclair[9] used the same graph model
in attempting to solve the optimal assignment problem with the con¬
straints of limited memory in one of two-processor distributed system or
limited bandwidth in a broadcast network. SinclairdO] and Stone and
Bokhari[2] solved the dynamic assignment problem for two processors.
9
10
under two somewhat different cost criteria. They also considered the
case of more than two processors, and used shortest path algorithms to
find optimal assignments.
Stone made use of a max-flow min-cut algorithm in finding the
optimal program module-to-processor assignments to minimize the cost of
executing programs in a distributed fashion[l]. He showed that the
modules of a program may be assigned to the processors of a distributed
computer system so as to minimize the overall cost including module exe¬
cution cost and intermodule communication cost. The former is the cost
of running the individual modules on their assigned processors and the
latter is the cost of communicating between modules assigned to dif¬
ferent processors. He constructed a graph model for the two-processor
problem which represents processors and modules as nodes and the module
execution cost and intermodule communication costs as weights on the
undirected edges connecting the nodes of the graph. He showed that a
minimal weight cutset of edges which disconnects the graph into two sub¬
graphs corresponds an optimal assignment of program modules, where the
weight of the cutset is the sum of the weights carried by all the edges
in the cutset. Treating the two-processor assignment problem as a com¬
modity llow problem with suitable modifications, he found an optimal
assignment by using a max-flow min-cut algorithm. He also described the
construction of a processor tlow graph for the n-processor problem but
was unable to efficiently solve for an n-processor optimal assignment,
when n>2. Stone's algorithm is presented in detail in the next chapter.
11
Stone also examined the sequence of static optimal assignments
found as the load on one processor is held fixed and the load on the
other is varied» and proved the existence of a critical load factor for
each program modnle[4]. He assumed that in a two-processor system the
modules of a distributed program are free to move from processor to pro¬
cessor during the course of a computation. A module can be specified
prior to the start of execution as being located on one or the other of
the processors, and can be reassigned without any penalty after program
execution has begun. He found that for every program module M there
exists a critical load factor f^ such that when the load on the proces¬
sor with variable load is below f^, H is assigned to that processor by
an optimal assignment, and when the load on that processor is above f^.
the optimal assignment places M on the other processor. Thus, for sys¬
tems in which a single processor load factor is the only parameter that
varies, one can dynamically assign modules in an optimal fashion by cal¬
culating all critical load factors ahead of time and comparing the com¬
puted load factors against the actual load factors experienced at run
time. This opens the possibility of doing optimal assignments in real¬
time.
Sinclair analyzed the problem of processors connected to a broad¬
cast communication channel[5]. Communication costs are assumed to be a
function of the amount of information to be transmitted and the loading
on the network which causes access delays. These delays are caused by
the total communication traffic in the system. Sinclair was able to
12
find a minimal sequence of optimal assignments for a given program as
the average access delay increases from 0. The vaines of average delays
at which optimal assignments must change are called critical delays.
The algorithm requires 0(q) optimal assignment computations, where q is
the number of critical delays. There are no restrictions on the number
of processors, but the difficulty of tinding an optimal assignment for
more than two processors limits the algorithm's utility, except in those
cases referred to belcw for which efficient solutions to the optimal
assignment problem for more than two processors are known.
Gusfield recently applied a general method called parametric com¬
binatorial computing to the problem of finding optimal assignment in
two-processor systems when the loads on processors are independent and
both may vary[6]. This method is applied to efficiently find optimal
assignments for all possible combinations of processor loads,
represented by two time-varying parameters X^ and X^. The solutions are
represented by the faces of a convex polygon over the bounded (X^, X^)
plane, and the polygon can be constructed in O(n^) time, where n is the
number of modules. He also showed that for the two-processor problem
with the load on one processor fixed and the load on the other varying
as a function of the parameter X, the same method can be used to find a
minimal set of optimal assignments covering the entire bounded X line,
and the costs of these assignment form a piecewise linear concave func¬
tion of X with each linear segment corresponding to an optimal assign¬
ment
13
Rao, Stone and Hu considered the problem of minimum cost assignment
of program modules to two processors when one processor has limited
memory[7]. They first constructed a Gomory-Hu tree based on network
flow theory. A Gomory-Hu tree generated from an n-node network has n
nodes and n-1 edges. The value on each edge is found by solving a flow
problem in a network equal to or smaller in size than the original» and
the Gomory-Hu tree has the property that it is tlow-equivalent to the
original graph. (Two networks with the same node set N but different
edges and/or edge weights are said to be flow equivalent if for any pair
of nodes z and y in N, the maximum flow between z and y are the same in
both networks.) The maximum flow between two nodes z and y in an
undirected tree is just the weight of the minimum weighted edge in the
(unique) path between z and y. Furthermore» the Gomory-Hu tree has the
property that if P and P are the sets of nodes reachable from z and y»
respectively» in the tree when the edge with minimum weight in the path
from z to y is removed» the P and P are the nodes reachable from z and y
in the original graph if we remove all of the edges in some minimal
weight cutset separating z and y. This means that we can find an
optimal two-processor assignment in the Gomory-Hu tree. The Gomory-Hu
tree can be used to obtain some information about modules which should
be assigned as a group in a memory constrained system» as well as res¬
trictions on the assignment options.
Rao» et. al.» also constructed an inclusive cuts graph which speci¬
fies a partial order for searching for a minimum feasible cutset. A
14
feasible cutset is one which corresponds to an assignment satisfying the
memory constraint, and a minimum feasible cntset is a feasible cutset
with minimum weight. Use of the Gomory-Hn tree or the inclusive cut
graph, which corresponds to a minimum cost assignment satisfying the
memory constraint, can lead to substantial reductions of the size of the
problem.
Gonsalves studied the same problem in his thesis using heuristic
approaches[8]. He presented two polynomial-time algorithms for two-
processor scheduling with limited memory, and demonstrated experimen¬
tally that they can be successfully applied to this problem. He also
showed the reduction in complexity possible using the inclusive cut
graph for constant degree graphs.
Sinclair considered the problem of finding an optimal (minimum
cost) assignment for a distributed task in a computer network with a
broadcast communication subnetwork[9]. For broadcast networks in which
the channel is a critical resource, the optimization of a distributed
task assignment should consider not only the total task cost but the
cost in terms of channel utilization as well. He described a method for
determining an optimal assignment to two processors with minimum channel
utilization. The desired assignment with minimum communication cost can
be found efficiently for the two-processor case by performing a reduc¬
tion of the processor flow graph, relabeling the edges of the reduced
graph and applying a max-flow min-cut algorithm to the reduced processor
15
flow graph. He also considered the case when more than two processors
are involved and the module intercommunication graph is a tree. He
described an algorithm for constructing an assignment graph and reducing
it to a graph in which each spanning tree corresponds to an minimum cost
assignment. After reassigning weights to the nodes of this graph, one
can apply a dynamic programming algorithm to find a minimum weight span¬
ning tree that corresponds to a minimum cost assignment with minimum
transmission cost.
Sinclair also considered the possibility of dynamic reassignment of
modules during program execution in distributed processing systems both
for two-processor and for system with more than two processors[10].
Modules may migrate from one processor to another during program execu¬
tion but incur a reassignment cost in doing so. An optimal dynamic
assignment minimizes the sum of module execution, interprocessor commun¬
ications. and module reassignment costs for the program execution.
Using an extension to Stone's original work, he was able to efficiently
solve the two-processor problem. He showed an algorithm to find an
optimal solution for the general problem with time and space complexi¬
ties exponential in the number of program modules, and pointed out that
pruning the dynamic assignment tree as it is being created results in an
average space complexity much less than worst case. He also described a
modification of this algorithm which emphasizes its equivalence to a
shortest path algorithm and which can greatly reduce the number of com¬
putations by taking advantage of "localities" in the sequence of module
16
executions. Bokhari also considered reassignment costs as well as
residence costs in a somewhat different model and was able to solve this
problem for the case of two processors.
Stone and Bokhari summarized a number of theoretical results that
point the way toward the control of distributed computer systems[2].
They presented an extension of Stone's two-processor solution to the
three-processor case. They described the construction of a processor
flow graph tor the three-processor problem, and discussed an algorithm
for finding the minimum tricutset in a three-processor assignment graph,
which corresponds to a minimum cost three-processor assignment.
Another approach to solve the module assignment problem which does
not involve the use of processor assignment graphs is based on a shor¬
test path algorithm. Bokhari demonstrated that the n-processor assign¬
ment problem has an efficient solution when the calls graph of the pro¬
gram is tree-structured[ll]. He constructed an assignment graph such
that each assignment of modules to processors corresponds to a subset of
nodes and edges of the graph constituting a tree (called an assignment
tree). The weight of each assignment tree equals the cost of the
corresponding assignment. The shortest tree algorithm finds a minimum
cost assignment by finding the minimum weight assignment tree in the
assignment graph, where the weight of an assignment tree is the sum of
the node weights in the tree. This allows us to minimize the sum of
execution and communication costs for distributed systems with arbitrary
17
numbers of processors. For m modules and n processors, the time com-
2 plexity of the algorithm is 0(mn ).
Sinclair used Bokhari's method in finding the critical delays in
programs with tree-structured calls graphs[S] and also in finding
optimal assignments with minimum communication cost for broadcast chan¬
nels^] .
Bryant and Agre used queueing theory to compute the cost of an
assignment when congestion delays are included in the total cost[12].
Other researchers used different formulations and methods for solving
the module assignment problem. Chu» et. al.[13]» described a model with
additional real time constraints and several means of solving these con¬
strained optimization problems. They used a heuristic assignment algo¬
rithm to find a "good" assignment. Chou and Abraham used Markov deci¬
sion theory to solve the assignment problem under a different set of
constraints.
Bryant and Agre discussed in their paper an alternative approach to
modeling the execution of a set of distributed programs» using a closed»
multiclass queueing network[12]. This model allows representation of
congestion factors such as mean queue size or average response time in
the cost criterion function. The cost function used in their model is
the sum of the response ratios (response time divided by requested pro¬
cessor time) across all distributed programs on the system. Solving the
18
model allows one to obtain estimates of program execution time that
include queueing and communications delay. This approach has the advan¬
tage that the execution cost can be expressed in terms of performance
measures of the system such as response time. Additionally, they intro¬
duced an interchange heuristic search as a method of finding a good
module allocation. The complexity of the resulting algorithm is 0(M K
(K + N) C) where M is the number of modules. K is the number of sites in
the network, N is the number of communications processors, and C is the
number of distributed program types.
Chu, et. al.[13], discussed Stone's model and the deficiencies
dealing with no load balancing. They described a 0-1 integer program¬
ming approach which can be used with real-time and/or memory con¬
straints. They pointed out that this approach also fails to accurately
account for real time constraints when module precedence relations are
considered. Consequently, they generated a heuristic algorithm based on
work by Gylys[14] which can be used for module allocation subject to the
real-time and memory constraints, and they proposed replacing the real¬
time constraint with a load-balancing constraint. The approach involves
"fusing" together modules to get an initial assignment and then checking
to see if the assignment satisfies the load-balancing requirements and
memory constraints. If not, a heuristic phase moves some modules from
one processor to another to improve load balancing while meeting memory
constraints. They also discussed the estimation of intermodule inter¬
communication costs. They treated the modules as tasks waiting in the
19
queue to be allocated to the processors. The maximum system performance
is measured by throughput. Maximum throughput is achieved by load
balancing which tries to distribute modules as much as possible, but
overhead due to interprocessor communication drives the allocation stra¬
tegy to cluster modules to as few processors as possible.
Chou and Abraham presented an algorithm that is more general and
applicable to n-processor systems for making an optimal module to pro¬
cessor assignment for a given performance criteria[15]. Their model
includes a set of tasks and an operational precedence relationship among
the tasks. It allows the description of probabilistic branching and
concurrent execution in a distributed program. The algorithm is based on
the analysis of a semi-Markov process with rewards.
We are interested in finding optimal or near-optimal assignments in
systems with more than two processors. Previous work in this area, as
described above, either only considered programs with specific proper¬
ties (i.e., Bokhari's results for programs with tree-structured calls
graphs) or relied on methods such as interchange heuristics to obtain
"good" if not optimal assignments. This thesis is concerned with using
the extension of Stone's graph model to n>2 processor as a basis for
developing methods for dealing more effectively and efficiently with the
general n-processor assignment problem. In Chapter 5 we show that the
complexity of the n-processor assignment problem can often be signifi¬
cantly reduced, and in Chapter 6 we offer a heuristic algorithm for
20
finding near optimal assignments based on a strnctnre called an affinity
tree, which is related to a Gomory-Hn tree. To provide the necessary
background for both of these results we present in Chapter 4 Stone's
graph model and its solution in detail.
CHAPTER IV
ASSIGNMENT GRAPHS
In this chapter, we introduce assignment graphs and describe their
use in finding optimal assignments. We describe assignment graphs for
two-processor model, and then we extend them to more than two proces¬
sors. We also present several definitions and assumptions that are
relevant to the remainder of this paper.
4.1. Assignment Graphs for Two-Processor Systems
The assignment graph is a graphical representation of the program
model originally developed by Stone[1]. An assignment graph is an con¬
nected undirected graph consisting of a set of nodes and a set of
weighted, undirected edges. Every program module is represented by a
node, as is each processor. The edges are assigned edge weights, indi¬
cated by the numeric labels on the edges in the graph.
The edges between module nodes represent intermodule communication
patterns, and the weight of an edge between two module nodes represents
cost of communicating between the associated modules when the modules
are not coresident on the same computer. Recall that intermodule
21
22
communication costs between a specified pair of modules are assumed to
be zero when the pair of modules are coresident. Each module node is
labeled with the name of its associated program module.
Let the two processors be called and P^, and let the nodes in
the assignment graph associated with the processors also be labeled P^
and P^. The assignment graph has edges from each module node M to both
nodes P^ and P^. The weight of an edge between module node M and node
P^ is the cost of executing module M on processor P^. Similarly, the
weight of the edge between node M and node P^ is the cost of executing M
on P^.
Fig. 1 is an example of assignment graph for two processors. The
node P^ and P^ represent the two processors. Nodes A through F
represent modules. Ve assume that the costs for executing each module
on either processor are known. For each pair of modules, the cost of
communication between them should they not be coresident is also assumed
to be known. The communication cost between coresident modules is
ignored. The communication costs are normally given in units of time or
dollars, as are the execution costs. Ve see from the graph that some
modules run faster on processor 1 (for example. A), and some run faster
on processor 2 (for example, D). The symbol °» indicates an infinite
cost. An edge (M, P.) labeled with *® indicates that the module M must
be executed on processor P..
23
Figure 1. An assignment graph, and a cutset that determines a module assignment*
24
A cutset in a graph is defined to be a collection of edges such
that 1) when removed from the graph, node is disconnected from node
Pj» and 2) no proper subset of a cutset is also a cutset. The edges
crossed by the bold line in Fig. 1 form a cutset.
A cutset partitions the nodes of the graph into two disjoint sub¬
sets such that the nodes in one subset are connected to P^ and the nodes
in the other subset are connected to P^. Stone showed that each cutset
in the assignment graph corresponds to a module assignment, and every
assignment is represented by a cutsetlll.
The weight of a cutset is the sum of the weights of the edges in
the cutset. It is equal to the cost of the corresponding module assign¬
ment since the weight of a cutset is the sum of the module execution and
intermodule communication costs for that assignment. Note in Fig. 1
that if a module M is assigned to P^ then the edge (M, P^) is cut. but
the weight of this edge is the cost of executing H on P^. Clearly, for
each module M, either (H, P^) or (M, P^) must be in a cutset since oth¬
erwise P^ would not be disconnected from P^ by the cutset's removal. It
is easy to show that both (M, P^) and (H, P^) cannot be in the cutset,
and so each cutset will uniquely define an assignment, with the weight
of the cutset including the sum of the module execution costs for that
assignment. If two module nodes M and N are assigned to different pro¬
cessors by the cutset, then any edge between M and N must be in the
cutset and its weight included in the weight of this cutset. But this
25
weight is just the cost of communicating between M and N, assuming they
are assigned to different processors. If M and N are assigned to the
same processor. (M, N) cannot be in the cutset, in agreement with the
assumption that coresident modules have zero communication cost.
An optimal assignment corresponds to a minimum weight cutset of the
assignment graph. It follows that an optimal assignment may be obtained
by finding a minimum weight cutset in the graph. This may be done using
network flow algorithms.
A flow network consists of a set of nodes and a set of edges con¬
necting these nodes. We assume that the number of nodes and edges is
finite. For convenience we rule out the possibility of having an edge
forming a self-loop. We use A^ to indicate the edge leading from node
i to node j. Then a network is connected if for every partitioning of
the nodes of the network into subsets X and X there is either an edge
A. . or A. . with i € X and j € X. Every edge A. . have an associated lj ji J 13
positive integer b^ called the capacity of the edge.
A flow network has two special nodes. One is called the source,
denoted by s. and one is called the sink, denoted by t. An appropriate
analogy is a water pipeline system. The edges represent pipelines, the
source is the inlet for the water, the sink is the outlet of the water,
and all other nodes are junctions between pipelines. The capacity of
each edge is maximum volume per unit time of the pipeline. With such a
26
pipeline system, we are interested in the maximum flow that can be put
through it from the source to the sink.
For a given flow network, a set of nonnegative integers x^ is
called a flow in a network if they satisfy the following constraints:
jk
-v if j=s, 0 if j^s, t, v if j=t,
(4.1)
0 - xij - bij (for all nodes i, j). (4.2)
The v which appears in (4.1) is a nonnegative number called the value of
the flow. Note that (4.1) expresses the fact that flow is conserved at
every node except the source and the sink. Constraint (4.2) means that
the edge flow x^ is always bounded by the capacity of the edge b^,
If the network is a simple path from s to t, then the maximum
amount of flow that can be put through the network is obviously limited
by the edge with the minimum capacity of all the edges in the path. An
edge with minimum capacity is a bottleneck of the network. Ye shall now
define the general notion of a bottleneck in an arbitrary network. A
cutset is denoted by (X, X), where X is a subset of nodes of the network
and X is its complement. The capacity or value of a cutset (X, X),
denoted by c(X, X), is 5b,, for all i € X and j € X.
i.j J
Clearly, due to the constraints (4.1) and (4.2) the maximum flow
value v is less than or equal to the capacity of any cutset separating s
27
and t. Less obvious is that the maximum flow vaine is always equal to
the minimum capacity of all cntsets separating s and t. A cutset
separating s and t with minimum capacity is called a minimum cutset*
This result is called the Max-Flow Min-Cut Theorem, due to Ford and
Fulkerson[16].
Max-Flow Min-Cut Theorem
For any flow network the maximal flow value from the source to the
sink is equal to the capacity of a minimum cutset separating the source
and the sink.
The proof of Max-Flow Min-Cut Algorithm involved showing that if a
minimum cutset were not saturated (capacity = flow), then there must
exist a path from the source to the sink through which the flow between
s and t could be augmented. Ford and Fulkerson used this flow augmenta¬
tion approach to develop the first Max-Flow Min-Cut algorithm. Edmonds
and Karp improved the algorithm by always searching for a flow augment¬
ing path with a breadth-first search and were able to show a time com¬
plexity of O(n^) where n is the number of nodes. Later algorithms by
Dinic and Earzanov, Evens, Malhotra, et. al., and others, were success-
3 ful in reducing the time complexity to 0(n ).
Ve can view the assignment graph as a flow network with source
and sink P^. The edge weights are interpreted as flow capacities, and
we apply a max-flow min-cut algorithm to compute a maximum flow in the
28
network. The value of this flow is the cost of a minimum cost assign¬
ment of the program. By establishing a maximum flow in the network, we
can also find a minimum cutset of the assignment graph by identifying
all nodes reachable from by paths consisting only of unsaturated
edges. Since every minimum cutset of the assignment graph defines an
optimal assignment, we can solve the optimal assignment problem for the
3 two-processor case in 0(m ) time where m is the number of program
modules.
4.2,. Assignment Graphs for Systems with More than Two Processors
This section describes how to construct a graphic model for the n>2
processor problem. We will show suitable generalizations of the notion
of an assignment graph and the procedure for constructing such a graph.
This generalization is described in [1].
An n-processor assignment graph for a program of m modules consists
of a set of n+m nodes and a set of undirected, weighted edges. Each
module M has an associated module node labeled M, and each processor P^
has an associated processor node labeled P^. As before, the edge
between two module nodes M and N represent information transfer between
modules M and N, and the weight of the edge is the cost of communication
between M and N, assuming they are not coresident on the same processor.
29
As same as in the two-processor systems, for the graph model first
described by Stone and Bokhari[2] the edges between two modale nodes
represent intermodule communication patterns, and the numbers on the
branches represent the cost of intermodule communication between modules
which are not coresident on the same computer. It is assumed to be zero
for the intermodule communication costs between two coresident modules.
Similarly, every module node M is connected to each processor node
1 <. i <. n. by an edge (M, P^) with a weight which is derived from M's
execution costs. Suppose that the cost of executing a module M on pro¬
cessor P.. i = 1, 2, .... n. is T.. The edge between node M and node P„ l i 1
has weight (T- + ... + T - (n - 2)T, ) / (n - 1), and likewise the Z 21 1
branches to nodes P. carrys the weights (T + T, + ... + T - (n - 2)1.) L 1 o n Z
n / (n - 1). In general. (M, P.) has weight (( 5 T.) - (n - 2)T ) /
1 j=l J 1
j*i
(n - 1).
An n-cutset of the assignment graph is a set of edges such that
1) if removed, separates the assignment graph into n subgraphs each con¬
taining exactly one processor node, and 2) no proper subset is also an
n-cutset. As in the two-processor case, each n-cutset corresponds to an
assignment of the modules of the program. A module M is assigned to a
processor P^ by an n-cutset if module node M is in the subgraph contain¬
ing processor node P^. The weight of an n-cutset is the sum of the
weights of the edges in the n-cutset. The edge weights are defined such
that the weight of an n-cutset is the cost of the associated assignment.
30
The problem of finding an optimal assignment is equivalent to that of
finding a minimum weight n-cutset of the assignment graph.
Fig. 2 shows the assignment graph for a three-processor system with
P^, and P^ representing processors and A, B, C, D, and E representing
program modules. If D is assigned to processor P^, the edges to P^ and
Pj are cut. and their weights total to T^, the cost of executing D on
P^. In general, the execution cost of a module M on a processor P^ con¬
tributes to the weights of all edges (M. P^), 1 < i < n. For each pro¬
cessor P^. i ^ j, the contribution is / (n - 1). If M is assigned to
P., then the total contribution due to M's execution cost of P. is (n - 3 3
l)Tj / (n - 1) = Tj, as it should be. The total contribution due to
M's execution costs on P^, i 4 j, is 0, since n-2 edges include T^ /
(n - 1), and one edge (M, P^) contributes -(n - 2)T^ / (n - 1). An edge
(M. P.) with an infinite cost indicates that M must be assigned to P. i l
since the cost of running M on any other processor would contribute an
infinite amount to the program's total cost.
A minimum n-cutset in an assignment graph for n-processors can be
found by exhaustive enumeration but the computational complexity of such
an approach makes this impractical for all but small problems. An m-
module program in an n-processor system has 0(nm) possible distinct
assignments. Vhen n>2. finding an optimal assignment is difficult. For
n>3. the problem of finding an optimal assignment is known to be NP-
complete; i.e. it is in a class of "hard" problems for which efficient
31
Figure 2. Au assignment graph for a three-processor system.
32
solutions almost certainly do not exist.
A two-processor flow can give information about the minimal n-
cutset in an n-processor graph. Previous research[l] shows that a
module node associated with a processor node by a two-processor flow is
also associated with that node by a minimum n-cutset. Unfortunately, it
is easy to construct examples in which a node that belongs with a par¬
ticular distinguished node by a minimum n-cutset fails to be associated
with that node by a two-processor flow. In the next chapter we will
discuss the implications of these results and describe a new result
which can substantially decrease the complexity of finding a minimum n-
cutset
CHAPTER Y
SEARCH SPACE REDUCTION
In using an exhaustive search procedure to find an optimal assign¬
ment, we assume that each module might be assigned to any of the n pos¬
sible processors by an optimal assignment. However, in many cases this
is not true. Ve show that a simple modification of the n-processor
assignment graph can be used to identify modules which must be assigned
to a particular processor under any optimal assignment of the program.
This can greatly reduce the size of the search space for optimal solu¬
tions. Ve present a number of experimental results indicating the mag¬
nitude of the reductions that we might expect in applying this result.
5_.1_ The Reduction Theorem
A max-flow min-cut algorithm requires that the flow network have
exactly one source node and one sink node. When we use a max-flow min-
cut algorithm to find an optimal assignment in a two-processor flow
graph, we choose one processor node to be the source node and the other
to be the sink node. Attempting to extend this approach to an n-
processor system is difficult because we no longer can identify a unique
33
34
source and sink.
However* we can modify the n-processor assignment graph so that we
can apply a max-flow min-cnt algorithm to obtain some information about
module assignments for the n-processor case. In the n-processor assign¬
ment graph* we choose one of the processor nodes* say as a source
node. Ve add a new node P’ to the graph, and edges connecting it to
each processor node except Pj. The weights of all the new edges are
infinite. Ve call this graph the P^-reduction graph. Then we let P^ be
the source node and P' be the sink node. By running a max-flow min-cut
algorithm on the P^-reduction graph, we find a minimum cutset which
separates the nodes of the graph into two partitions, one containing P^
and the other containing P*. Sometimes more than one cutset have
minimum weight. In a two-processor graph or P.-reduction graph, we
refer to the minimum weight cutset which associates the fewest number of
nodes with P^ as the P^-reduction cutset. Ve say that a module A is
assigned to P^ by a P^-reduction cutset if module node A belongs to the
same partition as P^.
The following theorem establishes a relationship between the P -
reduction cutset and the assignments of particular modules in any n-
processor optimal assignment. Let P^ be an arbitrary processor.
35
Reduction Theorem
If a module A is assigned to a processor by an P^-reduction
cutset* then A must he assigned to P^ under any n-processor optimal
assignment.
Proof :
First note that if A is assigned to P^ by every minimum capacity
cutset (Xj»Xj) of the P^-reduction graph* then A is assigned to P^ by
the cutset (X,X) where P^ € X^ V j and X = Ç) X^.
We prove the theorem by contradiction. Let A be assigned to P^ by
a Pj-reduction cutset (X,X), which is represented by cutset I in Fig. 3.
Assume A is assigned to a different processor P2 under an n-processor
minimum cost partition II. Without loss of generality* assume I and II
cross each other in the original graph* as shown in Fig. 3. Denote the
four regions defined by these cutsets as P^> P^. A, and X according to
the nodes appearing in these regions. Region X may be empty. The
cutset weight c(U,V) is the sum of the weights of all branches between
two regions U and V.
Since I is an P^-reduction cutset* the weight of I is less than the
weight of I-A, where I-A represents the cutset that partitions the graph
into two regions Pj and P^ U X U A. include A with P^. Thus
c(P1*X) + c(P1*P2) + c(A,X) + C(A,P2) < ctPj.X) + ctPj.A) + ctP^Pj)
36
Figure 3. Two cutsets in a graph.
37
We can find
c(A,X) + C(A,P2) < c(P1.A) (5.1)
If the hypothesis is true, since II is a minimum cost partition, the
weight of II is less than or equal to the weight of II-A, where II-A is
the cutset that partitions the graph into two regions D X D A and Pj.
Combining (5.1) and (5.2)
c(A.X) < -c(A,X)
Since all costs are nonnegative, this is a contradiction. Eence A can¬
not be assigned to any processor node P2 by an optimal n-processor
assignment.
This theorem is similar to but stronger than the one described by
Stone[ll. Stone showed that A is associated with P^ in some minimum
cost partition of the n-processor graph if A is associated with P^ by a
two-processor flow algorithm. We have explored the condition under
which A is associated with P^ in any minimum cost partition of the n-
We have
c(A,X) + e(A.P ) < C(P2.A) (5.2)
processor graph
38
The above theorem has a direct application in reducing the size of
the search space in attempting to find an optimal assignment. In the
algorithm that follows* let c(a,b) be the weight of the edge between
nodes a and b.
Procedure PREDUCE
1. n <- number of processor nodes;
2. for i-1 step 1 until i=n do
begin /* construct the P^-reduction graph */
3. source node <- P^;
4. add a new node P ^ to the assignment graph as the sink node;
5. for j=l step 1 until n do
6. if j*i
begin
7. add edge (P^ P^^) to the assignment graph;
8. c(j,n+l) = «;
9.
10.
11.
12.
end
establish a maximum flow between P. and P „ using l n+1
a max-flow min-cut algorithm;
X <- {all module nodes reachable from P^ by a path of
unsaturated arcs);
X <- {all module nodes not in X};
/* (X, X) is the Pj-reduction cutset •/
condense the nodes in X into one node P^';
for all v € X do
39
13. c(PA', v) = J c(u, v) u
¥ u € X
14. remove P^+^ and all edges incident on it;
end
In this algorithm we first choose a processor node P^ as a source
node. Then we add a new node P^+^ as the sink node to the graph and
connect the other processor nodes P., .... P to P by edges of infin-
ite weight. After running the max-flow algorithm on this graph we find
a P^-reduction cutset. Then we condense the nodes in the partition X
with P^ into one node P^'. If any node a in X has an edge to a node j
in the other partition of the graph, there is an edge from P^' to j in
the modified graph. The weight of (P^'.j) is equal to the sum of the
weights of the edges from all nodes in X to node j. After removing P^+j
and its incident edges, we choose P^ as the source node and repeat the
procedure above. We call this process of combining module nodes with a
processor node P^ a P^-reduction.
By Theorem 1 a module that is associated with a processor P^ by a
Pj-reduction cutset must be assigned to Pj by any optimal n-processor
assignment. By finding the P-reduction cutsets for all the processors
P. we hope to find a number of modules whose assignment in the optimal
n-processor partition can be fixed. Then when we attempt to find the
optimal n-processor assignment, we only need to consider the remaining
modules
40
In this way we reduce the work on searching optimal assignment. In
the next section we show through a number of examples that this algo¬
rithm results in a significant reduction in the size of the search space
for most assignment problems, at least for programs which conform to the
class of assignment graphs that we generated as test cases.
The success in applying the Reduction Theorem to the assignment
problem leads us to ask if we can obtain any additional information from
the set of P^-reduction graphs about module assignments. In particular,
what if a module is associated with a processor node P^, by a minimum
capacity cutset of the P^-reduction graph but not the P^-reduction
cutset, and A is not associated with any P^, i ^ j, by any minimum capa¬
city cutset of the P^-reduction graph? Ve might hope that A could then
be shown to be assigned to P^ by any n-processor optimal assignment.
Unfortunately, the following lemma shows that this is not true.
fcemma
If A is assigned to P^ by a minimum cutset of the P^-reduction
graph but not by an P.-reduction cutset, and A is not assigned to any
processor P (j £ i) by any minimum cutset of P -reduction graph, then A «I J
is not necessarily assigned to P^ by an optimal n-processor partition.
Ve will prove this by contradiction. Fig. 4(a) is an assignment
graph of a 3-processor system. P^, and P^ represent processors. A,
B and X are module nodes. Edge weights CCA.P^) and C(A,P^> are zero.
41
Figure 4(a), Au assignment graph of a three-processor
system and its minimum cost partition.
Figure 4(h), An -reduction graph and the minimum cutsets.
42
Figure 4(c). An P2~reduction graph and the minimum cutsets.
Figure 4(d). An P3-reduction graph and the minimum cutset.
43
Processor Assignments Cost of Assignment A B C
1 i 1 6 1 i 2 5 1 i 3 6 1 2 1 6 1 2 2 5 1 2 3 6 1 3 1 7 1 3 2 6 1 3 3 7 2 1 1 12 2 1 2 7 2 1 3 10 2 2 1 10 2 2 2 5 2 O 3 8 2 3 1 12 2 3 2 7 2 3 3 10 3 1 1 12 3 1 2 9 3 1 3 8 3 2 1 11 3 2 2 8 3 2 3 7 3 3 1 11 3 3 2 8 3 3 3 7
Table 1. All the possible assignments and tbe weights of tbeir cutsets for the graph in Fig. 4.
* optimal assignment
44
From the definitions of (processor node, module node)-edge weights,
<T2 + T3 - Tt) / 2 = 3.
(Tx + T3 - T2) / 2 » 0,
(T1 + T2 " T3) / 2 = °*
This implies that « 0 and = 3. Figures 4(h), 4(c), and 4(d)
are the P^-, Pj-* an<* Pj“r®duction graphs. The bold lines show the
minimum cutsets in each reduction graph. Notice that A is assigned to
by a minimum cutset other than the P^-reduction cutset in the P^-
reduction graph, and A is not assigned to P2 or P^ by a minimum cutset
of the P2~ or P^-reduction graph. Table 1 lists all the possible 3-
processor assignment and the weight of the cutset for each. The three
assignments marked with are optimal. In the third assignment A is
assigned to P2 instead of P^. Hence A is not necessarily assigned to P^
by an optimal 3-processor partition even if A is assigned to P^ by a
minimum cutset in the P^-reduction graph, and not assigned to P2 or P^
by any minimum cutset in the P2 or P^-reduction graph.
5_.2 Assignment Problem Simplification by P-reductions
The Reduction Theorem can be used to simplify the n-processor
optimal assignment problem. In this section we will show the efficiency
of this approach by giving some experimental results. These results are
for a particular class of assignment problems determined by the method
used to generate the assignment graph, and we will describe this method
45
first*
The graphs we used in the experiments were generated randomly.
First we generated the set of edges in the graph. We use a graph model
in which the probability of an edge existing between any pair of nodes
is a constant. 1/3.
We inserted edges between all pairs of nodes with probability 1/3
for each pair. The average degree of a node (number of incident edges)
is equal to (n - 1) / 3 for an n-node graph. After all edges were gen¬
erated. we verified that the resulting graph was connected; that is, we
required that there exist at least one path between every pair of nodes.
If we found a set of nodes that is not connected to the remaining nodes
by any paths, we added an edge which links a node in this set to one of
the other nodes of the graph. In this way we generated graphs with a
relatively dense interconnection structure, which in turn tends to make
the assignment problem more difficult.
We then assigned weights to the edges. We assumed that communica¬
tion costs are uniformly distributed in the range 1 to 20 in our exam¬
ples. For each edge between two module nodes, we generated a random
number R between 0 and 1. calculate R * 20 and used the ceiling of this
value as the weight of the edge. In the similar way we determined the
weight for an edge between a processor node and a module node. Since
there do not exist any detailed analyses of programs written for distri-
46
buted processors the choice of parameters is arbitrary.
In regards to the cost for executing module A on n processors, we
have following equations from Stone's formula.
cfPj.A) = (T2 + T3 + ... + T - (n - 2)T2) / (n - 1).
C(P2.A) - (TJ + T3 + ... + Tn - (n - 2)T2) / (n - 1).
c(P ,A> - (T, + T„ + ... + T , - (n - 2)T ) / (n - 1) n 12 n-1 n
In generating the model, we selected the edge capacities c(P.,A)
rather than execution costs for A. Using these equations, we can
calculate the values T^. 1 <. i <. n, from the chosen edge capacities. If
we find that an execution cost turns out to be 0. this is not really
a problem. We can bias all of the T^'s by replacing each with T.+<r
for some positive <r. This would not influence the solution of the n-
processor optimal assignment since this causes'the assignment cost to
increase by a constant a for each possible assignment, and the optimal
assignment(s) remain unchanged. Consequently, had we started with
values TJ+<T ^ 0 for «11 i. 1 <. i <u, we could have subtracted a from
each term again without changing the identity of the optimal assign¬
ments. even though some of the values of might now be zero.
Fig. 5 is a histogram of the graph size reduction for problems with
3 processor nodes and 6 module nodes. Fig. 6 is the case of 3 processor
nodes and 9 module nodes, and Fig. 7 describes the results for 6
freq
uen
cy
47
rH «O 4> d N O
(O’M O h W O h W O O © vo co o vo co © *d d o vo co o vo co o o«*o O 'O CO O VO CO O 04 o o VO CO O VO CO o W Vl O VO CO O xO CO O &0
<6 ,d P< d vt 60
o ed
u <0 o <0
<H •d o
d d o
•rt © ■M rH O d d *d
*d o o a Vi
VO O N T3
•H d (0 04
fd a © 04 *d Vt o 60 d
4> Vi wd o «M (0
CO <M © O O
o a Vi 04 0* m 60 co o
«M •d <0 «V»
•eH •H W >
• co
<0 Vi d 60
tv
O VO co O VO CO O fH CO «O VO oo O
frequency
48
es 4> P N O
• • •• •• •• «• •« •• •• •• .« *r4 <n +*
OHCScn^VOI^OOONO o OH(SCO^^VûhCOO A P OH<Srt^V)'OhCOO o<*o OHtStn^VJSOhOOO «av OiHcSco^*n\or^ooo d Hi Or-icsco^r*nvor^ooo &o
,d o« <3 Hi ao
4> A «M
# M (A O 4>
<M *P O
d d o
<D r4
O p P ’p
*© o <D s Hi
ON 4> N 'a
•H d <A 03
A (A Cu 1) <3 *d Hi o euo d
a> Hi ,d O
(A <A
«M 41 o O
o 6 Hi 03 P« Hi 00 cn O
•H» .d (A •M
•HI •H » £
# Vo
© Hi P oo
•H
OH NW^«n\Ohoo o rHcSfn^*»nvor-ooo
f re
qu
en
cy
49
^ on o P N O
«o «M O H CS CO 't VO !>• 00 C7N O Q OHN ^^mvohcoo ,p P OiHnro^f«nvof-ooo o< *a OHdfo^fmvohoo o « o> OH««^V>VOhWO U U OHNtO^^VÛhOOD 60
(A •P P< fid w 60
4>
Vi <0 O
<H *o o
P p O
•*<4 fl) «V» H O P P
•o o 0> 6 V»
o\ <u N •a
•H P CO aj
rP CO P« 4> «0 *o Vi o 00 p
4> u •P o ■M CO
CO «M o O o
o s V a & W4 00 VO o
«M •P CO «M •H •*4 W *
0> *4 p 00
•H to
OHf^CO'tViVOh-OOO H(Swt V1V: h» O
overage percentage reduction vs. the number of
1 modules
50
Figure
51
processor nodes and 9 modules nodes. In each of the cases described
above we generated 100 examples.
The axis marled "graph size reduction" gives the percentage of
module nodes that are assigned to one processor in any optimal assign¬
ment. In other words, this is the fraction of nodes whose assignment in
the n-processor case is fixed by a P-reduction. The histograms show the
relative frequency of occurrence of each percentage reduction from the
100 test case.
Fig. 8 shows the average percentage reduction as a function of
number of module nodes, for several different numbers of processor
nodes. We see that for those cases with relatively fewer of nodes, the
reduction algorithm is able to fix the assignments of a large fraction
of modules. Also, the effectiveness of the reduction process decreases
as the number of processor increases, for a fixed number of modules.
Other histograms of the graph size reduction for various numbers of pro¬
cessor nodes and module nodes are found in the Appendix.
A P-reduction is based on the application of a max-flow min-cut
3 algorithm, the complexity of which is 0(r ) for an assignment graph with
r nodes. For an n-processor system we have to run the max-flow algo¬
rithm on n P-reduction graphs, each with m+2 nodes. The time complexity
3 is 0(nm ). Suppose that for n processor nodes and m module nodes, we
can expect an average total assignment graph reduction of x. as a frac-
52
tion of the total number of module nodes. The time required to find an
optimal assignment m(l-x) modules by exhaustive enumeration is
0(nm^ *)). Thus, if we first do P-reduction and then find the optimal
n-processor assignment for the reduced problem* the time complexity is
0(nD^ Z^+nm^). If x is large enough, this represents a significant
improvement over the time required for exhaustive enumeration of m
module nodes, which is 0(nm).
We noted above that as the number of module nodes and processor
nodes increases, the fraction of modules whose assignments can be fixed
decreases. Consequently, we need some means other than exhaustive
enumeration for finding optimal or near-optimal assignments even when we
use P-reductions to reduce the size of the search space. In the next
chapter, we introduce an efficient heuristic assignment algorithm to
allow us to find "good" assignments even when the number of processors
or modules is large, and we show that for a large number of test cases,
it in fact produces an optimal assignment.
CHAPTER VI
HEURISTIC ALGORITHM TO FIND NEAR OPTIMAL MULTIPROCESSOR ASSIGNMENT
In this chapter we introduce a tree structure called a G-H tree
which is flow equivalent to a network flow graph. Then we describe a
related structure called affinity tree which can be used as a basis for
a heuristic algorithm to find optimal and near-optimal n-processor
assignments. Finally we present some experimental results which show
the effectiveness of this algorithm.
6.1 G-H Tree
Gomory and Hu showed that for a flow network with n nodes, maximum
flows between all the n(n - 1) / 2 pairs of nodes can be obtained by the
application of only n-1 max-flow min-cut algorithms[3]. They described
a procedure for constructing a tree, called a Gomory-Hu cutset tree,
from a xlow network with values on the edges of the tree and with the
following properties:
1) The maximum flow value between any pair of nodes N and N in a D
the flow network is equal to
53
54
■i,,(T.1''V’rji ’«k’
where v^ is the value associated with the edge between nodes N^ and N^
in the Gomoxy-Hu cutset tree (abbreviated as G-H tree).
2) Any edge of the G-H tree corresponds to a minimum cutset
(X,X) separating and in the original network. The set X is the
set of nodes connected to N^ in the G-H tree when the edge is
removed.
Fig. 9(b) is the G-H tree for the network in Fig. 9(a). For exam¬
ple* the maximum flow between nodes 1 and 5 is 13, while that between 3
and 4 is 14. The G-H tree is said to be flow-equivalent to the original
graph because of property 1 listed above.
To construct a G-H tree the procedure is briefly as follows. Ve
choose two nodes as source and sink, and do a maximum flow computation
to find a minimum cutset. The nodes in the network are separated by the
minimum cutset into two parts A and A. We represent this by two nodes
connected by an edge bearing the cutset value v^ (Fig. 10). In one node
are listed the nodes of A, in the other those of A. Next we choose two
nodes in A (or two in A), and solve the flow problem in the condensed
network in which the nodes listed in A (or A) are combined into a single
node. The resulting cutset has value v^ and is represented by an edge
of weight V2 connecting the two parts into which A (or A) is divided by
the cutset, say A^ and A^. The node A (or A) is attached to A^ if it is
55
Figure 9(a)
14
Figure 9(b)
(EHMD
Figure 10
56
in the same partition as A^, and to if it is in the same partition as
A2 (Fig. 11).
We continue to select pairs of nodes, both of which are contained
in a single node in the tree being constructed, and compute the maximum
flow between them. Each time we do this, we create a new node and edge
which are added in such a way that the resulting structure is still a
tree. This process is terminated when each node of the tree is labeled
with exactly one node. Each time we solve a flow problem in a network
equal to or smaller in size than the original. When the algorithm ter¬
minates. the resulting tree is a G-H tree for the original flow network.
For a more detailed description of this algorithm, see [17].
We illustrate this process by the example of Fig. 9. We arbi¬
trarily choose nodes 2 and 6, and compute a maximum flow between them.
We find the minimum cutset to be ((1.2). (3.4.5.6)) with value 17 as
indicated in Fig. 12(a). The first step in constructing the G-H tree is
shown in Fig. 12(b).
Next we choose nodes 1 and 2. since they belong to the same node in
Fig. 12(b). In obtaining the maximum flow between nodes 1 and 2, we
combine 3. 4. 5 and 6 into a single node. Fig. 13(a) shows the result¬
ing graph and the minimum cutset separating 1 and 2. which is ((1). (2,
3. 4. 5. 6)) with value 18. The tree shown in Fig. 13(b) includes an
edge between nodes 1 and 2 with weight 18. with node 2 connected to the
57
Figure 12(a)
Figure 12(b)
58
Figure 13(a)
Figure 14(a)
59
node labeled (3* 4, 5, 6).
Now we choose 3 and 6, 1 and 2 are condensed into a single node
(Fig. 14(a)). while node 3. 4. 5, and 6 are no longer combined. The
minimum cutset separating 3 and 6 is ((1. 2» 6), (3. 4, 5))» with value
13. This is reflected in the tree in Fig. 14(b).
To find the maximum flow between nodes 4 and 5. we can combine 1. 2
and 6 into a single node (Fig. 15(a)). The minimum cutset in this graph
is ((4), (1. 2. 3. 5, 6)) with value 14. Since 4 is alone in its parti¬
tion. we add a node labeled 4 to the tree in such a way that 4 is
separated from all other nodes in the tree by an edge of weight 14 (Fig.
15(b)).
Finally we consider the maximum flow between nodes 3 and 5. taking
1. 2 and 6 as one node. The minimum cutset separating 3 and 5 is ((3).
(1. 2. 4. 5. 6)) with capacity 15. and the completed G-H tree is shown
in Fig. 9(b).
Since the G-H tree of r nodes requires r-1 maximum flow computa¬
tions. the time complexity for constructing the G-H tree is bounded by
4 0(r ). However, since each time we compute a new maximum flow, we usu¬
ally combine sets of nodes in the original flow network into single
nodes in the flow network for which we compute the maximum flow, the
expected time complexity should be much less than this bound.
60
0-^_0_lï_0 “ (S.4.5
Figure 14(b)
Figure 15(a)
Figure 15(b)
61
The G-H tree is flow-equivalent to the original flow network. How¬
ever. there has been some loss of information in making this transforma¬
tion. We cannot be sore of finding an optimal n-processor assignment by
searching for a minimum value partition of a G-H tree derived from the
n-processor assignment graph. Nevertheless, we will make use of a simi¬
lar but simpler transformation procedure as a basis for a heuristic
algorithm for finding optimal or near-optimal assignments. This
transformation and the heuristic algorithm that employs it are the sub¬
jects of the next section.
6..2 Heuristic Procedure for Determining Multiprocessor Assignment
Since in an n>3-processor system the problem of finding an optimal
assignment is NP-complete. it is not likely that we will be able to
develop an efficient algorithm for finding an optimal assignment.
Instead, we are led to the question of whether there is any efficient
approximation algorithm for finding near-optimal assignments — assign¬
ment with costs that are very close to that of an optimal solution — at
least most of the time. Such an algorithm would be very attractive in
terms of the feasibility of its application to practical problems.
The G-H tree for a system consisting of n processors and m modules
4 has n+m nodes and is constructed in time bounded by 0((n+m) ). In this
section we first define a structure called an affinity tree which is
constructed in a similar fashion but which has fewer nodes. We then use
62
it as a basis for a heuristic algorithm for determining n-processor
assignments. Ve also show some results of the application of this
heuristic to a number of artificially generated test problems.
An affinity tree is a tree consisting of n nodes generated from an
n-processor assignment graph. It is related to a G-H tree, but each
node is labeled with exactly one processor node and some (perhaps none)
module nodes of the original module assignment graph. A node in an
affinity tree may represent more than one node in the original graph.
For an n-processor system we can get an affinity tree with n nodes.
The following algorithm generates an affinity tree.
(1) Create a node N, and list all of the nodes of the assignment graph
in N.
Let V = (N) /* V is the set of nodes in the (partially constructed)
affinity tree */
Let E ** 0 /* E is the set of edges in the affinity tree */
(2) Construct an assignment graph G in which each node n. listed in N
is a separate node in G. For any other node N' € V, all of the
nodes listed in N' are condensed into a single node labeled N' in
G. (Edge (N*,n.) exists in G if an edge (n.,n.) exists for some n • j i j
in N’ in the original assignment graph. The weight c(N',n^) is the
sum of edge capacities c(n^,n^) for all n^ such that n^ is a node
63
listed in N' and (n^.n^) is an edge in the original
graph.)
assignment
(3) Arbitrarily choose two processor nodes and listed in N, and
find the maximum flow f and minimum cutset (V «V ) between P and x y x
P such that P € V , P € V . y x x y y
(4) Create two new nodes N and N . In N , list all the nodes in V x y x x
that were listed in N, and in N list all the nodes in V that were y y
listed in N. Remove N from V.
(5) Add an edge (N ,N ) to E, with weight f. x y
(6) For every N' € V such that (N'.N) € E with weight f', if N' € V^,
replace (N'.N) by (N',N^) with weight f', and if N' € V , replace
(N'.N) by (N'.N ) with weight f’. y
(7) Replace N in V by N and N . i y
(8) Choose a node N' in V with two or more processor nodes listed in
it. If no such node exists, stop; else let N - N', and go to (2).
Fig. 16(b) is an affinity tree of the module assignment graph shown
in 16(a). Node P„ through P^. are processor nodes and m4 through m, are lo ID
module nodes* Some assignment graphs have two or more distinct affinity
trees
20 5
Figure 16(a)
Figure 16(b)
65
Our heuristic algorithm to assign modules in an n-processor distri¬
buted system uses the affinity tree as a basis. Basically our reasoning
is as follows. Each edge in the affinity tree represents a minimum
value cutset between two processor nodes in the n-processor assignment
graph. The set of cutsets associated with the edges of an affinity tree
partition the nodes of n-processor assignment graph into n subsets, each
containing exactly one processor node. By making the interpretation
that a module node A belonging to the same subset as processor node P^
means that module A is assigned to processor P^, we see that an affinity
tree uniquely defines an assignment. The partition has been constructed
by combining only minimum capacity cutsets between processor nodes into
a single n-cutset of the assignment graph.
If each of the minimum cutsets defined by the affinity tree were
disjoint (contained no edges belonging to any other cutset), the result¬
ing n-cutset should define an optimal assignment. Unfortunately, these
cutsets will be disjoint only in a few degenerate cases, and commonly
will share many edges with other cutsets defined in the affinity tree.
However we will approximate a minimum value n-cutset by the set of
minimum cutsets from the affinity tree under the supposition that it is
usually better to separate two processor nodes in the n-processor
assignment graph by a minimum cutset than by a cutset which does not
have minimum value. In some sense, the modules that belong to the same
node of the affinity tree as a processor P have an "affinity" for that
66
processor; hence the term affinity tree.
The algorithm for determining an assignment can he stated as two
simple steps.
Generate an affinity tree from an n-processor assignment graph.
step 2
Assign each module to the processor with which it is coresident in the
same node of the affinity tree.
This algorithm greatly reduces the computation for assignment on a
distributed system. When it is used to determine module assignment the
results may be sufficiently close to the optimal results to be useful in
a pragmatic sense. This algorithm uses a maximum flow algorithm as a
subroutine so the complexity of the module assignment is dependent upon
the implementation of the maximum flow algorithm used. Fortunately, the
maximum flow algorithm is among the class of algorithms with relatively
low computational complexity. There are various modifications of the
algorithm available that take advantage of special characteristics of
the module to obtain increased efficiency[18. 19, 20. 21]. The least
complexity of the algorithm is 0(r * e) where r is the number of nodes
and e is the number of edges in a graph. Since there are r(r - 1) / 2
edges in a fully connected graph, the value of e is at most r(r - 1) / 2 3
and the algorithm has complexity 0(r ). If there are n processors and m
modules in a system, only n-1 two-processor flows are run. Thus the
67
3 complexity of our algorithm is 0(n(m+n) )•
6.3 Experimental Resnlts
To determine the effectiveness of the heuristic algorithm described
in the previous section, we tested it on 100 randomly generated exam¬
ples. In these examples we allowed both the number of processor nodes
and the number of module nodes to vary. For each example we generated
all possible module assignments in an exhaustive search for a minimum
cost assignment. As we examined each possible assignment we calculated
its cost, and we recorded the maximum and minimum cost as well as the
mean cost over all possible assignments for a given test case. Table 3
shown in the Appendix lists these values for each example, comparing the
cost of the assignment generated by our heuristic algorithm with that of
the optimal assignment and the mean cost over all possible assignments.
In examples 11. 63. 65 and 77. we found several different affinity trees
for each. They indicate different heuristic assignments. In three
examples. 11. 63 and 77. the assignments represented by the different
affinity trees were all optimal. In example 65. we have four affinity
trees, but only one of them shows the assignment with minimum cost and
we use it to calculate the deviations listed in the table involving the
"near optimal" assignments. The percent deviation described in Table 2
and 3 are computed by taking the difference between the two costs,
dividing by the smaller of the two costs* and then multiplying by 100.
68
<A s 8» H CA es 08
<M g U EP EP a* IP fiP »? tP »? «M «M en o QO cS vo O 00 en
d P, c8 ON o O en 00 en en o o a • • • • • • • •rt •M en en 1 en 1 o CS en 00 rH «M P *M o ON NO r** o 00 CS CS «8 08 P rH rH rH
•H 4> O > P P <D S O
■o P B
a> O CA |x
CA CA
ïP * SP »? EP cP »? Èp EP EP
«8 CM 08 rH 'Cf CS NO 00 NO o 00 en U M N ON m O 00 00 en CS CS o a> c O • • • m • • • • • > 08 O en en en CS en Os o o r** 08 8> rH en 00 rH r* 00 en CS en es
B rx «S rH rH CS rH rH rH 08
CA 4> CA 08
«M U tP iP EP EP EP »? EP »? O rX
08 rX en O CS en o "Cf -f r* P G c8 n O I en I rH en en 00 rH O •** a • • • • • • • • • •H -M •M rH en 1 en 1 CS CS en 00 Tf "M P< H-» rH ON 00 rH 00 en CS «8 O P rH rH
•M O > B P o o o
T3 P
CM P
<u bo 0
CA CA
ïP * # dp EP EP EP Ip EP P
08 08 P t** CS rH ON 00 en rf rH «X <D «H rH en en rH 00 en en en NO ON O a O • • • • • • • • • • > rH 1^ en NO CS es o 00 rH c8 M en 00 rH r- 00 HT CS en en
rX «S rH rH CS rH rH rH P
M P 08 O 6 P
•M P CM +* O O P, îP EP iP Èp IP »? »? EP EP
O «M a P en rH O r** o NO O O »n ^f o a a vo rH O en o NO 00 ■S* o 00
•M O •H • • • • • • • • • • -M IX -M 00 o NO o Tf ON en o8 CM P
•H O P
0> rH o •O o8 P
a o ix P 60 H-» P EP * Êp P EP eP »? EP EP »? «8 P. P U o M l> rH o rH o o Hf xf ON 1> O H1 00 o Os o xf -Hf ON No O > n • • • • • ■ • • • • 08 08 r-X O O o rH o rH rH O *n TT
O i-X
P P Z
U <D 4> rX 43 H 6 O P O P G
ON NO
CA W
H o <u CA ,© CM CA S O o PS o a o
*4 P<
NO NO
es
Table
69
We find from the Table 3 that for 73% of the examples the heuristic
algorithm gives an real optimal module assignment. For the remaining
examples the deviation of the cost of the "near optimal" assignment (the
heuristic solution) from the cost of an optimal assignment is around
10%. with the maximum deviation being less than 14%. Even in the worst
case the cost of the heuristically determined assignment is much less
than the mean value of the cost for all the possible assignments.
In Table 2 we display the deviations for different classes based on
the number of processors and number of modules. We consider two cases:
one in which we include all the examples in the class and one in which
we only include those examples for which the heuristic solution was not
optimal. The average deviation of the approximate solution cost from
the optimal solution cost for all 100 examples is 1.67%. Although as
the number of modules nodes and the number of processor nodes in the
assignment graph increases the average deviation tends to increase, it
is clear that considering the increased number of all possible assign¬
ments. the heuristic algorithm is still an efficient means for finding
"good" assignments, i.e.. assignments which have costs only slightly
higher than an optimal assignment.
CHAPTER YII
SUMMARY AND CONCLUSIONS
7.1 Summary of Results
We studied the program module assignment problem in distributed
systems. The graph model developed by Stone is useful in the two-
processor case because the optimal assignment can be efficiently found
by the application of a max-flow min-cut algorithm. Although the graph
model can easily be extended to the n>2 processor case* the n>3-
processor assignment problem is known to be NP-complete, and no effi¬
cient algorithm for n>3 has been discovered.
We can often reduce the complexity of the n-processor assignment
problem by identifying modules that must be assigned to specific proces¬
sors under any optimal assignment. The Reduction Theorem which proves
this shows how to determine which modules, if any, will be assigned to a
particular processor P^ under any optimal assignment. We construct a
P.-reduction graph and apply a max-flow min-cut algorithm to it to find
a P.-reduction cutset which partitions the nodes of the reduction graph
into two subsets. Any module belonging to the same subset as P^ are
assigned to P by any optimal n-processor assignment. This often allows
70
71
substantial reductions in the number of modules whose assignments in an
optimal n-proces$or assignment cannot be fixed, although the effective¬
ness of this procedure decreases as the number of modules and/or proces¬
sors increases* This was demonstrated experimentally using a number of
randomly generated assignment graphs for several combinations of numbers
of modules and processors*
Even after applying these P-reductions, we are still left with the
problem of finding an optimal assignment based on the reduced assignment
graph* The heuristic algorithm that we used to find "good" solutions is
based on a structure called the affinity tree that is related to the G-H
tree, a flow-equivalent transformation of a flow network. We construct
an affinity tree and used it to identify a partition of the n-processor
assignment graph that represents (hopefully) a near-optimal module
assignment* The heuristic algorithm yielded very good results in a
variety of test cases based on randomly generated graphs* In a prag¬
matic sense the resulting assignment are almost always optimal, and when
the resulting assignment is not optimal, its cost is usually very close
to that of the optimal assignment especially when compared to the aver¬
age cost of all possible assignments* The time complexity of this algo-
3 rithm is bounded by 0(n(m+n) ) where n is the number of processors and m
is the number of modules in the system or in the reduced system.
72
1.2 Suggestions for Future Research
Since the heuristic algorithm is not guaranteed to yield an optimal
result» we can use the solution it produces as a starting point for an
iterative search method such as the one described by Bryant and
Agre[12]. Suppose we have n processors and m modules in a system. In
the search algorithm they employed» they pick an order in which the
module nodes are to be considered. They select an initial assignment»
compute its cost» and then perturb it to see if this cost can be
reduced. They consider assigning the first module (call it M^) to all
processors other than the one it is currently located while holding all
other module assignments fixed. If any of these alternative assignments
of leads to a lower cost assignment» we move to the low processor
which minimizes the assignment cost. Then we repeat the process using
the second module M^. After finish with the mth module» we continue
with M^. When we reach a point such that moving the module under con¬
sideration to a processor other than the one to which it is assigned at
the start of the iteration does not lead to lower total cost» we use the
current assignment as our best guess for an optimal assignment. Note
that in each iteration we must examine n-1 different assignments» and
the time required for each iteration is therefore 0(mn). By making an
informed choice of initial assignment using our heuristic assignment
algorithm» we might expect fewer iterations to be required» and we might
also hope that by starting "closer" to a globally optimum solution»
there is less chance the algorithm will stop on finding a locally
73
optimum solution (relative to the assignment of the module under con¬
sideration when the algorithm terminated).
This thesis has considered the problem of determining an optimal
assignment for a program in which each module is constrained to be
assigned to a single processor for the entire program execution. The
module reassignment after the beginning of program execution is elim¬
inated. The cost functions used to construct an assignment graph are
only the aggregate module execution costs and intermodule communication
costs. If we allow modules to be reassigned during program execution,
the cost of reassignment will influence the cost of the optimal dynamic
assignment. The dynamic module assignment for the program in a distri¬
buted system (especially an n>2-processor distributed system) is the
open problem of the future research. The model we used for the statis¬
tic case can be extended to the dynamic case. The function cost which
should be considered can include the module reassignment cost and the
module residence cost. The latter is incurred even when the module is
inactive. It is a function of the processor on which the previous
instance of that module is executed or the next instance is to be exe¬
cuted. The former is incurred when the module, may be the inactive
module, is reassigned during the program execution.
If we have a G-H tree generated from a module assignment graph, let
some nodes in the G-H tree represent processors and some represent
modules. Because of the feature of the locality in the tree, we can get
74
some heuristic knowledge about assigning module nodes to processor
nodes.
First# if a module node is only attached to a processor node# then
the module should be assigned to that processor* For example# in Fig.
17(a)# we can assign module nodes C and D to processor node Pj.
In addition# if one module node is only attached to another module
node# we can condense the two nodes into one node* For example# we can
condense node F into node E and assign F to whatever E is assigned to*
In this way the assignment problem can be simplified* The pruned tree
is shown in Fig* 17(b)*
For module nodes attached to several processor nodes# there are
several choices for the assignment* Each is formed by means of breaking
a group of edges in the G-H tree. We use the ”+” operation from Boolean
algebra to express choosing this pattern "or” choosing that pattern*
The combination of corresponding edges broken to form patterns is based
on such principles: If a group of module nodes is attached by n proces¬
sor nodes, in other words# there are n parallel branches in the sub¬
graph# then n-1 branches have to be broken in one pattern* We use the
"** operation from Boolean algebra to express breaking this edge "and"
breaking that edge* There may be several serial edges in one branch*
If any one of them is broken the branch will be broken.
75
Figure 17(a)
Figure 170>)
76
Thus for the subtree in Fig* 17(b)* to assign module nodes A and B
we can choose links to break like this:
a*(c + d)+b*(c + d)+a*b
It is
a*c + a*d + b*c+b*d + a*b
Where a * c corresponds to assigning module À to P^ and B to P^, and a *
d corresponds assigning both module A and module B to Pj.
In this way we might expect to greatly reduce the amount of search¬
ing required for finding the desired assignment. This reduction is due
to the fact that the principle to form the assignment has eliminated
many of the original n° patterns from consideration* We have a limited
number of patterns to choose instead of nm* Recall that for a system
with n processor and m modules there may exist nm assignments*
As the use of computer networks matures* these and similar problems
dealing with the distribution of a program's execution to take advantage
of the various resources available in the network will become increas¬
ingly important* The results contained in this thesis will help provide
a basis for further research in this area as interest in it continues to
grow
REFERENCES
[1] 3.S. Stone» "Multiprocessor Scheduling with the Aid of Network Flow
Algorithms." IEEE Trans» Software Engineering SE-3(1), pp.85-93
(January 1977).
[2] H.S. Stone and S.H. Bokhari. "Control of Distributed Processes."
Computer 11(7). pp.97-106 (July 1978).
[3J R.E. Gomory and T.C. Hu, "Multi-Terminal Network Flows," Journal of
the Society of Indnstrial Applied Mathematics 9(4), pp.399-404
(1961).
[4] H.S. Stone, "Critical Load Factors in Two-Processor Distributed
Systems," IEEE Trans. Software Engineering SB-4(3), pp.254-258 (May
1978).
[5] J.B. Sinclair, "Critical Delays for Optimal Assignments in Broad¬
cast Networks," TR No. 8210, Department of Electrical Engineering,
Rice University, Houston, TX (August 1982).
[6] D. Gusfield, "Parametric Combinatorial Computing and a Problem of
Program Module Distribution," Journal of the ACM 30(3), pp.551-563
(July 1983).
77
78
[7] G.S. Rao, H.S. Stone, and T.C. Hn, "Assignment of Tasks in a Dis¬
tributed Processor System with Limited Memory." IEEE Trans. Comput¬
ers C-2S(4), pp.291-299 (April 1979).
[8] T.A. Gonsalves, "Heuristic Algorithms for Distributed Processor
Scheduling with Limited Memory," Master's Thesis, Department of
Electrical Engineering, Rice University, Houston, Texas (June
1978).
[9] J.B. Sinclair, "Optimal Assignments in Broadcast Networks with
Transmission Delays," TR No. 8205, Department of Electrical
Engineering, Rice University, Houston, TX (July 1982).
[10] J.B. Sinclair, "Dynamic Assignment in Distributed Processing Sys¬
tems," PhD Thesis, Department of Electrical Engineering, Rice
University, Houston, Texas (August 1978).
[11] S.H. Bokhari, "A Shortest Tree Algorithm for Optimal Assignments
across Space and Time in a Distributed Processor System," IEEE
Trans. Software Engineering SE-7(6), pp.583-589 (November 1981).
[12] R.M. Bryant and J.R. Agre, "A Queueing Network Approach to the
Module Allocation Problem in Distributed Systems," Performance
Evaluation Review 10(3), pp.181-204 (Fall 1981).
[13] W.W. Chu, L.J. Holloway, M.T. Lan, and K. Efe, "Task Allocation in
Distributed Data Processing," Computer 13(11), pp.57-69 (November
1980).
79
[14] V.B. Gy lys and J.A. Edwards, "Optimal Partitioning of Workload for
Distributed Systems," Direst of Papers. COMPCON Fall 76. pp.353-357
(September 1976).
[15] T.C.K. Chon and J.A. Abraham, "Load Balancing in Distributed Sys¬
tems," IEEE Transactions on Software Engineering SB-8(4), pp.401-
412 (July 1982).
[16] L.R. Ford, Jr. and D.R. Fulkerson, "Maximal Flow through a Net¬
work," Canadian Journal of Mathematics 8(3), pp.399-404 (1956).
[17] T. C. Hu, Integer Programming and Network Flows. AddisonrWesley,
Reading, Mass. (1970).
[18] L.R. Ford, Jr. and D.R. Fulkerson, Flows in Networks. Princeton .
University Press (1962).
[19] J. Edmonds and R.M. Karp, "Theoretical Improvements in Algorithm
Efficiency for Network Flow Problems," JACM 19(2), pp.248-264
(1972).
[20] S. Even, "The Max Flow Algorithm of Dinic and Karzanov: An Exposi¬
tion," Report No. MIT/LCS/TM-80, Laboratory for Computer Science,
Massachusetts Institute of Technology, Cambridge, Mass. (December
1976).
[21] V.M. Malhotra, M.P. Kumar, and S.N. Maheshwari, "An O(lvl^) Algo¬
rithm for Finding Maximum Flows in Networks," Information Process¬
ing Letters 7(6), pp.277-278 (October 1978).
APPENDIX
f re
qu
eu
cy
31
I
Si n n VOt^COCOOOOOiHOi-l CO ri
m CO
SÊ 8? o en h o <n h o co vo o co vo O CO NO O CO VP o co VO O co vo O CO VO O CO VO O CO VO D CO VO
&&&&&&& O CO h O CO Is o O CO vo O CO VO o O CO vo O CO vo O O ro vo o co vo O O CO VO o CO VO o O to VO O CO vo o
44 p« os u co a>
44 <4-» • « n o o *o
«M O d
d o a> •^4 *H ■M d o *o d o •d 6 o u cs
r-l <D N *d
d (0 03 44 w CU <0 o3 *0 u o «o d 4) U
44 O <M <0 0) <H 4> O o
O
. H
isto
gra
m
wit
h
3 pr
a> d N O
00 rH
•H 4> «0 4-» U
o d 44 P 00
Pi T3 fltf 4) P P oo
o oo vo *n co H N co
o oo vo m co H O m in vo h oo o\ o
000000%: >>>>>>>>>>
> rrrn-rn-rrr)
>>'>))
:r/7
■/ ■/
) LIA
82
o S 4>
OcnOOOOOOOOOOOOOtHO H Cï
o d N O
fOvoo>(S^oOH^,voa\N»nooH^,r^O o •nOinHvoHhNhMoowoo't^ *3* © *d d WhO^hH^OOH^Wciin^N'OO P« *0 cj^f,o\H^'o«Htnm«ocnohO d d ÛOVO^NHOShVi^MOÛOt'^WHO U U OOh\Oin^«HOO\OOh»0^ffJMHO 00
(0 •d Q< «3 P 00
<D ,d 4-» •
CO n 0 0 *o <w O
d d 0 0 •T4 4-» d O ’O d 0 •d a <D U r-
T*i 0> N *0 •*4 d « d
•d CA a d d "O *4 c 00 d
4> M A 0 +■» CO
CA (M 4> O O
O a U d p< (4 00 rr> O •M .d (A •M •H •*4 en *
• o> rH
O M P 00 •H fci
OmHhmOymHhNM^OVONOO^O HHcs«w^t«n»n\ohhOOCoo\o
000000%
: 7
5
* > n
imi
) i r
rrrr
r / n
11m
) -J
21
rn
rrn
r/v
v v i
545455%
: 9
2Z
ZZ
3
83
Q 3 Q>
A
s s
OOOOOOOOOOOOOOOOOOOr-l«n
V tf N O
o\^oO(r>r*NvoH^ov)OMfooror^M'€H<no o OVOHhC<OOcOO>^OinO'OHhC<OOtoO\^0 ,3 3 ONcnoodr-rHvoo*no^to\<nooc^r*^HVoo«no 0**0 O^HhMCO(nO\^D«ftO'OHh«eOcnONtO It 0> 0\ffJ»c<>H'00^0^0\WOOMhHNOOlOO H H O'OHhM^ïO^^OlOOVOrthMOOWONVO U>
O 0\ W w (S hHVOO^O^^rOOONhHVCOViO
Fig
ure
20.
His
tog
ram
of
the
gra
ph
size
red
ucti
on fo
r th
e
gra
phs
wit
h
3 pro
cess
or
nodes
and
22
module
nodes.
00
00
00
%:
g g
rrrr
rrr n
ni
/ trm
n ) u
n / n
i) n
// u
n 7:
2:11
84
>> O S Q>
D \± ^ C40000000000000000000000000
r-4 O N
'^CQ<S»n0\fnt^rH»n00<SV0O^t*00cS«O0>C0r^rHV^00cS'OO «nO'oHvo«shwwwoN^O‘no'OH'ô^htnoo(o^,i,o r=< Ht0,tVCh0\OM««nV0C0OHW^VOh0\O<S(0V»'O»O P« sOCSOO^OVOfOONtnHt^CO D\ÛMM^O'Orr>0\>nHMnO «* ^O^ffi»(nhM'OH^O«nO^O\MOOcOhNVOH\OOIOO V* OOVOiOfO(SOO\h^^fOHOûOVÛinW(SOO\h'Û^MHO «0
Otohrl^^WVOO^OO^OOWhHlftChcnVOO^OONVûO HHHMM(nco«n^^io«n«ovo\ovorHr%oooocoo\^o
red
ucti
on
Fig
ure
21
. H
isto
gra
m
of
the
grap
h
size
red
uct
ion
for
the
gra
ph
s w
ith
3 p
roce
ssor
no
des
and
26
mod
ule
no
des
.
ZZ
ZZ
Z
85
o a
«n vo vo o\ v© oo H H H H (S
O P C4 O
# ï£ ^ ^ # M«M O O O O O O O OOOOOO *P P O O O O O O Ck *o oooooo id a> OOOOOO WiM OOOOOO «o
10 •P o« «s 60
a> WP
t4 to O 4>
<w *P O
a q O
a> 44 »-H O p P «o
•O O P a 14
tn <u N *C
•H p to «
TCJ P P< <o P *o H O 60 p
O H «P O +4 V)
to <H O O o
O B U P & Wi 60 ^ O 'M ,p (O 44
•H «H w *
cs CS
O w P 00
•H E*
OOOOOO cl S’ vo 00 o
fre
qu
en
cy
86
M HH TH
© d N O
* * *H *rt © -M
OOOOOOOOO O ooooooooo «dd ooooooooo pu *a ooooooooo © © OOOOOOOOO U U omomoooioo *o
rd CU © H 60 © •d -M
• ©
o © o
d d o
•H © 4-» »—» © d d
o © 6 m
60 © N •Ü
•rt d W ©
*d M cu © © *a u o 60 d © u
wa © -M ©
© <W © O o
o 8 © Q* oo o
td W 4-»
•f* •*4 a *
• cn
© d 60 to
ON^hO^inhO H<sco«n\ot^ooo
frequency
87
•ft rt H Q> A N O
• • •• •• •• •• •• •• •• •• •• •• •« *H *H w -v*
oo\»h\o»ft«ft^(nNHO o OOHN«^«ft'0h»0\O »d P O0\00hVôio^wNHOO 04*0 OOrlNW^Mft'O [> OO 0> O «S 4> O0\»hV3«ft^«NHOO U U OOHN(ft^<ft'0h°°0\ O OO
OftWhVOlft'tWP^HOO HCSfft^'ft'Of-OOONO
(0 ,P P< « u 00
o
4-» • 0)
m <o o *o <W o
P P o © •H «M d O P O •o a <D Vi iH
rH <D N *p •H d (A oa
•P (A & O o *o Vi o 00 d
© VI o
«M (A (A ©
O O o
E Vi P OI U 00 <** o *M *P «0 4-> •H •ft u *
cs
o
p 00 •rt
IX
freq
uen
cy
88
B XL VOÛOmOOOOOOOOOOOOOfH
£ B* Eê
O O O O O O O O O O O O O O O O O O O O O «n O «n O OM«OhO
* # 8? ^ O O O O O O O O O O O O O O O O «n © m © N irt t' O
ï£ B* & & O O O O O O O O O O O O D O O O «n © tn © M »n ©
* B* o © © O © © o © © © © © © © © © «n © m © n «o ©
U U 00
V) Pk « U
©
W O
£ O
O Pt o n © N
rfl eu © M 00 ©
rP
6 od M
a
» <n
© £ O
•rt *rt © <0 -M M
o P
fCt P* aO •rt
eu *o « 4)
(*
O'ONOOVJHt^tnO'OdOOlOHh-fOO ^Hr-f<SfOfo^«n*nvo'or^oooooN©
wit
h
4 p
roce
sso
r n
od
es
and
16
mod
ule
nod
es.
000000%
:
)>?'
) T ri
im
nann
rrm
i
89
>1 o rt 4>
H - - mOOOOOOOOOOOOOOOOOOO *H
<o C N O
»AO^,3S,t^tn00w00(ShNhH'O.H\0O«nO o OHHHfqMWW't^^inVOVOhhOOOO^ONO P ^OOhVOV)^(ONrtOOyW^VO^Tf<nc<HOO 04*0 HW«OhO\H(n'nhO'ON^'O00ON^\O00O «s o vor^wtohm^^HOO^ovoNOsmHhwo Vf MftC<0««nwO»\0(nHO\>O^HO\h^MO 60
O^0\^0Nf0WWC0«hMhH'OH'OO‘OO«no rlHf<NWW,t^»n|ft'OVO^hOOOOO>OsO
Fig
ure
26.
His
togra
m
of
the
gra
ph
size
reducti
on fo
r th
e
gra
phs
wit
h
4 p
rocess
or
no
des
and
21
mo
du
le
nodes.
000000%
: <
n\U
T'r
TT
‘> ) )}
rrrrrr
) i )
) i}
} ) )
) ) ) )
) ) 11}
) i n
) ) )
i
90
a o (3 N O
OOOOOOOOOOOOOOOOOOOOOOOOO O OOOOOOOOOOOOOOOOOOOOOOOOO ,3 P OOOOOOOOOOOOOOOOOOOOOOOOO 0**0 OOOOOOOOOOOOOOOOOOOOOOOOO c«4> OOOOOOOOOOOOOOOOOOOOOOOOO * H OOOOOOOOOOOOOOOOOOOOOOOOO t*
O^CO^'OOl'OO^VOO'tWlSveO^OOMVOO'tM^VOO HrHcsc^c^cncn^^f^rin^nvovovpr^r^oooooooNO^o
Fig
ure
27
, H
isto
gra
m
of
the
grap
h
size
red
uct
ion
for
the
gra
ph
s w
ith
4 p
roce
ssor
nod
es
and
25
mod
ule
nod
es.
freq
uen
cy
91
HMW HH <D d N O
• • •• •• •• •• *H *v4
# ^ « <M O O O O O O o o o o o 44 d O O O O O P< *o O O O O O d 4> O O O O O H O O O O O t*
CO
Ql (0 u 00
o
n M o <D
«H *o o
d d o
4> w ^Hl o d d •o
*o o « 6 u 4> N *o
•H d (A «8
44 CO p< o d *d H o 00 d
o * 44 o •4-* CO
CO «M o O o
o 6 u d p4 * oo •n o 'M 44 <0
•H •H •H *H » *
•
00 cs *> m d 00
•H C*
o «n o m o CJ «n f* O
frequency
92
H (H H H o (3 N O
<« +J O^^ffjhHVoO U OHd^VihOOo «40 Oh^rtMinno ou «o omn^csoo^o «j « owhm^(SrtO i4M ON«OOOrt tbO 00
O 00 (SJ h H VJ O
r( «Tf «OhOO O
«0 ed Pi a u OÛ
O
«w •
w. CO O A> «H
O ci d O
d> •M *-* O d
•a *T3 o <P 6 t*i
O N *d •(H d «0 ad
*P CO Pi o ad *d U O 00 G
4> U O
«M CO CA
<W 4) O O
O 6 U cd Pi t* oo «n o 4-» •d O» 4-» •<H •H W *
ON cs
<0 U O 00 •H cx<
frequency
93
CO
44 o« a u eo
o 44 •M •
CO
U 0) O *o
o G
o <D •rH 4-» P O •o P o
*o 6 <D W o
rH Q> N *o
*ri P CO P
44 CO
P< o OS n O 00 a <u u
43 o •M <0
<0 «
O o O
£ U P Q* tH oo m o
+*» 44 CO 4-»
•*■4 •*4 *
ooooooooooo ooooooooooo ooooooooooo ooooooooooo ooooooooooo ooooooo oooo
fl> P N O
o cn
•H *H Q> CO (H
o P 44 P 00 «f4 cu*a CSc
OS o »-i m 00
ooooooooooo HNto^'nvohWoo
I}}
n JI
} ) /1
) //1
u / )}
t}>
j u}
I n n
}}
) i nr/'j.l
ZL
:%O
OO
OO
O
94
>% o a 4>
CA
A ÇU 04 U 60
4> A 44 »
M U 4> O TJ
«H O G
G O 4>
+4 0 O »Q G O
TJ 6 o * «n
4> N *o
G <0 OS
Qt <U «4 TJ M O oo a
n
4) U A O 44 (fl
(A <H 4> O o
o 6 it a eu u to m o 44 (A 44
w *
iH'AOfHOOOOOOO d
h«Oh«OhwOh<n VOCOONOCOONOCOOVOCO
©COOV©CO©V©CO©VOCO VOCOOVOPOOVOCO D \o w ©coovocoovocoovoco vo co o vo ro © v© co © v© co
O O O rH
4> 0 iH
N O CO
•rt "H 4> eÊ e* ^ s* CA 44 U O t*'* CO o © P
O WD CO O ^ P 60
O V© en O eu *© bu O CO o 04 <D © V© co O *4 fc
O © CO o 60
ovofoovocoovocoovocoovoeoo HNMW'tt^VOVOhOOOO^O
\ ))>))>
)) rrr/ rrrm
11 ) n
? n'> i > > n
) n i n
TT
7\ T
6 :%
00
000
0
95
o P 4)
OOlHOOOOOOOOOOOOOOOOOO
oooooooooooo oooooooooooo oooooooooooo OOOOOOOOOO DO OOOOOOOOOOOO oooooooooooo
q> P H O
• • •• •• •• •• «H «*«1 «A+»
oooooooo o oooooooo Æ 0 oooooooo o* *a OOOOOOOO ai 4> OOOOOOOO u u OOOOOOOO &Û
0«n©*oo*o0tn©ino«o©*o©tn©«no«no HH«cE<n«^r^viin\oyohh00oo^ao
Fig
ura
32
. H
isto
gra
m
of
the
grap
h
size
red
uct
ion fo
r
the
gra
ph
s w
ith
5 p
roce
sso
r n
od
es
and
20
mod
ule
no
des
.
000000%
: 1
nn
/ 7
77
77
)>
>))
)) n
I )
)})')
i i
i }
11
i )
1} }
Un
iLU
.ll
96
o a a>
oooooooooooooooooooooooo a> £ M O
•• •• •• •• •• •• «H
r*coo^(nor*coo^foor^(not^«ots'foo^coo u vocoovofoONocnovooiovocnOVoroovomovocno & 3 vocoovûeoovofoovocnovocnovocoovocoovocoo p« *a ^pf00vocv)ovo<no>0700\ofnovotoovofoovo(no «s « vocoovocnovocoo^ocnovofoovoroovocnovomo u u Hn'n^°OoH(nin'ooooH(n«n\oWoH«n<n'oWo w>
o^«c<voo«o\whHV)0^ooM^o movcor^iHioo rHrHnc<C4fOCO^,^«OV>«OVO'Ol^t^r^OOOOONO\0
Fig
ure
33.
His
tog
ram
of
the
gra
ph
size
reducti
on fo
r th
e
gra
ph
s w
ith
5 p
rocess
or
nodes
and
24
mo
du
le
nodes.
m 11 n
rnjj.n
.in n
ut m
"nrn
, i I.LU
i
97
>% 0
p © P
01
Vi
s
\
\ s s
CO 44 a* «1 n 00
o 44 4-»
• V* <0 o o
*o o
P P o •H © •M H Q P P ’O
o © a Vi
NO o N *o
•M P CO at
44 CO 04 © aJ •O $4 O 00 G
O V« 44 O 44 CO
(A CM V o o
o s Vi C3 OI Vi 00 N© O 44 44 CO 44
•H •M « »
o\H^t o d N Cï H
© p CO
N O © •M >M Vi
# eR ï£ # CO 44 P 00
*9*4 OMnOhW o o o NO co o NO co o A p Ci« o vo co o vo co o p* *o o vo CO o vo CO o at © O VO CO O NO CO o Vi Vi O VO CO O VO CO o 00
O NO €0 O VO CO O H <n »0 VO OO o
f re
qu
en
cy
98
(S ^ H H 4> fl N O
CA
Pi cd d 00
o id 4-»
• to CA o ©
•a o
d d o •H 41 «M H O d d *©
*o o © 6 is
m V N *d
•*S d CO od
CA
p< 41 01 *o u o oo d
© Is fd O «M <0
«A O
o o O e is
cd Pi Is 00 l> o
s-> *d (0 S-»
•«s •H W >
m co
# tfî ïf* »+» o o o o o o » OOOOOO «40 OOOOOO 04*0 OOOOOO «4 4> OOOOOO u u OOOOOO <*
© u 0 «0
to
OOOOOO N'tvo WO
frequency
99
«n es o d N O
&&&&&&&&& « ^ ooooooooo o ooooooooo ,d d ooooooooo 0**0 OOOOOOOOO d P OOOOOOOOO U U O'OO'AO'ttO'nO 00
09 .d eu d *4 00
P •d 4-*
«4 C0 O P <M
O d d o •fi p +4 M O d d •o *o o O B U
00 P N •a •ri d (A «s
•d CA P« O d *o u o oo d
P U fd o +4 CA
CO CM P O P
O S P d Pi P oo r- o +* fd <o •M •H n *
CD CO
P m P 00 •H
OM«OhO(SiOhO HNm^SOhOOO
frequency
100
H 3 3 N O
3 +» O^ONWhHVO O O OHcS't'nhOoo ,3 3 ObtHOOirtMO O* *X3 O«OHh<Sfl0Tf D ot 3 OOO > >n M H O U U Oc^mOOH^t^O 60
3 ,3 Pi
P to
0 *4 +*
• P 3 O 3
*o O
3 a O •p 3 «*■» f-H O 3 3 •O
*a O <0 a P
r- 03 M *o
•H 3 CO 3
,3 CO Pi 3 3 •O P O 60 3
O P *3 O
3 3
<M 3 O O
O E P 3 Pi * 60 00 O
•M ,3
a >
cn
a> P 3 60
•H fcl
hH*no H d ^ m h 00 o
101
<W o G
S E ' * P P £ * £ P P P * P P P P P P P P P P P P P P P P P 0 o -H o U 4-» NO M* en NO 00 NO es p- G 00 O o O o NO G O O O NO O O O NO rH P- NO it <M 0< N ON G 00 p- 00 G h- ON O o O o O O CN O O 00 O O NO G rH C— r- 4-» o G G Ov r- G 00 en m rH ON rH P- o o O o p** 00 P- NO NO es NO rH e'¬ en 00 O rH
1-1 G U G o O NO rH o 00 O O NO o o O o G en 00 CS CN ON r- NO en 00 t- ON NO ► o as en H rH rH en es rH NO rr rH rH rH rH rH rH rH rH «G- es rH rH rH rH o 6 o •o 0
<W o B c
O il M as P * P P P P P PPPPPPPPPP iP P P ^ iP iP cP «P «P P
o <4-4 6 NO rH o 00 vo es r- if 00 O O O O NO G O o rH «o O O O NO rH r- NO iH 4,1
•H G ^
CS G G 00 t- 00 «o r- ON O O O O O O ON o 00 00 O O NO G rH r- r- G G Q« ON P- G 00 en G rH ON rH P- O O O O P- 00 P- NO O CS NO rH r- en 00 o rH •H « O O rH NO rH O 00 O O 40 o O O O G en 00 CS rH ON r- NO en 00 C-> ON NO
E en ri rH H en CS rH 40 rf rH rH rH rH rH rH rH rH G rH es rH rH rH rH
*o
<W H *H O «a as
B 6 c o •H 4-i 44 P PPPPPPPPPP # # # P tP P iP P
i< o< o. o O NO O O o O O O O O O O O O O O O rH O O O O O O o O 4-» Of
o o o O NO O O o O O o O O O O O O O O o rH o O O O o o o O i-4 *> n B M* ft o O G o o o O o o O O o o O o O o o 00 o O O o o o o O O
w w O H
•o J3 <H»
U as as B en en G r* G* ON CS f- ON NO ON ON ON ON G en rH en O rH en IO NO 00 en ON G as -H es 1* en es CS CS rH G en en en en en es en rH G es en CS en NO 40 en en G -M 40"^ *"N* - P, G •o O •O 4-> O
o o G en en en r- G ON CS t- ON NO ON G en rH en e'¬ rH en NO NO 00 en ON G •H B es G en es es CS rH HT en en CS en rH en «S en es en NO NO en en 4-> (4
B w o 6 w es vo ON 00 00 en en G en vo G es en ON rH G O rH p- G en 00 en p- a ce G es en G G en ON NO G vo o rH e- rH ON o rH es 00 rH en rH ON o
•HI B «H rH rH rH rH rH rH rH rH rH rH rH rH rH rH rH rH rH
CO CO « en en r- en en r- o t- en f- O O NO NO NO O O NO NO O O O O O G en en NO -en en NO o vo en vo O O CS es O NO t- CS NO G G G NO
G • • • • « • • • • • • • • • • • • • • • • • • • O en ON es ON ON 00 CS es NO 00 oo if G ON 00 00 es O NO NO NO G G IO B O 00 ON ON ON 00 NO G ON rH r- 00 NO 00 NO e* 00 ON vo 00 O ON e- 00
rH rH rH
G u O a> H
*o <M G NO NO NO Vo vo NO NO 6
6
NO NO NO 40 NO 40 40 NO NO NO NO G G G G § O 13
G O B G W4
M O o G
*o <H G
§ O G en en en en en en en en en en if G G If G* G G G G G NO NO NO NO
(3 O W P«
m O H es en G en vo r- oo ON O rH CS en G »o NO e* 00 ON O rH CS en G Z rH rH rH rH rH rH rH rH rH rH es es es CS CS
Tab
le
3(a
)
102
CM *H O 03 e e tS O «M O U 4J
•M CM P« 4-1 O
,tOVDH<^Hh'tnHV)OOSNHriyoO»0(SW^hNV)(OVOW oo o\‘ONh*»worio\H(ShH^o\0^ctvefnvemdMvo«o
d a •M «a ^ ► d d o G o *o 0
S
VPH^yoN(n»VOhOH^OO(Sh«lOV)ct\oOO\OHOH»ON HOh(SON\ÛOWh«H(S‘0»0\0'Û^roHOW(SWVOin»A H H H H M 1H r-t NMdrOHNlTH-ICSCSH
CM O 8
O 0 U ai
&&&&&&&&&&&&&&&&&&&&&&&&&&&. O CH 8 *H *M •M d «M
o VO H Os H^mH‘AOO\(SHCOnOV)NOO^h(Smn'OM WON^NhCOffJO HOSHMhHVOWO^^VOW'O^OONVOm
d « Ûi •M «> O > 8 o
*o
voiMOsvo«Srf)fovot^O^H»oeocsvoOcncr»^vooo\oiHOr-|oôos HOh(SON«oooh«H^m\oh\oo\fnHO»(s»\o‘ntn H H H H M Ml Ml MrtrO(nH(S«OHMMH
CM ~4 »M O cd d .ES d -M -M
O 44 4-» •4 ft P) 4-» O O d
000000r*000000000ir>00000000000 000000*0000 OOOO00N0OOOOOOOOOOO
•M M 8 ► d O d d *4
•O d CM
000000000000000*0^4*00000000000
St
"near
op
tim
al"
«Nm»(nH»'0(oo\N^mff>voiftO^HOPoirtHhff>0(o cn^^N^ncnfo^dcoNMvotn^^vwcsw^ocr^rinvo
Ml
O U d
•H 8
44
cociio^(f)H»nvotn^(S^fO(n^tno^HOOotnHhmOtn
TH
sig
nm
ei
max
mvoH(SOMn^^^OfoN(n«oooooMinNMfo\op'»(SOoo OOOWO»hOOH»H^«HVO»HO>Oh(n^eOO\(fjVOH H H H Ml Ml Ml Ml iH (SHcSHCSMtSH^NM
d d d
O OO OOOhhhhhhnwhfOOOhhhOhOn toO ^roooosocsoMiMiMivoMivocooovofnoovovovoovoocncnco
d B
CS’d*0*Ot^OO«^^frO\VO*4,rHOO«OVOf’OVOO^*V©VO^fr*<S»OOSfrs oooooovooo\ovot^osvoosco«ooscnr^^*mroM(vor^os(ncor^vo
HHHHHHHHHHH
nu
mb
er
of
mo
du
les
't^^^'tfddddinden«dd0)0(0\0\^o\os^ososoo
nu
mber
of
pro
cess
ors
cocotococotONovovovovdvpso\ovovorr)(n(f>(ncn«o(nrocorr>^
No.
ir>Nor^ooo\OMi(S<n*4riovor^ooosOMi(si(o^cosor*ooosOtH
Tab
le
3(b
)
103
H o ft
p a a O «H
* # fcR R |R * R ER £ * R ER ER ER R ER R * R tR * # & & * tp o M 44 O o V) cs ON vo O Hf O cn vo o o cn o *<r rH 00 vo H 00 ON vo ON o •H 44 Qt
o o o *8- 00 H- vo H* ON cn r* Hf •© o o •© vo o o 00 cn m 00 ^8* ON vo 00
« d •© i© H ON s* cn cs ON r** o «© o o o vo 00 00 ON rH 00 H O cs cn •H ft u H H 'HI 00 4f vo cs ON 00 vo ON cs cs ON cn H H o cn ON cn 00 ̂ 8* 00 00 > « «8
a © d
rHI H H CS H cs H H CS H H H CS cs rH H H H rH rH H H rH •o
<W o a
O f-l 14 ft
tR R # ER * R # # * ERCRERERERIRERERERERIRERER R EP ER o 4-i a VO O i© <© ON vo O H* o cn o o o cn O O O O VO VO ̂ 8* 00 rH o ON H* •H ON ON h- r- ©■ vo Hf ON cn r- H* «© o o <© VO NO vo NO l© cn l© ON vo <© 44 d 44 «8 ft o< cn «n 00 cn ON cn cs ON o V© o o O VO vo vo vo vo 00 VO ^8- O CS «© •W © o cs cs cs ON *8» vo cs ON H* 00 vo ON cs cs ON cn cn cn cn 'ft* ON cn ON Ht* 00 00 ► a H H H CS H cs H H CS H H H cs CS H H H rH H H H H H •O
<M H fH o ft ft
a a •*4 **4 (3 o ER
•H O, P< r** r- en O o O O o O O O o o o O H H vo O O 00 O o H* 44 at o o
H H o H O o o o o O O o o o o o «© <© l© o *8* o cn O o ON w •«4 H B H* vo en o o o o o O O o o o o o 00 00 CS vo o CS o v© o o o ► * o H © © <4 no J3 <44
H U ft © a m <© cn ON Os Hf cs 00 H H* cs vo vo vo r* rH H cn <© f* cn r- 00 cn H V fl t- r- vo cn *©• cs ON cn cn VO ON r- •8 1- <© <© N© vo vo oo 00 ON oo «© o P 44 #»*s. H - Qi © e© ft •© ft •© O •© 44 O
«0 O o P cs o CS ON ON cs 00 H H* cs vo vo H r- H H* cn cn H vo •*4 vo cn «©• cs ON cn cn VO ON *8* vo vo 00 00 ON 00 <© O a rH •M a 4) 6 M m vo oo cn H o •© vo vo rH ON 00 00 ON *© vo o cn vo d © H r* vo vo ©• 00 CS cn o NO 00 00 cn 00 o ON V© cs Hf f* cn «0 a cs H H H rH cs H H cs cn cs H H H H cs H cs cs CS H cs ••4 0» tft «8 <© m n m o V© c© o o o o O O O o O o o o f*
p cs cs o cs r* o t** 1© oo 00 cs 00 CS CS o 00 o rH vo «8 • • • • • • • • • • ft • • • • # • ft • • • • © rH r* vo cs «© o cn cs <© cs rH o vo ON o o vo a VO cn CS CS o H ON cn r* H* H «8* H* rH •© vo •© o oo o HP ON
H H HI H H CS H H CS cs H H rH H rH H cs rH cs rH H
14 « ©
© H
g <44 P o *d
00 00 00 00 00 00 00 00 oo r*. r* r* f* r* t* r* r- vo vo vo
P o d a
0» «4
w o t> (A
,© 44 «0 I O © H- ^8* H- H* H- H* 't «© 1© »© v© <© 1© i© 1© i© <© vo VO VO
d o 44 04
• © N cn H* •© NO r* 00 ON O H cs cn H* V© vo r^ 00 ON o rH CS cn z «© vo v) m vn n <© v© VO VO vo vo vo vo vo VO vo vo
Table
3(c)
104
«H ri O B
a 6 o
6 -r4
1* # * * * * * # O * * * * * |R # P * #*****#*# o U 44 t*» H o OS os 00 ri 40 NO CS CS en 00 en ri en en Os O O NO 00 ri es os 40 vo en •*4 a»
o H CS 40- 00 00 CO OS 40 ri 00 40 en 00 NO ri en vo en en o 00 Os ri o 00 40» ri t**
B a CS i0* 00 00 00 CS 40 CS ri CS o 40 en ri 40» 40 ri vo CS en 40 ri t'¬ r— 00 t- o\ r- •*4 >
B «
14 B
h* 40» ri
s* ri
CO ri
CO ri o ri
ri ri
t*- 40* CS CS es es en es CS es es CS CS es es ri 40 S* B B O *o a
s
«M O 8
a o u
^4 B
(RtRaRtRtR#£tRtRlRlRtRlR^t£t£l£#^^ ********
o 6 r- ▼H O os OS 00 ri 40 •O ri m es es en ri OS 40» 40 «o *4» so 40 40 CS NO «o 4o en •H 4-» a
•H +J
H CS 00 90 CO OS 40 CS «H 40 40* en <o ri Os *40* vo 40 Os 00 es 00 r- en 0 W h 00 « Pt CS 00 00 00 CS 40 CS VO ri t'¬ CS OS ri 40" CS ^r vo O en 40 s- 00 es vo r- H h •H o O r- *0* CO CO O ri r* 4* en es en S* en 40 en en en es es en es en es 40 <S 40 ► 8 ri ri ri ri ri ri
*o
r «M •■i ri O B B
8 8 a o
•*■4 44
•H 44
* # tR tR tR |R tR tR * ïRlRtRtRfcRlRlPlR * «RtRcRtRcR^^tRtR •H o« Qt o O O O O O O o ri 40 CS Os O o 40* Os ri 40 vo O NO r- ON 00 o o 4-» O O o O O O O O O o vo 00 vo 40 O o Os 40 ri r-* r- o o en "0* es O ri O •H U 8 o o o O O o O o en vo 40 en ri o o ri 00 vo o o o ri 40 NO O ri O ► B O ri ri ri ri « O 14 •o a 44
H 14 B B 8 NO CS CS 40 40 NO VO 40 ri Os Os 40 ri ri en 00 00 O vo "t vo CS vo o O OS CS O •*4 O t- 40 4* 40* 40 vo 00 o vo o l- en O es 40 CS 40* h* vo r- oo es 00 es ri ri en a 44 ri « CS en ri es CS CS ri CS CS ri CS ri ri es ri CS ri CS ri S P< B «o
4-» O
O o a
«H VO CS cS 40 VO VO 40 4t CS es r- ri CS 40 00 CS en ’B* 40 OS Os 00 1 t- 1 O t"- CS e O 40 »s* 40 vo 00 os 40 OS 40 o O CS «O ri CS vo vo t*» vo ri t-» 1 o ri ON en
44 B
ri ri CS CS ri es CS CS ri CS CS ri CS ri ri CS ri i CS ri ri ri
N B 6 H OS ri 40 os NO «H NO 4* 40* OS 40 os 40 Os OS «0* 40 r- os 40 CS ri es Os en 4o oo a B H ri 40 CS CO r- r- O 40 OS en o O ri 40 ri CS es 40* en s* O 40 r- r- «o o 60 •*4 10
8 CS CS ri ri ri ri ri en en en es en en en CS en en es en es es en es es ri es es
<0 « o en r- O CO O en o O en O r-* en en en ri os 00 00 Os •o VO OS NO O a
«i V) 00 ri 40 CO 40 NO h* en NO r* o en 00 ri en oo ri es 40 ri O O
B CS 40 Os t*» CO CS NO S o es OS 00 r- ri en en es C-* Os vo es vo ri CS OS 40 8 00 CS O ri 40* 40* en t*- ri 00 00 os *0* os O ri CS ri CS 00 en VO vo en os ri ri ri ri ri ri ri CS en en CS es CS es es es en es en es es es es es ri eS ri
»i B B
B H •o 44
Q a tjjj NO NO NO NO NO 6
6
40 40 40 40 40 40 40 40 40 40 40 4 4 40* S' 40 4 4 4
B a o a 8
40 *4
m O o 40 ,o 44 40
P O B NO NO NO NO VO NO NO 40 40 40 40 40 40 40 40 40 40 NO vo NO NO NO NO NO NO NO NO
a B O
ri ri ri ri ri ri ri ri ri ri ri ri ri ri ri ri ri ri ri ri
U p«
# O 40 NO t** 00 OS O ri CS en 4* 40 NO f- 00 os O ri CS en 40 40 NO t* 00 OS O Z r*» r- r- r*» 00 00 00 00 00 00 00 00 00 00 Os OS Os os os OS OS OS OS OS O
ri
Table
3(d)