+ All Categories
Home > Documents > The prize-collecting generalized minimum spanning tree...

The prize-collecting generalized minimum spanning tree...

Date post: 22-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
25
J Heuristics (2008) 14: 69–93 DOI 10.1007/s10732-007-9027-1 The prize-collecting generalized minimum spanning tree problem Bruce Golden · S. Raghavan · Daliborka Stanojevi´ c Received: 31 December 2005 / Revised: 12 October 2006 / Accepted: 16 October 2006 / Published online: 15 May 2007 © Springer Science+Business Media, LLC 2007 Abstract We introduce the prize-collecting generalized minimum spanning tree problem. In this problem a network of node clusters needs to be connected via a tree architecture using exactly one node per cluster. Nodes in each cluster com- pete by offering a payment for selection. This problem is NP-hard, and we describe several heuristic strategies, including local search and a genetic algorithm. Further, we present a simple and computationally efficient branch-and-cut algorithm. Our computational study indicates that our branch-and-cut algorithm finds optimal so- lutions for networks with up to 200 nodes within two hours of CPU time, while the heuristic search procedures rapidly find near-optimal solutions for all of the test in- stances. Keywords Networks · Heuristics · Local search · Genetic algorithms · Branch-and-cut In the prize-collecting generalized minimum spanning tree (PCGMST) problem, which arises in the design of regional telecommunications networks, a set of regions needs to be connected by a minimum cost tree structure and, for that purpose, one gateway site needs to be selected out of a set of candidate sites from each region. The competing sites in each region offer a monetary compensation, or a “prize,” if selected as the gateway node for their region. The objective is to minimize the total cost of links used to connect the regions offset by the total sum of prizes collected from gateway sites selected for the design. Examples of providing a monetary compensation for selection into a telecom- munication network arise in many real-world contexts. For example, in the design of B. Golden · S. Raghavan ( ) · D. Stanojevi´ c The Robert H. Smith School of Business, University of Maryland, College Park, MD 20742-1815, USA e-mail: [email protected]
Transcript
Page 1: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

J Heuristics (2008) 14: 69–93DOI 10.1007/s10732-007-9027-1

The prize-collecting generalized minimum spanningtree problem

Bruce Golden · S. Raghavan · Daliborka Stanojevic

Received: 31 December 2005 / Revised: 12 October 2006 /Accepted: 16 October 2006 / Published online: 15 May 2007© Springer Science+Business Media, LLC 2007

Abstract We introduce the prize-collecting generalized minimum spanning treeproblem. In this problem a network of node clusters needs to be connected via atree architecture using exactly one node per cluster. Nodes in each cluster com-pete by offering a payment for selection. This problem is NP-hard, and we describeseveral heuristic strategies, including local search and a genetic algorithm. Further,we present a simple and computationally efficient branch-and-cut algorithm. Ourcomputational study indicates that our branch-and-cut algorithm finds optimal so-lutions for networks with up to 200 nodes within two hours of CPU time, while theheuristic search procedures rapidly find near-optimal solutions for all of the test in-stances.

Keywords Networks · Heuristics · Local search · Genetic algorithms ·Branch-and-cut

In the prize-collecting generalized minimum spanning tree (PCGMST) problem,which arises in the design of regional telecommunications networks, a set of regionsneeds to be connected by a minimum cost tree structure and, for that purpose, onegateway site needs to be selected out of a set of candidate sites from each region.The competing sites in each region offer a monetary compensation, or a “prize,” ifselected as the gateway node for their region. The objective is to minimize the totalcost of links used to connect the regions offset by the total sum of prizes collectedfrom gateway sites selected for the design.

Examples of providing a monetary compensation for selection into a telecom-munication network arise in many real-world contexts. For example, in the design of

B. Golden · S. Raghavan (�) · D. StanojevicThe Robert H. Smith School of Business, University of Maryland, College Park,MD 20742-1815, USAe-mail: [email protected]

Page 2: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

70 B. Golden et al.

undersea cable networks connecting different continents, not all countries or cities en-route can be directly connected to the undersea cable network. This is due to the verysignificant cost of connecting a location to a deep sea cable network. Consequently,planners of these undersea cable networks usually designate that one location will beselected from each of a specified set of regions that the network traverses. Given thepotential monetary benefits associated with being a location that is directly connectedto a transcontinental fiber-optic network with significant economic benefits it is notuncommon for cities or countries to vie against each other for selection as a locationon this network. These monetary incentives are usually in the form of tax credits andrebates to the builder or operator of the telecommunications network.

In mathematical terms we are given an undirected graph G = (V ,E), with nodeset V and edge set E, with a cost vector c ∈ R

|E|+ defined on the set of edges E, and

a prize vector p ∈ R|V |+ defined on the set of nodes V . We are also given a partition

of the node set V1, . . . , VK (i.e., Vi ∩ Vj = ∅, if i �= j and⋃K

k=1 Vk = V ). We needto find a minimum cost tree spanning exactly one node from each set in the partition,where the cost of the tree is defined as the total cost of the edges used for the treeminus the sum of the prizes corresponding to the nodes selected for the tree.

When all the prizes are equal to zero (or equivalently are exactly equal to eachother within a node set in the partition), this problem corresponds to the generalizedminimum spanning tree problem, that has been studied recently by several groupsof researchers (see Feremans et al. 2004; Golden et al. 2005; Pop 2004). Since theGMST problem is NP-hard, it implies (by restriction) that the PCGMST problem isalso NP-hard.

In this paper we present several polynomial-time heuristics for the PCGMST prob-lem, and discuss two improvement strategies that significantly enhance the perfor-mance of these algorithms. We also adapt two heuristic search procedures—localsearch and a genetic algorithm—that we designed for the GMST problem (see Goldenet al. 2005) to the PCGMST problem. These heuristic search procedures can be usedto obtain high-quality solutions in large networks.

We also present a simple and computationally efficient branch-and-cut solutionprocedure for this problem. This procedure is very easy to implement and utilizessimple depth-first search at the integer incumbent nodes of the branch-and-bound treeto identify violated cuts. We compare its performance with two different variationsof an exact procedure proposed by Pop (2004) for the GMST problem, and showthat our algorithm provides better bounds for the problem. We show on a large setof instances that this procedure can be used to find optimal solutions in networkswith random edge costs with up to 200 nodes and up to 40 clusters (for the rest ofthis paper we will refer to a node set in the partition as a cluster) within two hoursof CPU time. On networks with Euclidean edge costs this procedure can be used tofind optimal solutions in networks with up to 125 nodes within two hours of CPUtime.

Our computational testing indicates that this branch-and-cut algorithm is sensitiveto the relative values of edge costs and node prizes. When the node prizes have asmaller contribution to the objective function (compared to the contribution of edgecosts), our branch-and-cut procedure finds optimal solutions in 130 out of 169 testinstances within a two hour CPU time limit. On the other hand, when the node prizes

Page 3: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

The prize-collecting generalized minimum spanning tree problem 71

have a higher contribution to the objective function, our branch-and-cut algorithmfinds the optimal solutions in 166 out of 169 test instances within a two hour timelimit. The performance of the heuristic search procedures is quite remarkable. Specif-ically, both local search and the genetic algorithm find the optimal solution in all ofthe 296 test instances where the optimal solution is known (from the branch-and-cutprocedure)!

The rest of this paper is organized as follows. In Sect. 1 we discuss related work onthe GMST problem. In Sect. 2, we discuss heuristic strategies for the PCGMST prob-lem. We propose a lower bounding procedure and several polynomial-time heuristicsfor the PCGMST problem. We show that tailored repetitions of these heuristics pro-vide a significant improvement in the quality of their solutions. In Sect. 2, we alsoadapt two heuristic procedures that we previously developed—local search and agenetic algorithm to the PCGMST problem. In Sect. 3, we review the mathemati-cal formulation proposed by Pop (2004) for the GMST problem and discuss someimportant properties of this formulation that are relevant to our branch-and-cut algo-rithm. Section 4 discusses two exact solution procedures based on the mathematicalformulation presented in Sect. 3. First, Sect. 4.1 explains the rooting procedure pro-posed in Pop (2004), and then compares the performance of two different variationsof the rooting procedure. Next, in Sect. 4.2, we propose a new branch-and-cut algo-rithm for the PCGMST problem. We discuss specific choices that need to be madein this algorithm and computationally compare the performance of two different ver-sions of this procedure. In Sect. 5, we compare our two heuristic search procedureswith the branch-and-cut algorithm. Finally, in Sect. 6, we provide concluding re-marks.

1 Literature review

To our knowledge, we are the first researchers to consider the prize collecting variantof the GMST problem. We will discuss some of the past work done on the GMSTproblem.

Several variants of the GMST problem have been studied in the literature. The ver-sion studied in this paper was introduced by Myung et al. (1995), who have shownthat the GMST problem is NP-hard. Myung et al. (1995) have also developed a dual-ascent based branch-and-bound procedure that was used to solve problems in net-works with up to 100 nodes and 4,500 edges.

Feremans (2001) and Feremans et al. (2002, 2004) presented several different for-mulations for the GMST problem and proposed a specialized branch-and-cut algo-rithm. In the computational study performed in Feremans (2001), this algorithm wasused to find optimal solutions for the GMST problem in networks with random edgecosts with up to 200 nodes. The same procedure provided optimal solutions for theGMST problem in the networks with edge costs satisfying the triangle inequality andwith up to 160 nodes. Pop (2004) proposed a new mathematical formulation for theGMST problem and used it to develop a new exact procedure that can be viewed as aspecial form of delayed row and column generation. We will discuss this formulationand exact solution procedure in Sects. 3 and 4. Golden et al. (2005) proposed two fast,high-quality metaheuristic procedures for the GMST problem. Their computational

Page 4: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

72 B. Golden et al.

study shows that these procedures provide optimal solutions for most of the problemsin the large set of test instances.

Other versions of the GMST problem found in the literature are closely relatedto the more extensively studied Group Steiner Problem (GSP). In this sense, the firstversion of the GMST problem was introduced by Cockayne and Melzak (1968). TheGSP studied by Cockayne and Melzak (1968) requires the design of a tree structurespanning at least one node from each cluster of nodes. Additionally, there may exista set of Steiner nodes that does not belong to any of the clusters, but can be used forthe tree design.

Recently, Duin and Voß (2004) have pointed out that when a specific GMST prob-lem fits the framework of the GSP, one can use the transformation of the GSP tothe well-studied undirected Steiner problem in graphs (SPG) to model the GMSTproblem. In their computational study, Duin and Voß (2004) found that two special-ized SPG heuristics, Pilot-Rush and Pilot-Drop (originally defined in Duin and Voß(1999)), provide good results for this special case of the GMST problem. On a setof problems with Euclidean, random, and rectilinear distances in networks with 100to 400 nodes, the Pilot-Drop procedure provided solutions that were on average lessthan 0.3% from optimality and with a maximum gap of 3.4% from optimality.

The at least version of the GMST problem (which is similar to the GSP exceptthere are no Steiner nodes) is identical to the GMST problem, except that the treemust span at least one (instead of exactly one) node from each cluster. For theseproblems Duin and Voß showed that an exact SPG solver similar to the one devel-oped by Duin (1993) significantly outperformed a genetic algorithm (GA) developedby Dror et al. (2000), both in terms of the solution quality and computational effort.On a set of 20 problems, the GA developed by Dror et al. (2000) provided solutionsthat were on average 6.53% from optimality, while the exact SPG procedure pro-vided optimal solutions for all test instances within very short CPU times. Shyu etal. (2003) developed an ant colony approach that provides comparable results to theGA developed by Dror et al. It also takes less CPU time than the GA developed byDror et al. Recently, Haouari et al. (2005) proposed an exact branch-and-bound al-gorithm for the at least version of the GMST problem. The proposed algorithm wascombined with a specialized preprocessing algorithm, and was used to find optimalsolutions for problems in networks with up to 250 nodes, 1000 edges, and 25 clusterswithin three hours of CPU time.

2 Heuristic procedures for the PCGMST problem

Given the fact that the PCGMST problem is a generalization of the polynomiallysolvable minimum spanning tree (MST) problem, it is natural to raise a questionregarding the application of existing MST algorithms, as heuristics, to the PCGMSTproblem. In our previous work (Golden et al. 2005) we addressed this question for theGMST problem, and presented a spanning tree lower bound and three heuristic pro-cedures based on three well-known MST algorithms—Kruskal’s, Prim’s, and Sollin’s(see the text by Ahuja et al. 1993, for a nice description of the three algorithms, theirimplementations, and complexity).

Page 5: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

The prize-collecting generalized minimum spanning tree problem 73

The procedures for the GMST may be applied to the PCGMST, by ignoring thenode prizes, and subtracting them ex-post from the solutions (i.e., by subtracting thenode prizes for the nodes in the tree solution). However, if the prizes are consideredex-post, the three heuristic procedures provide solutions of extremely poor quality.In particular, on a set of TSPLIB instances (these instances are described in Sect. 5)for which the optimal solution is known, we found that the heuristic solutions wereon average 40% from optimality. On the other hand when the node prizes are zero,these three heuristic procedures provided solutions that were on average 15% fromoptimality. This suggests that for the PCGMST problem, the node prizes should beaccounted for within the heuristic procedures instead of ex-post.

In this section, we show how to account for node prizes, and describe a spanningtree lower bound procedure. We also describe how to adapt Kruskal’s, Prim’s, andSollin’s algorithm to the PCGMST problem while accounting for the node prizes.We then discuss a repetition strategy that significantly improves the performance ofthe proposed heuristics while maintaining the polynomial running time. Finally, wedescribe two fast and efficient metaheuristic procedures—local search and a geneticalgorithm—for the PCGMST problem.

2.1 Spanning tree lower bound

A very simple lower bound for a given PCGMST problem can be obtained throughthe straightforward application of Kruskal’s algorithm for the MST. This proceduresolves the MST problem on a modified graph where each cluster is contracted into asingle node (the prize of this node is set to the highest prize offered by nodes fromthe same cluster) and multiple edges between pairs of clusters are replaced by theone of minimum cost. To calculate this lower bound, it is not necessary to contractthe graph. Instead, it is fairly easy to modify Kruskal’s algorithm to accomplish thesame. The running time of the lower bound algorithm is identical to Kruskal’s and isO(|E| + |V | log |V |). (Observe, that if we ignore the node prizes or node prizes arezero, we obtain the spanning tree lower bound for the GMST problem (Golden et al.2005).)

2.2 Polynomial-time heuristics

We now describe how to adapt Kruskal’s, Prim’s, and Sollin’s algorithm for the MSTproblem as heuristics to the PCGMST problem while taking into account the nodeprizes within the algorithm. We then consider a heuristic polynomial-time repetitionframework called the pilot method that significantly improves the performance ofthese adaptations.

The adaptation of Kruskal’s algorithm builds the PCGMST in a similar fashion toKruskal’s algorithm for the MST, with two exceptions. First, we need to make surethat exactly one node in each cluster is selected. And, second, we need to take intoaccount prizes offered by the selected nodes.

To take into account the prizes we apply the following strategy in all three adap-tations. Initially, we modify the cost of all edges by subtracting the prizes of the endpoints of an edge from its actual cost (note that the edge cost defined this way more

Page 6: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

74 B. Golden et al.

precisely reflects the impact of addition of a particular edge on the objective func-tion). In the subsequent steps of our algorithms, we update the edge costs whenever anew node is added to the tree that is being built. Once a node is included in the tree,we add the weight of this node to the cost of all edges that have this node as one of itsend points. This modification of edge costs ensures that weights of the nodes alreadyin the tree are not considered in the edge selection process (note that once the node isselected for the generalized spanning tree (GST), it is only the actual edge cost thatmakes a difference in the objective function).

In other words, in each iteration of Kruskal’s adaptation (we call this adaptationPCKH) for the PCGMST problem, it finds the minimum cost edge (observe the edgecosts can change in each iteration) such that its addition to the tree does not create acycle among the clusters, and that at most one node from each cluster is selected forthe final network design. This is also the bottleneck step in Kruskal’s adaptation. Thisstep takes O(|E|) time, and, since we add K − 1 edges, the running time of PCKH isO(|E|K).

The adaptation of Prim’s algorithm for the PCGMST problem (referred to asPCPH) requires the selection of a starting node from which the tree is grown. Oncethe starting node is selected, we proceed in a straightforward manner identical toPrim’s algorithm for the MST problem while taking care that exactly one node isselected from each cluster. That is, in each step we add the minimum cost edge fromthe nodes in the tree being grown to the nodes not in the tree, taking care that exactlyone node is selected from each cluster. Additionally, we need to make sure that thenode prizes are taken into account (as explained above). Observe that the solutionprovided by Prim’s adaptation may depend on the node selected as the starting node(we refer to this node as the root node). In our implementation of this algorithm weselect the root node randomly.

The running time of Prim’s adaptation is identical to Prim’s algorithm for the MSTplus the time needed to make updates of edge costs. Since we will update an edge costonly once, this takes O(|E|) time. So, the running time of Prim’s adaptation remainsO(|E| + |V | log |V |).

Sollin’s algorithm for the MST starts with each node representing a tree. In eachiteration of Sollin’s algorithm, it identifies, for each tree in the partial solution, theminimum cost edge emanating from the tree. It then adds these edges to the partialsolution (thus merging trees to build larger trees, and reducing the number of trees inthe partial solution). The iterations of the algorithm are repeated until a spanning treeis obtained. We adapt Sollin’s algorithm to the PCGMST as follows (we refer to thisadaptation as PCSH). In each iteration of the algorithm, for each tree (or cluster, if noedge in the forest constructed so far is incident to any node in the cluster) select as acandidate edge the minimum cost edge out of the tree (or cluster) whose addition isfeasible (i.e., adding the edge will not result in multiple nodes from a cluster in thepartial solution). Ties between edges, for selection as candidate edge, are broken bychoosing the edge that appears first in the sorted order. Consider the selected edgesin sorted order and add an edge to the partial solution if its addition is feasible. (Notethat, although the edges were feasible when selected, once we start adding edges tothe partial solution, a selected edge may no longer be feasible for addition to thepartial solution.) We repeat these iterations, updating edge costs as described earlier,until a feasible generalized spanning tree is obtained.

Page 7: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

The prize-collecting generalized minimum spanning tree problem 75

Each iteration of Sollin’s adaptation takes O(|E| + K2 logK) time as we gothrough the list of |E| edges to select the minimum cost feasible edge out of eachtree (or cluster), sort them, and then consider at most K − 1 edges to add in an itera-tion. The time for making edge updates at the end of the iteration is O(|E|). In eachiteration at least one edge is added, since it is always feasible to add the first selectededge. Thus there are at most K − 1 iterations. Consequently, the overall running timeof Sollin’s adaptation is O(|E|K + K2 logK).

Note that the proposed heuristics are guaranteed to return a feasible solution onlywhen the underlying graph is complete (in terms of edges between all pairs of nodes).Ideally, we would like to add an edge in our adaptations only if the addition of theedge does not cause the heuristic to fail (i.e., make the problem infeasible). In order todo this, we need to answer the following question. Given a graph, clusters, and edges,does it contain a feasible GST? (When we add an edge, we have selected a node ina cluster. Thus, we can delete all other nodes in the cluster and edges emanatingfrom them, and ask the question: does the modified graph contain a feasible GST?)Unfortunately, in general, from the transformation given by Myung et al. (1995),even the recognition of whether a graph contains a generalized spanning tree is NP-complete. Consequently, we handle infeasibility as follows.

In the case of Kruskal’s adaptation, if the algorithm results in an infeasible solu-tion, we propose that the algorithm remove the most expensive edge added to the treefrom the problem and run the heuristic again. We repeat this procedure until either afeasible GST is obtained or one cluster has no edges out of it. In the case of Prim’sand Sollin’s adaptations, if the algorithms result in an infeasible solution, we deletethe edge that was most recently added to the tree, and run the algorithm again.

When feasibility is an issue, a crude running-time bound for Kruskal’s adapta-tion is O(|E|2K) (since we run Kruskal’s at most |E| times). However, this boundis somewhat misleading, because when the graph is dense (i.e., |E| is large), infea-sibility is very unlikely. Furthermore, observe that in Kruskal’s algorithm the mostexpensive edge in the tree constructed is the last edge that was added to it. Thus, in-stead of building a tree from scratch, we may simply delete the last edge added andcontinue to build the tree from there. Similarly, arguing as for Kruskal’s adaptation,a crude running-time bound for Prim’s adaptation, when feasibility is an issue, isO(|E|2 +|E||V | log |V |). Again, for the very same reason as in the case of Kruskal’sadaptation, this is somewhat misleading. Also, as in the case for Kruskal’s adapta-tion, we can delete the most recently added edge and continue to rebuild a tree fromthere. Arguing similarly, when feasibility is an issue, a crude running-time bound forSollin’s adaptation is O(|E|2K + |E|K2 logK).

The quality of the solutions provided by the three adaptations PCKH, PCPH, andPCSH, are shown in Table 1. Table 1(a) shows results for the PCGMST problem withnode weights set to zero (so, these problems correspond to the GMST problem), andTable 1(b) shows results for the PCGMST problem with the integer node weightsrandomly selected from the interval [0,10]. Note, the number of instances in thetwo tables differs because they only include results for problems where the optimalsolution is known. The results in Table 1(b) indicate that accounting for the prizeswithin the adaptations (instead of ex-post) significantly improves the quality of results(recall that at the start of this section we found that accounting for the prizes ex-postresults in average gaps on the order of 40%). On the other hand, the average gap of

Page 8: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

76 B. Golden et al.

Table 1 Comparison of Kruskal’s, Prim’s, and Sollin’s adaptation and the spanning tree lower bound forthe PCGMST problem: (a) Zero node prizes; (b) Integer node prizes randomly selected in the range [0,10]

Clustering Number Spanning tree Upper bound procedures

type instances Lower bound PCKH PCPH PCSH

Avg error Avg error Avg error Avg error

(a)

Center 37 37.13% 8.27% 13.10% 7.99%

Grid, μ = 3 28 22.22% 5.05% 7.94% 5.22%

Grid, μ = 5 28 32.95% 10.65% 12.41% 10.70%

Grid, μ = 7 28 42.34% 15.87% 17.18% 15.64%

Grid, μ = 10 29 44.39% 15.90% 18.56% 15.67%

Overall 150 35.94% 11.00% 13.83% 10.89%

(b)

Center 33 74.18% 12.06% 25.42% 17.64%

Grid, μ = 3 22 47.84% 7.95% 13.71% 10.89%

Grid, μ = 5 22 68.68% 12.59% 22.16% 14.21%

Grid, μ = 7 26 101.10% 21.33% 32.12% 23.28%

Grid, μ = 10 27 74.39% 18.98% 23.76% 19.93%

Overall 130 74.22% 14.75% 23.88% 17.52%

the PCKH, PCPH, and PCSH is still quite high and probably too crude to be usedfor practical purposes. The same holds for our simple spanning tree lower bound forwhich we also report results in Table 1. This lower bound is very weak, deteriorateswith non-zero node weights, and is probably not a good choice for obtaining lowerbounds for the PCGMST problem.

The poor performance of the proposed upper bound heuristics is not surprising,given the fact that edge selection is used to direct the search. This kind of search canbe inefficient for the PCGMST problem since when we select an edge, we automat-ically select (“fix”) nodes that will be used for the two clusters incident to the edgeselected. This motivates the question of whether a different type of search that wouldtake into account node selection, would perform better.

We now describe an extensive search strategy that significantly improves our upperbound heuristics by progressively fixing all the nodes in the GST. First, let T be a setof nodes that are fixed for the initial GST design, such that 1 ≤ |T | ≤ K . Next, letSt

T represent a solution obtained by fixing all the nodes from the set T (t = |T |)for the GST design and then applying one of the three upper bound heuristics. (Thesteps for finding a solution St

T for a given set T are specific to each of our upperbound heuristics, and will be explained later in this section.) Also, let T ∗ indicate aset of nodes selected for the final GST design, and V ∗ indicate a set of all nodes thatbelong to the same clusters as the nodes in T ∗. The search is started with empty setsT and T ∗, and one of our upper bound heuristics is run |V | times, each time startingwith a different node selected for the design. The node i∗ (i ∈ V ), that provides theminimum cost solution is then fixed for the final GST (i.e., it is added to the set T ∗).In order to add another node to the set T ∗, we first set T = T ∗ and then run the sameupper bound heuristic |V \ V ∗| times, each time with a different node i (i ∈ V \ V ∗)

Page 9: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

The prize-collecting generalized minimum spanning tree problem 77

Fig. 1 Pilot method for the upper bound heuristics for the PCGMST problem

added to the set T (i.e., each time we set T = T ∗ ∪ {i}, fix the nodes in the set T

for the GST design, and apply the same upper bound heuristic). We continue thisprocedure until K nodes are added to the set T ∗. This search paradigm is called thepilot method (see Duin and Voß 1999).

An example of the steps of our implementation of the pilot method for the pro-posed upper bound heuristics is illustrated in Fig. 1. In this example, we are givenan 8-node network with 4 clusters, each containing 2 nodes. In the first iterationof the pilot method we run one of our upper bound heuristics |V | times. The nodethat provides the best solution (node 3 in this example) is then added to set T ∗,i.e., it is fixed for the final GST design. In the next iteration of the pilot methodthe same upper bound heuristic is run |V \ V ∗| times. As in the previous iteration,we select the node that provides the best solution (in this case node 6) and add itto the set T ∗. The pilot method is continued until K nodes are fixed for the GSTdesign.

The procedure used to obtain solutions StT at any given iteration of the pilot

method is as follows. In the case of Kruskal’s and Sollin’s adaptation, we simplyfix the nodes from the set T for the GST design and apply either PCKH or PCSHrespectively. In other words, we eliminate all nodes belonging to the same clustersas nodes that are fixed for the design and run our upper bound heuristics without anymodifications. However, since each step of Prim’s algorithm adds an edge to a par-tially completed tree, we need to implement a slightly different strategy. Specifically,we want to make sure that the new node added to the set T ∗ at the end of a giveniteration of the pilot method is connected to the tree that is being built (in other wordswe want to make sure it is feasible to construct a tree on the set T ∗ in each step ofthe pilot method). To ensure this, we first connect the candidate node i ∈ V \ V ∗that is being considered to the tree on T ∗ using the least-cost edge available beforeproceeding with Prim’s adaptation.

The results of the pilot method applied to each of our upper bound heuristics areshown in Table 2. We can see that solutions obtained using the pilot method provide

Page 10: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

78 B. Golden et al.

Table 2 Comparison of Pilot method for PCKH, PCPH, and PCSH. (a) Zero node prizes. (b) Integer nodeprizes randomly selected in the range [0,10]

Clustering Number PM-PCKH PM-PCPH PM-PCSH

type instances Average error Average error Average error

(a)

Center 37 0.80% 1.53% 0.83%

Grid, μ = 3 28 0.42% 0.83% 0.48%

Grid, μ = 5 28 0.71% 1.30% 0.65%

Grid, μ = 7 28 1.56% 1.42% 1.74%

Grid, μ = 10 29 1.49% 1.42% 1.71%

Overall 150 0.99% 1.31% 1.07%

(b)

Center 33 1.75% 1.18% 3.97%

Grid, μ = 3 22 1.21% 0.62% 2.30%

Grid, μ = 5 22 1.43% 1.85% 1.59%

Grid, μ = 7 26 0.91% 1.63% 1.27%

Grid, μ = 10 27 1.25% 1.99% 1.70%

Overall 130 1.33% 1.46% 2.27%

upper bounds that are less than 2% from optimality in the case of zero node weights,and less than 3% from optimality in the case of non-zero node weights. This is asignificant improvement when compared to the quality of the upper bounds obtainedby the initial adaptations (PCKH, PCPH, and PCSH) of the MST algorithms. Also,an interesting observation is that the pilot method for our upper bound heuristics runsin polynomial time (a very crude running time of the pilot method for each of ourupper bound heuristics is |V |K times their running times). The actual average CPUtimes of the PM-PCKH, PM-PCPH, and PM-PCSH procedures were 0.84, 0.34, and66.29 seconds respectively in the case of zero node weights, and 101.22, 0.23, and49.30 seconds respectively in the case of non-zero weights. (We note that the longerCPU times of the PM-PCKH and PM-PCSH procedures compared to the PM-PCPHprocedure are due to our code. We believe that the running times for these procedurescan be improved with more efficient codes.)

2.3 Metaheuristic procedures

We now describe two fast and efficient metaheuristic procedures for the PCGMSTproblem—local search and a genetic algorithm.

The local search (LS) procedure we developed in Golden et al. (2005) for theGMST problem may be directly applied to the PCGMST problem. It differs onlyin that an additional objective function term for node prizes needs to be taken intoaccount here. Briefly, our local search procedure works as follows. It is an iterative1-opt procedure. It visits clusters in a wraparound fashion following a randomly de-fined order. In each cluster visit, the neighborhood of a feasible generalized spanningtree is explored by examining all feasible trees obtained by replacing the node (inthe tree) from the current cluster. In other words, a GST of least cost (the cost of

Page 11: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

The prize-collecting generalized minimum spanning tree problem 79

Fig. 2 Steps of our Genetic Algorithm for the PCGMST problem

the tree is the sum of the edge costs minus the rewards on the nodes) is found bytrying to use every node from that cluster, while fixing nodes in other clusters. Thelocal search procedure continues with visiting clusters until no further improvementis possible. The procedure is applied to a pre-specified number of starting solutions(denoted by t).

We now present a genetic algorithm (GA) for the PCGMST problem that is similarto the one we developed in Golden et al. (2005), with a few differences in the initialpopulation and genetic operators applied. Figure 2 shows an outline of our geneticalgorithm. The initial population is created by randomly generating a pre-specifiednumber of feasible GSTs. Before adding a new chromosome to the population P(0),we apply local search and add the resulting chromosome as a new population mem-ber. Within each generation t, new chromosomes are created from population P(t −1)

using two genetic operators: local search enhanced crossover and random mutation.The total number of offspring created using these operators is equal to the numberof chromosomes in the population P(t − 1), with αP (t − 1) offspring created us-ing crossover, and βP (t − 1) offspring created using mutation (fractions α and β

are experimentally determined). Once the pre-specified number of offspring is gen-erated, a subset of chromosomes is selected to be carried over to the next generation.The algorithm terminates when the termination condition, a pre-specified number ofgenerations that we denote NUMGENS, is met.

We now provide some more details of the genetic algorithm. A chromosome isrepresented by an array of size K , so that the gene values correspond to the nodesselected for the generalized spanning tree. The initial population is generated by ran-dom selection of nodes for each cluster. If possible, a minimum spanning tree isbuilt over the selected nodes. Otherwise, the chromosome is discarded, since it rep-resents an infeasible solution. Each feasible minimum spanning tree built in this wayis then used as input for the local search procedure. The resulting solution is thenadded to the initial population as a new chromosome. We apply a standard one-pointcrossover operation (see Fig. 3 for an example). As in the initial population, onlythe feasible solutions are accepted. Each child chromosome created using this op-erator is used as input to the local search procedure, and the resulting chromosomeis added to the population. A random mutation operator randomly selects a clusterto be modified and replaces its current node by another, randomly selected, node

Page 12: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

80 B. Golden et al.

Fig. 3 An example of one-pointcrossover operator

from the same cluster. The new chromosome is accepted if it results in a feasibleGST. In order to maintain diversity of the population, we do not apply local searchto new chromosomes created by random mutation. At the end of each generation, afraction, θ , of the current population is selected to be carried to the next generation,while the remaining chromosomes are discarded. This selection is a combination ofelitism and rank-based selection, where the top 10% of the current population is se-lected using elitism and the remaining 90% is selected using rank-based selection(see Michalewicz 1996).

Our two metaheuristic procedures performed outstandingly. On a set of 296 testproblems where the optimal solution is known both LS and GA found the optimalsolution. We will elaborate on these results in Sect. 5 when we compare them withan exact solution procedure for the PCGMST problem.

3 A compact formulation

The exact procedures that we develop in this paper are based on a compact (i.e., poly-nomial size) mathematical formulation for the GMST problem originally proposed byPop (2004). We will first introduce and motivate this formulation. We will then de-scribe some important properties associated with this formulation that we later use todevelop a simple and efficient branch-and-cut algorithm. While the formulation wasintroduced in the context of the GMST problem, we adapt and present it in the con-text of the PCGMST problem. Since the formulation has node variables to indicatewhether a particular node is selected, this is easily done by modifying the objectivefunction in the formulation.

To motivate the formulation for the PCGMST problem, it is best to first look at thewell-known subtour elimination formulation for the minimum spanning tree (MST)problem from which this formulation is derived. This (submst) formulation can bestated as follows:

Page 13: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

The prize-collecting generalized minimum spanning tree problem 81

Minimize∑

(i,j)∈E

cijuij (1)

subject to:

∀(i,j)∈E

uij = n − 1, (2)

∀(i,j)∈S

uij ≤ |S| − 1, S ⊂ V, |S| > 1, (3)

uij ∈ B1 ∀(i, j) ∈ E. (4)

The linear relaxation of this formulation describes the convex hull of the integerfeasible region (see Magnanti and Wolsey 1995), but has an exponential number ofsubtour elimination constraints (3). It is well known that the separation problem forthe identification of violated subtour elimination constraints (SECs) is a polynomiallysolvable min-cut problem that can be solved by either using one of the well-knowncombinatorial algorithms for the min-cut problem or by solving a linear program.

Martin (1991) shows that when the separation problem can be formulated as apolynomial-size linear program, we can use the dual of the separation problem to ob-tain an equivalent (in terms of the linear relaxation) compact (i.e., polynomial sized)formulation. In the case of the submst formulation, use of this approach leads to anew formulation (ref-submst) that replaces constraint (3) by the following three con-straints:

wkij + wkji = uij ∀(i, j) ∈ E, ∀k ∈ V, (5)∑

j

wkij ≤ 1 ∀k ∈ V, ∀i ∈ V \ k, (6)

j

wkkj = 0 ∀k ∈ V. (7)

Martin (1991) formally establishes the equivalence of the submst and ref-submstformulations. Note, it is easy to see that the ref-submst formulation is valid. In termsof the MST problem, each set of wkij variables for a given k represents a set of vari-ables defining a directed tree rooted at node k, where an individual variable wkij isequal to 1 if node j is predecessor of node i in the directed tree rooted at node k.Constraints (6) and (7), together with (2) and (5), guarantee that each node in thedirected tree rooted at node k has at most one predecessor node, while the node k,by constraint (7), has no predecessors at all. (Note, constraint (6) may also be writtenwith an = sign, instead of a ≤ sign.) Constraints (5) through (7) guarantee that thereare no cycles, since if there was a cycle it would mean that for some root node k,and the corresponding set of variables wkij , there is a node with either two predeces-sors, or a root node has a predecessor, which is not valid by the definition of theseconstraints.

We now discuss the properties of a special type of relaxation of the ref-submstformulation where only a subset of nodes W (W ⊂ V ) is selected to be used as the

Page 14: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

82 B. Golden et al.

root nodes of directed trees. In other words, instead of having variables wkij definedfor each k ∈ V , only the wkij variables for k ∈ W are included in the formulation.We will refer to this type of relaxation of the ref-submst formulation as a p-rootrelaxation, where p indicates the number of nodes used as roots in a given relaxation.

Lemma 1 The linear relaxation of the ref-submst formulation with wkij variablesdefined only for k ∈ W ⊂ V (1 ≤ |W | ≤ |V | − 1), provides an optimal solution for agiven MST problem if the solution is integer feasible to the ref-submst formulation.

Proof It is easy to see that Lemma 1 is true, as any relaxation of the ref-submst notincluding a set of constraints (5–7) is actually equivalent to the submst formulationwithout a set of subtour elimination constraints. And, for the latter relaxation, aninteger solution without cycles always represents the optimal solution. �

Lemma 1 implies that an integer solution of the p-root relaxation of the ref-submstformulation always represents either a tree, or a structure with at least one cycle(note that if the solution is integer but does not define a tree structure, then, byconstraints (3), the solution must contain at least one cycle). So, one way to solvethe original problem is to start with a p-root relaxation, and then add violated con-straints (5–7) (which means adding root nodes to the relaxation). Another approachto solve the original problem is to start with a p-root relaxation, and then use violatedsubtour elimination constraints (3) to eliminate any cycles that may be present in thesolution of the p-root relaxation. While the violated SECs can be identified in poly-nomial time using a min-cut algorithm, identification of the violated constraints (5–7)may not be immediately obvious. However, we show that when the solution of thep-root relaxation is integral the separation problem for constraints (5–7) can be solvedin a straightforward manner.

Proposition 2 If there is a cycle in the integer solution of the p-root relaxation ofthe ref-submst formulation, then nodes k∗ ∈ W (where W is the set of root nodes) cannever be a part of the cycle(s) present.

Proof Observe that if a node k∗ ∈ W was a part of a cycle, it would imply that eitherone of the nodes in the cycle (other than node k∗) has 2 predecessors in the directedtree rooted at node k∗, or that node k∗ has a predecessor in this tree. However, neitheris possible by definition of constraints (5–7) for the set of variables wk∗ij . �

Proposition 2 suggests that violated constraints (5–7) can always be identified asthose corresponding to the directed trees rooted at nodes forming a cycle in the cur-rent integer solution of the p-root relaxation. We now show how these ideas can beapplied to the PCGMST problem.

The subel formulation uses three types of variables to define the connections be-tween clusters in the graph. The first group of variables, are local edge variables uij

that define the use of specific edges between nodes in the graph. The other two groupsof variables, ylr and wklr , on the other hand, define connections between clusters inthe graph. The ylr variables are global edge variables that indicate whether the solu-tion includes an edge directly connecting clusters l and r . Observe that a GST forms a

Page 15: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

The prize-collecting generalized minimum spanning tree problem 83

spanning tree on the graph obtained by shrinking each cluster to a single node. Thus,we would like the global edge variables to form a spanning tree on the clusters. Con-sequently, the wklr variables play the same role as in the ref-submst formulation, butin relation to the global edge variables (i.e., they ensure the global edge variables donot form a cycle on the clusters). More precisely, each variable wklr indicates whethercluster r is a predecessor of cluster l in the directed tree rooted at cluster k (there areK directed trees, one for each of the clusters in the graph). Finally, the formulationcontains node variables zi to indicate whether node i is in the solution.

Subtour elimination formulation (subel) for the PCGMST problem:

Minimize∑

∀(i,j)∈E

cijuij −∑

∀i∈V

pizi (8)

subject to:

i∈Vk

zi = 1 ∀k = 1, . . . ,K, (9)

i∈Vl,j∈Vr

uij = ylr ∀l, r = 1, . . . ,K, l �= r, (10)

j∈Vr

uij ≤ zi ∀r = 1, . . . ,K, ∀i ∈ V \Vr, (11)

(i,j)∈E

uij = K − 1, (12)

yij = wkij + wkji ∀k, i, j = 1, . . . ,K, i �= j, (13)∑

j

wkij = 1 ∀k, i = 1, . . . ,K, i �= k, (14)

wkkj = 0 ∀k, j = 1, . . . ,K, (15)

wkij ≥ 0 ∀k, i, j = 1, . . . ,K, (16)

zi ≥ 0 ∀i ∈ V, (17)

uij ≥ 0 ∀(i, j) ∈ E, (18)

ylr ∈ B1 ∀l, r = 1, . . . , K, l �= r. (19)

Formulation subel represents a straightforward extension of the ref-submst for-mulation, achieved through the introduction of local (i.e., u and z) and global (i.e.,y and w) variables. Pop (2004) shows that the only variables that need to be definedas integer in the subel formulation are the global edge variables ylr . Once the ylr

variables are integer, and form a spanning tree on the clusters, the other variablesare automatically integer. Together, the variables define the GST in the followingway. Constraint (9) ensures that exactly one node is selected from each cluster. Con-straint (10) ensures that an edge connecting two nodes can be used only if the clusters

Page 16: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

84 B. Golden et al.

corresponding to these nodes are connected by a global edge. Constraint (11) guar-antees that node i belonging to cluster Vl can be connected to at most one node inany given cluster Vr (l �= r). Constraint (12) specifies the number of edges that aresupposed to be selected for the GST. Constraint (13) ensures that clusters i and j canbe adjacent in the directed tree rooted at cluster k only if clusters i and j are directlyconnected. Constraints (14) and (15) ensure that, with the exception of a root clus-ter k (k ∈ K), every cluster in the directed tree rooted in the cluster k has exactly onepredecessor cluster.

It is possible to define a p-root relaxation of the subel formulation using a sim-ilar relaxation strategy defined for the ref-submst formulation. In other words, thep-root relaxation of the subel formulation includes only a subset of the wkij variablesthat define directed trees for the pre-specified set of the root clusters W (1 ≤ |W | ≤K − 1). (Note that the directed trees in the p-root relaxation for the PCGMST prob-lem are defined over the clusters of the graph G, while the directed trees in the p-rootrelaxation for the MST problem are defined over the nodes of the graph G.) It is easyto establish that this relaxation has similar properties as the p-root relaxation for theMST problem.

Lemma 3 The p-root relaxation of the subel formulation with the wkij variablesdefined only for k ∈ W ⊂ K (1 ≤ |W | ≤ K − 1), provides an optimal solution to thePCGMST problem if the solution is integer feasible to the subel formulation.

Proof First, observe that we can write the subel formulation using the subtour elimi-nation constraints instead of the constraints (13–16). (We will refer to this new formu-lation as the subel′ formulation for the PCGMST problem.) This directly implies thatany relaxation of the subel formulation that does not include a set of constraints (13–16) is equivalent to the subel′ formulation without a set of subtour elimination con-straints. The rest of the proof is similar to the proof of Lemma 1. �

In an identical fashion to the MST problem, we can argue that if an integer solutionof the p-root relaxation is infeasible to the subel formulation, then there must exista cycle defined by the ylr variables (we call these global cycles since ylr representsglobal edge variables). Further, by definition of constraints (13–16), clusters that areused as roots of directed trees in the p-root relaxation of the subel formulation cannotbelong to global cycles.

4 Exact procedures

In this section we discuss two exact procedures. First, we discuss a simple exactprocedure called the rooting procedure (originally proposed by Pop (2004), for theGMST problem). We examine and test two variations of the rooting procedure todetermine the best strategy while using the rooting procedure. Then, we develop anew (and simple) branch-and-cut algorithm for the PCGMST problem.

Page 17: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

The prize-collecting generalized minimum spanning tree problem 85

Fig. 4 Steps of our implementation of the Rooting Procedure (RPcycle)

4.1 Rooting procedure

This procedure is motivated by the fact that the subel formulation that includes allthe wkij variables and the corresponding constraints may be enormous and difficultto solve. The idea behind the rooting procedure is, therefore, to try to find the optimalsolution by using only a small subset of clusters as root clusters.

The rooting procedure starts with the p-root relaxation of the subel formulationwith a single randomly selected root cluster kr . In the next step, the p-root relaxationthat includes only variables and constraints defining a directed tree from root node kr

is solved. If the solution of this relaxation turns out to be a global tree, then, accordingto Lemma 3, the problem is solved; otherwise, variables and constraints for anotherroot cluster are added (or used to replace existing root clusters) and the relaxation isresolved. This procedure is repeated until the optimal solution is found.

Pop (2004) does not elaborate on the strategy used to add or substitute root clustersin the rooting procedure. We suggest two simple ideas that come to mind when apply-ing the rooting procedure. One is to randomly select clusters and add one root clusterat a time to the p-root relaxation. We refer to this variant of the rooting procedureas RPrand. A more deliberate strategy for the addition of root clusters to the p-rootrelaxation that has a potential to provide better computational results is to examinethe solution to a given p-root relaxation for the existence of global cycles (i.e., cyclesdefined on the global edge variables) and to only add a cluster (or clusters) that is apart of these global cycles. We refer to this procedure as RPcycle. Our implementationof RPcycle is outlined in Fig. 4.

We computationally tested these two variants of the rooting procedure on a set ofgeographical TSPLIB instances. The results confirmed our expectations and showed asignificant difference in the performance of the two versions of the rooting procedure.With a two hour CPU time limit, RPrand did not find the optimal solution in 11 outof 41 instances and, on average, required 2805.16 seconds of CPU time. Also, onaverage, RPrand required 7.34 root clusters to be added to the p-root relaxation inorder to find the optimal solution. RPcycle, on the other hand, did not find the optimalsolution in 9 instances, and required, on average, 2126.34 seconds of CPU time. Theaverage number of root clusters needed to be added to the p-root relaxation in thiscase was 4.09. Additionally, in instances solved to optimality, RPcycle never needed

Page 18: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

86 B. Golden et al.

more than 8 root clusters to find the optimal solution, while RPrand required 25 rootclusters in the worst case. RPcycle also provided better lower bounds than RPrand inseveral instances. In particular, when compared over 11 instances that RPrand did notsolve to optimality, the lower bound provided by RPcycle was on average 3.93% betterthan the one provided by RPrand.

Pop et al. (2006) independently describe the variant that we call RPcycle. Theircomputational results appear to be identical to Pop (2004) where no elaboration on therooting procedure is provided (most of the results in Pop et al. (2006) appear to alsobe in Pop (2004)). In contrast, our experiments explicitly compare the performanceof two different variants of the rooting procedure in the context of the PCGMSTproblem in order to get a better sense of the impact that the choice of the root clustershas on the efficiency of the rooting procedure for the PCGMST problem.

4.2 A simple branch-and-cut algorithm

In Sect. 3 we pointed out that an alternative approach for solving the p-root relaxationof the subel formulation is to progressively add violated subtour elimination con-straints (SECs). In this section, we elaborate on this idea and present a new branch-and-cut algorithm for the PCGMST problem.

First, note that in a given solution of the p-root relaxation violated SECs can bedefined over local edge variables or over global edge variables. Although it is not ap-parent which choice is better, it is clear that the number of SECs defined over globaledge variables can be significantly lower than the number of SECs over local edgevariables. Consequently, we use SECs defined over global edge variables (we callthese global SECs) in our branch-and-cut algorithm. Another important decision thatone needs to make is whether to add SECs at all nodes of the branch-and-bound treeor only at certain, designated, nodes. This issue is very important from an imple-mentation perspective since the complexity of the separation problem may not be thesame at different nodes of the branch-and-bound tree. In the p-root relaxation, theseparation problem for the identification of violated SECs at any node of the branch-and-bound tree is a min-cut problem. At incumbent (integer) nodes, on the otherhand, we can examine the set of edges selected in the solution and identify cyclesusing a simple depth-first search. These cycles immediately point to violated SECs.Consequently, once a cycle is identified, we can add a subtour elimination constraintdefined over the clusters (i.e., a SEC defined over global edge variables) that are partof this cycle.

Besides the advantage of solving a simpler separation problem, the addition ofcuts at incumbent (integer) nodes only may be more effective due to a fewer numberof nodes at which the separation problem needs to be solved. We tested both ideasby defining two branch-and-cut algorithms. The first algorithm, BCAinc, adds SECsonly at incumbent (integer) nodes. The steps of this procedure are outlined in Fig. 5.

The second algorithm, BCAall, solves the min-cut problem at every node of thebranch-and-bound tree, and whenever violated global SECs are found, they are addedto the entire branch-and-bound tree. In our implementation of the BCAall we haveused the min-cut algorithm of Stoer and Wagner (1997) implemented in LEDA 4.5to solve the separation problem. The comparison of these two procedures is providedin Table 3. We can see that the proposed algorithm has comparable performance

Page 19: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

The prize-collecting generalized minimum spanning tree problem 87

Fig. 5 Steps of our branch-and-cut algorithm (BCAinc)

Table 3 Comparison of branch-and-cut algorithms BCAinc and BCAall for the PCGMST problem. (Note:The TSPLIB instances used for these tests were modified by adding randomly generated integer nodeweights in the range [0,10])

Clustering Number of BCAinc BCAall Avg relative difference

type instances BCAinc vs BCAall

Avg. time (sec) Avg. time (sec) UB LB

Center 41 1606.23 1588.04 0.00% −0.03%

Grid, μ = 3 32 2362.58 2177.03 −0.07% −0.31%

Grid, μ = 5 32 2611.72 2293.01 −0.03% −0.09%

Grid, μ = 7 32 1534.83 1450.86 0.01% −0.08%

Grid, μ = 10 32 1189.16 1177.85 0.01% −0.12%

Overall 169 1847.34 1729.41 −0.02% −0.12%

with BCAinc providing slightly better upper bounds, but also somewhat worse lowerbounds. However, due to the extreme simplicity of the implementation of BCAinc, wesuggest the use of this algorithm.

The key advantage of BCAinc is the fact that it does not require sophisticated rou-tines and can be easily coded using commercial optimization software such as ILOGConcert Technology, and can even be easily implemented in a modeling languagelike AMPL or GAMS. Additionally, the improvement in the time needed to solve thePCGMST problem using these procedures compared to the time needed to solve somealternative mathematical formulations (without using branch-and-cut) is quite signifi-cant. For example, the mathematical formulation that we used for the GMST problemin Golden et al. (2005)1 needed several days to solve several random instances (theseinstances are described in Sect. 5). Our BCAinc procedure, on the other hand, solvedmost of these instances within two hours of CPU time.

1This formulation was solved using CPLEX 7.1 on a Sun Microsystems Enterprise 250 with 2 × 400 MHzprocessors and 2 GB RAM.

Page 20: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

88 B. Golden et al.

5 Additional computational results

All the procedures presented in this paper were coded using Microsoft Visual C++.The exact procedures were coded using ILOG Concert Technology 2.0 in CPLEX 9.0.All computations were performed on a workstation with 2.66 GHz Xeon processorand 2 GB RAM.

The test instances used in this paper include two sets of problems identical to thosethat we used in Golden et al. (2005), except that here we have used randomly gen-erated integer node weights in the range [0,10]. The first set represents the TSPLIBinstances where the edge costs satisfy the triangle inequality. These instances weregenerated by Fischetti et al. (1997). This set contains five groups of problems thatdiffer in the type of clustering in the network (the parameter μ in these instances canbe interpreted as the average number of nodes in a cluster). The second set of testinstances are random instances generated by Golden et al. (2005).

The two metaheuristic procedures, local search (LS) and genetic algorithm (GA),were applied with the following parameters that we determined using computationaltests on a separate set of random instances. For the GA, we use a population size of100 chromosomes, 15 generations, and we create 50% of offspring using crossoverand mutation operators each. The fraction of the population discarded at the end ofeach generation is set to 0.5. The LS was performed using 500 independent randomlygenerated starting solutions. For the exact procedures, BCAinc and RPcycle, we haveused a two hour time limit.

We first compared BCAinc and RPcycle over a set of geographical TSPLIB in-stances and obtained the following results. Out of 41 instances, BCAinc did not finda solution in 8 instances, while RPcycle did not find a solution in 9 instances. Overall,the average CPU time for the BCAinc was 1606.23 seconds, while RPcycle required2126.34 seconds on average. In 9 instances where RPcycle did not find the optimum,BCAinc provided a better lower bound in 8 instances with an average improvementof 1.72%.

Due to the superiority of BCAinc over RPcycle, in our further computational tests,we have compared BCAinc with the two metaheuristics—LS and GA. Table 4 sum-marizes results for 130 TSPLIB instances where BCAinc found the optimum withintwo hours of CPU time, and Table 5 provides results for the remaining 39 unsolved

Table 4 Summary of computational results for BCAinc, LS, and GA on TSPLIB instances where BCAincfound the optimal solution within 2 hours of CPU times. Both LA and GA found the optimal solutions inall instances. Node weights are in the range [0,10]

Clustering Number of BCAinc LS GA

type instances Opt. sol. Avg. time (sec) Avg. time (sec) Avg. time (sec)

Center 41 33 250.16 8.70 4.55

μ = 3 32 22 163.72 38.84 10.74

μ = 5 32 22 526.12 3.53 3.83

μ = 7 32 26 227.47 2.51 3.34

μ = 10 32 27 76.03 1.66 2.00

Overall 169 130 241.53 10.23 4.70

Page 21: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

The prize-collecting generalized minimum spanning tree problem 89

Table 5 Computational results for BCAinc, LS, and GA on TSPLIB instances where the optimal solutionis unknown. (The time limit for BCAinc was 2 hours)

Problem |E| BCAinc LS GAname LB UB Soln Time (sec) Soln Time (sec)

TSPLIB instances, center clustering28pr136 8879 30406.9 34003 34003 5.59 34003 5.7030kroa150 10809 9330.25 9655 9655 7.33 9655 11.2830krob150 10807 9568.43 9896 9896 6.78 9896 11.5339rat195 18478 406.117 541 532 36.49 532 18.4140kroa200 19409 10597.5 11442 11442 17.81 11442 18.2840krob200 19430 10123.5 11080 11047 17.59 11047 18.7240d198 18841 6662.5 6844 6844 17.19 6844 42.2345ts225 24650 54554.8 62036 62080 196.02 62016 30.56

TSPLIB instances, grid clustering, μ = 350bier127 7729 70052.6 70965 70965 135.44 70965 14.8660pr136 9064 49394 52569 52575 20.47 52562 22.1357kroa150 11005 13318.8 13796 13796 206.31 13796 32.1758u159 12321 23212.3 23916 23916 229.64 23916 23.4781rat195 18745 635.438 646 649 67.33 646 61.0572kroa200 19636 13613.6 14574 14533 246.41 14529 59.0276krob200 19661 14028.1 14930 14930 328.14 14930 74.2567d198 19101 7588.51 7956 7956 58.47 7956 44.0275ts225 24900 73343.2 78639 78609 68.74 78609 136.9484pr226 25118 58815.4 62156 62052 54.91 62052 74.66

TSPLIB instances, grid clustering, μ = 536kroa150 10868 9549.6 9929 9929 10.70 9929 31.0536krob150 10870 9226.83 9580 9580 10.00 9580 29.5533pr152 11083 37505.3 37985 37985 9.03 37985 16.1932u159 12046 16498.4 16871 16871 8.03 16871 9.5249rat195 18600 450.332 514 515 263.28 513 44.6947kroa200 19464 11020.6 11383 11383 25.67 11383 23.2248krob200 19511 10261.1 10866 10866 26.22 10866 24.2740d198 18772 6527.09 6892 6892 15.88 6892 39.6945ts225 24726 46446.4 60900 60496 210.20 60431 27.7750pr226 24711 55338.1 56444 56444 265.52 56444 25.84

TSPLIB instances, grid clustering, μ = 725krob150 10725 6841.77 7174 7174 5.47 7174 5.0036rat195 18454 349.141 409 408 18.27 408 15.2535kroa200 19335 8016.5 9507 9476 14.61 9474 15.0236krob200 19362 8802.5 9581 9580 14.97 9580 15.3832d198 18372 5836.9 6329 6329 136.39 6329 17.8335ts225 24544 42156.9 50753 50636 20.53 50635 15.73

TSPLIB instances, grid clustering, μ = 1025rat195 18225 257.55 313 313 7.69 313 6.4125kroa200 19100 6008.25 6754 6754 7.75 6754 22.1125krob200 19082 6094.43 6801 6801 7.67 6801 21.0625d198 18149 5600.93 6067 6053 6.66 6053 7.5925ts225 24300 34771.1 40199 40187 8.64 40187 7.81

Page 22: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

90 B. Golden et al.

instances. Both metaheuristic procedures found optimal solutions in all 130 instances.The average CPU time required by LS in these instances was 10.23 seconds, whilethe average CPU time required by the GA was 4.7 seconds. Over all 169 TSPLIBinstances, GA always provided solutions at least as good as the solutions provided byLS, and in eight instances GA was better than LS. In these instances, the upper boundprovided by GA was, on average, 0.14% better than the upper bound provided by LS.The average time BCAinc needed to solve 130 TSPLIB instances to optimality was241.52 seconds. Of the remaining 39 instances where BCAinc did not find the optimalsolution, the upper bound provided by BCAinc equals the best upper bound found byour metaheuristics in 24 instances. In 15 instances, the best upper bound found byour metaheuristics was, on average, 0.31% better than the one provided by BCAinc.The average gap between the best upper bound and lower bound provided by BCAinc,over the 39 instances that BCAinc did not solve to optimality, was on average 8.94%.

In the case of random instances, our branch-and-cut algorithm found the optimalsolution for 41 out of 42 instances within two hours of CPU time (see Table 6). Ininstances where the optimal solution is known, BCAinc needed 32.93 seconds to findthe optimal solution. Over the same set of instances, the average CPU time for LSwas 6.2 seconds, and 5.98 seconds for GA. Both LS and GA did not find an optimalsolution for one of the problems where the optimal solution is known.

In order to test the sensitivity of our procedures with respect to assigned nodeprizes, we have repeated our experiments over all 169 TSPLIB instances using adifferent node-prize function with 5 possible prize values (0, 1, 10, 100, 1000). Theseare summarized in Table 7. In these experiments, BCAinc found the optimal solutionin 166 out of 169 instances within two hours of CPU time, with an average CPU timeof 113.95 seconds.2 Both LS and GA found optimal solution in all 166 instances, withaverage CPU times of 14.37 and 9.56 seconds, respectively. In three instances whereBCAinc did not find the optimal solution, LS and GA provided the same solutionsthat were 12.65% from the best known lower bound, and 0.03% better than the upperbound provided by BCAinc. These results indicate that higher node prizes (relativeto edge costs) tend to make the problem easier to solve for BCAinc. At the sametime, the higher importance of the node prizes seems to make problems slightly moredifficult to solve for the metaheuristics—LS and GA—in terms of computation time.

6 Conclusions

In this paper, we introduced the PCGMST problem, the prize collecting version ofthe GMST problem, that has potential applications in telecommunications networkdesign.

We presented three polynomial-time upper bound heuristics for the PCGMSTproblem, and also adapted two heuristic search procedures—local search and a ge-netic algorithm—for the GMST problem to the PCGMST problem. We discusseda formulation and solution strategy called the rooting procedure introduced by Pop(2004) for the GMST problem. We experimented with two variants of the rooting

2In this case, we had to adjust the mip gap in CPLEX settings due to higher magnitude of cost functions.

Page 23: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

The prize-collecting generalized minimum spanning tree problem 91

Table 6 Computational results for BCAinc, LS, and GA on random instances

Problem characteristics BCAinc LS GAK |V | |E| Soln Time (sec) Soln Time (sec) Soln Time (sec)

15 120 2,000 −54 7.63 −54 1.66 −54 1.663,000 −69 2.77 −69 1.70 −69 2.006,000 −83 3.16 −83 1.66 −83 1.59

150 3,000 −82 0.33 −82 2.14 −82 1.455,000 −80 0.33 −80 2.08 −80 1.919,000 −98 19.24 −98 2.22 −98 1.94

180 4,000 −81 8.30 −81 2.53 −81 2.307,000 −96 6.60 −96 2.52 −96 2.39

14,000 −94 11.36 −94 2.39 −94 2.31

20 120 1,500 −76 2.95 −76 2.70 −76 2.753,000 −100 8.91 −100 2.86 −100 2.926,000 −112 85.30 −112 2.66 −112 2.73

160 3,000 −96 3.80 −96 4.09 −96 3.135,000 −94 60.56 −94 3.78 −94 3.72

10,000 −131 13.08 −131 3.52 −131 3.44

200 5,000 −101 43.36 −101 4.94 −101 4.7210,000 −108 42.48 −108 4.58 −108 3.6415,000 −128 1.80 −128 4.50 −128 4.45

25 150 3,000 −89 88.73 −85 5.44 −89 5.676,000 −111 19.47 −111 5.31 −111 4.149,000 −130 3.81 −130 5.20 −130 4.78

200 5,000 −111 12.00 −111 7.11 −111 6.9910,000 −134 60.33 −134 7.13 −134 6.5915,000 −148 12.39 −148 7.14 −147 6.56

30 120 2,000 −89 5.34 −89 5.78 −89 5.453,000 −144 2.86 −144 6.17 −144 7.726,000 −159 1.58 −159 6.30 −159 5.91

150 3,000 −118 30.67 −118 7.22 −118 7.086,000 −164 8.83 −164 7.14 −164 6.899,000 −139 18.38 −139 7.09 −139 6.84

180 4,000 −146 11.73 −146 9.30 −146 7.587,000 −151 15.50 −151 9.17 −151 8.48

14,000 −153 189.02 −153 8.38 −153 7.16

40 120 1,500 −112 2.58 −112 10.05 −112 9.333,000 −157 22.00 −157 9.75 −157 9.456,000 −180 12.83 −180 9.05 −180 9.58

160 3,000 −142 9.25 −142 13.86 −142 12.775,000 −170 20.14 −170 13.36 −170 12.70

10,000 −205 6.97 −205 12.49 −205 11.36

200 5,000 −177 150.20 −177 18.19 −177 17.0010,000 −195* 7,200.00 −194 16.94 −194 17.3615,000 −223 323.56 −223 15.33 −223 15.91

*BCAinc did not complete search for this problem within 2 hours of CPU time. Upper bound at thetime search was terminated was −194

Page 24: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

92 B. Golden et al.

Table 7 Summary of computational results of BCAinc, LS, and GA on TSPLIB instances with nodeweights [0,1,10,100,1000]. Both LA and GA found the optimal solutions in the 166 instances thatBCAinc solved to optimality within 2 hours of CPU time

Clustering Number of BCAinc LS GA

type instances Opt. sol. Avg. time (sec) Avg. time (sec) Avg. time (sec)

Center 41 41 55.89 8.54 7.17

μ = 3 32 31 348.13 52.45 23.14

μ = 5 32 31 56.82 7.46 10.18

μ = 7 32 31 34.66 3.46 4.80

μ = 10 32 32 93.63 2.24 3.50

Overall 169 166 113.95 14.37 9.56

procedure for the PCGMST problem. We then developed a very simple and effectiveexact solution procedure.

Our computational experiments show that our upper bound heuristics, which rep-resent the straightforward adaptations of the MST heuristics, do not provide goodbounds for the PCGMST problem. However, when applied within the framework ofa tailored, heuristic-repetition paradigm called the pilot method we show that theseprocedures provide substantially better results (finding solutions that are on averageless than 3% from optimality).

The exact procedure that we proposed in this paper is a novel branch-and-cut al-gorithm that works with incumbent (integer) nodes of the branch-and-bound tree. Re-sults of our computational study indicate that this algorithm provides better boundsfor the PCGMST problem than the two variants of the rooting procedure that wetested in this paper. Our computational experiments further suggest that our branch-and-cut algorithm is a good choice (i.e., can provide optimal solutions within rela-tively short CPU times) for networks with up to 200 nodes. For larger networks, theheuristic search procedures are a better alternative. These procedures (LS and GA)have demonstrated excellent performance for two different types of functions usedfor the node prizes. In particular, out of 296 TSPLIB instances for which the opti-mum is known, both LS and GA, very rapidly, found the optimum solution in all 296instances. For the random instances, both algorithms found optimal solutions for 40out of 41 instances for which the optimal solution is known.

References

Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory, Algorithms and Applications. Prentice-Hall, New Jersey (1993)

Cockayne, E.J., Melzak, Z.A.: Steiner’s problem for set terminals. Q. Appl. Math. 26(2), 213–218 (1968)Dror, M., Haouari, M., Chaouachi, J.: Generalized spanning trees. Eur. J. Oper. Res. 120(3), 583–592

(2000)Duin, C.: Steiner problem in graphs: approximation, reduction, variation. Ph.D. thesis, University of Am-

sterdam (1993)Duin, C., Voß, S.: The pilot method: a strategy for heuristic repetition with application to the Steiner

problem in graphs. Networks 34(3), 181–191 (1999)Duin, C., Voß, S.: Solving group Steiner problems as Steiner problems. Eur. J. Oper. Res. 154(1), 323–329

(2004)

Page 25: The prize-collecting generalized minimum spanning tree …terpconnect.umd.edu/~raghavan/preprints/pcgmst.pdfIn the prize-collecting generalized minimum spanning tree (PCGMST) problem,

The prize-collecting generalized minimum spanning tree problem 93

Feremans, C.: Generalized spanning trees and extensions. Ph.D. thesis, Université Libre de Bruxelles(2001)

Feremans, C., Labbé, M., Laporte, G.: A comparative analysis of several formulations for the generalizedminimum spanning tree problem. Networks 39(1), 29–34 (2002)

Feremans, C., Labbé, M., Laporte, G.: The generalized minimum spanning tree problem: polyhedral analy-sis and branch-and-cut algorithm. Networks 43(2), 71–86 (2004)

Fischetti, M., Salazar-Gonzalez, J.J., Toth, P.: Symmetric generalized traveling salesman problem. Oper.Res. 45(3), 378–394 (1997)

Golden, B., Raghavan, S., Stanojevic, D.: Heuristic search for the generalized minimum spanning treeproblem. INFORMS J. Comput. 17(3), 290–304 (2005)

Haouari, M., Chaouachi, J., Dror, M.: Solving the generalized minimum spanning tree problem by abranch-and-bound algorithm. J. Oper. Res. Soc. 56(4), 382–389 (2005)

Magnanti, T.L., Wolsey, L.A.: Optimal trees. In: Ball, M., Magnanti, T.L., Monma, C.L., Nemhauser,G.L. (eds.) Network Models. Handbooks in Operations Research and Management Science, vol. 7,pp. 503–615. North-Holland, Amsterdam (1995)

Martin, R.: Using separation algorithms to generate mixed integer model reformulations. Oper. Res. Lett.10(3), 119–128 (1991)

Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs. Springer, Heidelberg(1996)

Myung, Y.S., Lee, C.H., Tcha, D.W.: On the generalized minimum spanning tree problem. Networks 26(4),231–241 (1995)

Pop, P.C.: New models of the generalized minimum spanning tree problem. J. Math. Model. Algorithms3(2), 153–166 (2004)

Pop, P.C., Kern, W., Still, G.: A new relaxation method for the generalized minimum spanning tree prob-lem. Eur. J. Oper. Res. 170(3), 900–908 (2006)

Shyu, S.J., Yin, P.Y., Lin, B.M.T., Haouari, M.: Ant-tree: an ant colony optimization approach to thegeneralized minimum spanning tree problem. J. Exp. Theor. Artif. Intell. 15(1), 103–112 (2003)

Stoer, M., Wagner, F.: A simple min-cut algorithm. J. ACM 44(4), 585–591 (1997)


Recommended