A PARALLEL INTEGER PROGRAMMING
APPROACH TO GLOBAL ROUTING
by
Tai-Hsuan Wu
A dissertation submitted in partial fulfillment of the requirements for the degree of
Doctor of Philosophy (Electrical and Computer Engineering)
at the
UNIVERSITY OF WISCONSIN-MADISON
2011
Fair use allowance: You are allowed to reproduce of this dissertation with appropriate credit or citation. If financial profit is involved, i.e., for use in a textbook or sales from distribution of copies, contact me first if further into future. All cited materials are copyright of their respective authors.
©Copyright by Tai-Hsuan Wu 2011
All Rights Reserved
submitted to the Graduate School of the University of Wisconsin-Madison
in partial fulfillment of the requirements for the degree of Doctor of Philosophy
By
The dissertation is approved by the following members of the Final Oral Committee:
Date of final oral examination:
Month and year degree to be awarded:
A PARALLEL INTEGER PROGRAMMING APPROACH TO GLOBAL ROUTING
Tai-Hsuan Wu
Azadeh Davoodi, Assistant Professor, Electrical and Computer Engineering, Chair Jeffrey T. Linderoth, Associate Professor, Industrial and Systems EngineeringKewal K. Saluja, Professor, Electrical and Computer Engineering Mikko H. Lipasti, Professor, Electrical and Computer EngineeringParameswaran Ramanathan, Professor, Electrical and Computer Engineering
May 23, 2011
August 2011
i
University of Wisconsin – Madison
Abstract
A Parallel Integer Programming Approach to Global Routing
Tai-Hsuan Wu
Chair of the Supervisory Committee: Professor Azadeh Davoodi
Electrical and Computer Engineering
This work introduces a parallel algorithm for an important Electronic Design Automation problem known
as Global Routing. Global routing is a stage in the VLSI design cycle during which millions of
interconnects are planned on the chip. The existing global routing procedures are highly sequential in
nature, and thus developing a parallel algorithm is a challenging task.
In this dissertation, first, a global routing procedure known as GRIP is proposed which heavily relies
on Integer Programming (IP) techniques. GRIP decomposes the global routing problem into smaller
subproblems corresponding to rectangular subregions on the chip together with their net assignments.
GRIP solves the individual subproblems and the connection problem between them. It is the first
successful realization of a heavily IP-centric procedure to Global Routing for large industrial instances.
Due to the effective use of Integer Programming techniques and decomposition, GRIP demonstrates
tremendous improvement in wirelength, with the same or lower usage of the routing resources, compared
to the competing global routers in the open literature.
Despite the limited parallelism in solving the subproblems and the procedure to parallel - connect
them, GRIP takes significant time (from many hours to days) to solve the challenging ISPD 2007 and
2008 benchmark instances.
To improve the computational runtime in GRIP without much degradation in the solution quality, next,
this work presents PGRIP, a procedure which allows solving the subproblems with a very high degree of
parallelism. Concurrent processing of the routing subproblems is desirable for effective parallelization.
However, achieving global routing solutions with no (or low) over-usage of routing resources is
ii
challenging without strong, coordinated algorithmic control. PGRIP addresses this challenge via a
patching phase which offers a one-time synchronization between the subproblems to minimize the
likelihood of over-usage of routing resources when attempting to connect them after their concurrent
processing. Patching also relies on Integer Programming techniques and provides a one-time feedback to
each subproblem to enhance its connectivity to its adjacent subproblems.
Similar to GRIP, PGRIP maintains the large gap in the solution quality compared to the competing
academic global routers. Unlike GRIP, it is able to achieve a significantly high degree of parallelism. The
runtimes of different steps in PGRIP can be budgeted by the user. In the computational experiments,
PGRIP achieves the same (wall) runtime (of about 75 minutes) regardless of the size and difficulty of the
problem instance, while running on a grid of few hundred CPUs of only 2GB memory. In return, a more
difficult instance has a higher number of subproblems and is solved using a higher number of parallel
CPUs. Obtaining similar runtimes is a unique feature of PGRIP, compared to the competing global
routing methods which are shown to take significantly longer for more challenging problem instances.
Moreover, the memory usage in both GRIP and PGRIP are significantly lower than the competing
procedures because each processor is assigned to solve a small-sized subproblem.
The third component of this dissertation is utilizing the parallel procedure of PGRIP to solve a
variation of global routing to minimize interconnect power. The work introduces Power-GRIP which
offers an IP formulation for minimizing interconnect power in multi-supply voltage (MSV) domains. The
IP integrates a proposed mathematical modeling for the interconnect power. Power-GRIP adapts and
extends the procedures used in PGRIP, while it is the first work to study interconnect power minimization
in MSV domains. Simulation results demonstrate significant saving in the interconnect power metric for
global routing without any degradation in wirelength and routing resource usage, compared to an initially-
provided and wirelength-optimized routing solution.
In summary, with the aid of large-scale parallelism provided by computational grids, this work
demonstrates that the use of the integer programming, which was perhaps viewed as too time-consuming
and hence impractical for global routing, allowing generating significantly higher quality solutions while
meeting practical runtime requirements.
iii
TABLE OF CONTENTS
Pages
Abstract ·········································································································································· i
Table of Contents ···························································································································· iii
List of Figures ································································································································ v
List of Tables ································································································································ vii
Chapter 1: Introduction ············································································································· 1
1.1 Motivation ························································································································ 2
1.2 Contributions of This Dissertation ·················································································· 3
Chapter 2: Global Routing: Preliminaries and Literature Review ··········································· 8
2.1 Problem Definition ··········································································································· 9
2.2 Fundamental Techniques ································································································· 11
2.3 Shortcomings of the Existing Techniques ······································································· 21
2.4 Multi-objective Global Routing ······················································································· 23
Chapter 3: GRIP: Global Routing via Integer Programming ··················································· 24
3.1 An Integer Program for Global Routing ·········································································· 25
3.2 Solution Procedure via Price-and-Branch ······································································· 26
3.3 Decomposition for Scalability ························································································· 35
3.4 Handling Overflow ··········································································································· 41
3.5 Comparison to Optimization-Based Methods ·································································· 44
3.6 Simulation Results ··········································································································· 45
iv
Chapter 4: PGRIP: A Parallel Integer Programming Approach to Global Routing ·················· 51
4.1 Challenges of Parallelizing GRIP ···················································································· 52
4.2 An Integer Programming Formulation of PGRIP ···························································· 55
4.3 The Parallel Global Routing Procedure ············································································ 56
4.4 Simulation Results ··········································································································· 65
Chapter 5: Power-GRIP: Power-Driven Global Routing for MSV Domains ··························· 71
5.1 Interconnect Power Modeling in MSV Domains ····························································· 73
5.2 Placement of Level Converters ······················································································· 77
5.3 Power-Driven MSV-Based Global Routing ···································································· 80
5.4 Simulation Results ··········································································································· 91
Chapter 6: Conclusions and Future Works ··············································································· 96
6.1 Conclusions ······················································································································ 96
6.2 Future Work 1: Layer Directive Global Routing ····························································· 97
6.3 Future Work 2: Enhancing the Correlation Between the Placement and Routing Stages ··· 100
Bibliography ··································································································································· 103
v
LIST OF FIGURES
Figure Number Pages
1.1 Physical design flow ········································································································· 2
2.1 Published articles of the Global Routing problem since 1983. The statistical data is obtained from Microsoft Academic Search with the keyword “Global Routing” ············ 8
2.2 The construction of grid graph for the global routing problem ········································· 9
2.3 A 2-D view of the design after placement (left); A 3-D global routing grid-graph (right) ·· 10
2.4 Maze routing with bounding box may lead to sub-optimal solution in the presence of routing obstacles (designated in red in the figure) ···························································· 12
2.5 Pattern Routing and Monotonic Routing can be used to speedup the shortest path searching process. The search space, however, becomes limited as a result of few pattern possibilities ························································································································· 13
2.6 Two approaches of tree construction for multi-terminal nets ·············································· 14
2.7 Hierarchical Global Routing ······························································································· 17
2.8 (a) A six-terminal net which can be decomposed into five two-terminal sub-nets (b) Routing these five two-terminal subnets independently yields to sub-optimal solution (c) A solution with a smaller wirelength can be found while sharing the resources ········ 22
3.1 Overview of the price-and-branch procedure for GRIP ······················································ 29
3.2 Improving routes via the shortest path algorithm on a weighted grid-graph ······················ 31
3.3 Procedure to identify new candidate routes with reduced cost via rerouting segments of an existing route ·············································································································· 33
3.4 Modifying grid-graph of a subproblem to handle floating terminals ································· 37
3.5 (a) Defining subproblems using initial Flute-based net planning. (b) Improving net assignment to the subproblems via detouring ···································································· 38
3.6 GRIP’s procedure to define and solve the subproblems ······················································ 39
3.7 Connecting route-segments in adjacent subproblems ························································· 40
3.8 Defining few subproblems around the edges with overflow ·············································· 43
3.9 Comparison of GRIP-based partitioning versus uniform-based partitioning in the benchmark adaptec1 [3] ····································································································· 46
vi
4.1 GRIP solves a subproblem with some flexibility in routing the “inter-region” nets, which resulting in exploring a limited parallelism ······························································ 53
4.2 Example of planning inter-region nets when processing two adjacent subproblems independently ······················································································································· 54
4.3 Overview of parallel GRIP ·································································································· 57
4.4 Example illustrating the PGRIP patching procedure ··························································· 61
4.5 Uniform allocation of routing resources for parallel-connecting subproblems ················· 64
4.6 The projected congestion map of adaptec1 benchmark instance at different phases of PGRIP ···························································································································· 68
5.1 Overview of Global Routing with Multi-Supply Voltage ··················································· 72
5.2 MSV-based Global Routing model with level converters ··················································· 73
5.3 Decomposition of net with multi supply voltage levels ······················································ 74
5.4 Modeling route capacitance on a Global Routing edge ······················································ 76
5.5 Dependence of three types of capacitance on edge utilization in metal layer 1 ················· 77
5.6 Valid on-route level converter locations for one net ··························································· 78
5.7 Comparison between (b) wirelength-optimized Global Routing and (c) power-optimized Global Routing ···················································································································· 80
5.8 Convex expression of edge capacitance in metal1 with respect to the edge utilization ······ 81
5.9 Power-aware route generation ····························································································· 86
5.10 Penalizing the edge capacitance if the rerouting of a net causes a larger edge utilization compared to phase 1 ············································································································ 89
5.11 Decomposition into smaller-sized subproblems similar to GRIP and PGRIP ·················· 90
6.1 Different metal layers have different wire widths. The even numbered metal layers run horizontally across the picture, while the odd numbered layers run perpendicular to the picture ······················································································································· 98
6.2 Overview of layer directive Global Routing ······································································ 99
6.3 Inserting buffers in the post-placement stage may create congested regions and cause routability issue ··················································································································· 102
vii
LIST OF TABLES
Table Number Pages
3.1 The ISPD’07 and ISPD’08 benchmarks ·············································································· 47
3.2 Runtime information of GRIP (without the overflow step) ················································ 49
3.3 Results of GRIP for the ISPD’07 and ISPD’08 benchmarks. The wirelength (WL) is scaled to 105 ···································································································································· 50
4.1 Results of PGRIP for the ISPD’07 and ISPD’08 benchmarks. The wirelength (WL) is scaled to 105 ························································································································ 66
4.2 Estimated overflow of the initial subproblems ···································································· 67
4.3 Runtime comparison of PGRIP and GRIP ·········································································· 69
5.1 Results of the level converter placement for the ISPD'08 benchmarks ······························ 93
5.2 Results of Power-GRIP for the ISPD’08 benchmarks. The wirelength is scaled to 105. Power and capacitance are scaled to 103 ·············································································· 94
1
Chapter 1
INTRODUCTION
With the rapid advances in nanometer VLSI process technology, modern circuit design with bil-
lions of transistors is becoming increasingly complex, and in turn placing even higher computing
demands on the Electronic Design Automation (EDA) tools. Aggressive technology scaling, not
only requires simultaneous optimization of conflicting objectives such as power, timing, area, and
noise, but also requires considering the impact of manufacturing inaccuracies such as process vari-
ations and sub-wavelength lithography. These challenges complicate achieving design closure and
prolong the design cycle.
Physical design which precedes the fabrication of a circuit is one of the most critical stages in
the design flow because it is the step where many of the objectives can be accurately modeled and
thus effectively be optimized. Physical design is composed of several steps such as placement, clock
network synthesis, routing, and design for manufacturability. As shown in Figure 1.1, these steps
are applied in an iterative manner. During these iterations, various design objectives are optimized
while checking that a set of design rules are satisfied. If the design rules are not satisfied at the end
of one iteration in the physical design flow, then a new iteration will start and the process continues.
To achieve design closure faster, it is always desirable to speedup the physical design stage
since it is among the most time-consuming steps in the design flow. This can be done with the
recent advances in parallel architectures, by accelerating the individual components in the design
flow (e.g., parallel placement or clock network synthesis). Alternatively, faster design closure is
possible by reducing the number of iterations in the design flow. This may be achieved by improving
the algorithms driving the steps in the design flow to generate a higher quality solution. With the
recent advances in parallel architectures and cloud computing, one way to improve the quality of the
existing algorithms is to revisit the alternative procedures which were considered impractical due to
their high computational demands.
2
Logic Design
Placement & CTS
Routing
DFM
Phy
sica
l Des
ign
Post Layout Simulation
Tape out
Global Routing
Track Assignment
Detail Routing
May incorporate Global Routing in the placement stage for
better congestion estimation
Figure 1.1: Physical design flow.
1.1 Motivation
In this dissertation, we investigate an alternative optimization technique, Integer Programming, for
an important EDA problem known as Global Routing. Integer Programming was considered imprac-
tical for Global Routing due to its large single-thread runtime requirement on large-sized industrial
design instances. Nevertheless, we demonstrate that by utilizing parallel computing, Integer Pro-
gramming is applicable and allows obtaining significant improvement in the solution quality while
meeting the practical runtime requirements.
Global routing fits in a modern physical design flow as shown in Figure 1.1. The routing stage
is further decomposed into three sub-stages of global routing, track assignment, and detail routing.
Global routing plans the approximate routing path for each net for a given placed netlist. The
approximate routing path of each net is then assigned to the available routing tracks, which represent
the physical routing resources in a design. During the detail routing stage, various sophisticated
design rules such as wire spacing constraints and via stacking rules are checked and repaired to
ensure design and manufacturability closure. Among all these three stages, global routing is perhaps
the most critical one. This is because the interconnect planning performed at this stage directly
3
impacts various design objectives such as the chip area, circuit timing, power consumption, the
complexity of detail routing stage and the number of iterations required to complete the design cycle.
Moreover, the solution obtained from global routing is not only utilized during track assignment and
detail routing, but as shown in Figure 1.1, it may also be passed over to the placement stage in order
to provide a more accurate wirelength and congestion estimation and to enhance the routability of
the design.
Due to the fact that the simplest version of the routing problem (i.e., rectilinear routing of a single
multi-terminal net with minimum wirelength) is an NP-complete problem [40], a considerable body
of work in the past three decades has focused on solving the global routing problem, and even its
implementation for various design styles such as gate arrays, sea of gates, standard cell-based and
custom designs [44] [49] [66] [67] .
Global Routing is an inherently sequential EDA problem. On one hand, this is because most
of the state-of-the-art academic and commercial global routers rely on a rip-up and reroute based
procedure, which is iterative by nature and hard to parallelize. The basic step in rip-up and reroute
is to remove one or more routes passing through the congested regions and replace them with new
routes going through the less congested ones. Although various techniques have been proposed to
improve the efficiency of the rip-up and reroute procedure, routing multiple nets simultaneously is
still difficult because of the competition for routing resources [47]. Specifically, if two nets in the
same region are rerouted concurrently, they may use the same routing resource to complete their
new routes and result in unexpected over-utilization of routing resources. On the other hand, one
can try to partition a large-sized design into smaller subregions, and then independently route these
subregions in parallel. Nevertheless, many nets may belong to multiple subregions, and guarantee-
ing the connectivity between adjacent subregions without over-utilizing the routing resources is not
trivial, and in fact is the main challenge of this approach.
1.2 Contributions of This Dissertation
In this dissertation, we present three related topics on parallelizing Global Routing via Integer Pro-
gramming. We first propose a method to decompose a design into rectangular subregions together
with their net assignments to form smaller-sized subproblems. These subproblems are solved via an
4
Integer Programming (IP) based procedure in a systematic order, and then a subregion connection
phase is applied to generate a complete solution. This procedure significantly improves the solution
quality but has a prohibitively large runtime due to the limited parallelism. Thus, to speedup the
runtime, the procedure is extended to solve all the subproblems with a much larger degree of paral-
lelism using only a one-time synchronization which results in significantly enhancing the quality of
the subregion connectivity. At the end, we propose an extended IP formulation of Global Routing
to minimize the contemporary objective of interconnect power which is increasingly gaining impor-
tance in modern VLSI design. We show that a similar parallel procedure is applicable to solve this
extended IP formulation.
1.2.1 Contribution 1: Global Routing via Integer Programming [70] [72]
We first introduce GRIP, a Global Routing technique which heavily relies on integer programming
techniques. As the first step towards achieving parallelism, GRIP decomposes the large-sized prob-
lem into smaller-sized subproblems. The smaller-sized subproblems are solved individually via an
Integer Program (IP), which aims to select one route for each net from a set of promising candidate
routes. Later, the route fragments of the same net in adjacent subproblems are connected to form the
complete Global Routing solution. To further reduce the overflow, an IP-based overflow reduction
phase is applied at the end. The first contribution of this dissertation can be summarized as follow:
• An integer program for the global routing problem which minimizes the wirelength and via
costs of the routed nets as its objective. The procedure is directly applied to a 3-D graph
model of the problem, thus avoiding a commonly-used layer assignment phase;
• Generation of a set of promising candidate routes for each net using a linear-programming
based pricing procedure. The pricing is an iterative procedure that effectively and systemati-
cally considers the impact of currently-generated routes when generating new ones;
• A decomposition procedure to make integer programming applicable to large-sized instances.
The routing problem is divided into a set of balanced subproblems in terms of the complexity
required to solve them. Consequently the runtime of our procedure depends on the number of
subproblems, and some of the non-adjacent ones can be processed in parallel;
5
• A novel method called “floating terminals” for retaining connection flexibility when solving
the decomposed subproblems;
• A final “clean-up” integer programming-based procedure for routing a set of designated nets
to minimize the overflow.
In the simulation results, GRIP achieves an average 9.23% and 5.24% improvement in the sum-
mation of wirelength and via cost for the ISPD’07 and ISPD’08 benchmarks respectively. These
results are compared to the best solutions reported for each case from four state-of-the-art academic
global routers. The remarkable improvement is due to a combination of the concurrent nature of IP,
effective pricing for candidate route generation, directly working with the 3-D model of the problem,
effective decomposition into subproblems, and effective recombination of the solution fragments.
1.2.2 Contribution 2: A Parallel Integer Programming Approach to Global Routing [71]
Although GRIP can produce high quality solutions, it has a prohibitively long runtime to complete
global routing. To tackle this issue, a parallel global routing procedure called PGRIP is presented,
which is able to significantly speed up the (wall) runtime of GRIP by utilizing many more proces-
sors. PGRIP removes a major bottleneck in GRIP by routing all the subproblems independently and
ensuring (through a one-time synchronization) that the routing results of adjacent subproblems can
be effectively patched together. Moreover, the runtimes of different steps in PGRIP can be budgeted
by the user. The memory usage in both GRIP and PGRIP are significantly lower than the competing
procedures because each processor is assigned to solve a small-sized subproblem. In our compu-
tational experiments, PGRIP achieves the same (wall) runtime (of about 75 minutes) regardless of
the size and difficulty of the problem instance while running on a grid of few hundred CPUs of only
2GB memory.
There are several challenges to obtain a high-quality solution (of small wirelength without over-
utilization of routing resources), if processing the subproblems concurrently to realize a parallel
global router. The first challenge is to effectively decompose the routing problem into subproblems
so that the difficulty of the subproblems are balanced. This step can significantly impact the final
solution quality. The second challenge is to generate the subproblem solutions in a manner that can
6
facilitate their connectivity later and avoid overflow. PGRIP addresses both of these challenges. The
following summarizes the second contribution of this dissertation:
• To form the subproblems, we extend GRIP to include a formal procedure for the initial esti-
mation of the distribution of the nets. This is a crucial step to obtain a high quality routing
solution and to achieve balanced subproblems.
• In order to effectively achieve concurrent processing of individual subproblems, we employ a
one-time synchronization approach so that significant portions of the computation can occur
completely without centralized control. This synchronization is via our novel use of an integer
programming patching procedure.
• Our procedure can finish problems of varying sizes and difficulties within the same time
budget, and the number of used processors varies depending on the problem size.
1.2.3 Contribution 3: Power-Driven Global Routing via Integer Programming in MSV domains [73]
To demonstrate the flexibility of the proposed parallel framework for Global Routing, next, mini-
mizing an alternative objective is considered in this work. Instead of the traditional minimization
of wirelength and inter-layer via costs during global routing, minimizing the interconnect power is
introduced which is becoming an increasingly important design objective.
Specifically, this work presents an IP model for interconnect power minimization during global
routing for designs in Multi-Supply Voltage domains (MSV). The mathematical model captures the
dependency on wire size, spacing, and wiring congestion at different metal layers, as well as the
supply voltage utilized at each domain on the chip. We show a similar procedure of GRIP and
PGRIP can be extended to handle this variation of the global routing formulation. This work makes
the following contributions:
• Extend the wirelength-driven pricing procedure in GRIP, to a power-driven one to generate
power-efficient candidate routes.
• Employ a two-phase procedure to handle the nonlinearity of the IP formulation heuristi-
cally. The first phase is applied to minimize the interconnect capacitances (area, fringe, and
7
congestion-dependent coupling capacitances). The second phase is then used to minimize
power by accounting for net activities, voltage levels, while adhering to the interconnect ca-
pacitances obtained from the first phase.
The remainder of this dissertation is organized in the following sections. The definition of Global
Routing and its formulation are described in Chapter 2. The basic techniques that are widely adopted
in the modern global routers are also introduced in this chapter. In Chapter 3, GRIP is presented
based on an Integer Programming model of global routing. The price-and-branch procedure, and
the decomposition and connection of subregions are discussed in detail in this chapter. PGRIP and
its procedure to parallel solve all subregions are presented in Chapter 4. Power-GRIP is presented in
Chapter 5 to minimize the interconnect power in MSV domains. Finally, Chapter 6 concludes this
dissertation and offers a summary and future directions.
8
Chapter 2
GLOBAL ROUTING: PRELIMINARIES AND LITERATURE REVIEW
The Global Routing problem was first defined by Burstein et al. [10] in 1983 in order to simplify
the “complicated” wire routing problem in VLSI design. Since then, a great deal of research efforts
have been dedicated to the global routing problem, and more than four hundred articles have been
published (as shown in Figure 2.1). Specifically, research in global routing has gained momentum
ever since new challenging benchmarks were released during the ISPD 2007 contest [3]. In this
chapter, the global routing problem is defined and some fundamental techniques that are widely
used by the global routing procedures are introduced. An overview of modern state-of-the-art aca-
demic global routers is then presented, followed by a discussion on the global routing challenges for
modern VLSI designs.
Figure 2.1: Published articles of the Global Routing problem since 1983. The statistical data isobtained from Microsoft Academic Search with the keyword “Global Routing”.
9
cells
global edges
global bins
global edges
global bins
cap. = C
Figure 2.2: The construction of grid graph for the global routing problem.
2.1 Problem Definition
The Global Routing problem can be conceptualized on a grid-graph G = (V,E) as depicted in Fig-
ure 2.2. After placement, a chip is partitioned into rectangular regions called global bins. Each
global bin is a vertex v ∈ V in the grid-graph. The boundary between two adjacent global bins is
modeled as an edge e ∈ E. Each edge e is associated with a cost ce. Also given as input is a set
of (multi-terminal) nets N . Each net Ti is defined by a set of vertices (terminals) in V (Ti ⊂ V ).
At the level of Global Routing, the terminals of the nets are assumed to be located at the center of
each global bin. Routing a multi-terminal net is finding a Steiner tree that connects its terminals to
each other. The cost of the tree is the summation of the costs of its edges. For example, when the
cost of an edge is 1 unit, the cost of the tree reflects the wirelength of the corresponding route. The
Global Routing problem finds a set of Steiner trees connecting the terminals of each net Ti,∀i ∈ N .
Furthermore, each edge e ∈ E is associated a capacity ue, reflecting the maximum available routing
resources between its corresponding adjacent bins. If an edge is utilized higher than its capacity,
then the overflow on the edge is computed by adding the units of extra wire usage for that edge.
In modern VLSI design, the routing resources are available in many metal layers. For example,
the ISPD’07 benchmarks have six metal layers—three horizontal layers and three vertical layers [3].
Adjacent layers are connected by (inter-layer) vias. In the grid graph, vias are also modeled as edges
with unlimited capacity. Each vertex is connected to its top and bottom vertices corresponding to
the top and bottom layers (if they exist). For various reasons such as reliability, manufacturability,
10
global edges
global bins
Horizontaledges
Vias
Verticaledges
Figure 2.3: A 2-D view of the design after placement (left); A 3-D global routing grid-graph (right).
area, and signal delay, it is desirable to minimize or control the number of vias during routing. The
grid-graph of global routing can be extended as a 3-D graph as shown in Figure 2.3. The cost of a
via is considered 3 units (to associate a higher penalty with via usage) in the ISPD’07 benchmarks
[3] and 1 unit in the ISPD’08 benchmarks [4].
When evaluating a routing solution, typically two metrics of total wirelength and overflow are
minimized. The total wirelength is the same as the total costs of the routed nets, when the cost
of each via is 1 unit (e.g., ISPD’08 benchmarks [4]). The total overflow is computed as the units
of overflow added over all the edges [50]. Typically, overflow should also be minimized (zero is
desirable) since it directly corresponds to the routability of the design. The wirelength and overflow
are conflicting objectives; to minimize the overflow, the nets passing through the congested regions
need to be detoured, and thus causing an increase in the total wirelength. In modern VLSI design,
a small number of overflow is allowed during Global Routing. This is because additional routing
resources are preserved for the detail routing stage when the design rule violations are repaired,
and these resources can also be used to eliminate the overflow. Additionally, runtime is also an
important metric. This is especially true in the cases in which global routing is repeatedly used
to guide a placement algorithm as a congestion estimation tool to improve the routability of the
design [47] [60]. Overall minimizing wirelength and overflow have been traditionally used since
they correlated well with enhancing routability, timing, power, design rule violations, and other
interconnect-related issues that are of concern in the lower stages of the design.
11
2.2 Fundamental Techniques
Although the global routing procedures have recently achieved remarkable progress, some of the
fundamental techniques have remained the same. In this section, we review these fundamental
techniques and divide them into three categories : 1) routing techniques for a two-terminal net, 2)
decomposition techniques for routing a multi-terminal net, 3) frameworks for routing all the nets.
2.2.1 Routing Techniques for A Two-Terminal Net
The fundamental problem in Global Routing is to find the shortest path for connecting two vertices.
For a routing grid graph as shown in Figure 2.2(b), each edge e in the graph is associated with a
weight we. The weight for example reflects the current utilization of the edge. (Edge weights will
be discussed later when the common routing frameworks are presented.) The single two-terminal
net routing problem identifies the shortest path with minimum weights (of its edges) which connects
the two vertices.
• Maze Routing
Maze routing was introduced in [54] to solve the problem of connecting two vertices in a graph
with the shortest path. Despite various improvements which have been proposed to enhance
the maze routing algorithm in the past few decades, the core procedure of this algorithm
remains intact.
The maze routing algorithm is composed of two stages - propagation and backtracking. Dur-
ing the propagation stage, a wave is propagated in a breadth first traversal of the routing graph
from one of the vertices, designated as the source node, until reaching the other vertex, desig-
nated as the sink node. Next during the backtracking stage, a shortest path is identified in the
reverse direction to connect the source and sink nodes. Several traditional algorithms such as
Lee’s Algorithm [43] and Hadlock’s Algorithm [30] have been proposed to realize the maze
routing for the VLSI routing problem.
Nevertheless, the runtimes of these algorithms are the major bottlenecks for today’s large-
scale designs with millions of grid edges. A common improvement is to create a bounding
12
(a) With bounding box (b) Without bounding box
Figure 2.4: Maze routing with bounding box may lead to sub-optimal solution in the presence ofrouting obstacles (designated in red in the figure).
box around the source and sink nodes to restrict the search region. If no path can be found
to connect these two nodes in this region, the bounding box will be increased and the path
searching algorithm is applied again. In the presence of routing obstacles (which are for
example introduced by the edges that have been utilized to full capacity by the previously-
routed nets), this restricted-bounding box implementation may lead to a sub-optimal solution
as shown in Figure 2.4.
The Dijkstra’s algorithm [21] is the most popular implementation in the latest global routers.
The A* search algorithm [32], which is based on an extension of the Dijkstra’s algorithm,
reorders the searching nodes in the graph to speed up the shortest path searching process.
Specifically, each node in the graph is associated with a cost which is the summation of edge
weights from the source node to the current node and the estimated weight from the current
node to the sink node. The A* search algorithm then visits these nodes in the increasing order
of their costs. Recently, a new procedure is introduced in [47] to further improve the runtime
of maze routing. In this procedure, an upper bound in terms of wirelength is estimated when
finding a path to connect two nodes. During the breadth first traversal in the maze routing
algorithm, any path that exceeds this upper bound is disregarded so that the searching space
becomes restricted.
13
S
T
S
T
L-Shape
Z-Shape
(a) Pattern Routing (b) Monotonic Routing
Figure 2.5: Pattern Routing and Monotonic Routing can be used to speedup the shortest path search-ing process. The search space, however, becomes limited as a result of few pattern possibilities.
• Pattern and Monotonic Routing
As mentioned before, the maze routing algorithm can effectively identify the shortest path
between two terminals. The runtime overhead, on the other hand, is the main shortcoming of
maze routing. In fact, the runtime of the latest academic global routers is mostly occupied by
maze routing. To obtain further speedups, Pattern Routing [42] is alternatively considered. It
is a technique which highly restricts the search space by utilizing specific routing patterns to
connect two terminals. For example, L-shape routing utilizes at most one single bend to route
a two-terminal net, while Z-shape routing contains two bends as shown in Figure 2.5(a).
In general, Pattern Routing can identify a shortest path very quickly, but the solution quality
has a big gap compared to maze routing. To enhance the solution quality, Monotonic Routing
is then proposed [59]. The main idea is to expand the search space compared to pattern
routing to improve the solution quality. This algorithm starts from the source node and then
monotonically increases the routing path until reaching the sink node, as shown in Figure
2.5(b). Although it is faster than the maze routing technique, the search area is still restricted
to the bounding box of the source and sink nodes.
Due to the fact that both Pattern and Monotonic Routings have a limited search space, the
latest academic global routes usually adopt a two-step approach: First Pattern and (or) Mono-
tonic Routing are applied to route all the nets. Then Maze Routing is applied to identify the
shortest paths for those nets which were routed with a very high weight in the previous step.
14
(a) Minimum Spanning Tree (b) Rectilinear Steiner Minimum Tree(a) Rectilinear Minimum Spanning Tree (b) Rectilinear Steiner Minimal Tree
Figure 2.6: Two approaches of tree construction for multi-terminal nets.
2.2.2 Decomposition Techniques for Routing A Multi-terminal Net
In modern VLSI designs, more than 50% of the nets have more than two terminals. The all-pairs
shortest path algorithms such as the Floyd-Warshall algorithm [21] and Johnson’s algorithm [39] are
applicable to route multi-terminal nets. Maze routing algorithm can also be extended to handle this
case. However, these algorithms have a high runtime penalty due to the larger problem size for multi-
terminal nets. Instead, the global routing procedures typically decompose a multi-terminal net into a
set of two-terminal nets, and then route these two-terminal nets individually. The decomposition of
multi-terminal nets itself, however, is an NP-complete problem [28], and there exists many different
algorithms to tackle the decomposition problem.
Two decomposition approaches are common in Global Routing. The first approach is based on
the construction of rectilinear minimum spanning tree (RMST). As shown in Figure 2.6(a), four
terminal points are first sorted by their coordinates, and a spanning tree is constructed by connecting
these terminals in the sorted order. The spanning tree construction may not yield to the best routing
topology, but the main advantage of this approach is its fast runtime. In the past (i.e., before the
ISPD’07 global routing contest [3]), the majority of the global routing algorithms relied on this
approach for multi-terminal net decomposition.
The second approach is to construct the rectilinear Steiner minimal tree (RSMT) as shown in
Figure 2.6(b). Additional Steiner points may be inserted as (additional) net terminals in the decom-
posed (sub)nets during the tree construction to obtain a better tree topology. The RSMT construction
15
process was considered impractical for Global Routing because of its intractable runtime complex-
ity, until a fast RSMT algorithm called Flute [18] proposed recently. For the nets with nine or fewer
terminals, Flute uses look-up tables to identify the optimal Steiner tree. For the nets with more than
nine terminals, a divide-and-conquer method was applied. FastSteiner [41] is another RSMT algo-
rithm which can generate better topologies for the nets with more than nine terminals, compared to
Flute.
Due to the fast runtime and high performance of Flute, it has been adopted by some of the lat-
est academic global routers [12], [76] to decompose the multi-terminal nets. Although the RSMT
approach generates better initial tree topologies, routers need to have the capability to effectively
restructure the tree topologies for avoiding congested regions. On the other hand, the RMST ap-
proach has worse initial tree topologies, but the simplicity of its data structure is the main advantage
(does not need to record the Steiner point locations). However, routers need to spend more efforts to
recover from the bad initial solutions, and it is also necessary to have mechanisms to allow resource
sharing among the wire segments of a tree [13], [47], [61].
2.2.3 Frameworks for Routing All the Nets
• Rip-up and Reroute and History-based Routing
A common framework for routing all the nets is an iterative “repair” framework known as
rip-up and reroute. Rip-up and reroute starts with an initial routing solution which contains
overflow. Within one iteration, the nets are decomposed into two-terminal nets and ordered
(typically according to the overflow in their routes). The nets are visited in a specific order.
When visiting each net, its route is ripped-up and the corresponding edge utilizations are
decreased. Next, the net is rerouted, typically using Maze Routing to find a shortest path of
a smaller weight. The edge weights are updated again to reflect the newly routed net. The
process continues for many iterations, typically until a time-limit is reached or all the nets are
routed without overflow.
When visiting the nets, the ordering procedure may significantly affect the solution quality.
As a result, many existing approaches [12], [16], [27], [31], [50], [58], [59], [76] have focused
on improving the net ordering procedure.
16
During the iterative rip-up and reroute process, the edge weights can be updated according to
a history-based procedure. This procedure, also known as negotiated-congestion routing [50],
was first introduced in the 90’s for routing in FPGAs. Here, the edge weights are gradually
updated such that the routing edges, which consistently remain in the congested regions over
multiple iterations, are assigned a higher weight. This strategy encourages the maze routing
procedure to utilize the edges outside the congested regions to route the nets. In general, the
cost of each edge e can be formulated as [50]
ce = (be +he) · pe (2.1)
where be is an intrinsic cost, he is the cost reflecting congestion history, and pe is the current
congestion penalty. Different routers may have different penalty functions pe. Typically, giv-
ing a higher penalty to the overflow edges can reduce the runtime to obtain an zero-overflow
solution. However, the wirelegth may in turn increase significantly. For example, let ue and re
represent the capacity and current usage of edge e. The penalty function pe in [61] is defined
as
pe =
exp(k(re/
ue −1))
i f re/
ue > 1
re/
ue otherwise(2.2)
At each rip-up and reroute iteration, a route t for a ripped net is found such that its total cost
(∑e ce, ∀e ∈ t) is minimized. After an alternative route is identified, the congestion history he
is updated using the following equation
hk+1e =
hke +hinc i f e has over f low
hke otherwise
(2.3)
The parameter hinc is a constant value in most implementations. The authors in [50] suggest
that only the nets passing through the congested regions need to be rerouted. This technique
is widely used by many academic routers [12], [61], [76].
17
(a) Bottom-up Routing
(b) Top-down Routing
Figure 2.7: Hierarchical Global Routing.
• Hierarchical Routing
An alternative framework for routing all the nets is a hierarchical one. The intuition behind
this framework is to partition the large-sized and complicated global routing instance into a
set of smaller but simpler ones. It can also be categorized as a systematic divide-and-conquer
approach. In general, there are two different approaches for hierarchical routing which are
bottom-up routing and top-down routing [14] [15] [19] [77].
The bottom-up routing approach, as shown in Figure 2.7(a), first partitions a design into a set
of fine-grained bins to form the bottom level. The local nets that completely fall inside the
bins are routed and fixed. In the next level, four adjacent bins are merged into a larger bin,
and again the nets that completely fall inside this bin are routed. This process is continued
until all the small bins are merged together. In this approach, short nets are routed first in the
earlier levels and occupy the routing resources. Later, this approach may fail to find feasible
18
routes for the long nets because of the shortage in the routing resources. Alternatively, the
top-down routing approach starts with one largest bin. The long nets are routed first, as shown
in Figure 2.7(b). Then this bin is gradually partitioned into smaller ones and the short nets
are handled afterwards. Similarly, finding feasible routes for short nets is a major challenge
in this approach.
2.2.4 Literature Review
Existing academic global routers can be categorized into sequential routers [12], [13], [35], [47],
[52], [57], [61], [76] and concurrent routers [7], [9], [10], [16], [34], [65], [77]. Recently, much
attention has been given to the sequential approach due to the faster runtime, but it heavily relies on
the rip-up and reroute technique and suffers from the dependency on net ordering. The concurrent
approach, on the other hand, has potential to generate a higher quality solution. Nevertheless, the
prohibitively longer runtime of the concurrent approach has become a bottleneck for today’s large-
sized Global Routing instances. Following, we briefly introduce the unique features of these modern
academic global routers.
• Labyrinth [42] applies a quick variation of pattern routing. While it does not achieve compet-
itive solutions, its source code is available which facilitated the development of subsequent
routing procedures.
• DpRouter [11] uses an efficient dynamic-programming based pattern routing technique to
achieve better routing solutions for two-pin nets. It also uses a segment-movement RMST
technique to reconstruct the tree structures for avoiding the congested regions.
• ARCHER [57] is an iterative rip-up and reroute approach. It combines several point-to-point
routing techniques to explore the tradeoff between the solution quality and runtime. For the
nets outside the congestion regions, relatively fast routing procedures such as pattern routing
are employed. Expensive routing procedures such as maze routing are then utilized to im-
prove the solution quality. This strategy has been widely used in the subsequent academic
global routers. Furthermore, a Lagrangian relaxation based algorithm is proposed to dynami-
cally modify the Steiner trees to optimize the routing congestion. While upon its publication,
19
Archer had a competitive runtime, its solution quality and runtime significantly fall behind
from the subsequent global routers.
• MaizeRouter [52] is primarily based on two complementary edge-based operations including
extreme edge shifting and edge retraction. Extreme edge shifting is a generalization of edge
shifting that has been enhanced to restructure the Steiner tree topologies particularly to obtain
topologies that reduce routing congestion. Edge retraction is a procedure to allow resources
sharing among the wire segments of a tree to reduce the wirelength. MaizeRouter won the
3-D category of the ISPD’07 Global Routing Contest.
• FastRoute [58], [59], [76] is one of the most competitive routers with respect to the runtime.
The Hanan grid structure is used in FastRoute to construct the Steiner trees. Later, the Steiner
tree construction is enhanced to consider the vias simultaneously. The monotonic routing
and multi-source multi-destination maze routing are the key point-to-point routing techniques
utilized in FastRoute. Despite its fast runtime, FastRoute fails to generate a zero overflow
routing solution for many benchmarks in its earlier versions. Recently, this router has been
improved to generate competitive solutions in terms of both wirelength and overflow.
• BoxRouter [16] [17] is one of concurrent global routing procedures since it iteratively solves
an integer programming (IP) formulation. However, it can also be considered as a sequential
approach. The main idea of BoxRouter is progressive IP. The procedure starts with a small
box around the congested region of the chip and applies IP to route the nets inside the box.
Next, the box is progressively expanded to include more unrouted nets. Although the IP
considers only L-shaped patterns for each two-pin decomposition, one round of maze routing
is applied afterwards to route the nets which could not be successfully routed after solving
the IP. The subsequent BoxRouter 2.0 adds an additional post-routing stage which utilizes
the negotiation-based rip-up and reroute algorithm to further improve the solution quality. In
BoxRouter 2.0, the IP formulation for layer assignment is also extended to take into account
the routing block regions.
20
• FGR [35] [61] is an abbreviation of “Fairly Good Router”. It extends the PathFinder router
of [50] to handle today’s large-scale global routing instances with multiple routing layers.
It offers several technical novelties, such as a particular function to compute the congestion
penalty, a resource sharing procedure to construct tree structures, and a fast layer assignment
followed by a 3-D clean-up phase. FGR won the 2-D category of the ISPD’07 Global Routing
Contest. It also generated the best solution for many benchmarks at that time. BFGR [35] is a
subsequent version of FGR. BFGR improves the memory usage by clustering the neighboring
overflow edges during the rip-up and reroute procedure, and a new history cost function is also
proposed to enhance the negotiation-based procedure.
• NTHU-Route [12] [27] is an traditional iterative router which also uses a rip-up and reroute
framework. It combines and enhances many different techniques proposed by the previous
academic global routers and is able to generate high-quality solutions. The history-based cost
function is perhaps its core strength to distribute the overflow. NTHU-Route also employs a
congested region identification method to specify the net ordering during rip-up and reroute.
The monotonic routing is the basic point-to-point routing algorithm used in NTHU-Route.
The wirelength reduction in NTHU-Route is achieved by using an adaptive multi-source
multi-sink maze routing method. NTHU-Route won the ISPD’08 Global Routing Contest.
• SideWinder [34] is the latest concurrent global router (before this work). It combines pattern
routing and maze routing in an incremental IP formulation. Similar to BoxRouter, it considers
routing all the two-pin nets with L-shapes first. However, instead of iteratively expanding a
small bounding box, the entire routing grid is considered during each pass. In addition to L-
shapes, it also considers all the Z-shapes and selected C-shapes (slightly detoured routes) in
IP to reduce the overflow. Although more patterns are considered in the IP, the routing results
are not competitive in terms of overflow and wirelength.
• NCTU-GR [47] utilizes the traditional rip-up and reroute based algorithm as its core. Its
main contribution is to “parallelize” the sequential rip-up and reroute algorithm. In the task-
based parallel multi-threaded algorithm of NCTU-GR, multiple nets can be ripped up and
rerouted by different threads concurrently. The concurrent routing process also tries to address
21
the challenge of unexpected resource over-usage. Moreover, this router utilizes RMST to
construct the tree structure for multi-terminal nets, but it takes the advantages from the initial
generated RSMT trees to reduce the wirelength.
2.3 Shortcomings of the Existing Techniques
2.3.1 Sequential Routing Techniques
Perhaps the most straightforward strategy for routing multiple nets is to select a specific net ordering
and then route the nets sequentially in that order. The major advantage of this approach is that the
congestion information from the previously routed nets can be taken into consideration while routing
a current net. Due to its simplicity and practicality, most of the state-of-the-art global routers utilize
this sequential approach. Even the procedure of [17], which uses IP as its core, relies heavily on the
pre-routing and post-routing stages which are inherently sequential.
The drawback of a sequential approach is that the solution quality depends greatly on the order
in which the nets are processed, and it is hard to find a good net ordering. For a given net ordering, it
is often more difficult to route the nets that are considered later since they are subject to more block-
ages. Moreover, the sequential approach is inherently hard to parallelize. One possible approach
is to give different net orderings to different threads (cores), and consider the best solution among
those. This strategy can potentially improve the solution quality, but it still can not reduce the run-
time. Alternatively, the parallelization can be achieved by routing multiple nets by different threads
concurrently as proposed in [47]. However, the resource competing issue (i.e., multiple nets use the
same routing resources to complete their routes) is the main challenge. The simulation results in
[47] also show a slight degradation in terms of wirelength using this parallelization strategy.
2.3.2 Net Decomposition and Resource Sharing
Many of the state-of-the-art global routers first use Flute [18] to generate a Steiner tree for each net,
and then decompose this tree into two-terminal sub-nets. Later, point-to-point routing algorithms
such as pattern routing and maze routing are used to route these two-terminal sub-nets individually.
This procedure can significantly reduce the coding development time; it mainly needs the imple-
mentation and optimization of the net ordering and the edge weights. Nevertheless, the routes of the
22
(a) (b) (c)(a) (b) (c)
Figure 2.8: (a) A six-terminal net which can be decomposed into five two-terminal sub-nets. (b)Routing these five two-terminal subnets independently yields to sub-optimal solution. (c) A solutionwith a smaller wirelength can be found while sharing the resources.
two-terminal sub-nets corresponding to the same net may end up being overlapping with each other
after decomposing; further post-processing is necessary in order to compute and consider the actual
wirelength of the route for the un-decomposed net. Figure 2.8 gives an example for this net decom-
position problem. There is a multi-terminal net with six terminal points, and it can be decomposed
into five two-terminal sub-nets as shown in Figure 2.8(a). In Figure 2.8(b), a solution corresponding
to the independent routing of the sub-nets is shown to be sub-optimal. A better solution considering
the overlap and shared resource usage is shown in Figure 2.8(c).
2.3.3 Multi-layer Global Routing
After the release of the ISPD’07 benchmarks, due to the large-size of the 3-D global routing grid-
graph, the subsequent global routing procedures are following a two-step approach to to decompose
and work with smaller-sized subproblems. First, the 3-D global routing grid-graph is projected to a
2-D grid-graph. The capacity of each edge in the grid-graph is the summation of the capacities of its
corresponding edges (that have the same projection) in the 3-D graph. After solving the 2-D routing
problem on the projected graph, next, a procedure called layer assignment [23] is employed, during
which each segment of a route in the 2-D graph is projected back to the 3-D graph, and inter-layer
vias are then used to connect these segments of the same net on different metal layers to each other.
Neglecting the via cost when solving the 2-D routing problem, however, may lead to significant
degradation in solution quality in terms of the total wirelength (assuming the via cost is included
in the wirelength computation). One possible solution is to penalize every bend wire when solving
the routing problem in the 2-D projected graph. Nevertheless, this method still can’t reflect the true
23
via cost, as one wire may directly go from the bottom metal layer to the top metal layer (stacked
vias), which yields to a higher via cost. In this dissertation, a routing framework is proposed which
directly operates on the 3-D global routing grid-graph and avoids the layer assignment phase.
2.4 Multi-objective Global Routing
Due to the deep submicron process technology, the routability and performance of a circuit needs
to be considered simultaneously during Global Routing. This is because the interconnect delay has
become a dominant factor that determines the system performance [51]. Moreover, crosstalk noise
from the coupling capacitance between adjacent wires is also an important factor to determine the
circuit performance and should also be considered during the global routing stage [79]. Previous
work of [33] developed a timing driven global router called TIGER, which is an Integer Program-
ming based approach. In Tiger, the circuit performance is considered by adding a set of path-based
timing constraint to the formulation. In [20], a heuristic-based timing driven global routing method
for standard cell design was proposed. In this approach, the Steiner trees are first constructed and
fixed for the timing critical nets. The non-critical nets are then routed and detoured from the con-
gested regions to improve the routability. The crosstalk avoidance was considered during the global
routing stage in [79]. In this approach, the nets with a crosstalk violation are ripped-up and rerouted
to satisfy the crosstalk constraint.
Furthermore, dynamic power can also be reduced during the global routing stage by rerout-
ing the nets to minimize the power consumed by the routes [62]. Specifically, interconnect power
(signal power) can take a significant portion of the dynamic power spectrum. For example, the
contribution of the interconnect power is reported to be around 30% of dynamic power for a 45nm
high performance microprocessor synthesized using a Structured Data Paths design style and about
18% of the overall power spectrum [62]. Earlier work [48] also reports high interconnect power in
Intel microprocessors. To minimize interconnect power, a route can be detoured from the congested
regions to minimize the coupling capacitance between adjacent routes. It can also be rerouted to
the lower metal layers since the lower metal layers have thinner wire width and therefore have less
capacitance. Recently, [63] proposed a method to minimize interconnect power during the global
routing stage.
24
Chapter 3
GRIP: GLOBAL ROUTING VIA INTEGER PROGRAMMING
In this chapter, we propose GRIP, a Global Routing procedure that heavily relies on integer
programming techniques. Not only GRIP is able to generate solutions for large-sized global routing
instances, but also the solutions found by GRIP demonstrate a considerable improvement in quality
compared to the best solutions in the open literature. In addition, GRIP has minimal dependency on
the nature of the benchmark instances and robustly generates the best solution in each case.
To effectively use integer programming, GRIP decomposes the large-sized routing problem into
smaller-sized subproblems. Each subproblem corresponds to a rectangular subregion on the chip
together with its net assignments. The smaller-sized subproblems are solved individually, and later
the route fragments of the same net in adjacent subproblems are connected. A final phase is ap-
plied to reduce overflow in the congested regions. The above steps are based on solving an integer
program (IP) that aims to select one route for each net from a set of promising candidate routes.
In the simulation results, GRIP achieves on an average 9.23% and 5.24% improvements in the
total cost (i.e., wirelength and via cost) for the ISPD’07 and ISPD’08 benchmarks respectively.
These results are compared to the best solution reported for each case from four state-of-the-art
academic global routers. The significant improvement is possible due to a combination of the con-
current nature of IP, effective pricing for candidate route generation, directly working with the 3-D
model of the problem, effective decomposition into subproblems, and effective recombination of the
route fragments of adjacent subproblems.
The remainder of this chapter is divided into seven sections. In Section 3.1, an integer program
(IP) for the global routing problem is discussed which minimizes the wirelength of the routed nets
as its primary objective. To effectively solve the IP, in Section 3.2, a linear-programming pricing
procedure is proposed to effectively generate a set of candidate routes for each net. Section 3.3
discusses the decomposition of the original problem into smaller-sized subproblems corresponding
to subregions on the chip. It also discusses the procedure to solve each subproblem using an in-
25
troduced concept called “floating terminal” as well as integrating and connecting the solutions of
different subproblems to obtain a complete solution. Consequently the runtime of the GRIP’s pro-
cedure depends on the number of the decomposed subproblems, some of which can be processed in
parallel. In Section 3.4, the IP in GRIP is extended to minimize overflow as a secondary phase and
an algorithmic procedure is introduced to apply this IP in selected congested regions on the chip.
Computational results are reported in Section 3.6.
3.1 An Integer Program for Global Routing
In a mathematical description of the Global Routing problem, we are given a grid-graph G = (V,E)
describing the network topology, a set of (multi-terminal) nets given by N = {T1,T2, . . . ,TN}, (with
Ti ⊂ V ), and edge capacities ue and edge costs ce ∀e ∈ E. Denote by T (Ti) the collection of all
Steiner trees (routes) connecting the terminals in Ti, and let the parameter ate = 1 if Steiner tree t
contains edge e ∈ E, ate = 0 otherwise. Define the binary decision variable xit that is equal to 1 if
and only if net Ti is routed with route t ∈T (Ti). An integer program for the Global Routing problem
can be written as
minx,s
N
∑i=1
∑t∈T (Ti)
citxit +N
∑i=1
Msi (ILP-GR)
∑t∈T (Ti)
xit + si = 1 ∀i = 1, . . . ,N, (3.1)
N
∑i=1
∑t∈T (Ti)
atexit ≤ ue ∀e ∈ E, (3.2)
xit ∈ {0,1} ∀i = 1, . . . ,N,∀t ∈ T (Ti),
si ≥ 0 ∀i = 1, . . . ,N.
The parameter cit is the cost of route t for net Ti which is computed as the total length of the
3-D route, cit = ∑e∋t ce, where the notation e ∋ t denotes that edge e ∈ E is contained in route
t ∈ T (Ti). The equations (3.1) in the (ILP-GR) formulation enforce the routing of each net. The
decision variable si will be positive if net Ti cannot be routed. The objective function trades off the
total routing length with the number of nets that are routed. Typically M is chosen sufficiently large
to ensure that all the nets are routed. The equations (3.2) in the (ILP-GR) formulation ensure that
the given edge capacities are not exceeded.
26
The formulation (ILP-GR) has a number of appealing properties.
1. The exact properties of the route, such as topology and metal layer can be incorporated into
the cost cit of a route, similar to [61]. The formulation can thus directly handle the 3-D Global
Routing problem, avoiding the traditional layer-assignment phase which can be a source of
sub-optimality.
2. The cost of a route can correspond to any other metric such as the area-capacitance of the
route over multiple metal layers.
3. The formulation does not require that the nets be a priori broken into two-terminal sub-nets.
Breaking nets before doing routing can be a significant source of sub-optimality in the re-
sulting final routing [61]. We note that the final version of GRIP has some “net-breaking” to
define subproblems for scalability.
4. The slack variables si and the corresponding objective penalty factor M push the optimiza-
tion to generate a zero overflow routing solution. The model is quite flexible, as with minor
modifications, the integer program can be set to minimize the total overflow.
A significant disadvantage of the formulation (ILP-GR) is its size. First, for a given net Ti, the
number of all the decision variables for this net is equal to |T (Ti)|—the number of possible Steiner
trees connecting the terminals in Ti in the 3-D global routing grid-graph. Second, the number of nets
N and edges E may also be very large. Nevertheless, we use (ILP-GR) as the basis of GRIP. In the
subsequent discussion, we outline the manner in which we deal with the issues posed by the large
formulation size.
3.2 Solution Procedure via Price-and-Branch
GRIP’s procedure to obtain an approximate solution to the above large-scale integer program (IP)
consists of two phases, as shown in Figure 3.1. First, a pricing procedure is used to generate a
set of candidate routes for each net. Second, branch-and-bound is applied to solve the (ILP-GR)
formulation using only the set of generated candidate routes. This two-phase heuristic procedure is
commonly known as price-and-branch [8], [38].
27
3.2.1 Overview of Candidate Route Generation
To generate a set of candidate routes for each net, GRIP solves a linear-programming (LP) relaxation
of (ILP-GR), a relaxation obtained by replacing the binary requirements on the variables xit ∈ {0,1}
with the weaker constraints 0 ≤ xit ≤ 1. The linear program is solved by a column-generation (CG)
procedure [24] during which a subset of all possible routes (GRIP’s candidate routes) are identified.
Solving the LP relaxation via column generation guarantees obtaining the optimal solution to the
linear program, as if all the routes were explicitly considered.
To describe the CG procedure, it is helpful to consider the dual (LPD-GR) of linear programming
relaxation of (ILP-GR):
maxλ≤M,π≤0
∑i∈N
λi + ∑e∈E
πeue (LPD-GR)
s.t. λi +∑e∋t
πe ≤ cit ∀i = 1, . . . ,N,∀t ∈ T (Ti). (3.3)
In a column generation procedure, only a small subset of all possible routes is explicitly included
in the LP relaxation of (ILP-GR). Let S (Ti)⊂T (Ti) be the set of routes considered for net Ti. The
restricted master problem for (ILP-GR) is
minx≥0,s≥0
N
∑i=1
∑t∈S (Ti)
citxit +N
∑i=1
Msi (RMLP-GR)
∑t∈S (Ti) xit + si = 1 ∀i = 1, . . . ,N
∑Ni=1 ∑t∈S (Ti) atexits ≤ ue ∀e ∈ E.
Solving (RMLP-GR) yields a (primal) solution (x, s) as well as values λ ≤ M and π ≤ 0 for the
dual variables in (LPD-GR). By linear programming duality, if the values (λ , π) satisfy all the dual
constraints in (3.3), then (x, s) is an optimal solution to the LP relaxation of (ILP-GR). If not, then
the violated dual constraint suggests that adding the associated column (as a new route variable) to
(RMLP-GR) may reduce its objective value, and the process repeats.
To determine if the dual values (λ , π) (generated from (RMLP-GR) by considering only S (Ti))
are feasible in (LPD-GR) (which includes T (Ti)), we must determine if there exists at least one
28
route t ∈ T (Ti) with λi +∑e∋t πe > cit .
This is itself an optimization problem, known as the pricing problem. The pricing problem can
be decomposed into independent problems for each individual net i = 1, . . . ,N. Specifically, the
pricing problem for net Ti is
mint{cit −∑
e∋tπe | t ∈ T (Ti)}. (PP(Ti))
If the optimal solution value of the formulation (PP(Ti)) is sufficiently small (< λi), then the
values (λ , π) are not dual feasible. Specifically, let t∗ be an optimal solution to the formulation
(PP(Ti)). If
cit∗ −∑e∋t
πe < λi, (3.4)
then t∗ identifies a violated constraint (3.3) in the formulation (LPD-GR). The current solution to
(RMLP-GR) can thus be improved by updating Si to include t∗ as a new column (route).
The CG procedure is summarized as follows:
0. For each i = 1, . . . ,N, initialize S (Ti) with at least one route. (GRIP uses the route generated
for net Ti by the package Flute [18]).
1. Solve the formulation (RMLP-GR), yielding a primal solution (x, s) and dual values (λ , π).
2. For each i=1,. . . ,N, solve the formulation (PP(Ti)), yielding a route t∗. If cit∗ −∑e∋t πe < λi,
then Si = Si ∪{t∗}.
3. If an improving route for some net Ti was found in step 2, return to step 1. Otherwise, stop—
the solution (x, s) is an optimal solution to the LP relaxation of (ILP-GR).
Figure 3.1 illustrates these steps. First, an initial route for each net Ti is generated by Flute
in step 0. These routes are very close to the minimum Steiner trees for the nets. The total of their
29
Step 0: Generate an initial candidate route for
each net using the package Flute
Step 1: Solve (RMLP−GR) using candidate
routes to get dual values for (LPD−GR)
Step 2: Solve the pricing for each net
Step 3:Have new route?
add to candidate routes
Yes
Price:
Branch:Approximately solve (ILP−GR) using
a branch-and-bound solver for the generated candidate routes
No
Figure 3.1: Overview of the price-and-branch procedure for GRIP.
wirelength is likely to give a lower bound on the total wirelength in an optimal solution to the Global
Routing problem. On the other hand, these routes are initially 2-D routes that only use the lowest
horizontal and vertical layers and would result in significant overflow if all used in combination.
After step 1, in a primal solution (x, s) with dual values (λ , π), nets Ti ∈ N that are not able
to be routed completely with the existing Steiner trees in the set S (Ti) will have si > 0 and (by
the complementary slackness condition of linear programming) λi = M. Also by linear program-
ming duality theory, the dual variable πe is the rate of change of the optimal objective value of
(RMLP-GR) per unit change in ue, the capacity of edge e. By observing the condition (3.4), the CG
procedure will naturally seek to find routes for nets Ti with large λi, routing them with edges that
have πe as close to zero as possible. Ideally the routes would use edges with πe = 0, which implies
(again by complementary slackness) that the edge is not being used to capacity. In this way, one can
imagine that the CG procedure helps to iteratively disperse the initial nets from the lower layers to
30
upper layers and from the congested areas to less congested ones. As a result, it is unnecessary to
utilize layer assignment to manipulate the initial 2-D routes.
To summarize, the strengths of the pricing procedure are:
• When generating new candidate routes at each iteration of CG procedure, the impact of can-
didate routes of previous iterations are effectively taken into account. This is by resolving
(RMLP-GR), incorporating the impact of all the existing routes to get a new fractional solu-
tion.
• Within each iteration, solving the pricing procedure effectively identifies new candidate routes
since the objective of (RMLP-GR) is always improved. Moreover, a measure of current con-
gestion is also incorporated in selecting the nets to price. (See Section 3.2.3).
The computational experience with the CG procedure in GRIP was that the objective value of
(RMLP-GR) was quickly improved in the first iterations, but the rate of improvement decreased sig-
nificantly in the later iterations. This “tailing off” phenomenon is very common to the CG procedure
[25]. The improving routes at later iterations of the algorithm almost always come from nets outside
the highly congested areas. Further, the wirelengths of the improving routes are almost identical
to the trees currently available for routing. In these cases, adding the routes to (RMLP-GR) makes
little or no improvement to the objective value. A significant portion of the runtime of the CG pro-
cedure can be spent on iterations that improve the objective value of (RMLP-GR) only marginally.
Thus, in order to speed solution time, GRIP typically stops the procedure once the solution value
has tailed off. Specifically, if the objective value of (RMLP-GR) has made little or no improvement
(less than 10 wirelength units) in the last 20 iterations, the CG procedure is terminated. Next, we
discuss details of steps 2 and 3 of the pricing procedure.
3.2.2 Solving the Pricing Problem for One Net
In the pricing phase (step 2) of the CG procedure, GRIP solves (PP(Ti)) for each net. We rewrite the
objective expression of (PP(Ti)) as
cit − ∑e∋T (Ti)
πe = ∑e∋T (Ti)
(ce − πe) (3.5)
31
u
v
u
v
Figure 3.2: Improving routes via the shortest path algorithm on a weighted grid-graph.
where ce is the cost associated with edge e (e.g., ce=1 when considering wirelength and via count).
To minimize the above objective for net Ti, GRIP considers a weighted graph with edge weights
we = ce− πe. Minimizing the objective of (PP(Ti)) requires finding the smallest-weight Steiner trees
on this weighted graph. Finding a minimum-weight Steiner tree is in general NP-Hard [28], so GRIP
adopts a (heuristic) approach for finding columns that reduce the objective value of (RMLP-GR)
based on local search.
Within the pricing problem, condition (3.4) should be evaluated in step 3 of the CG procedure.
Given a dual solution (λ , π), the reduced cost of route t of net Ti is
cit = cit −∑e∋t
πe − λi (3.6)
Consequently the pricing problem can be viewed as a procedure for identifying a Steiner tree
t for net Ti whose reduced cost cit < 0. By the complementary slackness condition of linear pro-
gramming, for any optimal solution (x, s) to (RMLP-GR) and corresponding dual values (λ , π), the
reduced cost cit = 0 if xit > 0.
GRIP’s local improvement procedure for solving (PP(Ti)) uses this fact as well as the following
simple observation. Given a route t ∈ S (Ti), let V (t) be the set of vertices of the terminals and
Steiner points in t. If the variable xit > 0, and if there exists a path P′ which connects two vertices
(u,v)∈V (t) such that the weight of P′ (with respect to weights w) is less than the weight of the path
P from u to v using edges in t, the reduced cost of tree t ′ = t ∪P′ \P is negative. Thus, adding the
32
variable corresponding to route t ′ to (RMLP-GR) may reduce its objective value. Figure 3.2 shows
inserting such a u-v path into a base Steiner tree.
To approximately solve (PP(Ti)) for a net Ti, GRIP starts with the tree t ∈ S(Ti) with the largest
value of xit . Using edge weights we = ce − πe, a single-source shortest path problem is solved
for some u-v paths, where (u,v) ∈ V (ti). If the new u-v path has smaller length than the existing
path, a route with negative reduced cost has been identified. To identify sources and sinks for the
shortest path problems, the selected route t is decomposed into a set of two-terminal segments r jt
by breaking it at the Steiner points of t. The segments are considered in descending order of their
weight ∑e∋r jt we. When considering segment r jt , the remaining segments of t are considered as a
“base Steiner tree”, and an alternative route of the segment r jt must be found to connect to this base.
Zeroing the weights we = 0 ∀e ∋ t \ r jt and running Dijkstra’s single-source shortest path algorithm
will connect the segment r jt to the base net in a minimum cost fashion [61], [76].
Dijkstra’s single-source shortest path algorithm [26] generates an entire tree of shortest path
weights, thus possibly identifying many routes that would reduce the objective value of (RMLP-GR).
At each iteration of GRIP, a subset of these routes are added as new columns by uniformly sampling
from all the identified routes. At most 40 routes will be added for the nets inside the congested area
(defined in Section 3.2.3), and at most 16 routes are added for nets outside the congested area.
Figure 3.3 demonstrates how new routes are constructed by rerouting a two-terminal u-v seg-
ments. In the figure, the cost and capacity of each edge is 1, and there are two initial routes ta
and tb for nets Ta and Tb, respectively. After solving (RMLP-GR), GRIP sets the edge weights to
we = ce − πe. These edge values are shown in 3.3(a). Note that the two edges with overflow have
large negative dual values (πe << 0), resulting in large positive edge weights. (The penalty for not
completely routing a tree is M = 100 in this example). Based on these edge weights, the cost of
routes ta and tb are 205.
Assume that net Ta is selected for pricing first. As shown in Figure 3.3(a), tree ta can be de-
composed into three segments, each including a terminal. The total edge weight is maximum in the
segment that includes the two edges with large weights, so GRIP starts by rerouting this segment,
using the remaining ones for the base Steiner tree [61], [76]. To reroute the segment, it is removed
from ta, and the edge weights of the remaining edges on the base Steiner tree are set to zero, as
shown in Figure 3.3(b). Thus, GRIP considers the base tree as a backbone when reconnecting ua
33
ta’0.0 0.0
99.0
99.0
0.0
ua
va
0.0 0.0
1.0 1.0
1.0
1.0
0.0
ub
tb’0.0 0.0
0.0
99.0
99.0
0.0
1.0
1.0
1.0
1.0
1.0
vb
tb
ta
99.0
tb’
ta’
(a) (b)
(c) (d)
1.0
1.0 1.0 1.0
1.0
1.0
99.0
1.01.0
1.0 1.0 1.0 1.0 1.0
1.0
Figure 3.3: Procedure to identify new candidate routes with reduced cost via rerouting segments ofan existing route.
and va using Dijkstra’s algorithm.
After reconnecting, an improved route t ′a, avoiding the highly-weighted edges, is identified with
cost of 10 units. In a similar fashion, GRIP considers net Tb and reroutes the segment ub-vb as shown
in Figure 3.3(c). The new segment t ′b for net Tb has a new cost of 9 units. If the reduced costs of t ′a
and t ′b are larger than zero, then these routes are added to (RMLP-GR) as new candidate routes. An
interesting feature of this pricing algorithm is that the new routes can use different Steiner points
than the original ones.
34
3.2.3 Selecting Nets to Price
For large instances of (ILP-GR), the CG procedure can be significantly accelerated by only solving
the pricing problem (PP(Ti)) for a subset of all the nets. To select the nets Ti ∈ N for which
(PP(Ti)) is solved, GRIP takes advantage of information provided by the solution of (RMLP-GR).
Specifically, if si > 0, then net Ti is not completely routed using the existing routes in Si, so net Ti
will be priced. GRIP first prices all the nets in descending order of si (> 0).
To decide whether or not to price the remaining nets with si = 0, GRIP considers measures of
congestion in the current LP solution to (RMLP-GR). In the first measure, congested edges are
those edges e that have the most negative value of πe. The intuition is that πe provides the rate of
change in the objective function of (RMLP-GR) per unit additional capacity on edge e. The second
measure identifies a congested edge by letting ri ∈ argmaxt∈S (Ti) xti be a route for net Ti with the
largest solution value in (RMLP-GR). The value ηe = ∑Ni=1 arie is the number of units of capacity
on edge e that would be used if the routes ri were used for each net Ti ∈ N. If (ηe−ue) is large, then
e is highly congested.
GRIP defines a bounding box (of 3x3 units of grid edges) around an identified-congested edge
e. All nets Ti that contain a terminal inside the bounding box are repriced. GRIP first reprices nets
that are identified by the first congestion measure, followed by nets found by the second measure.
3.2.4 Branch and Bound
Once the CG procedure for the solution of the LP relaxation of (ILP-GR) is complete, either because
no improving routes were found in the pricing phase, or because tailing off was detected, a promising
candidate subset of routes S (Ti)⊂T (Ti) has been identified for each net Ti. Using only these route
variables, the integer program (ILP-GR) is formulated and solved by a black-box commercial integer
programming package. The solution returned by the solver is a feasible solution to the problem.
The proposed approach, based on the direct solution of (ILP-GR), has significant promise to
improve the solution quality of existing academic global routers. For example, using this approach,
we solved the small 2-D IBM01 circuit of the ISPD’98 suite [1] and were able to improve the wire-
length by approximately 5% compared to the best solution found by FGR [61], without any over-
flows. However, the runtime to achieve this high-quality solution for such a relatively-small instance
35
was prohibitively long—a few hours. Thus, in the following section, we discuss mechanisms for
decomposing the full global routing (ILP-GR) problem into smaller instances and procedures for
combining the solutions in order to generate high-quality solutions to large-scale Global Routing
instances. The decomposition procedure considerably accelerates the overall runtime.
3.3 Decomposition for Scalability
Many existing global routing algorithms define reasonably-sized subproblems and create a full so-
lution from integrating the solutions to these subproblems. For example, to achieve a good runtime,
BoxRouter [16] starts by solving an IP over a small rectangular box on the chip and progressively
increases the size of the box to generate new IPs, fixing the solution to the previous IP. However,
this solution fixing when increasing the box size may lead to a degradation in solution quality. The
work [77] proposes a hierarchical IP approach that first solves a small IP to plan the routing of the
longest nets. However, the impact of the shorter nets is neglected.
As demonstrated in Section 3.1, the price-and-branch procedure has potential to find high-
quality solutions, but needs to be accelerated. In this section, we discuss ideas for decomposing
the integer program (ILP-GR) into smaller ones that correspond to non-overlapping rectangular ar-
eas on the chip, together with their net assignments. For example, if all the terminals of a net fall
within a rectangle, then the net is assigned to that subproblem and is bound to be routed inside the
rectangle. We first discuss how GRIP’s IP-based procedure is applicable to solve one subproblem.
We then discuss the procedures to define subproblems and integrate their solutions.
3.3.1 Solving the Subproblems
A subproblem is characterized by a rectangle on the chip referred to as a subregion, together with
a set of nets that must be routed within that area. For some nets, all terminals will lie within the
rectangle, but for longer nets, additional (or all) of their terminals might be outside the rectangle.
Nets whose terminals do not all fall within the rectangle are referred to as inter-region nets. Inter-
region nets are partially routed by each subproblem, and subsequently their segments in different
subproblems are connected.
To be applicable in a decomposition-based procedure, GRIP must handle the case when a sub-
36
problem includes both within-region and inter-region nets. GRIP’s procedure works as follows.
Each subproblem defines a new grid-graph G′(V ′,E ′) and set of nets N ′ ⊂ N . The set N ′ is
composed of two types of nets: the within-region nets that have all terminals inside the subproblm
(Ti ⊆V ′), and the inter-region nets that have at least one terminal outside the subproblem (Ti ⊆V ′).
Figure 3.4(a) shows the latter type of these nets. The net in the figure belongs to three different
subproblems. The neighboring boundaries of these subproblems are shown in bold. The routing
problem for the bottom-left subproblem views this net to have one fixed and two “floating” termi-
nals. Each floating terminal represents a portion of a subproblem boundary through which the net
will connect to another subproblem.
To route inter-region nets in a subproblem, GRIP represents each floating terminal using an aux-
iliary node that is added to the set of nodes V ′ in the grid-graph. Edges connecting the nodes that
are on the subproblem boundary to their corresponding auxiliary node are added to the set E ′. The
added edges have infinite capacity and zero cost in the definition of the integer program (ILP-GR).
Figure 3.4(b) illustrates the addition of auxiliary nodes and edges. After applying this simple con-
struction, the integer program (ILP-GR) is well-defined, and can be solved by the procedure outlined
in Section 3.1.
The example of Figure 3.4(b) is for 2-D routing, but in the general 3-D case, each boundary of
a subproblem is a plane and the graph G′ extends to the third dimension, as shown in Figure 3.4(c).
The nodes on this vertical boundary plane are connected to their corresponding auxiliary node.
3.3.2 Decomposition into subproblems
The challenge of decomposing the problem into subproblems is best understood by means of our
initial computational experience. Our first decomposition approach was to define a uniform grid of
subproblems consisting of the same area. Net assignment to the subproblems was based on their
routings by Flute.
This natural but naive decomposition approach resulted in the IPs corresponding to the congested
subproblems taking significantly longer to be solved by our procedure (e.g., hours for congested sub-
problem and minutes for the less congested ones). Thus, an important objective of the subproblem
definition is to achieve balance, resulting in “equally-difficult” problems that take approximately
37
: floating terminals
auxiliary node
(a)
(b)
(c)
Figure 3.4: Modifying grid-graph of a subproblem to handle floating terminals.
the same time to solve.
GRIP’s procedure for defining subproblems begins by routing all the nets using the 2-D Steiner
route generated by Flute [18]. For each grid edge e of the 2-D problem, a utilization factor is
defined as the ratio of the number of (Flute) routes that cross edge e to its (projected) capacity ue.
The utilization factor plays an important role in defining the subproblem boundaries.
Next, GRIP applies a recursive bi-partitioning strategy, trying to balance an average edge uti-
lization factor (AEU) for each region. At each step, one rectangular partition is divided into two
new rectangles where the AEU is balanced between the two. The AEU for a partition is defined as
the average of the utilization factors of the grid edges in the corresponding rectangle. Moreover, to
decide between a vertical or horizontal partitioning, GRIP chooses the one that results in the smaller
aspect ratio of the generated rectangles. The recursive bi-partitioning stops when any of the sides of
the current partition is less or equal to 32 units of the routing grid, a size empirically set to generate
an IP that can be typically solved by the procedure outlined in Section 3.1 in an acceptable runtime.
This partition will then be taken as a subproblem. Figure 3.5(a) shows an example after the first two
steps.
Once the subproblems are created, the net assignments suggested by Flute are further improved
by considering the congestion of the subproblem.
38
T2
T1
T2
T1
(a) Before detouring (b) After detouring
A A
Figure 3.5: (a) Defining subproblems using initial Flute-based net planning. (b) Improving netassignment to the subproblems via detouring.
Figure 3.5(a) illustrates this point. The two nets T1 and T2 are routed using their Steiner routes,
both of which pass through subproblem A. Net T1 does not have any terminals inside subproblem
A. If A is congested, it is better to detour T1 from A, as shown in Figure 3.5(b), reserving the routing
resources for nets that must be routed into the subproblem.
To improve the net assignments to the subproblems, GRIP relies on the fact that subproblems are
solved in a congestion-based ordering, as described in Section 3.3.3. Before solving a subproblem,
GRIP detours as many nets as possible that “pass” through the corresponding subproblem (i.e., do
not have a terminal in it). The remaining (undetoured) nets inside the subproblem are the ones as-
signed to it and the corresponding subproblem is then solved. The procedure repeats before solving
the subsequent subproblem.
To detour routes out of a subproblem, a shortest path algorithm is used. For a net that does not
have any terminals in the current subproblem, we identify the segment (using its Flute route) that
passes the subproblem, and consequently the two terminals that are connected using this segment.
The two terminals are reconnected via a new segment back to its tree backbone using the same
maze routing procedure explained in the Section 3.2.2 (see Figure 3.5). The weights on the grid
graph for the shortest path problem are set as follows. Since the net should be detoured outside
the subproblem, weights of all the grid-edges inside the subproblem are set to infinity. For the
remaining edges, if an edge is used to capacity by the existing (Flute and detoured Flute) routes,
the weight is set to a large positive number (=100). The remaining edges have a weight of 1. The
39
Temporarily fix the nets using the Steiner routes
generated by Flute
Extract subregions by recursive bi-partitioning
Detour inter-region netsfrom subregion i
Solve IP for subregion i
i = 0
i ++
Figure 3.6: GRIP’s procedure to define and solve the subproblems.
detouring procedure in GRIP has the benefit that it is dynamic, continually updating edge weights
for rerouting, every time a new subproblem is processed. Figure 3.6 shows an overview of GRIP’s
flow to define and solve the subproblems.
3.3.3 Integration of the Solutions of the Subproblems
Thus far, we have explained how the subproblems are formed and how the net assignments are made
to define the subproblems. GRIP solves the subproblems in a rather sequential order with limited
parallelism. After all the subproblems are solved, a final phase connects the route segments which
pass neighboring subproblems. Both of these phases are explained in this section.
For each subproblem, GRIP first computes the total edge overflow (TEO). The TEO is the total
amount of overflow that would occur in the subproblem if the assigned Steiner (Flute and detoured
Flute) routes were used. Subproblems are processed in the decreasing order of their TEOs. Every
time a subproblem is processed, the floating terminals for a net Ti are fixed at a boundary of the
subproblem, as shown in Figure 3.7. Thus, the net Ti is partially routed, and subsequent subprob-
lems must respect this partial routing by assuming the imposed boundary-terminal is fixed. If two
consecutive subproblems (in terms of TEO) are not physically adjacent, GRIP processes them in
parallel.
40
it
Subregion 1 Subregion 2
0.0 0.0
0.0 0.0
0.0
0.0
0.0
0.0
0.0
0.0
: first Steiner point
: fixed terminal
Figure 3.7: Connecting route-segments in adjacent subproblems.
Solving all the subproblems fixes the locations of the floating terminals on the subproblem
boundaries. However, the subproblems are not connected, since the grid-edges between the bound-
aries of subproblems are not considered when solving each subproblem. GRIP uses the same IP-
based procedure to connect the route segments in adjacent subproblems. Specifically, after all the
subproblems are processed, GRIP fixes all the nets that completely fall within a subproblem. For the
inter-region nets, GRIP fixes a “backbone” inside each of its subproblems. To create the backbone,
GRIP removes the branch of the net that connects the boundary terminal to the first Steiner point of
the route in the subproblem. (See Figure 3.7.)
Once these connecting segments are removed, routing resources are freed. GRIP connects these
segments using the formulation (ILP-GR), first fixing all routes of within-region nets and backbones
of the inter-region nets. In the IP, the nets to be routed are two terminal nets crossing the inter-region
boundary, each terminal being a Steiner point of the backbone in the region. By setting the edge
weight we = 0 for all edges in the backbone, the IP effectively connects the two sub-nets at any
location on the backbones. When connecting two neighboring subproblems, remaining (unfixed)
capacity is allocated to the subproblem in a manner that ensures that neighboring subproblems in
all quadrants will be able to be effectively connected.
41
3.4 Handling Overflow
After connecting the subproblem solutions, GRIP evaluates if any net is left unrouted. In case all
the nets were routed, which we found to be the case in the majority of our tested benchmarks, GRIP
terminates. If nets were left unrouted, then routing those nets using any of the generated candidate
routes will introduce overflow (i.e., the corresponding slack variable in Equation (3.1) is 1). In this
section, we discuss an IP and the specifics of price-and-branch procedure to reduce overflow. We
then discuss how GRIP applies this procedure to selected areas on the chip.
3.4.1 Integer Program for Overflow Reduction
GRIP uses the following IP to minimize overflow:
minoe
∑∀e∈E
Qeoe (ILP-OV)
∑t∈T (Ti)
xit = 1 ∀i = 1, . . . ,N, (3.7)
N
∑i=1
∑t∈T (Ti)
atexit ≤ ue +oe ∀e ∈ E, (3.8)
xit ∈ {0,1} ∀i = 1, . . . ,N,∀t ∈ T (Ti),
oe ≥ 0 ∀e ∈ E.
Compared to (ILP-GR), the slack variable si is removed from the net constraint (3.1), but a
new slack variable oe is added to the edge capacity constraints (3.8). The slack variable oe will be
positive if edge e contains overflow, and the objective is to minimize the overflow over all edges.
In (ILP-OV), Qe is a constant weight that can be set to a different value for each edge. In the case
Qe = 1 ∀e, the objective is to minimize the total overflow. GRIP sets Qe by considering the overflow
produced by routing the nets unrouted by the original IP-based procedure. To route an unrouted net,
GRIP selects from the candidate routes for that net, the route that would lead to minimum additional
overflow. If edge e contains overflow in this complete solution, Qe is set equal to the amount of
overflow. If it does not contain overflow, Qe is set equal to 1.
42
3.4.2 Solution Procedure via Price-and-Branch
Similar to IP-based procedure of Section 3.2, GRIP utilizes column generation to solve the LP
relaxation of (ILP-OV). The dual of the LP relaxation of (ILP-OV) is
max ∑i∈N
λi + ∑e∈E
πeue (LPD-OV)
λi +∑e∋t πe ≤ 0 ∀i = 1, . . . ,N,∀t ∈ T (Ti),
−Qe ≤ πe ≤ 0 ∀e ∈ E,
λi : free.
GRIP starts with a small subset of routes and solves the restricted master problem for (ILP-OV):
minx≥0,o≥0
∑∀e∈E
Qeoe (RMLP-OV)
∑t∈S (Ti) xit = 1 ∀i = 1, . . . ,N,
∑Ni=1 ∑t∈S (Ti) atexit ≤ ue +oe ∀e ∈ E.
At the first iteration, S (Ti) only contains one route per net—the route used to obtain the
complete solution. GRIP solves (RMLP-OV) to obtain the dual values λ and π . The pricing
problem is solved to identify a new route t∗ that violates the first constraint of (LPD-OV) (i.e.
λi +∑e∈E πet∗e > 0), indicating the objective of (RMLP-OV) may be improved if t∗ is added to
S (Ti).
When solving the pricing problem to identify a negative reduced cost route for each net, the
edge weight we = −πe is used. Note that πe ≤ 0, so we ≥ 0, and Dijkstra’s single-source shortest
path algorithm can be used to identify the promising routes, exactly as in the procedure described
in Section 3.2.2. Just as in the GRIP procedure for solving (ILP-GR), the linear program solution
process is terminated when tailing off is detected, and the resulting routes are given to a commercial
branch and bound solver to find an integer solution to (ILP-OV).
When selecting nets to price, GRIP first focuses on the nets that pass through the edges with
overflow. Specifically, at one iteration of column generation, for each edge e with oe > 0, all the
43
: edge with max overflow
: fixed terminal
A
B
T1
Figure 3.8: Defining few subproblems around the edges with overflow.
routes t with xit > 0 containing edge e ∈ E(t) are selected and repriced. Since the number of edges
with oe > 0 is typically small, all the corresponding routes are selected, repriced, and their new
route variables are added at the same iteration of column generation. The process repeats until no
improving routes are found, or it tails off if the objective value of (RMLP-OV) has not improved
more than 1 unit of the objective value in the last 20 iterations. Once this procedure is terminated, a
set of candidate routes are found which is used to solve (ILP-OV) to obtain one route for each net.
3.4.3 Defining Subproblems for Overflow Reduction
After integrating the subproblem solutions, overflow was not observed in the majority of the tested
benchmarks. Only for the three benchmarks in the ISPD’08 suite was overflow observed. Further,
overflow was typically confined to a very few “hot spots”. GRIP exploits this observation to define
small-sized subproblems with their net assignments on which to apply the IP-based procedure for
overflow reduction, described in Section 3.4.1. The routes on the other portions of the chip remain
intact.
GRIP defines the subproblems on which to reduce overflow in a sequential order but solves the
corresponding subproblems in parallel. To define the subproblems, GRIP traverses the grid edges in
descending order of their overflow values in a complete solution.
GRIP defines a rectangular subproblem of 40x40 grid edges centered at each overflow edge. If
44
an overflow edge is already included in a previously-defined subproblem, a new subproblem will
not be defined. Moreover, if defining the subproblem for an overflown edge results in overlap with
a previously defined subproblem, the new subproblem will be shifted until the overlap is removed.
All the routes inside a subproblem will be rerouted using the IP-based procedure for overflow
reduction. If a net has terminals outside the subproblem, fixed-terminal location(s) will be defined on
the subproblem boundary, based on its route generated from the previous steps. The fixed terminal
locations are honored when solving the subproblem. Figure 3.8 illustrates this point. This process
also ensures that the segments of the rerouted nets in congested subproblems will remain connected.
3.5 Comparison to Optimization-Based Methods
A number of other authors have proposed optimization-based methods for global routing, and the
purpose of this section is to attempt to place our work in context of these previous contributions.
An early description of applying a pricing procedure to solve the global routing problem is given
by [36]. This work is perhaps the most similar to the GRIP algorithm, in that it relies on column
generation, on defining subregions, and on pasting partial solutions together. In their work, there is
no mention of solving the IP, only its LP relaxation, and computational results are not reported.
The works [7] and [65] both focus on developing fast algorithms for approximately solving the
(full) LP relaxation of the global routing problem. In these approaches, the actual, primal, integer-
valued, routing solution is done by a randomized rounding procedure. This is quite different from
GRIP. GRIP is based on a price-and-branch approach for approximately solving the integer pro-
gram. So both procedures of solving the LP relaxation (pricing), and obtaining an integer solution
(branching) are different.
The paper [56] is a similar approach to that of [7] and [65], but designs an algorithm that can
specifically accounts for the effects of wire spacing during yield optimization. The work [38] is a full
branch-and-price procedure that mathematically shares many commonalities with GRIP. However,
the work [38] is designed specifically for the switchbox routing problem, and the instance sizes are
small enough so that region-based decomposition, done in GRIP, is not needed.
Similar to (ILP-GR), the paper [77] also suggests IP formulations for the global routing problem.
To solve the formulations in [77], column generation is not employed, but rather a set of possible
45
routes for each net is iteratively constructed during a congestion estimation phase. Generating routes
during the LP solution process, as done in GRIP, has the significant advantage of exploiting the dual
information to suggest good routes. The work [77] additionally considers a number of different
objectives besides wirelength or overflow. The paper [78] builds on the work of [77], by describing
different heirarchical approaches, where the routing problems are solved either top-down or bottom-
up. Computational results are given for chips with up to around 25,000 cells and nets.
BoxRouter [17] uses IP formulations as a fundamental component of their algorithm. These IP
formulations are not Steiner-tree packing formulations like (ILP-GR). A fundamental idea behind
BoxRouter is that of progressive IP, where the IP formulation is solved first for a subregion, and
portions of this solution are fixed before proceeding to other regions. This is quite a different
approach than GRIPs area-based decomposition and patching.
3.6 Simulation Results
We first evaluate the quality of the defined subproblems using the GRIP’s procedure in terms of
reaching balance in computation. Next, we present the results reporting the solution quality and
runtime of GRIP.
3.6.1 Evaluation of Subproblem Definition
As discussed in Section 3.3.2, it is a challenge to decompose a large-sized global routing instance
into a set of “balanced” subproblems. A bad decomposition approach may lead to long runtime or
even unroutable solution for the congested subproblems. To demonstrate the degree of balance of
GRIP’s subproblem definition, we considered the adaptec1 benchmark instance from the ISPD’07
benchmark suites [3].
We followed the same decomposition procedure as described in Section 3.3.2 for defining the
subproblems but applied a more aggressive detouring for the long inter-region nets and solved the
subproblems in parallel. Note that all the floating terminals of inter-region nets were modeled as
the auxiliary nodes, and the connection of subproblems was not considered in this experiment. To
measure the computation effort in the pricing phase, we report the number of iterations in the pricing
phase as reported by MOSEK 5.0 [55] which we used to solve each subproblem. Next, to measure
46
GRIP Partition
GRIP Partition
Uniform Partition
Uniform Partition
Subproblem IDSubproblem ID
Subproblem ID Subproblem ID
Figure 3.9: Comparison of GRIP-based partitioning versus uniform-based partitioning in the bench-mark adaptec1 [3].
the effort in the branch-and-bound phase, we report the number of nodes in the branch-and-bound
tree. We collect these two measurements for each subproblem.
We then applied an alternative uniform partitioning for comparison. We defined uniform-sized
subproblems of 32x32 routing grid units and utilized the same routing information as generated
in the previous approach. Similarly, all the subproblems are processed in parallel for which the
MOSEK iterations and the number of explored nodes in the branch-and-bound tree were recorded.
We then compare these two subproblem generation approaches and present the comparison plots in
Figure 3.9.
From this experiment, even though uniform partitioning uses the same initial routes of GRIP’s
partitioning to define the partitions, we can see that GRIP’s partitioning achieves better balance
in terms of number of MOSEK iterations. The uniform partitioning is also highly unbalanced with
respect to the number of branch-and-bound nodes. The plot for uniform partition shows two clusters,
47
Table 3.1: The ISPD’07 and ISPD’08 benchmarks
Benchmark # Nets Grid # Layers V.cap H.cap
adaptec1 (07) 176715 324x324 6 70 70
adaptec2 (07) 207972 424x424 6 80 80
adaptec3 (07) 368494 774x779 6 62 62
adaptec4 (07) 401060 774x779 6 62 62
adaptec5 (07) 548073 465x468 6 110 110
newblue1 (07) 270713 399x399 6 62 62
newblue2 (07) 373790 557x463 6 110 110
newblue3 (07) 551667 973x1256 6 80 80
newblue4 (08) 636195 455x458 6 84 84
newblue5 (08) 1257555 637x640 6 88 88
newblue6 (08) 1286452 463x464 6 132 132
newblue7 (08) 2635625 488x490 8 212 212
bigblue1 (08) 282974 227x227 6 110 110
bigblue2 (08) 576816 468x471 6 52 52
bigblue3 (08) 1122340 555x557 8 148 148
bigblue4 (08) 2228903 403x405 8 202 202
one with 1000 nodes (which we used as a threshold to stop the time-consuming branch-and-bound
procedure) and one with less than 200 nodes. Although GRIP’s partitioning is more balanced,
nevertheless, we still have a small number of unbalanced subproblems.
3.6.2 Comparison of the Solution Quality and Runtime
GRIP was implemented using C++. For solving individual LPs and IPs, MOSEK 5.0 [55] and
CPLEX 6.5 [22] were used, respectively. We report results on the ISPD’07 [3] and ISPD’08 [4]
benchmarks. In the ISPD’07 benchmarks, each via is considered to have a cost of 3 units while in
the ISPD’08 benchmarks, it has a cost of 1 unit. Table 3.1 reports the total number of routed nets
and the grid size for each benchmark. Column 4 shows the number of metal layers. The last two
columns report the projected edge capacities for vertical and horizontal layers.
Table 3.3 reports the solution quality of GRIP for these benchmark instances1. For each bench-
mark, the total overflow is given in the column TOV, and the total wirelength (WL) is broken down
1Benchmark solutions can be downloaded at http://wiscad.ece.wisc.edu/gr/
48
into both edge and via cost. GRIP is compared to four recent academic global routers: FGR 1.1
[61], FastRoute 4.0 [76], NTHU-Route 2.0 [12], and BoxRouter 2.0 [16]. For each router, we report
the percentage improvement in wirelength found by GRIP, as well as the total overflow.
Considering wirelength, GRIP consistently generates the best result for each benchmark. The
improvement numbers are quite significant. The improvement is larger (9.23%) in the ISPD’07
benchmarks since the via cost is 3 for these benchmarks, and therefore the benefits of avoiding layer
assignment and directly generating 3-D routes are more significant. At the same time, if the same
GRIP solutions for the ISPD’07 benchmarks (for via cost of 3) are evaluated assuming via cost is 1,
still an improvement of on average 5.25% in wirelength is obtained.
GRIP generates solutions with no overflow for the majority of the benchmarks, so the overflow
reduction procedure of Section 3.4 need not be applied. For three ISPD’08 benchmarks, newblue4,
newblue7, and bigblue4, the overflow found by GRIP (before applying the overflow step) is reported
in column 10 of Table 3.3. The corresponding overflow and wirelength numbers after applying the
overflow reduction procedures is reported in the last two columns of Table 3.3. GRIP generates
the best known overflow for the first two benchmarks and quite comparable overflow for bigblue4,
while maintaining the wirelength improvement. For these three benchmarks the average degradation
in wirelength compared to GRIP without overflow step is about 0.11%.
GRIP was run on a heterogenous grid of CPUs of 2GB memory, shared by many users, and
controlled by the Condor grid computing toolkit [46]. Table 3.2 reports the run time information, not
including the overflow step. The number of subproblems created by the decomposition procedure
for each benchmark is given in the second column.
As discussed in Section 3.3.2, GRIP uses a congestion-based ordering to process and solve the
subproblems. Based on the ordering, at each step, GRIP solves multiple independent subproblems in
parallel. Column 5 in the Table 3.2 gives the number of such processing steps for each benchmark.
The average and maximum numbers of parallel-processed subproblems at each step are also reported
in columns 6 and 7. The wall clock times are given in columns 3. The runtime unit is minutes. These
wall clock times are computed for the case when the grid would not be shared with other users. The
actual wall clock time for solving the instances was larger, as jobs submit to the Condor-controlled
grid often waited in the job queue while higher-priority jobs were run.
We also report the total CPU runtimes as the summation of the runtimes spent by the CPUs
49
Table 3.2: Runtime information of GRIP (without the overflow step).
Benchmark #Subp. Runtime #Steps #Parallel Subp.
Wall Total CPU Ave. Max.
adaptec1 (07) 100 388 2247 12 8.3 18
adaptec2 (07) 169 455 2677 16 10.6 23
adaptec3 (07) 576 478 5168 32 18.0 38
adaptec4 (07) 570 509 5258 30 19.0 51
adaptec5 (07) 225 584 7133 16 14.1 30
newblue1 (07) 144 483 3076 18 8.0 15
newblue2 (07) 238 467 5228 23 10.4 18
newblue3 (07) 1170 1430 6768 61 19.2 39
newblue4 (08) 174 529 3974 20 8.5 19
newblue5 (08) 311 821 6598 31 9.5 21
newblue6 (08) 140 448 5096 15 8.9 16
newblue7 (08) 325 985 5377 36 9.0 18
bigblue1 (08) 49 339 2770 12 3.9 7
bigblue2 (08) 172 690 3793 21 8.0 20
bigblue3 (08) 208 731 3448 28 7.3 16
bigblue4 (08) 215 726 4400 27 7.6 21
to solve the subproblems in column 4 of Table 3.2. Here, we can see that GRIP’s significant im-
provement in solution quality comes at a considerable CPU time expense. However, comparing
the total runtime to the wall clock time, it can be seen that even the small level of parallelism in
GRIP can yield significant improvement, and reduce computational time to nearly acceptable lev-
els. Next chapter describes further exploiting parallelism to obtain similar high-quality solutions in
even shorter wall clock times.
For the benchmarks with overflow, an additional 30 minutes of walltime was used for solving
the problem (RMLP-OV) to generate the candidate routes, and a 5 hour limit was used for solving
the IP using branch-and-bound. In the overflow case, the IP solver took significantly longer than
when solving similar IPs whose primary objective was wirelengh. For each benchmark, very few
subproblems were defined around the congested areas for overflow reduction. These subproblems
were processed in parallel.
50
Tabl
e3.
3:R
esul
tsof
GR
IPfo
rthe
ISPD
’07
and
ISPD
’08
benc
hmar
ks.T
hew
irel
engt
h(W
L)i
ssc
aled
to10
5 .
Ben
chm
ark
FGR
1.1
Fast
Rou
te4.
0N
TH
U-R
oute
2.0
Box
Rou
ter2
.0G
RIP
(with
outO
Vst
ep)
GR
IP(w
ithO
Vst
ep)
TOV
WL
(%)
TOV
WL
(%)
TOV
WL
(%)
TOV
WL
(%)
TOV
WL
Edg
eV
iaTO
VW
L#
sp
adap
tec1
(07)
08.
420
10.9
90
8.79
011
.99
081
.036
.544
.5–
––
adap
tec2
(07)
08.
330
10.0
10
9.33
012
.60
082
.433
.748
.7–
––
adap
tec3
(07)
07.
140
9.39
07.
690
10.6
10
185.
497
.587
.9–
––
adap
tec4
(07)
03.
940
7.84
07.
360
7.57
017
2.3
91.5
80.8
––
–
adap
tec5
(07)
08.
110
11.7
30
8.18
011
.65
023
8.9
104.
813
4.1
––
–
new
blue
1(0
7)52
610
.99
08.
510
7.76
400
9.73
083
.924
.959
.0–
––
new
blue
2(0
7)0
6.18
010
.49
09.
830
9.83
012
1.4
48.0
73.4
––
–
new
blue
3(0
7)39
908
10.1
431
634
14.2
831
454
6.50
3895
89.
4852
518
156.
176
.279
.945
960
157.
638
Avg
.WL
Impr
.7.
91%
10.4
0%8.
18%
10.4
3%
new
blue
4(0
8)26
24.
0014
47.
1213
84.
6520
03.
9219
612
4.2
83.2
41.0
136
124.
42
new
blue
5(0
8)0
4.36
05.
880
3.79
04.
350
222.
814
7.6
75.2
––
–
new
blue
6(0
8)0
5.44
06.
640
3.61
05.
150
170.
510
2.4
68.1
––
–
new
blue
7(0
8)14
584.
1262
5.91
684.
9720
86.
3611
033
5.5
189.
514
6.0
5433
5.8
1
bigb
lue1
(08)
06.
330
7.24
04.
020
5.76
053
.737
.216
.5–
––
bigb
lue2
(08)
05.
940
10.0
30
5.07
04.
890
86.0
48.3
37.7
––
–
bigb
lue3
(08)
04.
400
3.44
03.
430
3.91
012
6.2
78.7
47.5
––
–
bigb
lue4
(08)
414
4.72
152
8.67
162
4.48
472
4.69
232
220.
512
1.9
98.6
180
220.
71
Avg
.WL
Impr
.4.
91%
6.87
%4.
25%
4.88
%
51
Chapter 4
PGRIP: A PARALLEL INTEGER PROGRAMMINGAPPROACH TO GLOBAL ROUTING
In this chapter, we present PGRIP [71] - a parallel Integer Programming procedure to Global
Routing problem. The goal of PGRIP is to eliminate the bottlenecks that result in a limited paral-
lelism in GRIP. An obvious way to parallelize GRIP would be to parallelize the branch-and-bound
search when solving the Integer Program (IP) of each subproblem. However, achieving high effi-
ciency from a general purpose parallel IP solver running on hundreds of concurrent processors is
a difficult task and an area of active research [45] [75]. Similar to GRIP, the approach taken here
works by decomposing the chip into subproblems but one in which subproblems may be routed inde-
pendently, ensuring (through a one-time synchronization) that resulting routings of the subproblems
can be effectively patched together. The patching itself is also accomplished by solving an IP. The
end result of the work is a parallel global router that is based on the extended IP procedure of GRIP,
but allows for concurrent processing of the subproblems and significant parallelism.
There are several challenges to obtain high-quality solutions from a parallel global router that
relies on concurrent processing of subproblems. The first challenge is effective decomposition of the
routing problem into subproblems—this step can significantly impact the final solution quality. The
second challenge is to generate the subproblem solutions in a manner that later facilitates their con-
nectivity and avoids overflow. Our work addresses both of these challenges. Specific contributions
of our work include the following items.
1. To form the rectangular subregions and the corresponding subproblems, we extend GRIP to
include a formal procedure for the initial estimation of the distribution of the nets. This step
is crucial to obtain a high quality routing solution and to achieve subproblems with balanced
computation runtimes.
2. In order to effectively achieve concurrent processing of individual subproblems, we employ a
one-time synchronization approach so that significant portions of the computation can occur
52
completely without centralized control. This synchronization is via our novel use of an integer
programming “patching” procedure.
3. Our procedure can accept as input a target runtime and produce a high-quality solution within
this limit. The runtime can alternatively be expressed as limits on the number of iterations of
each computational step.
We use various instances of the IP formulation of GRIP for overflow reduction as a core compo-
nent at different phases of our massively parallel procedure. We also introduce a parallel procedure
to independently connect neighboring subproblems.
Similar to GRIP, PGRIP has low memory requirements as it loads individual subproblems within
the local memory of each CPU or core. Specifically in our experiments, cores with a maximum of
2GB of memory were required. The resulting algorithm is highly scalable, concurrently using up
to 725 cores while solving the ISPD’07 and ISPD’08 [3] [4] benchmarks. In contract, in GRIP,
parallelism was limited to roughly 20 concurrent processes. Our routing procedure also achieves
high quality solutions with a runtime limit of 75 minutes, both in terms of wirelength and overflow.
The remainder of the chapter is organized into four sections. Section 4.1 discusses the challenges
of parallelizing GRIP. An Integer Programming formulation for PGRIP is presented in Section 4.2.
Section 4.3 explains the details of our parallel procedure. Specifically, the problem decomposi-
tion method in PGRIP is first introduced followed by the explanation of the pricing, patching, and
repricing phases. The computational results are reported in Section 4.4.
4.1 Challenges of Parallelizing GRIP
Recall that in GRIP, a large-sized global routing problem instance is first decomposed into smaller
subproblems. Each subproblem is a rectangular subregion on the chip together with its net as-
signment. The subproblems in GRIP are processed in the descending order of “difficulty” and the
ordering was crucial to obtain a high quality solution. As a result, only some of the non-neighboring
subproblems could be solved concurrently resulting in a limited parallelism. Since the majority of
the runtime in GRIP is consumed by the sequential processing of the subproblems, our objective in
PGRIP is to concurrently route all the subproblems on different processors so that the (wall) runtime
53
(a) (b)
Figure 4.1: GRIP solves a subproblem with some flexibility in routing the “inter-region” nets, whichresulting in exploring a limited parallelism.
can be improved. However, there are significant challenges to making the GRIP procedure operate
effectively without the centralized algorithmic control.
The first challenge is the defining of subproblems as it can significantly affect both the routing
solution quality and the runtime. Defining the subproblems results in categorizing the nets into inter-
region and intra-region nets. While solving the IP formulation can likely generate good solution
quality for intra-region nets, the assignment of inter-region nets to the subproblems can highly
impact both the wirelength and overflow.
In addition, the subproblem definition can highly impact the runtime. Since our main objective
for a parallel implementation is to improve the runtime, we need to ensure the defined subproblems
will take similar computational effort. As we demonstrated in Section 3.5.1, a bad partitioning
strategy may lead to “unbalanced” subproblems, and the subproblems within the congested regions
were dominating the parallel runtime. It took several hours to solve the congested subproblems,
while the rest of the subproblems only needed few minutes.
Therefore, one challenge in PGRIP is to define the subproblems in a proper manner so that the
planning of the inter-region nets can be effectively obtained to improve the solution quality, and the
difficulty of each subproblem can be balanced to improve the (wall) runtime.
The second challenge on parallelizing GRIP is to ensure that the connectivity between subprob-
lems can be accomplished so that the inter-region nets can be routed with no (or low) overflow. As
shown in Figure 4.1(a), we have limited routing resources (one edge in each metal layer between the
54
Ta Tb
Subproblem 1 Subproblem 2
Ta1 Tb2
Ta2Tb1
Ta1 Tb2
Ta2Tb1
(a) (b) (c)
Figure 4.2: Example of planning inter-region nets when processing two adjacent subproblems inde-pendently.
boundaries of adjacent subproblems) to connect the segments of the inter-region nets. When solv-
ing each subproblem, the “floating-terminal” concept provides the flexibility to route the inter-region
nets to connect anywhere to the subproblem boundary, as shown in Figure 4.1(b). During sequential
solving of the subproblems in the order of their difficulty, the connections of the inter-region nets
in the subproblem boundaries were gradually getting fixed. These boundary “floating-terminal”
locations were then honored by the subsequently-solved subproblems. However, in a parallel imple-
mentation with concurrent processing of the subproblems, connecting different segments of multiple
inter-region nets in the same adjacent subproblems may easily result in overflow in the subproblem
boundaries.
Figure 4.2 shows an example to demonstrate this issue for connecting multiple inter-region nets
in the adjacent subproblems. There are two inter-region nets crossing two subproblems, as shown in
Figure 4.2(a). A congested area is designated (in red) along the boundaries of the subproblems, and
these two subproblems are processed concurrently. If each subproblem is routed independently, then
the connections of the inter-regions nets to the boundary of each subproblem will be unaware of its
other sub-net boundary connection in the neighboring subproblem. As shown in Figure 4.2(b), con-
gestion may happen between the subproblem boundaries due to the long connection routes, which
may cause overflow. This issue happens because the locations of the floating-terminal locations of
each inter-region net is not known. Alternatively, as shown in Figure 4.2(c), if the floating-terminal
locations of inter-region nets from the adjacent subproblems are known, it is more likely to connect
55
the inter-region nets between the subproblem boundaries with less overflow.
Overall, connecting the inter-region nets to achieve an overflow-free solution without a strongly
coordinated algorithmic control between the subproblems is one of the major challenges to paral-
lelize Global Routing using this partition-based strategy. This is the reason why GRIP requires an
ordering and almost-sequential processing of the subproblems to gradually fix the floating-terminal
locations on the subproblem boundaries to ensure the connectivity of inter-region nets.
4.2 An Integer Programming Formulation of PGRIP
A mathematical description of PGRIP, which is a slight variation of the one given in formulations
(ILP-GR) and (ILP-OV) for GRIP, goes as follows. We are given a grid-graph G = (V,E) describing
the network topology, a set of (multi-terminal) nets given by N = {T1,T2, . . . ,TN}, (with Ti ⊂ V ),
and edge capacities ue and weights ce ∀e ∈ E, as discussed in Chapter 3 for GRIP. Denote by T (Ti)
the collection of all Steiner trees (routes) connecting the terminals in Ti, and let the parameter ate = 1
if Steiner tree t contains edge e ∈ E, ate = 0 otherwise. Define the binary decision variable xit that
is equal to 1 if and only if net Ti is routed with route t ∈ T (Ti). An integer program for the global
routing problem can be written as
minx,s
N
∑i=1
∑t∈T (Ti)
citxit + ∑∀e∈E
Qeoe (ILP-PGR)
∑t∈T (Ti) xit = 1 ∀i = 1, . . . ,N
∑Ni=1 ∑t∈T (Ti) atexit ≤ ue +oe ∀e ∈ E
xit = {0,1} ∀i = 1, . . . ,N,∀t ∈ T (Ti),
oe ≥ 0 ∀e ∈ E.
The parameter cit is the cost of route t for net Ti , reflecting its wirelength (including vias), as
described in formulation (ILP-GR) in Chapter 3.
The first set of equations enforces the routing of each net; for each net Ti exactly one route
will be selected. The second set of equations enforces the edge capacity constraint. The decision
56
variable oe will be positive if routing of the nets on edge e results in overflow, and the objective
function trades off the total wirelength with the total overflow. Typically Qe is chosen sufficiently
large to avoid overflow as much as possible.
Compared to the formulations (ILP-GR) and (ILP-OV) presented in Chapter 3 for GRIP, the
formulation (ILP-PGR) has an objective which can be considered as a combination of the objectives
in (ILP-GR) and (ILP-OV) to minimize both wirelength and overflow simultaneously. Furthermore,
(ILP-PGR) has the same constraints as (ILP-OV) and compared to (ILP-GR) it does not have a
slack variable for each net. Moreover, unlike (ILP-OV) which had the same penalty Qe in the GRIP
implementation, in PGRIP different penalties Qe can be assigned for each edge, which we will later
show is particularly helpful in avoiding overflow in our parallel implementation.
In PGRIP, the formulation (ILP-PGR) is used to solve each subproblem, and we approximately
solve (ILP-PGR) by the two-phase price-and-branch procedure similar to GRIP. Note that although
the bounded range of the dual variables λi and πi of the formulation (ILP-PGR) are different from
(ILP-GR), it does not affect the candidate route generation procedure as described in Section 3.2.1.
4.3 The Parallel Global Routing Procedure
In this section, we discuss the details of our parallel global router that removes the requirement
of sequential processing of subproblems. Similar to GRIP, we first generate a routing solution
for each subproblem, and then attempt to connect these partial routing solutions. Unlike GRIP,
these computations can be done almost completely independently. Our parallel global router is also
fundamentally different from GRIP in the way it uses the IP formulation (ILP-PGR) at different
stages of the algorithm, and in the manner in which candidate routes to populate the IP (ILP-PGR)
are generated.
Figure 4.3 gives an overview of our approach. When solving individual subproblems, we modify
the pricing procedure for candidate route generation so that each subproblem receives a one-time
feedback encoding information about the candidate routes of its neighboring subproblems. Given
this information, the subproblem solver “reprices” the nets in order to generate candidate route
fragments that are more likely to connect without overflow.
More specifically, the subproblems first undergo a quick, initial phase to generate a small set of
57
IP-basedpatching
Subproblem1
Subproblem2
Subproblemn
…
feedback to enhance
connectivity
partialrouting
solution
Figure 4.3: Overview of parallel GRIP.
candidate routes. Next, each subproblem sends information on the utilization of its boundaries by
inter-region candidate routes to one or more “master” CPU(s). The master CPU(s) then considers
pairs of neighboring subproblems. For each pair, a “patching” integer program is solved which
locates a desired window on the subproblem boundary for the pseudo-terminal for each inter-region
net. The subproblems then incorporate this feedback in a (longer) reprice procedure to generate
candidate routes that obey these restrictions on the locations of the pseudo-terminals and are more
likely to connect neighboring subproblems without overflow. After the adjusted pricing, a routing
solution is generated for each subproblem using a branch-and-bound based IP solver. In a final
phase, a parallel and distributed IP-based procedure is applied to connect the route fragments from
the neighboring subproblems.
Another important aspect of our parallel global router is the generation of individual subprob-
lems. Defining the initial subproblems can highly impact the final solution quality (as we show in
our simulations).
In this section, we provide more details about each step of our procedure—defining the subprob-
lems (Section 4.3.1), initial pricing at the subproblems (Section 4.3.2), distributed IP-based patching
(Section 4.3.3), adjusted pricing at the subproblems (Section 4.3.4), and parallel connecting of the
subproblems (Section 4.3.5).
58
4.3.1 Defining the subproblems
Defining the subproblems is a crucial step in our procedure which significantly affects the solution
quality and runtime. Poorly defined subproblems contain highly congested areas with many nets.
Congested subproblems are usually difficult and may result in overflow. In addition, the subprob-
lems typically take much longer to solve, resulting in idle time in our parallel procedure that relies
on finishing all subproblems before connecting them.
Two tasks are accomplished by subproblem definition - subproblem boundaries are specified
and nets are assigned to the subproblems. There are different ways to accomplish these tasks. One
way is to first define the boundaries, for example via recursive bi-partitioning of the chip area. The
assignment of each net is defined next, for example based on its 2D-projected route given by the
package Flute [18]. However, defining the assignments solely based on the “Flute estimate” can
result in highly-congested subproblems.
To mitigate the congestion, one could attempt to detour the routes generated by Flute into less
congested subregions, for a better assignment. However, detouring requires knowledge of the sub-
problem boundaries. On the other hand, defining the boundaries without considering an estimate
of the routes and congestion hot-spots during bi-partitioning might significantly limit the amount of
detouring. Therefore the two tasks of subproblem definition are inter-dependent.
A primary contribution of this work is to extend GRIP to obtain a more effective and formal
procedure for subproblem definition. The procedure works as follows:
1) The first step is to generate a routing of all the nets to guide the bi-partitioning. GRIP relies solely
on Flute to generate this routing, while we combine Flute with the IP formulation (ILP-PGR) in the
following manner. First, Flute is used to generate projected 2-D routes for each net. The short nets
are fixed in place, and the linear programming relaxation of (ILP-PGR) is solved. In (ILP-PGR), the
parameter Qe corresponding to edge overflow Oe is set to 1 for all e ∈ E. In practice, we provide as
input a target runtime limit (controlling the number of iterations of column generation) after which
we stop the procedure to get a fractional solution to (ILP-PGR). We then associate a weight with
each route proportional to its fractional value in the solution to the relaxed problem. The weights
are used to select one route for each net via a random procedure where the probability of selecting
a route is proportional to its weight. This is a well-known “randomized rounding” procedure and is
59
better than selecting the route with the highest fractional value for each net, as we also verified in
our implementation.
2) Using the estimated routing generated in Step (1), recursive bi-partitioning is applied to define
the subproblem boundaries. When partitioning, the total number of nets is used as the metric to
balance at each step of the bi-partitioning. Our computational experience indicated that this metric
(as opposed to the AEU used by GRIP) was more correlated with the final solution quality and did
a better job of balancing the computational effort (pricing and branch-and-bound) for solving each
subproblem. The bi-partitioning is terminated when the number of nets in a subproblem is smaller
than 4000, a value empirically determined based on observing the runtime of many subproblems.
In contrast, GRIP stopped when one of the boundaries of a subproblem becomes less than 32 grid
units. Due to the large global routing grid-size, the rectangular subregions formed in GRIP are
smaller and more similar. In contrast, stopping the bi-partitioning based on the number of nets may
result in rectangular subregions that are more varying in size.
3) After fixing the boundaries, we traverse the subproblems sequentially and apply the detouring
procedure of GRIP [72]. The subproblems are processed in the order of their estimated TEO from
the solution obtained in Step (1). Note that we are not solving the subproblem, but merely perturbing
some of the assignments made in Step 1 to obtain more balanced subproblems. This step has a
negligible contribution to the runtime of our routing procedure.
4.3.2 Initial Pricing at the Subproblems
After defining the subproblems, we apply an initial procedure to estimate the utilization of the
boundaries and the locations of the pseudo-terminals to connect the inter-region nets for each sub-
problem. This initial procedure is done independently for each subproblem, which implies that we
allow the generation of candidate routes for inter-region nets that may connect anywhere on the
boundary of the subproblem. After the initial pricing is completed, information from adjacent sub-
problems is sent to a “patching” process (see Section 4.3.3) that determines a window (restricted
region) on the boundary for the location of each pseudo-terminal.
The initial pricing is done by solving the (linear programming relaxation) of the formulation
(ILP-PGR) as described in Section 4.2. A time-bound (or iteration limit) is imposed on the initial
60
pricing phase. In our experiments, a limit of five minutes was used for this step.
The (ILP-PGR) formulation requires the definition of parameters Qe for each edge overflow
variable oe. In the initial pricing phase, we set Qe to be equal to the Manhattan distance of edge e
from the center of the subproblem. Thus, grid edges that are closer to the boundaries have a larger
overflow penalty. As we have previously noted, a major goal (and challenge) of the concurrent
processing of subproblems is to avoid overflow along boundaries when connecting the subproblems.
The weighted overflow penalization is an important factor towards achieving this goal.
In the initial pricing phase, the inter-region nets are allowed to have a pseudo-terminal anywhere
on the corresponding subproblem boundary (see Figure 4.1(a)). In order to assess the utilization of
boundaries by pseudo-terminals in a subproblem, it is important to generate candidate routes for all
the nets in the subproblem, not only the inter-region ones.
4.3.3 Distributed IP-based Patching
Patching is an IP-based procedure that receives as an input two neighboring subproblem boundaries
and the locations of pseudo-terminals on the boundaries from the initial pricing phase. A separate
patching procedure is applied for each pair of neighboring boundaries. The purpose of the patching
phase is to generate feedback to the corresponding two subproblems to enhance their connectivity
through subsequent repricing phase.
Consider the example in Fig. 4.4(a). There are two adjacent subproblems M, L and two inter-
region nets na, nb. The initial pricing phase is first applied independently on these two subproblems
to generate candidate routes for both nets. After this phase, net na has 3 candidate routes in sub-
problem M and 2 in subproblem L as shown in Figure 4.4(d). Based on the candidate routes, in
Figure 4.4(e), the corresponding pseudo-terminals Ma1 to Ma3 for net na in subproblem M and La1,
La2 in subproblem L are then identified. Similarly, the candidate routes and their corresponding
pseudo-terminals for net nb are shown in Figure 4.4(g) and (h) respectively.
Patching simultaneously considers the connection combinations for all the nets crossing the two
boundaries (e.g., both net na and nb have six combinations in Figure 4.4(e) and (h)). Each of these
combinations is encoded as a spanning window on the boundaries as shown in Figure 4.4(f) and
(i). The output of patching is a “restricting” window for each net, describing the permissible range
61
TaM1
TaL2
(d)
Net na
TaM2
TaM3
TaL1
Ma1
Ma2
Ma3
La1
La2
(e)
ca1
ca2
ca3
ca4
ca5
ca6
(f)
TbM1
TbL2
(g)
Net nb
TbM2
TbL1 Mb1
Mb2
Lb1
Lb2
(h)
cb1 cb2 cb3 cb4
cb5
cb6
(i)
TbL3 Lb3
na nb
Subregion M Subregion L
TaM TbL
TaL
TbM
(a) (c)
G’(V’, E’)
v
e
(b)
Figure 4.4: Example illustrating the PGRIP patching procedure.
62
of locations of its two pseudo-terminals on the two boundaries. This restricting window is selected
from the set of existing windows. For each net, one window is generated for the two boundaries, but
different nets can have different windows. These windows are then passed as the feedback to the
two subproblems during the repricing phase.
The patching problem can be posed as an integer program. Assume a routing grid-graph G′ =
(V ′,E ′), where v ∈V ′ is a vertex which represents possible locations of the pseudo-terminal, and e ∈
E ′ is an edge on the boundary of one of the subproblems. An example graph is given in Fig. 4.4(b).
Edges e ∈ E ′ are given a modified capacity that is the sum of the capacities of the boundary edge in
one subproblem and its “mirror” edge in its neighboring one.
For each net i, the IP considers |Li|×|Mi| possible combinations for connecting its two portions.
For net i, each possible combination is denoted by a “virtual route” t spanning its virtual terminals
in V ′. Define the parameter ate = 1 if virtual route t contains edge e ∈ E ′, ate = 0 otherwise. For
each virtual route t for net i, define the binary decision variable xit that will equal to 1 if route t
is selected for net i, and 0 otherwise. Define the parameter cit which is the length of the virtual
route t of net i, in terms of number of the edges in G′. As an example, in Figure 4.4(f), for net na,
we have 6 combinations (virtual routes) with their corresponding span {ca1,ca2,ca3,ca4,ca5,ca6} =
{2,3,1,0,2,1} . Similarly, Figure 4.4(i) demonstrates the combinations of virtual routes and their
corresponding spans for net nb.
For N nets to connect, the patching problem for two neighboring boundaries is mathematically
described as the following integer program:
minx
N
∑i=1
|Li|×|Mi|
∑t=1
citxit +N
∑i=1
Qsi (ILP-PATCH)
∑|Li|×|Mi|t=1 xit + si = 1 ∀i = 1, . . . ,N
∑Ni=1 ∑|Li|×|Mi|
t=1 atexit ≤ ue ∀e ∈ Ev
xit = {0,1} ∀i = 1, . . . ,N,∀t = 1, . . . , |Li||Mi|.
The first set of equations enforces selection of one virtual route for each net. The parameter Q is
63
set be large enough to force all “slack variables” si to take value zero, if possible. The second set of
equations ensures that the given virtual edge capacities are not exceeded. If si = 0 for net i ∈ N, then
exactly one xit variable will be 1 for net i. The corresponding virtual route t, characterized by its two
vertices in V ′, specifies the window on the boundaries of subproblem for net i ∈ N. All subsequent
routes generated by the next pricing phase must obey this constraint. If si > 0 for net i ∈ N in the
solution to (ILP-PATCH), this indicates that inter-region net i is very difficult to route effectively, so
its window is set to the entire boundary of the subproblem.
The second set of equations ensures that the given virtual edge capacities are not exceeded.
Recall the capacity of each virtual edge is the summation of the capacities of its corresponding
edges on the subproblem boundaries, which indicate the available routing resources. Using the
edge capacity constraints, we model the routing resource utilization when trying to choose a set of
windows for all the inter-region nets from the set of identified route-combinations connecting the
nets in the two subproblems.
The parallel routing algorithm solves one patching problem for each pair of subproblem bound-
aries that share at least one inter-region route. These instances of the patching IP are independent
from one another and can be solved in a distributed manner by many CPUs or, since the CPU time
required to solve the patching IPs is minimal, by a single designated processor, as in our implemen-
tation.
In summary, by solving the patching formulation, we consider the impact of all the possible
combinations for connecting a net to the boundaries of its two adjacent subproblems. To further
deal with the large-scale problem size of the patching optimization problem, since the runtime of
the branch-and-bound procedure correlates to the number of combinations, we found that it is not
necessary to consider all the possible combinations between the candidate routes of the nets included
in both of the two adjacent subproblems during the patching phase. The most promising combina-
tions are identified by selecting the candidate routes which have the largest fractional solutions after
the quick initial pricing. In our implementation, we only considered at most 50 combinations for
each net by selecting 10 candidate routes from the more congested subproblem and 5 from the
less congested subproblem. This formulation also simultaneously considers the impact of all the
inter-region nets that pass from the boundaries of adjacent subproblems.
64
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
Figure 4.5: Uniform allocation of routing resources for parallel-connecting subproblems.
4.3.4 Adjusted Pricing at the Subproblems
After each patching IP is solved, the solution, in the form of restricted windows for each inter-
region net is sent back to the processors responsible for the subproblems. At this point, previously
generated candidate routes that do not connect within the specified window range are filtered from
further consideration. Next, a new pricing phase begins wherein candidate routes are generated for
each net while imposing the constraint that the nets can only connect to the boundaries within their
specified windows. This is for the same (ILP-PATCH) formulation as explained in Section 4.3.2. In
earlier experimental work, we tried connecting subproblems using a heuristic method, but found the
IP to generate solutions of much lower overflow.
A time limit is imposed on the adjusted pricing phase (e.g., a limit of 20min in our experiments).
Once the adjusted pricing phase is over, a commercial branch-and-bound IP solver is called to
generate a solution for each subproblem that obeys the patching window constraints.
4.3.5 Parallel Connecting of the Subproblems
The parallel global routing procedure concludes with a final connection and polishing phase. Specif-
ically, after concurrently solving all the subproblems, for each inter-region net, the final segment that
connects its “backbone” to the subproblem boundary is removed, as shown in Figure 3.7.
We then fix all the nets that fall completely inside the subproblems. We also fix the backbones
of the inter-region nets, and implement an IP-based price and branch procedure similar to GRIP to
connect the backbones of the inter-region nets. Here we show how this connection phase can be
65
done in a distributed manner and independently for each pair of neighboring boundaries.
As shown in Fig. 4.5, we divide each subproblem into quadrants. Each quadrant is adjacent to
two neighboring subproblems (e.g., top-right quadrant is adjacent to the top and the right neigh-
boring subproblems). For each routing edge, we divide its remaining capacity (not utilized by the
fixed routes) into equal portions to be allocated for solving two “connection” problems of its two
corresponding subproblems.
For each of the two neighboring boundaries, we solve an IP-based connection problem. Each
subproblem is adjacent to two quadrants, so overall we consider the edges of four quadrants in the
connection problem. For example, for the two boundaries shown in Fig. 4.5, we use the top-left
and bottom-left quadrants of the right subproblem with the top-right and bottom-right quadrants of
the left one. For each edge, we use half of its remaining capacity. We then solve the (ILP-PGR)
procedure to connect the backbones of the inter-region nets.
4.4 Simulation Results
The parallel global routing procedure was implemented in C++. For solving individual linear pro-
grams (for pricing) and integer programs (for branch-and-bound), the software packages MOSEK
5.0 and CPLEX 6.5, respectively, were used. Parallel processing of subproblems was performed by
submitting jobs to a grid of hundreds of heterogeneous CPUs of 2GB memory, managed by the Con-
dor resource management system. The algorithm was evaluated for the ISPD’07 [3] and ISPD’08
[4] benchmarks. This is to specifically allow full comparison with the GRIP solutions.
A 10min runtime limit was imposed on solving the relaxed (ILP-GR), to define subproblems
(Section 4.3.1). For the initial pricing (Section 4.3.2), repricing (Section 4.3.4), and pricing to con-
nect the subproblems (Section 4.3.5), we set runtime limits of 5min, 20min, and 20min, respectively.
For solving the IP using branch-and-bound after candidate route generation we used a runtime limit
of 10min. We did not limit the patching procedure since this step was very fast. (In general the num-
ber of nets crossing between two subproblems is fairly small). As a result we report slight variation
in the runtimes of our algorithm on different benchmark instances.
In Table 4.1, we compare the solution quality obtained by the parallel IP-based global routing
procedure with existing approaches. For each benchmark, the total overflow (indicated by TOF),
66
Table 4.1: Results of PGRIP for the ISPD’07 and ISPD’08 benchmarks. The wirelength (WL) isscaled to 105.
Benchmark PGRIP FGR FastRoute NTHU-Route GRIP
TOF WL Edge Via TOF WL(%) TOF WL(%) TOF WL(%) TOF WL(%)
adaptec1 (07) 0 82.3 36.5 45.8 0 7.00 0 9.60 0 7.38 0 -1.56
adaptec2 (07) 0 83.4 33.8 49.6 0 7.20 0 8.90 0 8.21 0 -1.24
adaptec3 (07) 0 186.5 97.5 88.9 0 6.61 0 8.87 0 7.15 0 -0.58
adaptec4 (07) 0 173.2 91.5 81.7 0 3.44 0 7.36 0 6.88 0 -0.52
adaptec5 (07) 0 241.5 104.8 136.6 0 7.13 0 10.79 0 7.20 0 -1.07
newblue1 (07) 0 84.9 25.0 59.9 526 9.97 0 7.46 0 6.71 0 -1.14
newblue2 (07) 0 123.3 48.2 75.1 0 4.73 0 9.11 0 8.43 0 -1.55
newblue3 (07) 41K 156.3 76.0 80.3 30K 10.02 32K 14.17 31K 6.38 53K -1.03
Avg. Impr. 6.58% 8.87% 7.42% -1.09%
newblue4 (08) 132 124.9 83.4 41.4 262 3.65 144 6.78 138 4.29 152 -0.44
newblue5 (08) 0 223.9 147.7 76.0 0 3.95 0 5.47 0 3.38 0 -0.44
newblue6 (08) 0 172.0 102.5 69.5 0 4.61 0 5.83 0 2.78 0 -0.88
newblue7 (08) 54 338.4 189.8 148.6 1458 3.37 62 5.17 68 4.22 74 -0.83
bigblue1 (08) 0 54.0 37.3 16.7 0 5.81 0 6.72 0 3.49 0 -0.54
bigblue2 (08) 0 86.5 48.4 38.1 0 5.38 0 9.50 0 4.50 0 -0.64
bigblue3 (08) 0 126.5 78.7 47.8 0 4.20 0 3.24 0 3.22 0 -0.24
bigblue4 (08) 176 221.1 122.0 99.1 414 4.54 152 8.50 162 4.30 186 -0.22
Avg. Impr. 4.44% 6.40% 3.77% -0.53%
total cost of wirelength and via (indicated by “WL”) and the breakdown between wirelength and
via (indicated by “Edge” and “Via” respectively) are reported for PGRIP. For other approaches, we
report the percentage improvement in total cost of wirelength and via (indicated by %WL), and the
overflow. Our solutions were evaluated using the ISPD’08 script and were made available. 1.
Excluding GRIP, the solutions obtained by the parallel global routing algorithm improve signif-
icantly in total cost for each instances (ranging from 3.37% to 10.79%). Compared to GRIP, which
has the best reported solution but impractical runtimes, on average we only have 1.1% and 0.5%
degradation in total cost (WL) of ISPD’07 and ISPD’08, respectively.
Furthermore, the solutions obtain zero overflow for any benchmark that already had zero over-
flow solution from other tools. For benchmarks newblue4 and newblue7, the solutions from the
1Benchmark solutions can be downloaded at http://wiscad.ece.wisc.edu/gr/
67
Table 4.2: Estimated overflow of the initial subproblems.
Benchmark step1+step2+step3 step1+step2 Flute+step2
Avg. Max. Avg. Max. Avg. Max.
adaptec1 (07) 1.29 59 1.41 70 3.56 144
adaptec2 (07) 1.13 85 1.19 94 2.50 183
adaptec3 (07) 0.72 49 0.77 52 2.71 158
adaptec4 (07) 0.27 54 0.31 59 0.97 111
adaptec5 (07) 2.17 100 2.34 118 4.77 306
newblue1 (07) 0.43 40 0.49 43 0.94 73
newblue2 (07) 0.41 79 0.45 83 0.96 131
newblue3 (07) 0.84 660 0.92 748 2.16 1119
newblue4 (08) 1.12 88 1.13 101 2.14 147
newblue5 (08) 2.15 75 2.80 89 4.94 155
newblue6 (08) 1.03 93 1.19 113 2.68 192
newblue7 (08) 7.64 252 8.33 302 13.17 574
bigblue1 (08) 19.31 87 22.78 99 28.77 167
bigblue2 (08) 5.83 59 6.28 62 8.61 93
bigblue3 (08) 9.43 126 10.27 147 17.26 373
bigblue4 (08) 7.02 71 7.95 82 11.33 221
parallel global router have the smallest overflows reported so far (even better than GRIP). This is
likely due to a better definition of the initial subproblems and measuring overflow directly in the IP
formulation.
Next we analyze the quality of defining subproblems before starting their concurrent processing.
We measure it based on an estimate of overflow after defining the subproblems. Recall in defining
subproblems we apply a 3-step approach (see Section 4.3.1): 1) initial routing using a relaxed ILP, 2)
defining boundaries, and 3) detouring routes of step 1 to distribute the net assignments. We estimate
the edge overflow based on the routes generated in step 3, and calculate the average and maximum
edge overflows in each subproblem. We report the average of each of the two quantities over all
the subproblems in columns 2 and 3 of Table 4.2. We also report these two quantities, assuming
detouring is not applied (so the edge overflow is estimated only using the routes of relaxed ILP).
These are reported in columns 4 and 5. As can be seen, the average and maximum overflows of a
subproblem based on the estimate of routes in step 1 and step 3 are not very different. This means
68
(a) (b)
(c) (d)
Figure 4.6: The projected congestion map of adaptec1 benchmark instance at different phases ofPGRIP.
detouring does not result in much improvement after applying the relaxed ILP.
To better show that the relaxed ILP is helpful to define subproblems, we also report the average
and maximum overflow of the subproblems if 2-D routes are taken from Flute, and then subproblems
are generated using step 2. These are reported in columns 6 and 7. Now we can see that our
procedures can significantly improve overflow in initial subproblems before starting their concurrent
processing.
Figure 4.6 demonstrates the projected congestion map of adaptec1 at different phases of our
algorithm. In Figure 4.6(a), an initial net planning is generated by Flute [18]. As can be seen this
congestion map is highly unbalanced and has many hot spots (i.e., edges with overflow). Recall
69
Table 4.3: Runtime comparison of PGRIP and GRIP.
Benchmark PGRIP GRIP
#CPU WCPU(m) TCPU(m) E[#CPU] WCPU(m) TCPU(m)
adaptec1 (07) 90 76 2101 8.3 388 2247
adaptec2 (07) 110 76 2704 10.6 455 2677
adaptec3 (07) 211 77 6319 18.0 478 5168
adaptec4 (07) 221 79 5221 19.0 509 5258
adaptec5 (07) 280 77 3175 14.1 584 7133
newblue1 (07) 122 76 2306 8.0 483 3076
newblue2 (07) 215 77 4192 10.4 467 5228
newblue3 (07) 258 82 14590 19.2 1430 6768
newblue4 (08) 255 77 2944 8.5 529 3974
newblue5 (08) 504 80 4953 9.5 821 6598
newblue6 (08) 459 78 2219 8.9 448 5096
newblue7 (08) 725 86 4788 9.0 985 5377
bigblue1 (08) 124 76 956 3.9 339 2770
bigblue2 (08) 243 77 3411 8.0 690 3793
bigblue3 (08) 326 78 2690 7.3 731 3448
bigblue4 (08) 453 82 3096 7.6 726 4400
Avg. 287 78 4104 11.0 629 4563
that after defining the subregion boundaries, the linear programming relaxation of (ILP-PGR) is
applied followed by detouring as many inter-region nets as possible, as depicted in Section 4.3.1.
After detouring the inter-region nets (which is right before solving the individual subregions), the
number of edges with overflow and the maximum overflow (hotspots) are both decreased, as shown
in Figure 4.6(b). It implies that we now have more balanced subregions. We then apply the price-
and-branch and the patching procedure to solve these subregions concurrently. Figure 4.6(c) shows
the congestion map based on the solutions of the subregions before connecting them. Finally, a legal
solution is generated after connecting the subregions which is shown in Figure 4.6(d).
Table 4.3 reports the runtime comparison of PGRIP and GRIP. The number of parallel processing
jobs for each approach are reported in columns 2 and 5, respectively. The wall runtime of PGRIP
and GRIP are given in columns 3 and 6, indicated by WCPU. In columns 4 and 7, we report the
total runtime (indicated by TCPU). Our target wall runtime in PGRIP was 75 minutes, and the
70
reported runtimes had slight variations, ranging from 76 to 86 minutes. The variations came from
the unbounded runtime of the patching procedure (see Section 4.3.3) and steps 2 and 3 in defining
subproblems. Note that in PGRIP, all the subproblems were concurrently processed so the number of
CPUs in our experiments were equal to the number of subproblems, indicating we were able to take
the advantage of a significant amount of parallelism. This is a nice feature of our approach that the
runtimes in PGRIP are quite similar over different benchmarks, indicating a high level of scalability.
Although GRIP had slightly better solution quality compared to PGRIP, however, GRIP also had
much larger wall runtimes (on average, more than 10 hours to complete a benchmark). This is
because that in GRIP, only some of the non-adjacent subproblems can be processed in parallel. Both
PGRIP and GRIP had similar total runtime, although PGRIP used a more complicated procedure to
handle the inter-region nets. We don’t report the runtimes of competing methods. In comparison,
FastRoute [76], which has the fastest runtime over all the other academic global routers, takes many
hours to complete unroutable benchmarks, while PGRIP consistently finishes all benchmarks in
about 75 minutes.
Another feature of PGRIP is that by changing its runtime limit, we can explore the tradeoff
between runtime and solution quality; for example by allowing more runtime on the pricing phase,
we can generate more candidate routes which in turn can improve the solution quality. Take adaptec1
as an example. Changing the runtime limit of the pricing phases to 30min (instead of 20min in our
first experiment) results in an additional 1.13% improvement in WL. The total runtime is 96min.
Reducing the runtime limit of pricing to 10min, reduces the total runtime to 57 min, but degrades
the WL by 4.41%.
71
Chapter 5
POWER-GRIP: POWER-DRIVEN GLOBAL ROUTING FOR MSV DOMAINS
In this chapter, we present Power-GRIP [73] - a Power-Driven global routing procedure which
supports designs with multiple supply voltages. Power consumption is a primary design objective in
many application domains. Dynamic power still remains the dominant portion of the overall power
spectrum. Design with Multi-Supply Voltage (MSV) allows significant reduction in dynamic power
by taking the advantage of its quadratic dependence on the supply voltage.
Dynamic power is dissipated in combinational and sequential logic cells, clock network, and
the (remaining) local and global interconnects. We refer to the latter as interconnect power. The
interconnects are complex structures in nanometer technologies that span over many metal layers.
The power of a route segment depends on its width, metal layer, and spacing relative to its adjacent
parallel-running routes. These factors determine the area, fringe, and coupling capacitances which
impact power. Furthermore, in MSV designs, the power of a routed net depends on its corresponding
supply voltage. For example, a route will have lower power if all its terminal-cells have the (same)
lower supply voltage. If a net connects a driver cell of lower voltage to a sink cell of higher voltage,
its route includes a level converter (LC) and is decomposed into two segments of low and high
supply voltages, corresponding to before and after the LC.
We propose a global routing method that optimizes the interconnect power in MSV designs.
Figure 5.1 shows a generic design flow for a MSV-based Global Routing. After placement and
voltage assignment, the location and supply voltage of each cell are known. The supply voltage
can be determined for example through voltage island generation [29] [69], or through a row-based
assignment in a standard cell methodology. Furthermore, LC(s) are added to any net that connects
a driver cell to a set of sink cells of higher supply voltage. Next, Global Routing is applied to
minimize the overall wirelength, where the LCs are also included as terminals of a net.
For a given wirelength-optimized Global Routing solution, we propose to further detour the nets
in order to optimize the interconnect power. The interconnect power can be approximated during
72
Placement & Voltage Assignment
Tolerable wirelengthdegradation factor
Wirelength-OptimizedGlobal Routing
Power-OptimizedGlobal Routing
Net extensionwith level converters
Level converter basednet decomposition
Figure 5.1: Overview of Global Routing with Multi-Supply Voltage.
Global Routing since at this stage the metal layers of each route segment are known. Furthermore,
the spacing of parallel routes can be estimated from the routing congestion. Given a wirelength-
optimized solution, the nets can be rerouted to trade off wirelength with power. For example nets
from higher metal layers can be routed to the lower ones for less wire widths and area capacitance.
Nets can also be rerouted to spread the congestion, thereby increasing their spacings for less cou-
pling capacitance. Activity factor and supply voltage can be incorporated as a power-weight for
each route segment.
We present a mathematical formulation for MSV-based Global Routing to minimize power, and
present integer programming-based techniques to solve the formulation. As part of power saving,
our methods spread the routing congestion and ensure no additional overflow (of routing resources)
and a bounded degradation in wirelength compared to the initial solution.
To the best of our knowledge, this is the first work of power-driven global routing in MSV
designs. Previous works of [68] [78] consider maze-routing in conjunction with buffer insertion to
minimize interconnect and buffer power. In [74], co-design of power-grid and interconnect routing
networks is considered. Recently the work [63] discusses power-driven Global Routing, however
it does not consider the MSV case. Also, it relies on the availability of power-efficient candidate
routes for each net but generates such candidate routes quite heuristically.
As part of the contributions of this work, we show a formal procedure to generate power-efficient
candidate routes from the initial WL-optimized solution while taking into account the overall WL
degradation. This is based on a price-and-branch procedure for the proposed power-driven IP. A
parallel solution procedure similar to PGRIP is adopted for decomposing and concurrent solving of
73
cell
global bins
VL
level converter
VHre
VH
VHVL
VH
Figure 5.2: MSV-based Global Routing model with level converters.
subproblems.
The remainder of the paper is divided into 5 sections. Section 5.1 describes our MSV-based
interconnect model. The level convert placement strategy is introduced in Section 5.6. Section 5.3
discusses our formulation and solution procedure for power minimization. Simulation results are
presented in Section 5.4.
5.1 Interconnect Power Modeling in MSV Domains
In this section, we discuss an MSV-based Global Routing model. We assume the level converters
are placed for some of nets and the supply voltage of each cell is known.
5.1.1 Interconnect Modeling in MSV Designs
We are given a grid-graph G= (V,E) model of the Global Routing problem, where each vertex v∈V
corresponds to a global bin containing a number of cells. Each edge e ∈ E represents the boundary
of two adjacent bins. A capacity re is associated with each edge e, reflecting the maximum number
of routes that can pass between two adjacent bins. A net i ∈ {1, . . . ,N} is identified by its terminal
cells, which are a subset of the vertices V . In MSV-based Global Routing, the level converters are
also considered as net terminals. During Global Routing, a Steiner tree ti in G is found for each net
i to connect its terminals. The length of ti is taken to be its wirelength.
Figure 5.2 demonstrates an example. The chip is divided into regions. Each region has either a
low (VL) or high (VH) supply voltage. A routed net is specified in the figure. The net has one driver
74
(a)
level converter
t1 (vH)
s (vL)t2 (vH)
t3 (vH)
(b)
s
s
t2
t3
(c)
n2 (vH)
n1 (vL)n3 (vH)
n2 (vH)
n1 (vL)
n3 (vH)
cannot merge
can merge
Figure 5.3: Decomposition of net with multi supply voltage levels.
terminal with VL voltage and three sink terminals of VH voltage. There are two level converters in
this route and both of them are also considered as additional terminals of the net.
For power-driven MSV-based Global Routing, we first decompose a net which contains level
converters into a set of sub-nets. We reroute each sub-net as an individual net during power op-
timization. Consequently, we have Nd > N number of nets after decomposition. For example, in
Figure 5.3(a), the initial global route is shown with its level converters. The net is decomposed into
three sub-nets, each of which will be rerouted independently. As shown in Figure 5.3(b), the first
sub-net connects the driver terminal in VL to the two level converters. The second one connects one
level converter to one VH terminal. The third one connects the other level converter to the other two
VH terminals.
The decomposition of each net is done using its initial route and the location(s) of its level
converter(s), assuming they are determined before this stage. For a net containing level converters,
starting from its driver terminal, a sub-net corresponding to a low supply voltage is formed that
connects the driver terminal to a set of level converters and/or a set of sink terminals of the same
supply voltage. Next, one or more sub-nets are formed that connect the level converters to the sink
terminals of the same (and higher) voltage level. The BFS algorithm is utilized to traverse the initial
route in our implementation. For example, in Figure 5.3(b), we start traversing from the source node
until reaching the two level converters. All the touched edges form the first sub-net n1 which has
75
a low supply voltage. Next, we continue traversing from each of the level converters individually
until reaching all the sink nodes, using which the sub-nets n2 and n3 with high supply voltage are
then identified.
Our net decomposition procedure is able to find a minimum number of sub-nets for each net that
contains a level converter such that each sub-net has only one corresponding supply voltage. Note
that after rerouting the sub-nets, it is possible that these sub-nets may pass through the same edge(s)
as shown in Figure 5.3(c). If the sub-nets which pass through the same edges have the same voltage
level, (e.g., the sub-nets n2 and n3 in Figure 5.3(c)), then we can merge these sub-nets to release the
over-utilized routing resources. The above procedure is given for the case when two supply voltages
VL and VH exist, which is also the case considered in this dissertation. For higher number of voltage
domains, the procedure can be extended in a similar way.
5.1.2 Power Modeling
Each decomposed net i ∈ {1, ...,Nd} has a corresponding supply voltage Vi and switching activity
αi. The required interconnect power for a Global Routing solution is estimated as
P = fclk ×
(Nd
∑i=1
αiV 2i (C
sinki +Croute
i )
), (5.1)
where fclk is the frequency. As seen in Equation 5.1, the capacitance of routed net i is the sum
of the capacitances of its sink cells (denoted by Csinki ) and of its route (denoted by Croute
i ). Here
Csinki is a constant that does not depend on the re-routing, so it is excluded from the optimization.
Note that the power of the Level converters are considered fixed and thus also not considered as part
of the interconnect power optimization. The capacitance Croutei for a routed net i is the sum of the
capacitances of its unit-length edges that are contained in route ti (given by notation e ∋ ti):
Croutei = ∑
e∋tiCu
e . (5.2)
The parameter Cue is the capacitance of one routed edge e ∈ E. This capacitance is a function of the
metal layer le, wire width we and wire spacing se of the edge e. Specifically,
Cue =Ca(le,we)+2C f (le,we,se)+2Cc(le,we,se), (5.3)
where Ca and C f are the area and fringe capacitances with respect to substrate, and Cc is the
76
Cc
Cf
CaSubstrate
Cc
Cf
Figure 5.4: Modeling route capacitance on a Global Routing edge.
coupling capacitance. As indicated, these capacitances are functions of wire length, width, and
spacing, and are provided by the technology library through a lookup table.
In this work, we assume that only one (and a different) wire width is associated with each metal
layer, so we exclude the parameter we, and for each edge e ∈ E, its metal layer le is known. The
spacing for edge e is estimated from the edge utilization ue in a Global Routing solution. Given
the utilization ue and the length of edge e, (computed from the chip dimension and the routing
grid granularity), the spacing se is calculated to allow maximum spacing between its corresponding
routes. Figure 5.4 shows an example for ue = 3. This simple averaging strategy may be adjusted
if more information is available at the Global Routing stage; (e.g., the adjustment may be due to
the fixed short nets which fall inside a single global routing bin). With this approximation, we
can express the capacitance of a unit-length route-edge in terms of the edge’s metal layer and its
utilization. The total capacitance of edge e is given by the product of the per-unit capacitance Cue
and the utilization ue: Ce =Cue ×ue.
Figure 5.5 (left) shows the curves representing area, fringe, and coupling capacitances for metal
layer 1 with respect to edge utilization for a 45nm library [6], assuming each Global Routing edge
is 2µ . The summation of the 3 capacitances (Cue ) is shown on the right.
77
Figure 5.5: Dependence of three types of capacitance on edge utilization in metal layer 1.
5.2 Placement of Level Converters
In this section, we discuss our on-route level converter placement strategy. This strategy is used
to generate the simulation results in Section 5.4. Given the placement and initial global routing
information, this strategy searches the available placement space near the initial global routes for
placing level converter(s). It has minimum overhead to the design flow since re-legalizing cells is
not necessary in this strategy. Note that this strategy is not designed for general placement, and also
not necessary for the designs with predefined supply voltage regions and inserted level converters.
To guarantee the connectivity, the level converters are placed on the wirelength-optimized route,
initially provided for each net. This also ensures the addition of level converters won’t cause extra
congestion; it allows connecting each level converter to the initial route conveniently just by adding
vias from the level converter to the initial route. Randomly placing the level converters may harm
the Global Routing congestion and degrade total wirelength or overflow.
We list a set of requirements to identify valid level converter insertion cases for a net i with given
route ti. We assume the net has a single source and may have multiple sink terminals.
1. The location of level converter is located at vertices v in ti (v ∋ ti).
2. This vertex v should fall inside a VH voltage island.
3. The global bin corresponding to v should have enough space to add the level converter. We
denote the available space of v by Av and compute it after placement. (See Figure 5.6).
78
VL
level converter
VH
VH
VL
VH
ti
Av
i1 i2 i3 i4
i4
Figure 5.6: Valid on-route level converter locations for one net.
4. For k vertices v1, ...,vk satisfying the above 3 conditions, if all have the same distance to the
source terminal (in terms of the number of edges on ti), we require k level converters to be
added on these vertices simultaneously.
Figure 5.6 shows the set of potential level converter locations of net i with initial route ti. The
source is the terminal in VL island. Note that one vertex in ti can not be used because it is inside the
VL island. We have four cases for valid level converter insertion locations indicated by i1, i2, i3 and
i4. In the latter case, two level converters should be placed on the net after the diverging point on
the route to ensure VH is delivered to both sink terminals. For a single-source net i, we identify all
the cases for valid level converter insertion locations using a breadth first traversal on ti and denote
this set by Li. In this example |Li|= 4. For each case l ∈ Li, we further compute a corresponding
power pil using Equation 5.1, where the edge utilization required to compute coupling capacitance
is obtained from the initial wirelength-optimized solution. The power includes the interconnect
portions on ti and the level converter(s).
To select one level converter insertion case for each net, we define binary variable xil to be
equal to 1 if and only if case l ∈ Li is selected for net i. The level converter placement problem is
expressed as the following Integer Program (IP) which can efficiently be solved using a solver, as
we elaborate in our experiments.
79
minx,s
N
∑i=1
∑l∈Li
pilxil +N
∑i=1
Msi (IP-LC)
∑l∈Li xil + si = 1 ∀i = 1, . . . ,N
∑Ni=1 ∑l∈Li avlxil ≤ Av ∀v ∈V
si ≥ 0 ∀i = 1, . . . ,N
xil = {0,1} ∀i = 1, . . . ,N,∀l ∈ Li.
where the parameter avl is equal to 1 if in case l, a level converter is placed at vertex v. The first set
of constraints ensures at most one level converter insertion case is selected for each net. The slack
variable si will be positive if there is no available space for placing level converters for net i and
is heavily penalized by positive M to maximize the number of placed level converters. The second
constraints ensure level converters are placed in the free placement space.
In addition, it may not be possible to place level converters on a vertex v of the Global Routing
grid because its corresponding global bin is highly congested. We therefore associate for each
vertex v, a constant parameter Av, indicating its available placement space. In our experiments, the
available space is calculated for each global bin according to the placement density.
After solving IP-LC to obtain the level converter location(s) for each net, the nets passing
through the congested region may not be able to find a valid level converter insertion location. In
this case, it is necessary to detour these nets. We first insert all the level converts identified by Equa-
tion IP-LC and decompose the corresponding nets. The remaining available space for each vertex v
is then calculated. We create a bounding box around each failing net, and search the vertices inside
the bounding box for available resources. If more than one vertex has an available space, we select
the one closest to the source node. If non of the vertices have space available, the bounding box
is then expanded to explore more nearby vertices. Once the insertion location is identified, Maze
Routing is utilized to connect the level converter. Note that additional overflow may be introduced
during the detouring. Fortunately, the detouring case is rare, and the introduced overflow may be
recovered in the later phase.
80
n1 : VL, α=0.3
(a)
n2 : VH, α=0.7
n3 : VL, α=0.4
(b) (c)
Figure 5.7: Comparison between (b) wirelength-optimized Global Routing and (c) power-optimizedGlobal Routing.
5.3 Power-Driven MSV-Based Global Routing
In this section, we first utilize an example to show the motivation behind the power-driven global
routing in MSV-domains. Next we present a mathematical formulation of the power-driven MSV-
based Global Routing. We then discuss integer programming-based techniques to obtain high-
quality solutions to the formulation.
5.3.1 Motivational Example
Figure 5.7 demonstrates the advantage of the power-optimized Global Routing over traditional
wirelength-optimized Global Routing. Three nets with different activities and voltage levels are
presented in Figure 5.7(a). Nets n1 and n3 have low activities and low voltage level, and relatively
long initial routes. Net n2, on the contrary, is a short net but has a high activity and high voltage
level. The initial routes of these three nets share two common edges, as shown in Figure 5.7(a). To
reduce the interconnect power, these nets must be detoured to the neighboring edges. In the tradi-
tional wirelength-optimized Global Routing approach, shown in Figure 5.7(b), net n1 is chosen to
be detoured since it is a long net with multi-terminals. In fact, for most of the latest academic global
routers, net n2 is considered to be fixed due to its short wirelength. Fixing the short net n2, how-
81
0.08
0.12
0.16
15 20 25 30
Edg
e C
apac
tianc
e(F
f)
Edge Utilization (ue)
2um
3um 3
ur4ur 5
ur 6ur
2ur
Figure 5.8: Convex expression of edge capacitance in metal1 with respect to the edge utilization.
ever, will lose the opportunity of power saving. Alternatively, both the activity and voltage level are
simultaneously considered when optimizing power, and net n2 is detoured to achieve more power
saving as depicted in Figure 5.7(c).
5.3.2 Mathematical Formulation
As described in Section 5.1.2, the per-unit capacitance of an edge e (Cue ) is a function of its metal
layer and the edge utilization. Typically, this function is a convex increasing function, as depicted in
Figure 5.8. We represent the function Cue by a set of line segments denoted by Qu
e . For example, the
set Que is composed of 7 line segments in the library used in this work [6]. Each line segment q ∈ Qu
e
is of the form muq+ru
que, for a given range of ue, where muq and ru
q are derived from the library for that
range. For each of the 8 metal layers in our library, the curve Cue is represented as 7 piecewise-linear
segments.
82
Since the per-unit capacitance is convex, its value may be expressed in our mathematical opti-
mization problem for Global Routing with the following set of linear inequalities:
muque + ru
q ≤Cue ,∀q ∈ Qu
e . (5.4)
For a given edge utilization ue, the corresponding Cue is obtained from the line equation that
gives the largest value of muque + ru
q for q ∈ Que .
To model Global Routing, we are given a routing grid graph G = (V,E), a set of decomposed
multi-terminal nets denoted by Nd , and edge capacities re. Let Ti be a collection of all Steiner trees
that can route net i. We later discuss how to approximate Ti by generating a set of power-efficient
candidate trees with consideration of wirelength degradation. Each tree t ∈ Ti is associated with
a binary decision variable xit which is equal to 1 if and only if it is selected to route net i. Let the
parameter ate be equal to 1 if tree t contains edge e (if e ∋ t). The Global Routing problem for power
minimization is given by:
minx,s,Cu
Nd
∑i=1
∑t∈Ti
αiV 2i (∑
e∋tCu
e )xit +Nd
∑i=1
Msi (IP-POW)
∑t∈Ti xit + si = 1 ∀i = 1, . . . ,Nd
∑Ndi=1 ∑t∈Ti atexit ≤ re ∀e ∈ E
muq(∑
Ndi=1 ∑t∈Ti atexit)+bu
q ≤Cue ∀e ∈ E,∀q ∈ Qu
e
∑Ndi=1 ∑t∈Ti witxit ≤W0(1+β )
si ≥ 0 ∀i = 1, . . . ,Nd
xit = {0,1} ∀i = 1, . . . ,Nd ,∀t ∈ Ti.
The first term in the expression of the objective function is the interconnect power as explained
in Section 5.1.2. It includes activity αi and voltage Vi of net i. The capacitance of a route t of net i is
obtained by adding the unit edge capacitances Cue for all the edges e ∋ t. Here the route t ∈ Ti will
be selected for net i only if xit = 1.
83
The first set of constraints selects at most one route for each net. The slack variable si is equal
to 1 if net i cannot be routed, and the variable is penalized in the objective function by a large
parameter M to maximize the number of routed nets. The term ∑Ndi=1 ∑t∈Ti atexit represents the
edge utilizations ue. The second set of constraints ensures that the edge utilizations are within the
given edge capacities. The third set of constraints determines the per-unit edge capacitance Cue for
each edge e from its utilization, using the discussed piece-wise linear model. The fourth constraint
ensures the new wirelength is within a factor β of the initially-provided wirelength W0. Here wit
denotes the wirelength of route t of net i.
While the constraints of the presented (IP-POW) formulation are all linear, the objective expres-
sion is nonlinear since it includes multiplication of variables xit and Cue . We approximately solve the
formulation using the following two-phase heuristic approach:
1. We minimize the capacitance of all the edges (∑∀e∈E Ce) by rerouting the nets passing through
the congested region and ignore the net activities αi and voltage levels Vi.
2. We minimize an estimate of total power obtained by including αi and Vi for each net while
assuming the capacitance is fixed and obtained from step 1. However, to reduce the introduced
error, we heavily penalize the mismatch between the capacitance obtained from phase 1 and
the actual capacitance found at phase 2 for each edge during the optimization.
In the next two subsections we discuss these two phases in detail. For each phase, we first give
its IP formulation and then discuss the details of the procedure to efficiently solve the formulation
including our method for generation of the power-efficient candidate routes.
84
5.3.3 Phase1: Minimizing Total Capacitance
Using the piecewise linear approximation for the per-unit capacitance Cue given by Equation (5.4),
we may also approximate the total capacitance as
Ce =Cue ×ue ≥ mu
qu2e + ru
que ∀q ∈ Que .
This (convex) nonlinear expression may be re-linearized, resulting in another piecewise linear ex-
pression for the total edge capacitance that may be used in our linear integer program for minimizing
the total capacitance.
Ce ≥ mque + rq ∀q ∈ Qe. (5.5)
1) Formulation
The formulation of phase 1 is given by the following IP:
minx,s,C
∑∀e∈E
Ce +Nd
∑i=1
Msi (POW-P1)
∑t∈Ti xit + si = 1 ∀i = 1, . . . ,Nd
∑Ndi=1 ∑t∈Ti atexit ≤ re ∀e ∈ E
mq(∑Ndi=1 ∑t∈Ti atexit)+bq ≤Ce ∀e ∈ E,∀q ∈ Qe
∑Ndi=1 ∑t∈Ti witxit ≤W0(1+β )
xit = {0,1} ∀i = 1, . . . ,Nd ,∀t ∈ Ti
si ≥ 0 ∀i = 1, . . . ,Nd .
The objective expression is similar to the formulation (IP-POW) but the first term is replaced by
∑∀e∈E Ce which represents an estimate of the total interconnect capacitance. The third set of con-
straints is also updated; the variable Ce replaces Cue in the previous formulation, and the coefficients
in the piecewise linear model are updated by using Equation 5.5.
85
2) A Price-and-Branch Solution Procedure
We approximately solve the (POW-P1) using the a two-step heuristics. First, a pricing procedure
is used to generate a set of candidate routes for each net that are power-efficient while considering
the wirelength degradation. The pricing step approximates Ti in the formulation to contain a small
set of power-efficient candidate routes, instead of all the potential routes of net i. Second, branch-
and-bound is applied to solve (POW-P1), selecting one route for each net from the set of generated
candidate routes. The standard branch and bound algorithm can be carried out using a commercial
solver. This two-step procedure of generating candidate routes and then running branch and bound
is commonly known as price-and-branch [8], [38]. We apply the same price-and-branch procedure
as demonstrated in Section 3.1 for power improvement. The major technical difference in this
procedure is in the pricing step to find power-efficient candidate routes, which we next discuss in
detail.
3) Overview of Pricing for Route Generation
We solve a linear-programming relaxation of (POW-P1) by replacing the binary requirements on the
variables xit with constraints 0≤ xit ≤ 1∀i,∀t. The linear program is solved by an iterative procedure
known as column-generation [24]. In column generation, we start by replacing Ti (set of all possible
routes of net i) in formulation (POW-P1) by subset Si ⊂Ti, initially containing one candidate route
per net. We then gradually expand Si, adding new routes that may decrease the objective function.
Adding the new candidate routes is via a power-aware pricing condition for each net.
Before explaining the procedure in more detail, we first give the following notations:
1. We refer to the LP relaxation of (POW-P1) in which Ti is replaced by Si and 0≤ xit≤1 by the
”restricted master problem” denoted by (RMLP-P1); the solution of (RMLP-P1) for a given
Si is denoted by (x, s,C);
2. We refer to the dual of the restricted master problem by (D-RMLP-P1). The solution of (D-
RMLP-P1) consists of (λ ≤ M, π ≤ 0, µ ≥ 0, θ ≤ 0), corresponding to the dual variables for
the first, second, and third set of constraints in the relaxed (POW-P1), respectively.
86
ta
tb0.12 0.17 0.06
0.07
0.07
0.85
0.85
0.07 0.04
t'a tb
0.00 0.00 0.00
0.03
0.00
0.07 0.04
0.07
0.10
0.15
0.12
0.11ua ua
va
va
Figure 5.9: Power-aware route generation.
The iterative column generation procedure including the pricing condition is enumerated below:
1. For each net i = {1, . . . ,Nd}, initialize Si with one route. (In this work we start with the
solution of [12]).
2. Solve (RMLP-P1), yielding a primal solution (x, s,C) and dual values (λ , π, µ, θ ) in (D-
RMLP-P1).
3. Generate a new route t∗ for net i = {1, . . . ,Nd}. Using the solution of step 2, evaluate the
pricing condition: If λi > ∑e∈t∗ ∑q∈Qe mqµeq −∑e∋t∗(πe + θ), then Si = Si ∪{t∗}.
4. If an improving route for some net i was found in step 3, return to step 1. Otherwise, stop—the
solution (x, s,C) is an optimal solution to (RMLP-P1).
Step 3 gives the pricing condition in terms of the solution of the dual problem (D-RMLP-P1)
obtained at the current iteration. This step can determine for a given new route t∗, if it should be
added to the set Si to reduce the objective of (RMLP-P1). However, it does not specify how a
new route should be found such that the pricing condition gets satisfied. We discuss a convenient
graph-based procedure to generate new route t∗ which satisfy the pricing condition.
87
3) Route Generation for One Net
To find the improving routes for net i, we associate a weight we for edge e in the Global Routing
grid:
we = maxq∈Qe
(mqµeq)− πe − θ . (5.6)
By the theory of linear programming, for each edge e, at most one dual variable µeq,q ∈ Qe will
be positive in an optimal solution to (D-RMLP-P1). Thus, considering route t∗, we can compute the
pricing condition as λi > ∑∀e∋t∗ we. We take advantage of this interpretation to identify promising
route t∗ which satisfies the pricing condition. Given a route t ∈Si obtained from previous iterations,
we obtain t∗ by rerouting branches of t with the updated edge weights so that the overall weights of
rerouted branches are reduced.
We explain the procedure with the example of Figure 5.9. Considering two nets a and b, suppose
we are initially given the routes ta and tb for these two nets. After step 2 at the first iteration of
column generation, we obtain edge weights which are given in the figure on the left. To obtain a
new route t∗a for net a, we reroute different branches of ta. For each terminal, we identify a branch
as the segment connecting it to the first Steiner point on ta. We then reroute this branch by solving
Dijkstra’s single-source shortest path algorithm [26] on the weighted graph with the weights of the
first iteration, similar to [59], [61]. The route t∗a is shown in the right figure. After adding t∗a to Sa
we proceed to the second iteration and obtain new edge weights which are shown in the right figure.
The discussed pricing procedure is similar to the procedure introduce in chapter 3. However, it
differs in the pricing condition and the way edge weights are set up. For solving (RMLP-P1) and
its dual at each iteration we use the solver CPLEX 12.0. After obtaining the final set Si, again we
use CPLEX 12.0 for the branch and bound step to get the final solution. We further accelerate the
process by applying a simple problem decomposition that we will discuss in Section 5.3.5.
88
5.3.4 Phase2: Considering Activity and Voltage
At phase 2, we approximate the per-unit edge capacitances using the solution from phase 1, and
re-route the nets to minimize an approximation of the total power. Since the utilization (and hence
capacitance) corresponding to the routing solution of phase 2 may be different from phase 1, we
heavily penalize any mismatch in our optimization.
1) Formulation
We compute the following quantities after phase 1:
1. We define a new ”effective” capacity for each edge e as re = ∑Ndi=1 ∑t∈Ti atexit , where xit is the
value of the routing solution from phase 1.
2. We define the new per-unit capacitance as Cue =
Cere
, where Ce is the value of the edge capac-
itance from the solution found in phase 1.
With these definitions, the formulation of phase 2 is the following integer linear program:
minx,s,ε
Nd
∑i=1
∑t∈Ti
αiV 2i (∑
e∋tCu
e )xit +Nd
∑i=1
M1si + ∑∀e∈E
M2εe (POW-P2)
∑t∈Ti xit + si = 1 ∀i = 1, . . . ,Nd
∑Ndi=1 ∑t∈Ti atexit ≤ re + εe ∀e ∈ E
∑Ndi=1 ∑t∈Ti witxit ≤W0(1+β )
0 ≤ εe ≤ re − re ∀e ∈ E
xit = {0,1} ∀i = 1, . . . ,Nd ,∀t ∈ Ti
si ≥ 0 ∀i = 1, . . . ,Nd .
The first term in the objective expression is summation of an estimate of the power of the nets
where (∑e∋t Cue ) is the fixed approximate per-unit capacitance of edge e which contains route t and
is obtained using the solution of phase 1 as discussed before. The first set of constraints ensures at
89
0.08
0.12
0.16
15 20 25 30
Edg
e C
apac
tianc
e(F
f)
Edge Utilization (ue)
26 0.123eC =�
26eu =
0.123 0.01 eε+ ×
2eM
Figure 5.10: Penalizing the edge capacitance if the rerouting of a net causes a larger edge utilizationcompared to phase 1.
most one route is selected per net, otherwise a heavy penalty of M1 is associated if si = 0, and this is
reflected in the second term of the objective function. The second set of constraints enforces the new
utilization of each edge to be re+εe, where εe is a new variable which is heavily penalized by a large
factor M2 in the objective function if εe = 0. In other words, we highly penalize if the rerouting of a
net causes a larger edge utilization compared to phase 1. This in effect forces the routing process to
keep the mismatch in the edge utilizations as small as possible which translates in the capacitance
(which is function of utilization) to remain close to phase 1. We also enforce εe + re ≤ re to ensure
the edge utilization is not beyond its actual capacity re in the fourth set of constraints. Finally, the
third set of constraints ensures the increase in wirelength is bounded by factor β .
Note that in the objection expression of this formulation, a large constant parameter M2 can be
chosen to penalize the over-utilization of edge resources. To obtain more accurate estimation of
the edge capacitances, alternatively, we can utilize a liner function with respect to the utilization
for each edge to penalize the over-usage, as in our implementation. Figure 5.10 shows an example.
The edge utilization ue for this particular edge after phase1 is 26 units, and the corresponding edge
capacitance C26e is 0.123 according to the look-up table. The liner function 0.123+0.01×εe is then
calculated to estimate the edge capacitance when over-utilizing the edge resources. In this case M2e
is chosen to be 0.01 which is the slope of this function.
90
Fixed terminalVL
VH
Figure 5.11: Decomposition into smaller-sized subproblems similar to GRIP and PGRIP.
2) Solving using Price-and-Branch
The solution procedure is quite similar to the one explained in the previous Section 5.3.3 for phase 1.
Here, we just note the differences. We denote the restricted master problem by (RMLP-P2) and its
solution by (x, s, ε). The dual of the restricted master is denoted by (D-RMLP-P2) and its solution
is (λ , π, θ ), corresponding to the first, second and third set of inequalities in relaxed (POW-P2),
respectively.
The initial set Si is set to all the candidate routes generated from phase 1. This helps to quickly
generate a high quality solution for phase 2. It also ensures that the solution of phase 1 is included
as a feasible solution in phase 2.
The pricing condition is given by the following inequality λi > αiV 2i (∑e∋t Cu
e )−∑e∈t(πe + θ)
and is used to define the edge weights given by we = αiViCue − πe − θ , ∀e ∈ E.
5.3.5 Decomposition
To further accelerate solving of our two-phase formulation, we adopt the following problem de-
composition similar to GRIP and PGRIP. We divide each voltage island into a set of rectangular
subregions by recursive bipartition of the island while balancing the total number of nets which fall
inside each subregion. For a given subregion, in order to decide which nets fall in it, we use the
91
initial wirelength-optimized solution of [12]. We stop when the number of decomposed nets at each
subregion is at most 3000 which we empirically determined for our experimented benchmarks from
the ISPD’08 suite. Figure 5.11 shows an example. Note that in PGRIP, each subregion can have up
to 4000 nets. This is because the number of constraints in the IP formulation (ILP-PGR) of PGRIP
are less than the formulations (POW-P1) and (POW-P2), and therefore the IP solver is capable to
handle more nets in one subproblem.
Next, each subproblem is defined as one rectangular subregion with the set of nets assigned to it.
If a net passes from multiple subregions, we force the terminal location on the subproblem boundary
to be fixed from the wirelength-optimized solution. (See the figure). This allows independent solv-
ing of each subproblem without the hassle of later connecting the segments of a route in adjacent
subproblems. The subproblems are then solved in parallel without any synchronization.
Even though in our decomposition each subproblem in effect is assigned a low or high voltage
level, it is possible that the nets assigned to it have different supply levels. For example a high
voltage net may just pass from a subregion in a low voltage island. Or a net with level converter
(which will have portions of high and low voltage levels after net decomposition) may fall in a high
voltage island.
Overall this decomposition is extended from PGRIP, but we make use of our initially-provided
Global Routing solution for more effective decomposition to determine the fixed terminal locations
on the boundaries for independent and parallel processing of the subproblems.
5.4 Simulation Results
5.4.1 Benchmark Instances
In order to test our solution procedure and determine whether or not significant power savings were
possible without increasing wirelength, we modified known benchmarks to include multi-supply
voltages. Modifying the benchmarks required us to generate timing data, power data, and place
level converters. We implemented the procedure of [69] to generate voltage islands for two voltage
levels of VL = 0.9V and VH = 1.1V . The procedure required a sequential netlist with gate-level delay
and power models.
92
Timing Modeling:
We assumed the locations of the sequential elements in the ISPD’08 benchmarks [4] using the
following procedure. First, we obtained a Directed Acyclic Graph (DAG) representation of the
benchmarks from the variation provided by the corresponding ISPD’06 placement benchmarks [2].
Using the placement benchmarks, we obtained a DAG by starting from the designated Primary
Inputs and traversing in forward direction until reaching the Primary Outputs. We also assumed the
nets with more than 50 terminals to be clock trees to identify sequential elements.
We then assumed the delay of each cell (or node in the DAG) is proportional to its size (for unit
load) where the unit delay was assumed to be of the inverter of the 45nm library [6] used in this
work. We considered loading in our cell delay modeling to be proportional to the cell size which
was also given in the placement benchmarks.
Power Modeling:
We randomly and uniformly generated the activity factors of each net to be between 0.1 and 0.9.
The 45nm library used in this work contained information about the total capacitance (area, fringe,
coupling) for each of the 8 metal layers. We used the method described in Section 5.3 to extract
piece-wise linear model for Ce and Cue for each of the 8 metal layers. For each metal layer, we
considered the minimum wire size given in the library. To map edge utilization to spacing, we
assumed the length of each edge of the Global Routing grid to be 2µ; for a given utilization we
assumed the maximum spacing between the routes mapped to the same Global Routing edge.
5.4.2 Level Converter Placement
In our first experiment, we report the result from our level converter placement algorithm for the
nets that contained a level converter (had a source terminal in VL island with fanout terminals in VH
islands). We consider the following case in our experiment: We routed all the nets using the initial
wirelength-optimized solution of NTHU-Route2.0 [12]. We solve our formulation (IP-LC) to obtain
the level converter locations subject to the area density constraints. We consider the obtained results
as the base case for power comparison in our second experiment.
Recall the placement of level converters can impact the power of each route by decomposing
it into multiple segments where each segment has a high or low supply level. Using Equation 5.1,
93
Table 5.1: Results of the level converter placement for the ISPD’08 benchmarks.
Bench #Net #NetLC #LC Power WCPU(min)
adaptec1 177K 9K 20K 432242 5
adaptec2 208K 8K 17K 336881 7
adaptec3 368K 17K 43K 1056778 8
adaptec4 401K 16K 36K 751120 13
adaptec5 548K 32K 85K 1199591 11
newblue1 271K 75K 16K 318922 10
newblue2 374K 22K 47K 453234 17
newblue4 531K 38K 79K 927712 9
newblue5 892K 26K 84K 1469859 14
newblue6 835K 31K 91K 1367000 17
newblue7 1647K 28K 72K 2201835 21
biglue1 197K 9K 26K 619321 6
biglue2 429K 15K 43K 560723 13
biglue3 666K 23K 60K 814957 12
biglue4 1134K 17K 51K 1254323 15
we compute the total power of the nets which need level conversion. This includes the power of
level converters and the different routes segments of the decomposed nets after inserting the level
converters.
Table 5.1 reports our power comparison results. We report the total number of nets and the
number of nets which require level conversion in columns 2 and 3, respectively for each benchmark.
The total number of level converters in our case is given in column 4. The number of level converters
are larger than column 3, indicating that for some nets it may be better to add extra LCs but place
them closer to the sink terminals to reduce the route portion that is driven by high voltage and save
power. In column 5, we report the power of ([12]+LC) for the nets including the ones with level
conversion. We use these power numbers as the base case for our next experiment. Finally, the wall
clock time of the level converter placement (indicated by WCPU) is given in column 6. As can be
seen this step is done very quickly.
94
Table 5.2: Results of Power-GRIP for the ISPD’08 benchmarks. The wirelength is scaled to 105.Power and capacitance are scaled to 103.
Bench #Net # Netd # SP initial solution ([12]+LC) phase 1 phase1+phase2
W0 C P -WL(%) -C(%) -P(%) -WL(%) -C(%) -P(%)
adaptec1 177K 197K 130 54.2 953.3 432.2 0.05 11.70 8.57 0.07 15.48 16.17
adaptec2 208K 224K 195 53.0 750.0 336.9 0.12 10.34 6.93 0.14 14.57 15.13
adaptec3 368K 411K 359 132.7 2187.0 1056.8 0.01 11.51 8.67 0.34 13.55 13.94
adaptec4 401K 437K 296 123.0 1613.8 751.1 0.02 12.16 8.46 0.04 16.92 17.20
adaptec5 548K 632K 454 158.7 2543.0 1199.6 0.38 8.60 6.08 0.43 10.23 10.88
newblue1 271K 287K 195 47.0 612.2 318.9 0.11 13.39 9.87 0.22 17.45 18.40
newblue2 374K 421K 312 77.6 894.9 453.2 0.04 14.19 7.87 0.09 19.20 19.34
newblue4 531K 610K 462 133.7 1955.4 927.7 0.02 13.39 9.61 0.54 17.45 17.61
newblue5 892K 975K 658 234.7 3405.3 1469.9 0.89 11.55 6.75 0.86 14.00 13.47
newblue6 835K 926K 532 180.2 2834.9 1367.0 0.62 12.56 9.35 0.57 16.35 17.80
newblue7 1647K 1719K 670 360.2 5004.4 2201.8 0.01 15.12 11.20 0.17 19.63 20.93
bigblue1 197K 222K 152 57.0 1110.4 619.3 0.23 10.68 7.20 0.16 12.17 12.56
bigblue2 429K 472K 275 92.4 1283.8 560.7 0.14 11.64 7.86 0.10 14.76 14.33
bigblue3 666K 725K 453 133.0 1664.6 815.0 0.91 15.22 10.99 0.93 20.25 20.31
bigblue4 1134K 1184K 509 233.0 3006.6 1254.3 0.18 16.03 12.12 0.28 22.31 22.46
Avg. 0.25 12.54 8.77 0.34 16.29 16.70
5.4.3 Power Saving during Global Routing
Using the initial WL-optimized solution of [12], and after fixing the locations of level converters,
we applied net decomposition (as described in Section 5.1.1). Table 5.2 reports the number of nets
and decomposed nets in columns 2 and 3 respectively. We then applied our power-driven Global
Routing procedure using a wirelength degradation factor of β = 0, so no wirelength degradation was
allowed. We used CPLEX 12.0 [37] to solve our two-phase formulation, and parallel-processed the
subproblems by submitting the jobs to a grid of CPUs of 2GB memory. The number of subregions
(same as number of processors) is given in column 4 (#SP) in Table 5.2.
We then compared three routing solutions.
• The initial WL-optimized solution of [12];
• The solution after applying phase 1, obtained by solving the formulation (POW-P1);
• The solution by further applying phase 2, obtained by solving (POW-P1) followed by (POW-
P2).
95
For each case, we report the wirelength (WL), the total capacitance (C) (∑Ndi=1Croute
i , where Croutei
is defined in (5.2)), given in units f F , and the Global Routing power metric P from (5.1), excluding
the constant portions of the expression.
The results are reported in Table 5.2 in columns 5 to 13. For the initial solution, we report
the wirelength (W0) of the NTHU-R2.0 routes that have been augmented with the extra via-only
segment(s) to connect the level converters to the original routes. (As a result, there is slight increase
in wirelength compared to the numbers reported in the work [12]). For the solutions of phase 1 and
phase 2, we report only the percentage improvement in total wirelength, C, and P, all with respect to
the initial solution.
As can be seen, applying phase 1 of the power-reduction heuristic results in significant saving
of 8.77% in P. Recall, the savings are solely due to capacitance reduction (as can be seen from
the higher improvement rate in C compared to P). By further applying phase 2, we see additional
improvement in P (on average 16.70%). The improvement in C is slightly larger than phase 1,
even though phase 1 solely focuses on optimizing C. This is because we start phase 2 by including
all the candidate routes generated from phase 1. Notice that in both phase 1 and phase 2 there is
improvement (reduction) in wirelength compared to W0. It is important to note that no extra overflow
was introduced in the power-optimized solutions.
In our simulations, we explicitly bounded the runtime for phase 1 and phase 2. The wall clock
runtime of all benchmarks for phase 1 and phase 2 were set to 30min and 40min, respectively. The
number of processors (same as subregions) is given in column 5.
Another feature of our approach is that we can explore the tradeoff between wirelength and
power by controlling the wirelength degradation factor β . Take adaptec1 for instance. Allowing 2%
degradation in wirelength results in extra 4.5% power saving.
96
Chapter 6
CONCLUSIONS AND FUTURE WORKS
6.1 Conclusions
In this dissertation, we presented three related topics on parallelizing Global Routing via integer
programming. In Chapter 3, we presented GRIP which is a procedure for global routing via integer
programming. GRIP is based on solving an IP formulation by column generation and branch and
bound to select candidate routes for each net. The method uses dual information to create a dynamic
congestion metric and directly solves the 3-D model of the routing problem. GRIP uses techniques
to decompose the large-scale problem instances into subproblems of manageable size and recon-
nects the subproblem solutions again using IP. In the case of overflow, GRIP can apply an overflow
reduction procedure again using integer programming.
To further improve the runtime, in Chapter 4, we proposed PGRIP, a parallel global routing pro-
cedure that could independently process subproblems with minimal synchronization among them.
The parallel implementation highly relied on the IP formulation and the procedure to solve it. Our
goal was to show that integer programming (which was considered too computational-intensive for
global routing in the industrial-sized designs) can be used with the aid of massive parallelism to
significantly improve the solution quality while meeting the practical runtime requirements.
In Chapter 5, we proposed Power-GRIP which minimized an interconnect power metric for
designs with multi-supply voltage in the global routing stage. We presented an IP formulation
which considered power saving opportunities by reducing the area, fringe and congestion-dependent
coupling capacitances at each metal layer, while accounting for the activity and supply voltage of
each route segment. We showed significant savings in the power metric for global routing without
any degradation in wirelength or overflow. We also demonstrated that a similar parallel procedure
of PGRIP is applicable to solve this extended IP formulation.
In a broader sense, this work aims to show that parallelism can be used to invest in alternative
computational techniques which may have been considered as too time-consuming in the past, in
order to improve the solution quality.
97
6.2 Future Work 1: Layer-Directive Based Global Routing
As the device performance continues to scale following Moore’s law, the scaled interconnect per-
formance has remained essentially constant. In the past, the wire resistance was so low that the
interconnect delay was negligible, and the circuit delay was dominated by the device delay. In to-
day’s deep sub-micron process technology, however, interconnect delay has dominated the circuit
delay due to the increasing of wire resistance and side-wall capacitance [51].
In 2007 and 2008, the release of large-sized Global Routing benchmarks [3], [4] resulted in
monumental progress of academic global routers. The evaluation metrics suggested by these bench-
marks, namely the total wirelength, via count, and overflow, were improved remarkably. These
metrics, however, are no longer sufficient to catch up with the demands imposed by modern process
technology. Specifically, the main concern of these metrics is that they fail to capture the intercon-
nect delay in Global Routing.
In modern VLSI designs, different metal layers have different wire width. As shown in Fig-
ure 6.1, the wires in the lower metal layers have smaller wire widths so that the routing resources
can be increased. The higher metal layers, on the other hand, tend to have wider wire widths due to
the manufacturability and reliability. When using the wirelength as the main objective in the Global
Routing problem, nets are pushing down to the lower metal layers as much as possible so that the
wirelength can be minimized. In the past, pushing the nets to the lower metal layers didn’t hurt the
interconnect delay since the wire resistance was so small that the per-unit wire RC delay over all
metal layers were almost identical. In today’s deep sub-micron process technology, nevertheless,
pushing down the nets increases the interconnect delay significantly because of the high RC in the
lower metal layers.
One solution to this problem is to identify a set of timing critical nets, and prompt these nets to
the upper metal layers to decrease their net delay. To achieve this, it is necessary to incorporate “per
net” based layer information in Global Routing. Also the timing critical nets should be correctly
identified and assigned to proper layers. Recently, a new set of global routing benchmarks [53] has
been released which contains layer information for each net. For most existing global routers, it is
not difficult to incorporate layer information. But identifying and assigning critical nets to proper
layers are the main challenges.
98
M1
M2
M3
M4
M5
M6
Figure 6.1: Different metal layers have different wire widths. The even numbered metal layers runhorizontally across the picture, while the odd numbered layers run perpendicular to the picture.
6.2.1 Our Proposed Method
Our objective is to prompt as many timing critical nets to the upper layers as possible so that the
interconnect delay can be minimized. Ideally, we can identify all the nets with negative timing
slacks and prompt these nets to the upper metal layers. However, there are several problems for this
simple strategy. First of all, the routing resources of the upper metal layers are limited due to the
wider wire widths. For timing critical designs, it is difficult to route all the nets with negative slacks
using upper metal layers. Furthermore, the timing information is estimated before Global Routing
since there is no physical routing path. The nets with positive slacks before Global Routing may
become timing critical nets afterwards, and vice versa.
To utilize the limited upper layer routing resources effectively, we propose an iterative layer
directive based global routing strategy which is built on top of the commercial router Zroute [64].
As shown in Figure 6.2, we start from using a set of tight selection criteria to identify the most
critical nets, and prompt these nets to the upper metal layers. After one iteration, the selection
criteria are relaxed to identify more critical nets, and the design is then rerouted again. Note that
after one iteration, we recalculate the net delay of critical nets and remove those nets that have short
wirelength and large positive slacks. Nevertheless, we keep the short nets with slightly positive
slacks in the critical net set to prevent the bouncing effect. Also, we never remove the long nets
from the critical net set since pushing down the long nets may significantly increase the net delay.
Following we explain the two main portions of this strategy in detail.
99
Identify Critical Nets
Wirelength Slack Slew Bottleneck
Divide Critical Nets into M Bins
Route Bin i
Ripped up Scenic Routes &Push Down for Rerouting
Finish Layer DirectiveGlobal Routing
Set i = M
i > 0, i--
Route Remaining NetsIf the last critical net bin does not have scenic route
i == 0
Ripped All Nets
Figure 6.2: Overview of layer directive Global Routing.
6.2.2 Identification of Critical Nets
We select the critical nets based on four criteria including wirelength, slack, input slew, and bot-
tleneck. It is obvious that the nets with long wirelength and/or large negative slacks should be
considered as critical nets. Although input slew does not directly affect net delay, it is still worth-
while to consider nets which have a high input slew from their driver cells as critical nets. This is
because improving the delay of these nets can help to reduce the “total” path delay. The bottleneck
nets are those nets having large fan-in or fan-out cones. Improving the delay of this type of nets can
significantly improve the entire design performance since they are shared by many paths. In this
case, the bottleneck nets, even if they only have a slight degree of negative slacks, are worthwhile
to prompt to the upper metal layers.
100
6.2.3 Bin Routing of Critical Nets
After identifying the critical nets, we then divide them into several bins based on their wirelength.
If a net has longer wirelength, then it should be prompted to the higher metal layers. The number of
bins is based on the available routing layers for critical nets. For example, if we use metal4 to metal6
for routing critical nets, then the critical nets are divided into two bins. The first bin uses metal6 and
metal5 for routing, and the second bin uses metal6 to metal4 respectively. Our bin routing procedure
starts form the bin using the highest metal layers. After routing this bin, we detect the scenic routes,
which have high ratio between the global routing wirelength and the initial estimated wirelength, of
this bin. These detected scenic routes are ripped up and pushed down to the next bin for rerouting.
In particular, the number of scenic routes is a good indication of congestion. After routing all the
critical net bins, all the other nets are then routed with no layer constraints.
6.2.4 Discussion
Our proposed layer directive global routing approach is robust in the sense that it can handle de-
signs with different geometry aspect ratios and routing blockages. One may argue that the iterative
approach is too time-consuming. From our initial experiment, however, we show that this proposed
approach can achieve significant circuit timing improvement, and therefore the post routing opti-
mization efforts can be reduced significantly. This is an ongoing project, and more features are still
under development.
6.3 Future Work 2: Enhancing the Correlation Between the Placement and Routing Stages
As we discussed in the first chapter, the physical design is decomposed into several stages, and then
an iterative approach is utilized to simultaneously consider the design objectives against a list of
design constraints. The decomposition helps to simplify the design flow so that each stage can focus
on solving a particular optimization problem. However, it also creates the correlation issue among
the stages, especially for the placement and routing stages. For example during the placement stage,
the wirelength of a net is estimated since there is no physical routing information. When utilizing
the total wirelength as the main objective in the placement stage, cells are usually squeezed together
so that the “estimated” wirelength can be minimized. The squeezed placement, however, may lead
101
to congested regions and cause routability issue during the routing stage.
One solution to enhance the correction between the placement and routing stages is to incorpo-
rate Global Routing in the placement stage to guide the placement engine. As shown in Figure 1.1,
Global Routing can be incorporated in both the placement and post-placement optimization stages
to improve the routability and all the other main objectives such as timing, power, and cell density.
Almost all the existing commercial EDA tools have the capability to incorporate Global Routing in
the placement stage to enhance the design routability. Recently, the researchers in the academic area
start to center on this issue because of the ISPD’11 placement contest [5]. The correlation between
the post-placement optimization and routing stages, nevertheless, is still an open research area, and
we focus on this particular issue.
6.3.1 Incorporation of Global Routing in the Post-Placement Optimization Stage
After placing the cells in a design, several techniques such as cell sizing and buffer insertion are
utilized in the post-placement optimization stage to improve the design performance. For example,
if a net has long wirelength and negative timing slacks, then buffers can be added to this net to
improve its timing. However, if there is no physical routing information at this stage, the wirelength
of a net is then estimated, and it may lead to over optimization or under optimization of a design.
When a design is over optimized by inserting too many buffers or sizing up cells too much, on one
hand, the power consumption can increase significantly. Moreover, inserting too many buffers may
increase the cell density and result in the routability issue. For example, as shown in Figure 6.3, there
is a 8-bit bus with long wirelength located at the high metal layer. Buffers are inserted in the middle
of the bus to improve the timing of this bus, and these buffers could create a routing congested
region. This is because many routing resources are needed by connecting the high metal layer bus
to the inserted buffers (located at metal1). If the design is under optimized, on the other hand,
it puts more pressure to the routing stage (strict timing requirement for critical nets) . Therefore,
incorporating Global Routing to obtain more accurate routing information is the key step to enhance
the solution quality for the post-placement optimization stage.
Our proposed Global Routing algorithm for the post-placement optimization stage is similar to
what we discussed in the previous section with several enhancements. First of all, this global router
102
M1
M2
M3
M4
M5
M6
Figure 6.3: Inserting buffers in the post-placement stage may create congested regions and causeroutability issue.
needs to support incremental routing. For example, if a buffer is added to a long critical net which
was assigned to the upper metal layers, then this net should be pushed down to release the upper
layer resources. Also, other critical nets can be prompted up to decrease their net delay. This step
must be done incrementally to reduce the runtime. Second, the router must have the capability to
guide the optimizer. For example, if prompting a long critical net to the upper metal layers can
satisfy the timing requirement, then this net should not be touched by the optimizer, and vice versa.
This requirement can be done by assigning different RC scalings to different nets. To conclude, we
hope that by incorporating Global Routing in the post-placement optimization stage, we can bridge
the gap between these two stages and accelerate the path to design closure.
103
BIBLIOGRAPHY
[1] ISPD 1998 global routing benchmark suite,[online] http://www.ece.ucsb.edu/ kastner/labyrinth.
[2] ISPD 2006 placement contest and benchmark suite,[online] http://archive.sigda.org/ispd2006/contest.html.
[3] ISPD 2007 global routing contest and benchmark suite,[online] http://www.sigda.org/ispd2007/rcontest/.
[4] ISPD 2008 global routing contest and benchmark suite,[online] http://www.sigda.org/ispd2008/contests/ispd08rc.html.
[5] ISPD 2011 routability-driven placement contest and benchmark suite,[online] http://www.ispd.cc/contests/11/ispd2011 contest.html.
[6] Nangate 45 nm open cell library, [online] http://www.nangate.com. 2008.
[7] Christoph Albrecht. Global routing by new approximation algorithms for multicommodi-tyflow. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,20:622–632, 2001.
[8] Cynthia Barnhart, Ellis L. Johnson, George L. Nemhauser, Martin W. P. Savelsbergh, andPamela H. Vance. Branch-and-price: Column generation for solving huge integer programs.Operations Research, 46:316–329, 1996.
[9] Laleh Behjat and Andy Chiang. Fast integer linear programming based models for VLSI globalrouting. In IEEE/ACM International Symposium on Circuits and Systems, pages 6238–6243,2005.
[10] Michael Burstein and Richard N. Pelavin. Hierarchical wire routing. IEEE Transactions onComputer-Aided Design of Integrated Circuits and Systems, 2:223–234, 1983.
[11] Zhen Cao, Tong Jing, Jinjun Xiong, Yu Hu, Lei He, and Xianlong Hong. Dprouter: A fastand accurate dynamic-pattern-based global routing algorithm. In IEEE/ACM Asia and SouthPacific Design Automation Conference, pages 256–261, 2007.
[12] Yen-Jung Chang, Yu-Ting Lee, and Ting-Chi Wang. NTHU-Route 2.0: a fast and stable globalrouter. In IEEE/ACM International Conference on Computer Aided Design, pages 338–343,2008.
104
[13] Huang-Yu Chen, Chin-Hsiung Hsu, and Yao-Wen Chang. High-performance global routingwith fast overflow reduction. In IEEE/ACM Asia and South Pacific Design Automation Con-ference, pages 582–587, 2009.
[14] Tai-Chen Chen and Yao-Wen Chang. Multilevel full-chip gridless routing considering opticalproximity correction. In IEEE/ACM Asia and South Pacific Design Automation Conference,pages 1160–1163, 2005.
[15] Tai-Chen Chen, Yao-Wen Chang, and Shyh-Chang Lin. A novel framework for multilevel full-chip gridless routing. In IEEE/ACM Asia and South Pacific Design Automation Conference,pages 636–641, 2006.
[16] Minsik Cho, Katrina Lu, Kun Yuan, and David Z. Pan. BoxRouter 2.0: A hybrid and robustglobal router with layer assignment for routability. ACM Transactions on Design Automationof Electronic Systems, 14:1–21, 2009.
[17] Minsik Cho and David Z. Pan. BoxRouter: A new global router based on box expansion andprogressive ILP. IEEE Transactions on Computer-Aided Design of Integrated Circuits andSystems, 26:2130–2143, 2007.
[18] Chris C. N. Chu and Yiu-Chung Wong. Flute: Fast lookup table based rectilinear steinerminimal tree algorithm for VLSI design. IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems, 27:70–83, 2008.
[19] Jason Cong, Jie Fang, Min Xie, and Yan Zhang. MARS - a multilevel full-chip gridless routingsystem. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,24:382–394, 2004.
[20] Jason Cong and Patrick H. Madden. Performance driven global routing for standard cell de-sign. In IEEE/ACM International Symposium on Physical Design, pages 73–80, 1997.
[21] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introductionto algorithms, second edition. 2001.
[22] CPLEX Optimization, Inc., Incline Village, NV. Using the CPLEX Callable Library, Version9, 2005.
[23] Ke-Ren Dai, Wen-Hao Liu, and Yih-Lang Li. Efficient simulated evolution based reroutingand congestion-relaxed layer assignment on 3-D global routing. In IEEE/ACM Asia and SouthPacific Design Automation Conference, pages 570–575, 2009.
[24] George B. Dantzig and Philip Wolfe. Decomposition principle for linear programs. OperationsResearch, 8:101–111, 1960.
105
[25] Jacques Desrosiers and Marco E. Lubbecke. A primer in column generation. In G. Desaulniers,J. Desrosiers, and M. M. Solomon, editors, Column Generation, chapter 1. Springer, 2005.
[26] Edsger W. Dijkstra. A note on two problems in connetion with graphs. Numerische Mathe-matik, 1:269–271, 1996.
[27] Jhih-Rong Gao, Pei-Ci Wu, and Ting-Chi Wang. A new global router for modern designs. InIEEE/ACM Asia and South Pacific Design Automation Conference, pages 232–237, 2008.
[28] Michael R. Garey and David S. Johnson. The rectilinear Steiner tree problem is NP-complete.SIAM Journal of Applied Math, 32:826–834, 1977.
[29] Liangpeng Guo, Yici Cai, Qiang Zhou, and Xianlong Hong. Logic and layout aware volt-age island generation for low power design. In IEEE/ACM Asia and South Pacific DesignAutomation Conference, pages 666–671, 2007.
[30] F. O. Hadlock. A shortest path algorithm for grid graphs. Networks, 7:323–334, 1977.
[31] Raia T. Hadsell and Patrick H. Madden. Improved global routing through congestion estima-tion. In IEEE/ACM Design Automation Conference, pages 28–31, 2003.
[32] Peter Hart, Nils Nilsson, and Bertram Raphael. A formal basis for the heuristic determinationof minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4:100–107,1968.
[33] Xianlong Hong, Tianxiong Xue, Jin Huang, Chung kuan Cheng, and Ernest S. Kuh. TIGER:an efficient timing-driven global router for gate array and standard cell layout design. IEEETransactions on Computer-aided Design of Integrated Circuits and Systems, 16:1323–1331.
[34] Jin Hu, Jarrod A. Roy, and Igor L. Markov. Sidewinder: a scalable ILP-based router. In ACMInternational Workshop on System-Level Interconnect Prediction, pages 73–80, 2008.
[35] Jin Hu, Jarrod A. Roy, and Igor L. Markov. Completing high-quality global routes. InIEEE/ACM International Symposium on Physical Design, pages 35–41, 2010.
[36] T. C. Hu and Man-Tak Shing. A decomposition algorithm for circuit routing. MathematicalProgramming Essays in Honor of George B. Dantzig Part I,, 24:87–103, 1985.
[37] International Business Machines Corp., Armonk, NY. Using the CPLEX Callable Library,Version 12, 2009.
[38] David Grove Jogensen and Morten Meyling. A branch-and-price algorithm for switch-boxrouting. Networks, 40:13–26, 2002.
106
[39] Donald B. Johnson. Efficient algorithms for shortest paths in sparse networks. Journal of TheACM, 24:1–13, 1977.
[40] Andrew B. Kahng and Gabriel Robins. On optimal interconnections for VLSI. Kluwer Aca-demic Publishers, Boston, MA, 1995.
[41] Andrew B. Kahng and Alexander Z. Zelikovsky. Highly scalable algorithms for rectilinear andoctilinear steiner trees. In IEEE/ACM Asia and South Pacific Design Automation Conference,pages 827–833, 2003.
[42] Ryan Kastner, Elaheh Bozorgzadeh, and Majid Sarrafzadeh. Pattern routing: Use and theoryfor increasing predictability and avoiding coupling. IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems, 21(7):777–790, 2002.
[43] Chin Yang Lee. An algorithm for path connections and its applications. IEEE Transactions onElectronic Computers, 10:346–365, 1961.
[44] Kai-Win Lee and Carl Sechen. A global router for sea-of-gates circuits. In IEEE/ACM Euro-pean Design and Test Conference, pages 242–247, 1991.
[45] Jeffrey T. Linderoth. Topics in Parallel Integer Optimization. PhD thesis, Georgia Institute ofTechnology, 1998.
[46] Michael J. Litzkow, Miron Livny, and Matt W. Mutka. Condor - A hunter of idle workstations.In International Conference on Distributed Computing Systems, pages 104–111, 1988.
[47] Wen-Hao Liu, Wei-Chun Kao, Yih-Lang Li, and Kai-Yuan Chao. Multi-threaded collision-aware global routing with bounded-length maze routing. In IEEE/ACM Design AutomationConference, pages 200–205, 2010.
[48] Nir Magen, Avinoam Kolodny, Uri Weiser, and Nachum Shamir. Interconnect-power dissipa-tion in a microprocessor. In System-Level Interconnect Prediction, pages 7–13, 2004.
[49] Malgorzata Marek-sadowska. Route planner for custom chip design. In IEEE/ACM Interna-tional Conference on Computer Aided Design, pages 246–249, 1986.
[50] Larry Mcmurchie and Carl Ebeling. PathFinder: a negotiation-based performance-drivenrouter for FPGAs. In Symposium on Field Programmable Gate Arrays, pages 111–117, 1995.
[51] Joe W. McPherson. Reliability challenges for 45nm and beyond. In IEEE/ACM Design Au-tomation Conference, pages 176–181, 2006.
[52] Michael D. Moffitt. MaizeRouter: Engineering an effective global router. IEEE Transactionson Computer-Aided Design of Integrated Circuits and Systems, 27:2017–2026, 2008.
107
[53] Michael D. Moffitt. Global routing revisited. In IEEE/ACM International Conference onComputer Aided Design, pages 805–808, 2009.
[54] Edward F. Moore. Shortest path through a maze. In Annals of Computation Laboratory, pages285–292, 1959.
[55] MOSEK ApS, Copenhagen, Denmark. The MOSEK C API manual, Version 5.0, 2008.
[56] Dirk Muller. Optimizing yield in global routing. In IEEE/ACM International Conference onComputer Aided Design, pages 480–486, 2006.
[57] Muhammet Mustafa Ozdal and Martin D. F. Wong. Archer: a history-driven global routingalgorithm. In IEEE/ACM International Conference on Computer Aided Design, pages 488–495, 2007.
[58] Min Pan and Chris C. N. Chu. Fastroute: a step to integrate global routing into placement. InIEEE/ACM International Conference on Computer Aided Design, pages 464–471, 2006.
[59] Min Pan and Chris C. N. Chu. Fastroute 2.0: A high-quality and efficient global router. InIEEE/ACM Asia and South Pacific Design Automation Conference, pages 250–255, 2007.
[60] Min Pan and Chris C. N. Chu. IPR: An integrated placement and routing algorithm. InIEEE/ACM Design Automation Conference, pages 59–62, 2007.
[61] Jarrod A. Roy and Igor L. Markov. High-performance routing at the nanometer scale. IEEETransactions on Computer-Aided Design of Integrated Circuits and Systems, 27:1066–1077,2008.
[62] Rupesh S. Shelar and Marek Patyra. Impact of local interconnects on timing and power in ahigh performance microprocessor. In IEEE/ACM International Symposium on Physical De-sign, pages 145–152, 2010.
[63] Hamid Shojaei, Tai-Hsuan Wu, Azadeh Davoodi, and Twan Basten. A pareto-algebraic frame-work for signal power optimization in global routing. In IEEE/ACM International Symposiumon Low Power Electronics and Design, pages 407–412, 2010.
[64] Synopsys, Inc., Mountain View, CA. IC Compiler User Guide: Zroute, 2010.
[65] Tamas Terlaky, Anthony Vannelli, and Hu Zhang. On routing in VLSI design and communi-cation networks. Discrete Applied Mathematics, 156(11):2178–2194, 2008.
[66] Richard W. Thaik, Ngee Lek, and Sung-Mo Kang. A new global router using zero-one integerlinear programming techniques for sea-of-gates and custom logic arrays. IEEE Transactionson Computer-Aided Design of Integrated Circuits and Systems, 11:1479–1494, 1992.
108
[67] Benjamin S. Ting and Bou Nin Tien. Routing techniques for gate array. IEEE Transactions onComputer-Aided Design of Integrated Circuits and Systems, 2:301–312, 1983.
[68] Di Wu, Jiang Hu, and Rabi Mahapatra. Coupling aware timing optimization and antennaavoidance in layer assignment. In IEEE/ACM International Symposium on Physical Design,pages 20–27, 2005.
[69] Huaizhi Wu, I-Min Liu, Martin D. F. Wong, and Yusu Wang. Post-placement voltage islandgeneration under performance requirement. In IEEE/ACM International Conference on Com-puter Aided Design, pages 309–316, 2005.
[70] Tai-Hsuan Wu, Azadeh Davoodi, and Jeffrey T. Linderoth. GRIP: scalable 3D global routingusing integer programming. In IEEE/ACM Design Automation Conference, pages 320–325,2009.
[71] Tai-Hsuan Wu, Azadeh Davoodi, and Jeffrey T. Linderoth. A parallel integer programmingapproach to global routing. In IEEE/ACM Design Automation Conference, pages 194–199,2010.
[72] Tai-Hsuan Wu, Azadeh Davoodi, and Jeffrey T. Linderoth. GRIP: Global routing via inte-ger programming. IEEE Transactions on Computer-Aided Design of Integrated Circuits andSystems, 30(1):72–84, 2011.
[73] Tai-Hsuan Wu, Azadeh Davoodi, and Jeffrey T. Linderoth. Power-driven global routing formulti-supply voltage domains. In IEEE/ACM Design, Automation and Test in Europe, pages443–448, 2011.
[74] Jinjun Xiong and Lei He. Full-chip multilevel routing for power and signal integrity. Integra-tion, 40(3):226–234, 2007.
[75] Yong Xu, Ted K. Ralphs, Laszlo Ladanyi, and Matthew J. Saltzman. Computational experiencewith a software framework for parallel integer programming. Informs Journal on Computing,21:383–397, 2009.
[76] Yue Xu, Yanheng Zhang, and Chric Chu. Fastroute 4.0: global router with efficient via min-imization. In IEEE/ACM Asia and South Pacific Design Automation Conference, pages 576–581, 2009.
[77] Zhen Yang, Anthony Vannelli, and Shawki Areibi. An ILP based hierarchical global routingapproach for VLSI ASIC design. Optimization Letters, 1:281–297, 2007.
[78] Ahmed Youssef, Zhen Yang, Mohab Anis, Shawki Areibi, Anthony Vannelli, and MohamedElmasry. A power-efficient multipin ILP-based routing technique. IEEE Transactions onCircuits and Systems I-regular Papers, 57:225–235, 2010.
109
[79] Hai Zhou and Martin D. F. Wong. Global routing with crosstalk constraints. IEEE Transactionson Computer-aided Design of Integrated Circuits and Systems, 18:1683–1688, 1999.