Networks-on-Chips: Theory and PracticeAn overview of the cost considerations on the design of NoCs...

Networks-on-Chips: Theory and Practice

Editors: Fayez Gebali and Haytham Elmiligi

ii

Contents

1 Three-Dimensional Network-on-Chip Architectures 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Alternative Vertical Interconnection Topologies . . . . . . . . . . . . . . . . 6

1.4 Overview of the Exploration Methodology . . . . . . . . . . . . . . . . . . . 9

1.5 Evaluation – Experimental Results . . . . . . . . . . . . . . . . . . . . . . . 12

1.5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.5.2 Routing Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.5.3 Impact of Traffic Load . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.5.4 3D NoC Performance under Uniform Traffic . . . . . . . . . . . . . . 18

1.5.5 3D NoC Performance under Hotspot Traffic . . . . . . . . . . . . . . 22

1.5.6 3D NoC Performance under Transpose Traffic . . . . . . . . . . . . . 25

1.5.7 Energy Dissipation Breakdown . . . . . . . . . . . . . . . . . . . . . 27

1.5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

i

ii CONTENTS

1.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Chapter 1

Three-Dimensional Network-on-Chip

Architectures

1.1 Introduction

Future integrated systems will contain billion of transistors [51], composing tens to hundreds

of IP cores. These IP cores, implementing emerging complex multimedia and network ap-

plications, should be able to deliver rich multimedia and networking services. An efficient

cooperation among these IP cores (e.g., efficient data transfers) can be achieved through

utilization of the available resources.

The design of such complex systems includes several challenges to be addressed. Among

others one challenge is to design an on-chip interconnection network that should be able to

efficiently connect the IP cores. Another challenge is to derive such an application mapping

that will make efficient usage of the available hardware resources [39, 21]. An architecture

that is able to accommodate such a high number of cores, satisfying the need for commu-

nication and data transfers, is the Network-on-Chip (NoC) architecture [5, 25]. For these

reasons Networks-on-Chip become a popular choice for designing the on-chip interconnect

for Systems-on-Chip (MPSoCs), and are supported from the industry (such as the Æthereal

1

2 CHAPTER 1. THREE-DIMENSIONAL NETWORK-ON-CHIP ARCHITECTURES

NoC [18] from Philips, the STNoC [55] from STMicroelectronics and an 80-core NoC from

Intel [57]). As it is presented in [43], the key design challenges of emerging NoC design are

a) the communication infrastructure, b) the communication paradigm selection and c) the

application mapping optimization.

The type of the IP cores (their characteristics, capabilities) as well as the topology and

interconnection scheme plays an important role on how efficiently an NoC will perform for a

certain application or set of applications. Furthermore, the application features (e.g., data

transfers, communication and computation needs) plays an equally important role in the

overall performance of the NoC system. For this reason, in order to take full advantage

of the hardware resources the NoC architecture should be able to accommodate efficiently

the applications’ needs providing an application-specific (an application-domain specific)

architecture. An overview of the cost considerations on the design of NoCs is given at [9].

Up to now NoC designs were limited to two dimensions. But the currently emerging 3D

integration technology exhibits, among others, two major advantages, namely higher perfor-

mance and smaller energy consumption [7]. A survey of existing 3D fabrication technologies

is presented in [8], showing the available interconnection architectures among the layers of

3D integrated circuits and illustrating the main research issues in current and future 3D

technologies. So, due to process / integration technology advancements, it is feasible to

design and manufacture NoCs that will expand to the third dimension (3D NoCs). In order

to satisfy the demands of emerging systems for scaling, performance and functionality 3D

integration is a way to accommodate these demands. For example, a considerable reduction

can be achieved in the number and length of global interconnect using three-dimensional

integration [27].

In this chapter we present an architectural exploration methodology designing alternative

3D NoC architectures. We define as 3D NoCs these architectures composed of many layers,

where each layer is a two-dimensional NoC grid, where the grids are the same for all the

layers, composed of elements of the same types. The main objective of the methodology is

to derive to heterogeneous 3D NoC topologies with a mix of 2D and 3D routers and vertical

link interconnection patterns that performs best to the incoming traffic. The cost factors we

1.2. RELATED WORK 3

consider are: (i) energy consumption; (ii) average packet latency and (iii) total switch block

area, and we perform comparisons against an NoC that all the routers are 3D ones. We

have employed and extended the Worm Sim NoC simulator [36], being able to model these

heterogeneous architectures and simulate them, gathering information on how they perform.

The NoC heterogeneity can be achieved using a mix of two- and three-dimensional routers

for each layer of the NoC, which implies to a “reduced” presence of vertical interconnection

links. The design methodology evaluates such heterogeneous topologies, targeting mesh and

torus ones, for various inputs and shown which ones can handle best the corresponding types

of traffic.

The rest of the chapter is organized as follows. In Section 1.2 the related work is described.

In Section 1.3 we present the 3D NoC topologies under consideration, whereas in Section 1.4

the proposed methodology is introduced. In Section 1.5 the simulation process and the

achieved results are presented. Finally, in Section 1.6 the conclusions are drawn and the

future work is outlined.

1.2 Related Work

The on-chip interconnection is a widely studied research field and good overviews are [16, 14],

illustrating the various interconnection schemes available for present ICs and emerging Mul-

tiprocessor Systems-on-Chip (MPSoC) architectures. The use of an NoC-based intercon-

nection is able to provide an efficient and scalable infrastructure that is going to be able to

handle the increased communication needs. Lee et al. [31] presented a quantitative evaluation

of 2D point-to-point, bus and NoC interconnection approaches. In this work an MPEG-2

implementation is studied, exploiting the aforementioned interconnection solutions, proved

that the NoC-based solution scales very well in terms of area, performance and power con-

sumption.

In order to support the NoC design, a number of simulators has been developed, such

as the Nostrum [35], Polaris [53], XPipes [13] and Worm Sim [36] using C++ and/or Sys-

temC [44]. To provide adequate input / stimuli to an NoC design, usually synthetic traffic is


used. Several synthetic traffic generators have been proposed [47, 54, 20, 50] that are able to

provide adequate inputs to NoC simulators for evaluation and exploring the communication

infrastructure designs.

In [41] a methodology that is able to synthesize NoC architectures where long-range links

are inserted on top of a mesh network is proposed. In this way the NoC is transformed to

an application specific one, serving best the traffic, but it is limited to two dimensions. Li et

al. [33] presented a mesh-based 3D network-in-memory architecture, using a hybrid NoC/bus

interconnection fabric to accommodate efficiently processors and L2 cache memories in 3D

NoCs. It is demonstrated that by using a 3D L2 memory architecture, better results are

achieved comparing with the two-dimensional designs.

In the work of Koyanagi et al. [30] a 3D integration technique of vertical stacking and

gluing together of several wafers is presented. Utilizing this technology the authors are able

to increase the wiring connectivity while reducing the number of long interconnections. A

fabricated three-dimensional shared memory is presented in [32]. The memory module has

three layers and can perform wafer stacking using the following technologies: i) formation of

buried interconnection; ii) micro-bumps; iii) wafer thinning; iv) wafer alignment and v) wafer

bonding. Another 3D integration scheme is proposed in [24], where wireless interconnections

are being employed in order to offer connectivity.

In [37] an overview of the available interconnect solutions for Systems-on-Chip (SoC)

are presented. This study includes interconnects for 3D ICs and shows that 3D integration

reduces the length of the longest global interconnects [28] and reduces the total required

wire length and thus the dissipated energy [26].

In the work of Benkart et al. [6] a overview of the 3D chip stacking technology using

through-chip interconnects is presented. In this work the trade-off between the high number

of vertical interconnects versus the circuit density is highlighted, since the through-silicon

vias occupy active chip area. Furthermore, Davis et al. in the work presented in [15] show

an implementation of an FFT in a 3D IC achieving 33% reduction in maximum wire length,

proving that the move to 3D ICs is beneficial. However, they highlight as limiting factors

the heat dissipation and yield.

1.2. RELATED WORK 5

The placement and routing in 3D integrated circuits is being studied in [2] and a system on

package solution for three-dimensional systems is presented in [34]. A big challenge remains

the heat dissipation of 3D circuits [22]. In order to tackle this several analysis techniques

have been proposed [23, 10, 48]. Another way to approach this issue is to perform thermal-

aware placement and mapping for 3D NoCs, such as the work presented in [3]. Furthermore,

the insertion of thermal vias can lower the chip temperature as illustrated in [19, 12].

In [42] a generalized NoC router model is presented, and then based on that the authors

perform NoC performance analysis. Using the aforementioned router model it is feasible

to perform NoC evaluation which is significantly faster than performing simulation. Ad-

ditionally, Pande et al. [45] present an evaluation methodology in order to compare the

performance and other metrics of a variety of NoC architectures. But, this comparison is

made only amongst two dimensional NoC architectures. The work of Feero and Pande, pre-

sented in [17], extends the aforementioned work considering 3D NoCs and illustrates that

the 3D NoCs are advantageous when compared to 2D ones (with both having the same

number of components in total). It is demonstrated that besides reducing the footprint in a

fabricated design, three-dimensional network structures provide a better performance com-

pared to traditional, two-dimensional architectures. This works shows that despite the face

of a small area penalty, 3D NoCs achieve significant gains in terms of energy, latency and

throughput.

In [46] Pavlidis and Friedman presented and evaluated various 3D NoC topologies, propos-

ing an analytic model for 3D NoCs. Mesh topologies are considered, modeling the zero-load

latency of them. The authors assumed 100% vertical interconnection vias focused on the

physical level study of these silicon vias. Kim et al. [29] presentd an exploration of commu-

nication architectures on 3D NoCs. A dimensionally-decomposed router and its comparison

with a hop-by-hop router connection and hybrid NoC-bus architecture is presented. The

aforementioned works, both from the physical level as well as adding more communication

architectures, such as full 3D crossbar and bus-based communication, are complementary to

the one presented here and can be used for extension of the methodology.

The main differentiator with the related work is that we do not assume full vertical


interconnection (as it is shown in Figure 1.1), but an heterogeneous interconnection fabric,

composed of a mix of 3D and 2D routers. Additional motivation for this heterogeneous

design is not only the reduced total interconnection network length but the reduced size of

the 2D routers have when compared to the 3D ones [17]. In this way, reducing the number

of vertical interconnection links the fabrication of the design is easier and more active chip

area is being used by the available logic / memory blocks. Two-dimensional routers are the

routers that have connections with the neighboring ones of the same grid. Whereas, when

we say that a router is a 3D one, it means that it has direct, hop-by-hop, connections not

only with the neighboring routers belonging to the same grid but also to the ones belonging

to the adjacent layers. This difference between two- and three-dimensional routers for a 3D

mesh NoC is illustrated in Figure 1.1, where it is shown a grid that belongs to a 3D NoC

and in that grid are present 2D and 3D routers.

1.3 Alternative Vertical Interconnection Topologies

We assume four different groups of interconnection patterns, as well as the ten vertical

interconnection patterns used in the context of this work. Considering a 3D NoC, where each

layer has dimensions x×y, and only K% of the routers can have connections to the vertical

direction as well (called 3D routers). The available scenarios of how these 3D routers can

be placed on a layer - grid are:

1. Uniform: distribution of the 3D routers over the different layers. Using this scheme

we “spread” the 3D routers along every layer of the 3D NoC. In order to find the place

of each router we work like this:

• first place the first 3D router of the (0, 0) position of each layer (z = 0,..., number

of grids),

• then the four neighboring 2D routers are placed in the positions (x+r+1, y, z),

(x-r-1, y, z), (x, y+r+1, z) and (x, y-r-1, z). The r parameter is defined

as:

r = b 1

K%− 1c (1.1)

1.3. ALTERNATIVE VERTICAL INTERCONNECTION TOPOLOGIES 7

and it represents the number of 2D routers among consecutive 3D ones. In Figure 1.1(b)

is illustrated this scheme, showing one layer of a 3D NoC, with K = 25%, meaning that

r = 3.

2. Center: All the “3D routers” are positioned at the center of each layer, as it is shown

in Figure 1.1(c). Since vertical interconnection links exist only in the center of the

layer, in the outer region of the NoC grid the routers are 2D ones, connecting only to

the neighboring routers of the same grid.

3. Periphery: The 3D routers are positioned at the periphery of the each layer (as it

is shown in Figure 1.1(d)), in a sense the opposite vertical interconnection link to

the scheme presented earlier. In this case, the NoC is focused in serving best the

communication needs of the outer cores.

4. Full Custom: The position of the 3D routers is fully customized matching perfectly

the needs of the application with the NoC architecture. This solution fits best the

needs of the application, while it minimizes the occupied area by the switching blocks,

by “reducing” the number of vertical links and thus the number of 3D routers. However,

derivation of a full custom solution requires high design-time cost, since this exploration

is going to be performed for every application. Furthermore, this will create a non-

regular design that will not adjust well in a potential change of the functionality, the

number of applications that are going to be executed, etc.

The aforementioned patterns were based in the work on 3D FPGAs by [52]. In order to

perform exploration towards full custom interconnection schemes real applications and/or

application traces are needed. In this chapter we have adopted various types of synthetic

traffic, so the exploration for full custom interconnections schemes is out of the scope. More

specifically we perform exploration towards pattern based vertical interconnection topologies

(categories 1-3). We have considered ten different vertical link interconnection topologies.

For each of these topologies the number of 3D routers is given and inside parenthesis the

corresponding K percentage, considering a 4× 4× 4 NoC architecture.

• Full: where all the routers of the NoC are 3D ones (number of 3D routers: 64 (100%)).


(a) Full vertical interconnection (100%) for a 3D NoC. (b) Uniform distribution of vertical links.

(c) Positioning of vertical links at the center of the NoC. (d) Positioning of the vertical links at the periphery of theNoC.

3D Router 2D RouterProcessing

NodeInterconnection

Link

(e) Legend:

Figure 1.1: Positioning of the vertical interconnection links, for each layer of the 3D NoC(each layer is a 6× 6) grid.

1.4. OVERVIEW OF THE EXPLORATION METHODOLOGY 9

• Uniform based: pattern based topologies with r value equals to three (by three pattern,

as shown in Figure 1.1(b)), four (by four) and five (by five). Correspondingly the number

of 3D routers is: 44 (68.75%), 48 (75%) and 52 (81.25%).

• Odd: In this pattern all the routers belonging to the same row are of the same type.

Two adjacent rows never have the same type of router (number of 3D routers: 32 (50%)).

• Edges: Where the center (dimensions x×y) of the 3D NoC has only 2D routers (number

of 3D routers: 48 (75%)).

• Center: Where only the center (dimensions x×y) of the 3D NoC has 3D routers (number

of 3D routers: 16 (25%)).

• Side based: Where a side (e.g., outer row) of each layer has 2D routers. Patterns

evaluated had one (one side), two (two side), or three (three side) sides as “2D only”.

The number of 3D routers for each pattern is 48 (75%), 36 (56.25%) and 24 (37.5%)

correspondingly.

Each of the aforementioned vertical interconnection schemes has advantages and disad-

vantages and how these schemes perform is based on the behavior of the applications that

are implemented on the NoC. As it is explained in the experimental results (Section 1.5) a

wrong choice may diminish the gains of using a 3D architecture.

1.4 Overview of the Exploration Methodology

An overview of the proposed methodology is shown in Figure 1.2. In order to perform the

exploration for alternative topologies of 3D NoC architectures, we have used as a basis the

Worm Sim NoC Simulator [36] that utilizes wormhole switching [40] (this is the center block

in Figure 1.2).

In order to support 3D architectures / topologies, we have extended this simulator, adapt-

ing the provided routing schemes, and assuming compatibility with the Trident traffic for-

mat [54]. As it is shown in Figure 1.2 now the simulator supports 3D NoC architectures (3D

Mesh and 3D Torus – as shown in Figure 1.3) and vertical link interconnection patterns.


3D NoC Architectures

Existing tools Extensions Output of new tools

NoCSimulator

Metrics:- Latency- Energy:

- link energy- crossbar energy- router energy- …- Total energy consumption

2D NoC ArchitecturesMeshTorusFat tree...

Routing Schemesxyodd-even...

3D routing(adaptation of existing alg.)

StimuliSynthetic traffic

- uniform- transpose- hotspot

Real application traffic

Vertical link interconnection patterns

Figure 1.2: An overview of the exploration methodology of alternative topologies for 3DNetworks-on-Chip.

1.4. OVERVIEW OF THE EXPLORATION METHODOLOGY 11

Legend:

a) 3D mesh b) 3D torus

Figure 1.3: 3D NoC architectures: (a) Mesh and (b) Torus.

Each of these 3D architectures is composed of many grids, with each grid in turn from tiles

that are connected to each other using mesh and torus interconnection networks. Each tile

is composed of a processing core and a router. Since we are considering 3D architectures the

router is connected to the six neighboring tiles and its local processing core via channels,

consisting of two one-directional point-to-point links.

The NoC simulator can be configured using these parameters (as it is shown in Figure 1.2):

1. The NoC architecture (two- or three-dimensional mesh and torus architectures, as well as

defining the specific grid size (x and y parameters) and number of layers (z parameter).

2. The type of input traffic (uniform, transpose or hotspot) as well as how heavy the traffic

load will be.

3. The routing scheme.

4. The vertical link configuration file, which defines where vertical links are present or not.

5. The router model as well as the models used in order to calculate the energy and delay

figures.

The output of the simulation is a log file containing the relevant cost factors we evaluate,

such as overall latency, average latency per packet and the energy breakdown of the NoC,

providing numbers for link energy consumption, crossbar and router energy consumption


etc. From these energy figures we calculate the total energy consumption of the 3D NoCs.

The 3D architectures to be explored may have a mix of two- and three-dimensional routers,

ranging from very few 3D routers to only 3D routers (100% vertical interconnection link

presence). In order to steer the exploration we are based on different patterns (as they were

presented in Section 1.3. The proposed 3D NoCs can be constructed by placing a number of

identical two-dimensional NoCs on individual layers, providing communication by inter-layer

vias among vertically adjacent routers. This means that the position of silicon vias is exactly

the same for each layer. Hence, the router configuration is extended to the third dimension,

while the structure of the individual logic blocks (IP cores) remains unchanged.

1.5 Evaluation – Experimental Results

The main objective of the methodology and the exploration process is to find alternative

irregular 3D Network-on-Chip topologies with a mix of two- and three-dimensional routers,

exhibiting vertical link interconnection patterns that perform best to the incoming traffic.

Our primary cost function is the energy consumption, with the other cost factors being the

average packet latency and total switch block area. We compare these patterns against the

fully vertically interconnected 3D NoC as well the 2D one (with all having the same number

of nodes).

1.5.1 Experimental Setup

The three-dimensional router uses as a switching fabric a 7× 7 crossbar switch, whereas the

two-dimensional one uses as a switching fabric a 5 × 5 crossbar switch. Additionally, each

router has a routing table and based on the source/destination address, the routing table

decides which output link the packet should deliver to. The routing table is being built using

the algorithm described in Figure 1.4.

As an energy model the NoC simulator is using the Ebit model, proposed in [58]. We

1.5. EVALUATION – EXPERIMENTAL RESULTS 13

make the assumption (based on the work presented in [49]) that the vertical communication

links between the layers are electrically equivalent to horizontal routing tracks with the same

length. In this way we consider that the energy consumption of a vertical link between two

routers is the same one as the consumption of a link between two neighboring routers of the

same layer (if they have the same length).

More specifically and based on the fact that the 3D integration technology, which provides

communication among layers using through silicon vias (TSVs), has not been explored suf-

ficiently yet, careful design of systems that employ such interconnection is required. Due to

the large variation of the 3D TSV parameters, such as diameter, length, dielectric thickness,

and fill material among alternative process technologies, a wide range of measured resis-

tances, capacitances, and inductances have been reported in the literature. Typical values

for the size (diameter) of TSVs is about 4× 4µm, with a minimum pitch around 8− 10µm,

while their total length starting from plane T1 and terminating on plane T3 is 17.94µm,

implying wafer thinning of planes T2 and T3 to approximately 10− 15µm [38, 56, 1].

The different TSV fabrication processes lead to a high variation in the corresponding

electrical characteristics. More specifically, from the state-of-the art solutions that can be

found in relevant literature, the resistance of a single 3D via varies from 20mΩ to as high

as 600mΩ [56, 1], with a feasible value (in terms of fabrication) around 30mΩ. Regarding

the capacitances of these vias, their value in literature vary from 40fF to over 1pF [4], with

feasible value for fabrication to be around 180fF . In the context of this work we assume a

resistance of 350mΩ and capacitance of 2.5fF .

Using our extended version of the NoC simulator, we have performed simulations involv-

ing a 64-node and a 144-node architecture with 3D mesh and torus topologies with synthetic

traffic patterns. The configuration files describing the corresponding link patterns are sup-

plied to the simulator as an input. The sizes of the 3D NoCs we simulated were 4 × 4 × 4

and 6 × 6 × 4, whereas the equivalent 2D ones were the 8 × 8 and 12 × 12. We have used

three types of input (synthetic traffic) and three traffic loads (heavy, normal and low):

• Uniform: Where we have uniform distribution of the traffic across the NoC with the

nodes receiving approximately the same number of packets.


• Transpose: In this traffic scheme packets originating from node a, b, c have as des-

tination the node (X - a, Y - b, Z - c), where X, Y, Z are the dimensions of the NoC.

• Hotspot: Where some nodes (a minority) receive increased number of packets (in our

case it was at least 100% increased) than the majority of the rest of the nodes (which

they receive packets in a uniform manner). The hotspot nodes in the 2D grids are

positioned in the middle of every quadrant, where the size of the quadrant is specified

by the dimensions of each layer in the 3D NoC architecture under simulation. Whereas,

in the 3D NoC, a hotspot is located in the middle of each layer.

We have used three routing schemes present in Worm Sim [36], and extended them in

order to function in a 3D NoC:

• XYZ-OLD: Where it is a extended version on XY routing.

• XYZ: It is based on XY routing but this variation checks which direction has lower

delay and takes the one with the lower delay.

• ODD-EVEN: This is the odd-even routing scheme presented in [11]. In this scheme

the packets take some turns is order to avoid deadlock situations.

From the simulations performed we have extracted figures regarding the energy con-

sumption (in J) and the average packet latency (in cycles). Additionally, for each vertical

interconnection pattern, as well as for the 2D NoC we have the occupied area of the switching

block, based in the gate equivalent of the switching fabric presented in [17]. A good design

is one that exhibits lower values in the aforementioned metrics when compared to the 2D

NoC as well to the 3D NoC which has full vertical connectivity (all the routers are 3D ones).

Furthermore, all the simulation measurements were taken for the same number of cycles the

network was operational (200,000 cycles).

1.5.2 Routing Procedure

Furthermore, we have modified the routing procedure, as shown in Figure 1.4 (valid for

all routing schemes) in order to be able to route packets over the 3D topologies. This


1: function RoutingXYZ2: src : type Node; // this is the source node3: dst : type Node; // this is the destination node (final)4:

5: findCoordinates(); // returns src.x, src.y, src.z, dst.x, dst.y and dst.z6:

7: for all layer ∈ NoC do8: if packet passes from layer then9: findTmpDestination(); // find a temporary destination of the packet for each

layer of the NoC that the packet passes from10: end if11: end for12: while tmpDestination NOT dst do // if we have not reached the final destination...13: packet.header = tmpDestination;14: end while15: end function

16: function findTmpDestination // for each layer that the packet is going to traverse17: tmpDestination.x = dst.x18: tmpDestination.y = dst.y19: tmpDestination.z = src.z // for xyz routing20:

21: for all validNodes ∈ layer do22: if link NOT valid then // if vertical link does not exist. This information is

obtained through the vertical interconnections patterns input file.23: newLink = computeManhattanDistance(); // returns the position of a verical

link with the smallest Manhattan distance24: tmpDestination = newLink;25: else26: tmpDestination = link;27: end if28: end for29: end function

Figure 1.4: Routing algorithm modifications. (The // denote a comment in the algorithm)

modification allows the customization of the routing scheme in order to efficiently cope with

the heterogeneous topologies, based on vertical link connectivity patterns.

The steps of the routing algorithm are:

1. For each packet we know the source and destination nodes (lines 2 and 3) and we can

find the positions of these nodes in the topology. The on-chip “coordinates” of the

nodes are: for the destination one are dst.x, dst.y, dst.z and for the source one are

src.x, src.y, src.z (line 5).

2. By doing so we can formulate the temporary destinations, that is one temporary des-


tination per layer. More specifically, for the number of layers a packet has to traverse

in order to arrive to its final destination, the algorithm sets the route to a temporary

destination located at position dst.x, dst.y, src.z initially (lines 17-19). The algo-

rithm takes under consideration the “direction” of the packet is going to follow across

the layers (i.e., if it is going to an upper or lower layer according to its “source” layer)

and finds the nearest valid link at each layer. This process has as an outcome to update

properly the z coefficient of the temporary destination’s position (lines ). Valid link

is every vertical interconnection link available in the layer that the packet traverses.

This information is obtained through the vertical interconnection patterns file. A link

is uniquely identified by the node that is connected to and its direction. So, for all the

specified valid links that are located at the same layer with the header flit of the packet

check if it matches with the desired for the route to the destination up/down link.

3. If there is no match between them, compute the Manhattan distance (line 23) (in the

case of 3D torus topology we have modified it in order to produce the correct Manhattan

distance between the two nodes).

4. Finally, the valid link with the smallest Manhattan distance is chosen and its corre-

sponding node is chosen to be the temporary destination at each layer the packet is

going to traverse (lines 24-26).

5. After finding a set of temporary destinations (each one located at a different layer), they

are stored into the header flit of the packet (line 13). The aforementioned temporary

destinations may or may not be used, as the packet is being routed during the simulation,

so they are “candidate” temporary destinations. The criterion of being just a candidate

or the actual destination per layer is specified according to a set of vertical links that

exhibited relatively high utilization during a previous simulation with the same network

parameters and setting the desired minimum link communication volume or according

to a given vertical link pattern as they were presented at Section 1.1.

Since the modification of the algorithm is composed of a check if a vertical link exists in

the temporary destination of the packet, and if not find the closest router with such a link,

we manage to keep the routing complexity low.


90%

100%

110%

120%

130%

140%

150%

160%

Normalized

Laten

cy

Latency behavior for 64‐node NoCs (torus topology , xyz routing)

Hotspot (heavy) Hotspot (normal) Hotspot (low) Transpose (heavy) Transpose (normal)

Transpose (low) Uniform (heavy) Uniform (normal) Uniform (low)

Figure 1.5: Impact of traffic load on 2D and 3D NoCs (for all different types of traffic used).

1.5.3 Impact of Traffic Load

On top of the traffic schemes three different traffic loads were used (heavy, medium/normal,

low). In this way, by altering the packet generation rate it is possible to test the performance

of the NoC. The heavy load has 50% increased traffic, whereas the low one has 90% decreased

traffic compared to the medium one respectively. The behavior of the NoCs in terms of the

average packet latency is shown in Figure 1.5. In this Figure the latency is normalized using

as basis the average packet latency of the full connectivity 3D NoC under medium load and

for each traffic scheme. It can be seen the impact of the traffic load (latency increases as

the load increases) and that the NoCs can cope with the increased traffic as well as the

differences between the different traffic schemes.

Mesh topologies exhibit similar behavior, though the latency figures are higher due to

the decreased connectivity when compared to torus topologies. This is shown in Figure 1.6

where the latency of 64-node mesh and torus NoCs are compared (the basis for the latency


50%

100%

150%

200%

250%

300%

350%

400%

Normalized

Laten

cy

Latency behavior for 64‐node mesh and torus NoCs (uniform traffic, xyz routing) Mesh (heavy ‐ uniform)

Mesh (medium ‐ uniform)

Mesh (low ‐ uniform)

Torus (heavy ‐ uniform)

Torus (medium ‐ uniform)

Torus (low ‐ uniform)

Figure 1.6: Impact of traffic load on 2D and 3D mesh and torus NoCs (for uniform traffic).

normalization is the average packet latency of the full connectivity 3D torus. From this

comparison it is shown that the mesh topologies have an increased packet latency of 34%

compared to the torus ones (for the same traffic scheme, load and routing algorithm).

1.5.4 3D NoC Performance under Uniform Traffic

In Figure 1.7 the results of employing a non-fully vertical link connectivity to 3D mesh

networks by using uniform traffic, medium load and xyz-old routing are presented. We make

a comparison of the total energy consumption, average packet latency, total area of the

switching blocks (routers)and the percentage of 2D routers (having 5 I/O ports instead of

7) under 4 × 4 × 4 (Figure 1.7(a)) and 6 × 6 × 4 (Figure 1.7(b)) mesh architecture. In the

x-axis all the interconnection patterns are presented. In the y-axis, in a normalized manner

(used as basis the figures of the full vertically interconnected 3D NoC), the cost factors for


total energy consumption, average packet latency, total switching block area and percentage

of vertical links are presented.

The advantages of 3D NoCs when compared to 2D ones are shown in Figure 1.7(a). In this

case the 8× 8 mesh dissipates 39% more energy and has 29% higher packet delivery latency.

However, since its switching area is 71% of the area of the fully interconnected 3D NoC,

since all its routers are 2D ones.Employing the by five link pattern results in 3% reduction

in energy and 5% increase in latency. In this pattern only 81% of the routers are 3D ones so

we have a reduce area of the switching logic by 5%.Moving to bigger dimensions and as it

can been seen from Figure 1.7(b) more patterns exhibit better results. It is worth noticing

that the overall performance of the two-dimensional NoC significantly decrease, exhibiting

around 50% increase in energy and latency.

When we increase the traffic load by increasing the packet generation rate by 50% we

see that all patterns have a worst behavior than the one of the full connectivity 3D NoC.

The reason is that by using a pattern-based 3D NoC we decrease the number of 3D routers,

decreasing the number of vertical links, thus reducing the connectivity within the NoC. As it

is expected this reduced connectivity has a negative impact in cases where there is increased

traffic.

In the case that there is a low traffic load in the NoC the patterns can become beneficial

since there are not that high needs for communication resources. This effect is illustrated in

Figure 1.8. In this Figure are presented the experimental results for a 64- and 144-node 2D

and 3D NoCs under low uniform traffic and xyz routing.The exception is the edges pattern

in the 64-node 3D NoC (Figure 1.8(a)), where all the 3D routers reside in the edges of each

layer of the 3D NoC. This results in a 7% increase in the packet latency.Again it is worth

noticing that as the NoC dimensions increase the performance of the 2D NoC decreases. This

can be clearly seen in Figure 1.8(b) where the 2D NoC has 38% increased energy dissipation

and 37%.

We have also compared the performance of the proposed approach against that achievable

with a torus network, which provides wrap around links added in a systematic manner.

Note that the vertical links connecting the bottom with the upper layers are not removed,


0%

20%

40%

60%

80%

100%

120%

140%

64 node 2D and 3D NoCs (uniform traffic, medium load, xyz_old routing) Norm.Energy

Norm.Latency

Norm.Area

#Links

(a) Experimental results for a 4× 4× 4 3D Mesh.

0%

20%

40%

60%

80%

100%

120%

140%

160%

144 node 2D and 3D NoCs (uniform traffic, medium load, xyz_old routing)

Norm.Energy

Norm.Latency

Norm.Area

#Links

(b) Experimental results for a 6× 6× 4 3D Mesh.

Figure 1.7: Uniform traffic on a 3D NoC for alternative interconnection topologies.


0%

20%

40%

60%

80%

100%

120%

140%

64 node 2D and 3D NoCs (uniform traffic, low load, xyz routing) Norm.Energy

Norm.Latency

Norm.Area

#Links


0%

20%

40%

60%

80%

100%

120%

140%

144 node 2D and 3D NoCs (uniform traffic, low load, xyz routing) Norm.Energy

Norm.Latency

Norm.Area

#Links


Figure 1.8: Uniform traffic on a 3D NoC for alternative interconnection topologies.


as this is the additional feature of the torus topology when compared to the mesh. Our

simulations show that using the transpose traffic scheme, the vertical link patterns exhibit

notable results, and this is goes better and better as the dimensions of the NoC get bigger.

The explanation is that the flow of packets between a source and a destination is following

a diagonal course among the nodes at each layer and this is also true the source-destination

pair in 3D topologies, and this is where the wrap around links of the torus topology play a

significant role in non reducing the performance even we remove some vertical links. And

the results show that the bigger the dimensions of the NoC are, the energy savings also

get bigger when the link patterns are applied. But, this is not true for the case of mesh

topology. In particular, in the 6 × 6 × 4 3D torus architecture, using the by five, by four,

by three, one side, two side patterns show better results as long as the energy consumption

is concerned, for instance, the two side exhibit 7.5% energy savings and increased latency

32.84 cycles relatively to the 30 cycles of the fully vertical connected 3D torus topology.

1.5.5 3D NoC Performance under Hotspot Traffic

In the case of hotspot traffic (Figure 1.9), testing the 4× 4× 4 3D mesh architecture, seven

out of nine link patterns perform better relatively to the fully vertical connected topology.

For instance, the two side pattern exhibits 2% decrease in network energy consumption

whereas the increase in latency is 2.5 cycles, note that only 56.25% of the vertical links are

present. The hotspot traffic in 3D mesh topologies favors of cube topologies (for example

6× 6× 6), even so, in 6× 6× 4 mesh architecture the center and two side patterns exhibit

similar performance regarding average cycles per packet compared to that of fully vertical

connected architecture (that was expected due to the location where the hotspot nodes were

positioned).

In Figure 1.10 the simulation results for the two 3D NoC architectures when triggered by a

hotspot-type traffic are presented. In Figure 1.10(a) the results for the mesh architecture and

in Figure 1.10(b) the results for the torus architecture are presented respectively, showing

gains in energy consumption and area, with a negligible penalty in latency. Again the

architectures where congestion is experienced are highlighted.


0%

20%

40%

60%

80%

100%

120%

140%64 node 2D and 3D NoCs (hotspot traffic, low load, xyz routing) Norm.Energy

Norm.Latency

Norm.Area

#Links


0%

20%

40%

60%

80%

100%

120%

140%

160%

144 node 2D and 3D NoCs (hotspot traffic, low load, xyz routing) Norm.Energy

Norm.Latency

Norm.Area

#Links


Figure 1.9: Hotspot traffic on a 3D NoC for alternative interconnection topologies.


0%

20%

40%

60%

80%

100%

120%

140%

160%

64 node 2D and 3D NoCs (hotspot traffic, medium load, odd‐even routing)

Norm.Energy

Norm.Latency

Norm.Area

#Links


0%

20%

40%

60%

80%

100%

120%

140%

64 node 2D and 3D NoCs (hotspot traffic, medium load, xyz‐old routing)Norm.Energy

Norm.Latency

Norm.Area

#Links

(b) Experimental results for a 4× 4× 4 3D Torus.

Figure 1.10: Hotspot traffic on a 3D NoC for alternative interconnection topologies.


These results are also compared to their equivalent 2D architectures. For the 8 × 8 2D

NoC (same number of cores to the 4×4× 4 architecture) it shows 25% increased latency and

40% increased energy compared to one side link pattern, whereas the 12× 12 (same number

of cores to the 6× 6× 4 architecture) mesh shows 46% increase in latency and 49% increase

in energy consumption compared to the same pattern using uniform traffic. In addition,

comparing the by four pattern on 64-nodes architecture under transpose traffic shows 31%

and 18% reduced latency and total network consumption, respectively. Whereas, in the case

of hotspot traffic and employing the two side link pattern, these numbers change to 24%

reduced latency and 56% reduced energy consumption.

1.5.6 3D NoC Performance under Transpose Traffic

Under the transpose traffic scheme, when the by four link pattern is adopted it shows 6.5%

decrease in total network energy’s consumption at the expense of three cycles increased

latency. In Figure 1.11 the simulation results for the 3D 4× 4× 4 mesh and 6× 6× 4 torus

NoCs are presented for transpose type of traffic. From the Figure 1.11(a) we can see that

we have a 4% gain in the energy consumption of the 3D NoCs with a 5% increase in the

packet latency. Additionally we gain 6% in the area occupied by the switching blocks of

the NoC. Comparing these patterns to the 2D NoC (having the same number of nodes) we

can have on average a 14% decrease in energy consumption, a 33% decrease in total packet

latency. But, on the area the cost of the 3D NoC is higher by 23%. From the Figure 1.11(b)

we can see that the 2D NoC experiences traffic contention and not being able to cope with

that amount of traffic (the actual value of the latency is close to 5000 cycles per packet).

Additionally, 47% gains achieved in energy consumption. When this torus architecture is

compared to the “full” 3D one, it shows 5% gains in energy consumption with 8% increase

in the latency and 9% reduces switching block area.


0%

50%

100%

150%

200%

250%

300%

64 node 2D and 3D NoCs (transpose traffic, medium load, xyz‐old routing) Norm.Energy

Norm.Latency

Norm.Area

#Links


0%

20%

40%

60%

80%

100%

120%

140%

160%

180%

200%

144 node 2D and 3D NoCs (transpose traffic, medium load, xyz‐old routing)

Norm.Energy

Norm.Latency

Norm.Area

#Links

(b) Experimental results for a 6× 6× 4 3D Torus.

Figure 1.11: Transpose traffic on a 3D NoC for alternative interconnection topologies.


0%

20%

40%

60%

80%

100%

120%

140%

160%

180%

Link Crossbar Router Arbiter BufferRead BufferWrite

Energy Dissipation Breakdown

8x8 / mesh by_five by_four by_three center edges odd one_side three_side two_side full_connectivity

Figure 1.12: An overview of the energy breakdown in a 3D NoC (4×4× 4 3D mesh, uniformtraffic, xyz-old routing).

1.5.7 Energy Dissipation Breakdown

What it can be seen from studying the analytical results derived from Ebit [58] energy

model, is that the link’s, crossbar’s, arbiter’s, buffer’s read energy consumption gets smaller

in exchange with an increase in the energy consumed when writing to the buffer and by the

router’s routing engine. On average the link energy consumption accounts for the 8%, the

crossbar’s energy for the 6%, the buffer’s read energy for the 23% and the buffer’s write

energy for the 62% of the total energy respectively. The normalized results about the energy

consumption for a uniform traffic on a 4× 4× 4 NoC are presented in Figure 1.12.

1.5.8 Summary

A summary of the experimental results is presented in Table 1.1. There the energy and

latency values that were obtained are compared to the ones of the 3D mesh full vertically

interconnected NoC. The three types of traffic are shown in the first column. The next two


Table 1.1: Experimental results: min-max impact on costs (energy and latency) with mediumtraffic load.

min max min maxUniform 92% 108% 98% 113%

Transpose 88% 116% 100% 354%

Hotspot 71% 116% 100% 134%

Min‐Max impact on costs(energy and latency)

NormalizedEnergy Latency

Traffic Patterns

columns present the gains (min. values to max. values – in %) for the energy dissipation. The

forth and fifth columns show the min-max values for the average packet latency respectively.

As it can been seen energy reduction can be achieved up to 29%. But gains in energy

dissipation cannot be reached without paying a penalty in average packet latency. It is the

responsibility of the designer, utilizing this exploration methodology, to choose such a 3D

NoC topology and vertical interconnection patterns that meets best the requirements of the

system.

1.6 Conclusions

Networks-on-Chip are becoming more and more popular as a solution able to accommo-

date large numbers of IP cores, offering an efficient and scalable interconnection network.

Three-dimensional NoCs are taking advantage of the progress of integration and packaging

technologies offering advantages when compared to 2D ones. Existing 3D NoCs assume that

every router of a grid can communicate directly with the neighboring routers of the same grid

and with the ones of the adjacent layers. This communication can be achieved by employing

wire bonding, microbumb or through-silicon vias [15].

All of these technologies have their advantages and disadvantages. Reducing the number

of vertical connections make the design and final fabrication of 3D systems easier. The goal

of the proposed methodology is to find heterogeneous 3D NoC topologies with a mix of 2D

1.6. CONCLUSIONS 29

and 3D routers and vertical link interconnection patterns that performs best to the incoming

traffic. In this way the exploration process evaluates the incoming traffic and the intercon-

nection network, proposing an incoming traffic-specific alternative 3D NoC. Aiming at that

direction we have presented a methodology that shows that by employing an alternative

3D NoC vertical link interconnection network, in essence proposing a NoC with less vertical

links, we can achieve gains in energy consumption (up to 29%), in the average packet latency

(up to 2%) and in the area occupied by the routers of the NoC (up to 18%).

Extensions of this work could include not only more heterogeneous 3D architectures but

also different router architectures, providing better adaptive routing algorithms and perform-

ing further customizations targeting heterogeneous NoC architectures. In this way it would

be able to create even more heterogeneous 3D NoCs. For providing stimuli to the NoCs

a move towards using real applications would be useful, apart from using even more types

of synthetic traffic. By doing so, it would become feasible to propose application-domain-

specific 3D NoC architectures.

Acknowledgments

The authors would like to thank Dr. Antonis Papanikolaou (IMEC vzw., Belgium) for his

helpful comments and suggestions. This research is supported by the 03ED593 research

project, implemented within the framework of the “Reinforcement Program of Human Re-

search Manpower” (PENED) and co-financed by National and Community Funds (75% from

E.U.-European Social Fund and 25% from the Greek Ministry of Development - General Sec-

retariat of Research and Technology).

References

[1] L. Shi D. Frank K. Bernstein S. Steen A. Kumar G. Singco A. Young K. Guarini

A. Topol, D. La Tulipe and M. Ieong. Techniques for producing 3d ics with high-density

interconnect. In VLSI Multi-Level Interconnection Conference, 2004.


[2] Cristinel Ababei, Yan Feng, Brent Goplen, Hushrav Mogal, Tianpei Zhang, Kia

Bazargan, and Sachin Sapatnekar. Placement and routing in 3D integrated circuits.

IEEE Des. Test, 22(6):520–531, 2005.

[3] C. Addo-Quaye. Thermal-aware mapping and placement for 3-D NoC designs. In Proc.

of IEEE SOC, pages 25–28, 2005.

[4] Syed M. Alam, Robert E. Jones, Shahid Rauf, and Ritwik Chatterjee. Inter-strata

connection characteristics and signal transmission in three-dimensional (3d) integration

technology. In ISQED ’07: Proceedings of the 8th International Symposium on Qual-

ity Electronic Design, pages 580–585, Washington, DC, USA, 2007. IEEE Computer

Society.

[5] L. Benini and G. de Micheli. Networks on chips: a new SoC paradigm. Computer,

35(1):70–78, 2002.

[6] Peter Benkart, Alexander Kaiser, Andreas Munding, Markus Bschorr, Hans-Joerg Pflei-

derer, Erhard Kohn, Arne Heittmann, Holger Huebner, and Ulrich Ramacher. 3D chip

stack technology using through-chip interconnects. IEEE Des. Test, 22(6):512–518,

2005.

[7] E. Beyne. 3D system integration technologies. In International Symposium on VLSI

Technology, Systems, and Applications, pages 1–9, April 2006.

[8] E. Beyne. The rise of the 3rd dimension for system integration. In Proc. of International

Interconnect Technology Conference, pages 1–5, June 5-7, 2006.

[9] Evgeny Bolotin, Israel Cidon, Ran Ginosar, and Avinoam Kolodny. Cost considerations

in network on chip. Integr. VLSI J., 38(1):19–42, 2004.

[10] Ting-Yen Chiang, S.J. Souri, Chi On Chui, and K.C. Saraswat. Thermal analysis of

heterogeneous 3D ICs with various integration scenarios. In Proc. of International

Electron Devices Meeting, 2001.

[11] Ge-Ming Chiu. The odd-even turn model for adaptive routing. IEEE Trans. Parallel

Distrib. Syst., 11(7):729–738, 2000.

[12] J. Cong and Yan Zhang. Thermal via planning for 3-D ICs. In Proceedings of the

2005 IEEE/ACM International conference on Computer-aided design, pages 745–752,

Washington, DC, USA, 2005. IEEE Computer Society.

[13] Matteo Dall’Osso, Gianluca Biccari, Luca Giovannini, Davide Bertozzi, and Luca

1.6. CONCLUSIONS 31

Benini. xPipes: a latency insensitive parameterized network-on-chip architecture for

multi-processor SoCs. In Proc. of ICCD. IEEE Computer Society, 2003.

[14] William Dally and Brian Towles. Principles and Practices of Interconnection Networks.

Morgan Kaufmann Publishers Inc., 2003.

[15] W. Rhett Davis, John Wilson, Stephen Mick, Jian Xu, Hao Hua, Christopher Mineo,

Ambarish M. Sule, Michael Steer, and Paul D. Franzon. Demystifying 3D ICs: The

pros and cons of going vertical. IEEE Des. Test, 22(6):498–510, 2005.

[16] Jose Duato, Sudhakar Yalamanchili, and Ni Lionel. Interconnection Networks: An

Engineering Approach. Morgan Kaufmann Publishers Inc., 2002.

[17] Brett Feero and Partha Pratim Pande. Performance evaluation for three-dimensional

networks-on-chip. In Proc. of ISVLSI, pages 305–310, 2007.

[18] Kees Goossens, John Dielissen, and Andrei Radulescu. Æhereal network on chip: Con-

cepts, architectures, and implementations. IEEE Des. Test, 22(5):414–421, 2005.

[19] Brent Goplen and Sachin Sapatnekar. Thermal via placement in 3D ICs. In Proceedings

of the 2005 international symposium on Physical design, pages 167–174. ACM, 2005.

[20] Wim Heirman, Joni Dambre, and Jan Van Campenhout. Synthetic traffic generation as

a tool for dynamic interconnect evaluation. In Proc. of SLIP, pages 65–72. ACM Press,

2007.

[21] Jingcao Hu and R. Marculescu. Energy- and performance-aware mapping for regular

noc architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits

and Systems, 24(4):551–562, 2005.

[22] Hao Hua, Chris Mineo, Kory Schoenfliess, Ambarish Sule, Samson Melamed, Ravi

Jenkal, and W. Rhett Davis. Exploring compromises among timing, power and tem-

perature in three-dimensional integrated circuits. In Proceedings of the 43rd annual

conference on Design automation, pages 997–1002. ACM, 2006.

[23] Sungjun Im and K. Banerjee. Full chip thermal analysis of planar (2-D) and vertically

integrated (3-D) high performance ICs. In International Electron Devices Meeting,

IEDM Technical Digest., pages 727–730, 2000.

[24] A. Iwata, M. Sasaki, T. Kikkawa, S. Kameda, H. Ando, K. Kimoto, D. Arizono, and

H. Sunami. A 3D integration scheme utilizing wireless interconnections for implementing

hyper brains. 2005.


[25] Axel Jantsch and Hannu Tenhunen, editors. Networks on chip. Kluwer Academic

Publishers, 2003.

[26] JW Joyner and JD Meindl. Opportunities for reduced power dissipation using three-

dimensional integration. In Proceedings of the IEEE 2002 International Interconnect

Technology Conference, pages 148–150. IEEE, 2002.

[27] J.W. Joyner, R. Venkatesan, P. Zarkesh-Ha, J.A. Davis, and J.D. Meindl. Impact of

three-dimensional architectures on interconnects in gigascale integration. IEEE Trans-

actions on Very Large Scale Integration (VLSI) Systems, 9(6):922–928, Dec. 2001.

[28] JW Joyner, P. Zarkesh-Ha, JA Davis, and JD Meindl. A three-dimensional stochastic

wire-length distribution forvariable separation of strata. In Proceedings of the IEEE

2000 International Interconnect Technology Conference, pages 126–128. IEEE, 2000.

[29] Jongman Kim, Chrysostomos Nicopoulos, Dongkook Park, Reetuparna Das, Yuan Xie,

Vijaykrishnan Narayanan, Mazin S. Yousif, and Chita R. Das. A novel dimensionally-

decomposed router for on-chip communication in 3D architectures. In Proc. of ISCA,

pages 138–149. ACM Press, 2007.

[30] Mitsumasa Koyanagi, Hiroyuki Kurino, Kang Wook Lee, Katsuyuki Sakuma, Nobuaki

Miyakawa, and Hikotaro Itani. Future system-on-silicon lsi chips. IEEE Micro, 18(4):17–

22, 1998.

[31] Hyung Gyu Lee, Naehyuck Chang, Umit Y. Ogras, and Radu Marculescu. On-chip com-

munication architecture exploration: A quantitative evaluation of point-to-point, bus,

and network-on-chip approaches. ACM Trans. Des. Autom. Electron. Syst., 12(3):23,

2007.

[32] KW Lee, T. Nakamura, T. Ono, Y. Yamada, T. Mizukusa, H. Hashimoto, KT Park,

H. Kurino, and M. Koyanagi. Three-dimensional shared memory fabricated using wafer

stackingtechnology. Electron Devices Meeting, 2000. IEDM Technical Digest. Interna-

tional, pages 165–168, 2000.

[33] Feihui Li, Chrysostomos Nicopoulos, Thomas Richardson, Yuan Xie, Vijaykrishnan

Narayanan, and Mahmut Kandemir. Design and management of 3D chip multiprocessors

using network-in-memory. In Proc. of ISCA, pages 130–141. IEEE Computer Society,

2006.

[34] Sung Kyu Lim. Physical design for 3D system on package. IEEE Des. Test, 22(6):532–

1.6. CONCLUSIONS 33

539, 2005.

[35] Zhonghai Lu, Rikard Thid, Mikael Millberg, Erland Nilsson, and Axel Jantsch. NNSE:

Nostrum network-on-chip simulation environment. In Proc. of SSoCC, April 2005.

[36] Radu Marculescu, Umit Y. Ogras, and Nicholas H. Zamora. Computation and commu-

nication refinement for multiprocessor SoC design: A system-level perspective. In Proc.

of DAC, pages 564–592. ACM Press, 2004.

[37] J.D. Meindl. Interconnect opportunities for gigascale integration. IEEE Micro, 23(3):28–

35, 2003.

[38] MIT Lincoln Labs. Mitll low-power fdsoi cmos process design guide. September 2006.

[39] Srinivasan Murali and Giovanni De Micheli. Bandwidth-constrained mapping of cores

onto NoC architectures. In Proc. of DATE, page 20896. IEEE Computer Society, 2004.

[40] Lionel M. Ni and Philip K. McKinley. A survey of wormhole routing techniques in direct

networks. Computer, 26(2):62–76, 1993.

[41] Umit Y. Ogras, Jingcao Hu, and Radu Marculescu. Key research problems in NoC

design: a holistic perspective. In Proc. of CODES+ISSS, pages 69–74, 2005.

[42] Umit Y. Ogras and Radu Marculescu. Analytical router modeling for networks-on-chip

performance analysis. In Proceedings of the conference on Design, automation and test

in Europe, pages 1096–1101. EDA Consortium, 2007.

[43] U.Y. Ogras and R. Marculescu. Application-specific network-on-chip architecture cus-

tomization via long-range link insertion. In Proc. of ICCAD, pages 246–253, 6-10 Nov.

2005.

[44] Open SystemC Iniciative. IEEE Std 1666-2005: IEEE Standard SystemC Language

Reference Manual. IEEE Computer Society, March 2006.

[45] Partha Pratim Pande, C. Grecu, M. Jones, A. Ivanov, and R. Saleh. Performance

evaluation and design trade-offs for network-on-chip interconnect architectures. IEEE

Transactions on Computers, 54(8):1025–1040, Aug. 2005.

[46] V. F. Pavlidis and E. G. Friedman. 3-D topologies for networks-on-chip. IEEE Trans-

actions on Very Large Scale Integration (VLSI) Systems, 15(10):1081–1090, 2007.

[47] V. Puente, J.A. Gregorio, and R. Beivide. SICOSYS: an integrated framework for

studying interconnection network performance in multiprocessor systems. In Proceedings

of 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing,


pages 15–22, 2002.

[48] Kiran Puttaswamy and Gabriel H. Loh. Thermal analysis of a 3D die-stacked high-

performance microprocessor. In Proceedings of the 16th ACM Great Lakes symposium

on VLSI, pages 19–24. ACM, 2006.

[49] R. Reif, A. Fan, Kuan-Neng Chen, and S. Das. Fabrication technologies for three-

dimensional integrated circuits. In Proceedings of International Symposium on Quality

Electronic Design, pages 33–37, 18-21 March 2002.

[50] F.J. Ridruejo and Jose Miguel-Alonso. INSEE: An interconnection network simula-

tion and evaluation environment. In Proc. of Euro-Par Parallel Processing, volume

3648/2005, pages 1014–1023. Springer Berlin / Heidelberg, 2005.

[51] Semiconductor Industry Association. International technology roadmap for semicon-

ductors, 2006.

[52] K. Siozios, K. Sotiriadis, V. F. Pavlidis, and D. Soudris. Exploring alternative 3D FPGA

architectures: Design methodology and cad tool support. In Proc. of FPL, 2007.

[53] Vassos Soteriou, Noel Eisley, Hangsheng Wang, Bin Li, and Li-Shiuan Peh. Polaris: A

system-level roadmap for on-chip interconnection networks. In Proc. of ICCD, October

2006.

[54] Vassos Soteriou, Hangsheng Wang, and Li-Shiuan Peh. A statistical traffic model for on-

chip interconnection networks. In Proc. of MASCOTS, pages 104–116. IEEE Computer

Society, 2006.

[55] STMicroelectronics. STNoC: Building a new system-on-chip paradigm. White Paper,

2005.

[56] A. W. Topol, Jr. D. C. La Tulipe, L. Shi, D. J. Frank, K. Bernstein, S. E. Steen,

A. Kumar, G. U. Singco, A. M. Young, K. W. Guarini, and M. Ieong. Three-dimensional

integrated circuits. IBM J. Res. Dev., 50(4/5):491–506, 2006.

[57] S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, P. Iyer,

A. Singh, T. Jacob, et al. An 80-Tile 1.28 TFLOPS Network-on-Chip in 65nm CMOS.

In Proceedings of International Solid-State Circuits Conference (ISSCC), pages 98–589.

IEEE, 2007.

[58] T.T. Ye, L. Benini, and G. De Micheli. Analysis of power consumption on switch fabrics

in network routers. In Proc. of DAC, pages 524–529, 10-14 June 2002.

Date post:	06-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Networks-on-Chips: Theory and PracticeAn overview of the cost considerations on the design of NoCs...

Documents