+ All Categories
Home > Documents > Hierarchical opto-electrical on-chip network for future multiprocessor architectures

Hierarchical opto-electrical on-chip network for future multiprocessor architectures

Date post: 11-Dec-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
20
Hierarchical opto-electrical on-chip network for future multiprocessor architectures Somayyeh Koohi * , Shaahin Hessabi Computer Engineering Department, Sharif University of Technology, Tehran, Iran article info Article history: Received 14 March 2010 Received in revised form 24 June 2010 Accepted 7 July 2010 Available online 16 July 2010 Keywords: Optical NoC Hierarchy Scalability Contention-free Power consumption abstract Importance of power dissipation in NoCs, along with power reduction capability of on-chip optical inter- connects, offers optical network-on-chip as a new technology solution for on-chip interconnects. In this paper, we extract analytical models for data transmission delay, power consumption, and energy dissipa- tion of optical and traditional NoCs. Utilizing extracted models, we compare optical NoC with electrical one and calculate lower bound limit on the optical link length below which optical on-chip network loses its efficiency. Based on this constraint, we propose a novel hierarchical on-chip network architecture, named as H 2 NoC, which benefits from optical transmissions in large scale SoCs and overcomes the scalability problem resulted from lower bound limit on the optical link length. Performing a series of sim- ulation-based experiments, we study efficiency of H 2 NoC along with its power and energy consumption and data transmission delay. Furthermore, the impact of network size, traffic pattern, and packet size dis- tribution on the prominence of the proposed architecture over traditional NoC and non-hierarchical ONoC is addressed in this paper. Our experimental results verify that the proposed hierarchical architec- ture outperforms non-hierarchical ONoC for moderate and large scale MPSoCs, while its prominence degrades for small number of processing cores. Ó 2010 Elsevier B.V. All rights reserved. 1. Introduction Various limitations of electrical interconnect, including quali- fied and quantified problems have been predicted for about two decades [2]. While NoC, as a new architectural trend, can improve bandwidth of electrical interconnections, it is unclear how elec- tronic NoCs will continue to satisfy future bandwidth and latency requirements within the package power budget [24]. Optics is a very different physical approach that can address most of the prob- lems associated with electrical interconnects, such as bandwidth, latency, and crosstalk [19]. Additionally, bit rate transparency [9] of optical switching elements and low propagation loss of optical waveguides [30] lead to low power dissipation of silicon photonics. Importance of power dissipation in NoCs along with power reduction capability of on-chip optical interconnects offers optical network-on-chip (ONoC) as a novel technology solution which can introduce on-chip interconnection architecture with high trans- mission capacity, low power consumption and low latency. While electrical NoCs enforces unaffordable power dissipation in high performance MPSoCs, the unique advantages of ONoC offer consid- erable power efficiency and also performance-per-watt scaling as the most critical design metric. Several on-chip interconnect architectures have been proposed that leverage CMOS-compatible photonics for future multicore microprocessors. However, most of the proposed optical architec- tures are bus-based. For example, the Cornell hybrid electrical/ optical interconnect architecture [14] comprises an optical ring that assigns unique wavelengths per node in order to implement a multi-bus. Firefly [22], as a hybrid electrical/optical network architecture, proposes the implementation of reservation-assisted single-write-multi-read buses. Moreover, HP Corona crossbar architecture [26] is in fact numerous multiple writer, single reader buses routed in a snake pattern among the nodes. To analyze on-chip optical networks at the system-level, Briere et al. [5] have developed a contention-free ONoC. In the proposed ONoC, the address of the target is not contained in the data packet, but rather in the wavelength of the optical signal. Routing optical signals according to their wavelengths is called wavelength routing method. The proposed contention-free structure is obtained at the cost of large arrays of fixed-wavelength light sources and fast switches for wavelength selection which limit the scalability, and also severely increase power consumption and area issues. The Columbia optical network [24] is one of the few that pro- poses on-chip optical switches. Shacham et al. [24] have intro- duced hybrid architecture for ONoC that combines a high-speed photonic circuit-switched network with an electronic packet- switched control network. Unlike the optical network proposed by Briere et al. [5], the proposed hybrid network cannot route 1383-7621/$ - see front matter Ó 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.sysarc.2010.07.003 * Corresponding author. E-mail address: [email protected] (S. Koohi). Journal of Systems Architecture 57 (2011) 4–23 Contents lists available at ScienceDirect Journal of Systems Architecture journal homepage: www.elsevier.com/locate/sysarc
Transcript

Journal of Systems Architecture 57 (2011) 4–23

Contents lists available at ScienceDirect

Journal of Systems Architecture

journal homepage: www.elsevier .com/locate /sysarc

Hierarchical opto-electrical on-chip network for future multiprocessor architectures

Somayyeh Koohi *, Shaahin HessabiComputer Engineering Department, Sharif University of Technology, Tehran, Iran

a r t i c l e i n f o

Article history:Received 14 March 2010Received in revised form 24 June 2010Accepted 7 July 2010Available online 16 July 2010

Keywords:Optical NoCHierarchyScalabilityContention-freePower consumption

1383-7621/$ - see front matter � 2010 Elsevier B.V. Adoi:10.1016/j.sysarc.2010.07.003

* Corresponding author.E-mail address: [email protected] (S. Koohi).

a b s t r a c t

Importance of power dissipation in NoCs, along with power reduction capability of on-chip optical inter-connects, offers optical network-on-chip as a new technology solution for on-chip interconnects. In thispaper, we extract analytical models for data transmission delay, power consumption, and energy dissipa-tion of optical and traditional NoCs. Utilizing extracted models, we compare optical NoC with electricalone and calculate lower bound limit on the optical link length below which optical on-chip network losesits efficiency. Based on this constraint, we propose a novel hierarchical on-chip network architecture,named as H2NoC, which benefits from optical transmissions in large scale SoCs and overcomes thescalability problem resulted from lower bound limit on the optical link length. Performing a series of sim-ulation-based experiments, we study efficiency of H2NoC along with its power and energy consumptionand data transmission delay. Furthermore, the impact of network size, traffic pattern, and packet size dis-tribution on the prominence of the proposed architecture over traditional NoC and non-hierarchicalONoC is addressed in this paper. Our experimental results verify that the proposed hierarchical architec-ture outperforms non-hierarchical ONoC for moderate and large scale MPSoCs, while its prominencedegrades for small number of processing cores.

� 2010 Elsevier B.V. All rights reserved.

1. Introduction

Various limitations of electrical interconnect, including quali-fied and quantified problems have been predicted for about twodecades [2]. While NoC, as a new architectural trend, can improvebandwidth of electrical interconnections, it is unclear how elec-tronic NoCs will continue to satisfy future bandwidth and latencyrequirements within the package power budget [24]. Optics is avery different physical approach that can address most of the prob-lems associated with electrical interconnects, such as bandwidth,latency, and crosstalk [19]. Additionally, bit rate transparency [9]of optical switching elements and low propagation loss of opticalwaveguides [30] lead to low power dissipation of silicon photonics.

Importance of power dissipation in NoCs along with powerreduction capability of on-chip optical interconnects offers opticalnetwork-on-chip (ONoC) as a novel technology solution which canintroduce on-chip interconnection architecture with high trans-mission capacity, low power consumption and low latency. Whileelectrical NoCs enforces unaffordable power dissipation in highperformance MPSoCs, the unique advantages of ONoC offer consid-erable power efficiency and also performance-per-watt scaling asthe most critical design metric.

ll rights reserved.

Several on-chip interconnect architectures have been proposedthat leverage CMOS-compatible photonics for future multicoremicroprocessors. However, most of the proposed optical architec-tures are bus-based. For example, the Cornell hybrid electrical/optical interconnect architecture [14] comprises an optical ringthat assigns unique wavelengths per node in order to implementa multi-bus. Firefly [22], as a hybrid electrical/optical networkarchitecture, proposes the implementation of reservation-assistedsingle-write-multi-read buses. Moreover, HP Corona crossbararchitecture [26] is in fact numerous multiple writer, single readerbuses routed in a snake pattern among the nodes.

To analyze on-chip optical networks at the system-level, Briereet al. [5] have developed a contention-free ONoC. In the proposedONoC, the address of the target is not contained in the data packet,but rather in the wavelength of the optical signal. Routing opticalsignals according to their wavelengths is called wavelength routingmethod. The proposed contention-free structure is obtained at thecost of large arrays of fixed-wavelength light sources and fastswitches for wavelength selection which limit the scalability, andalso severely increase power consumption and area issues.

The Columbia optical network [24] is one of the few that pro-poses on-chip optical switches. Shacham et al. [24] have intro-duced hybrid architecture for ONoC that combines a high-speedphotonic circuit-switched network with an electronic packet-switched control network. Unlike the optical network proposedby Briere et al. [5], the proposed hybrid network cannot route

(a) (b)Fig. 1. 16-Node Spidergon: (a) Schematic and (b) 2D layout.

S. Koohi, S. Hessabi / Journal of Systems Architecture 57 (2011) 4–23 5

optical signals according to their wavelengths, and suffers fromcontention problems.

In [15], we have proposed CONoC (Contention-free Optical NoC)to overcome the limitations of previously proposed ONoCs by Bri-ere et al. [5] and Shacham et al. [24]. In addition to its capability foroptically resolving packet congestions, CONoC differs from the onedeveloped by Shacham et al. [24] in several aspects. It is based ondifferent simpler and smaller photonic routing architectures.Moreover, utilizing wavelength routing method along with pathreconfiguration capability leads to a contention-free architecture.These architectural advantages lead to much simpler electricaltransactions, reduced setup latency, and higher transmissioncapacity. The main limitation of CONoC is the receiver’s inabilityto accept multiple data streams from different transmitterssimultaneously.

Increasing number of processing cores enforces us to build anetwork interconnecting hundreds of cores in future MPSoCs.Hence, scalable interconnection architectures are inevitable to pre-serve affordable communication cost. As is evident, number of pro-cessing cores in MPSoCs conversely affects global link length of theon-chip network. On the other hand, in an ONoC, opto-electricalconversions impact optimum link length of the optical waveguides.Comparing various design criteria at the physical level, Chen et al.[6] have predicted that the critical dimensions beyond which opti-cal interconnect becomes advantageous over electrical intercon-nect are approximately one tenth of the chip edge length at the22 nm technology node. Hence, increasing the number of proces-sors in an optically interconnected SoC may lead to performancedegradation for data transmissions between neighbor nodes. Con-clusion reported by Chen et al. [6] is made at physical level, with-out considering system architecture.

Considering system-level design metrics, in this paper we ana-lytically calculate the minimum optical link length below whichoptical on-chip network loses its efficiency. For this purpose, weextract analytical models for data transmission delay, power con-sumption, and energy dissipation of ONoCs and electrical NoCs(ENoCs). To calculate lower bound limit on the optical link length,we compare these design metrics for varying values of link lengthand degree of multiplexing.

Based on the optimum link lengths, we propose a novel hierar-chical electrical/optical on-chip network architecture which bene-fits from optical transmissions in large scale SoCs. The proposedarchitecture, built upon CONoC, overcomes the scalability problemresulted from lower bound limit on the optical link length. Severalliteratures have studied hierarchical on-chip networks, while a fewof them are built upon optical/electrical architectures. Corona [26]and Firefly [22] networks are the most recent examples of thesehybrid networks, in which, an optical bus is responsible for datatransmission between local networks. Despite their simplicity, glo-bal optical buses suffer from scalability limitations and degradedperformance metrics due to the global arbitration scheme.

In this paper, we propose a hierarchical electrical/optical on-chip network architecture which benefits from electrical clustersand global optical network to transmit data packets between pro-cessing cores. Taking advantages of CONoC, the proposed hierar-chical architecture introduces a fully contention-free structure.Moreover, global ONoC in the proposed architecture enhancesCONoC such that simultaneous packets can be sent from each opti-cal router, and multiple packets can be received from differentsources at a same optical destination node. Investigating the effi-ciency of the proposed architecture, we conclude that the proposedhierarchical hybrid NoC improves power efficiency compared tothe non-hierarchical one, in large scale MPSoCs.

The rest of the paper is organized as follows: Section 2 brieflydiscusses NCFOR [15] as a nonblocking optical router and reviewsCONoC architecture, as an optical on-chip infrastructure built upon

it. Section 3 discusses the scalability limitations of previously pro-posed ONoCs and calculates lower bound limit on the optical linklength. In Section 4, we introduce our novel hierarchical networkand data transmission scenarios. For investigating the efficiencyof the proposed hierarchical architectures, we developed a simula-tion environment which is explored in Section 5. Utilizing thebehavioral simulator, several simulation-based experiments arecarried out for evaluating the impact of traffic pattern, packet sizedistribution, and the number of processing cores on the system-le-vel metrics of the proposed NoC. Section 6 presents the experimen-tal results and compares them with those obtained from ENoC.Moreover, in this section, we investigate the prominence of theproposed hierarchical architecture over the non-hierarchical opti-cal on-chip network. Section 7 investigates the efficiency of theproposed hierarchical NoC over ENoC and non-hierarchical opticalNoC in large scale MPSoCs. For this purpose, system-level metricsof the proposed architecture are evaluated for various numbersof processing cores. Finally, Section 8 concludes the paper.

2. Non-hierarchical ONoC architecture

In this section, we briefly review previously proposed on-chipoptical infrastructure, referred to as CONoC [15].

2.1. Topology

When designing an optical network-on-chip, we should concernfor physical properties of light transmission. One of these consider-ations is the waveguide intersection crosstalk. Moreover, whileimplementing high-degree electronic crossbar is simple, it is quitedifficult to construct optical crossbars larger than 4 � 4 using exist-ing 2 � 2 photonic switching elements.

Spidergon [8] is a proper interconnection topology which canexploit the advantages of photonic switching elements and doesnot confront with their limitations. The Spidergon architecturewith N (even) nodes is similar to a ring enriched by across links be-tween opposite nodes (Fig. 1a). Results show that the Spidergontopology compared to Ring and Mesh topologies is a good tradeoffbetween performance, scalability, energy, and area requirementsfor SoCs [4]. Due to the constant degree of Spidergon topology(equal to 3), 4 � 4 photonic crossbar of a node can simply intercon-nect neighbor nodes and local IP with each other. Moreover, rightangle waveguide intersections (as shown in Fig. 1b) introduce neg-ligible crosstalk. These advantages along with efficiency of theSpidergon topology motivated us to implement CONoC on top ofthis topology.

CONoC adopts Across-first routing [4] as its deterministic rout-ing algorithm. According to this routing scheme, if the target nodefor a packet is at distance D > N/4 (N is the number of nodes) on theexternal ring, then the across link is traversed first, to reach the

Fig. 2. 16-Node optical Spidergon built upon BPSB.

6 S. Koohi, S. Hessabi / Journal of Systems Architecture 57 (2011) 4–23

opposite node. Then, clockwise or counterclockwise direction istaken and maintained, depending on the target’s position. In thecase that D 6 N/4, packets only traverse along the external ring.

2.2. Optical router architecture

CONoC is built on a Nonblocking Contention-Free Optical Rou-ter (NCFOR) which benefits from both wavelength routing andelectrical techniques for path reservation. According to Spidergontopology, two-dimensional 4 � 4 crossbars are needed to intercon-nect local IPs and neighbor nodes with each other. NCFOR [15] ben-efits from 4 � 4 photonic crossbar proposed by Xu and Poon [28]which comprises an array of four identical silicon microring reso-nator switches. Enlarged view in Fig. 2 schematically shows this4 � 4 switch node. Although the proposed photonic switch in[28] (referred to as BPSB in [15]) does not suffer from internalblocking problem in most of data transmission scenarios, it cannotguarantee contention-free operation of the optical network. Hence,we have modified BPSB and built NCFOR upon it to guarantee con-tention-free operation of the optical network.

Fig. 2 illustrates an enlarged view of BPSB and an example of 16-node Spidergon topology built upon it. In this figure, microring res-onator switches are labeled as Si, 1 6 i < 4 and Ni stands for ith nodeof the network. Utilizing BPSB in an optical network, a blockingscenario occurs when two input messages need the same micror-ing resonator to be switched on or they impose inconsistentswitching states on it. All these scenarios are listed in Table 1.

Table 1Blocking scenario in BPSB.

Cont. Sw Cont. Path1 Cont. Path2 Cont. Sw Cont. Path1 Cont. Path2

0 6 i < N/2 N/2 6 i < NS1 T4 ? R3 T1 ? R3 S1 T4 ? R3 T1 ? R3

S1 T1 ? R2 T1 ? R3 S2 T2 ? R3 T1 ? R3

S1 T1 ? R2 T4 ? R3 S2 T1 ? R4 T1 ? R3

S2 T2 ? R3 T1 ? R3 S2 T1 ? R4 T2 ? R3

S3 T2 ? R1 T3 ? R1 S3 T2 ? R1 T3 ? R1

S4 T4 ? R1 T3 ? R1 S3 T3 ? R4 T2 ? R1

S4 T3 ? R2 T4 ? R1 S3 T3 ? R4 T3 ? R1

S4 T3 ? R2 T3 ? R1 S4 T4 ? R1 T3 ? R1

S3, S4 T3 ? R2 T2 ? R1 S1, S2 T4 ? R3 T1 ? R4

S1, S4 T3 ? R2 T4 ? R3 S2, S3 T2 ? R1 T1 ? R4

S2, S4 T2 ? R1 T4 ? R3 S2, S4 T2 ? R1 T4 ? R3

For each scenario, there exist two different optical data transmis-sion paths (referred to as cont. paths) which contend for electri-cally controlling the same silicon microring resonator switch(es)(named as cont. Sw). Each optical path is determined by a pair ofinput and output ports of the photonic router as shown by in-put � port ? output � port, where T1(R1), T2(R2), T3(R3), and T4(R4)represent West, North, East, and South input (output) ports,respectively.

Wavelength routing method allows wavelength selective filter-ing, which can devise fully contention-free structures. As a meansof wavelength selective filtering, SOI-based microring resonatorstructures have been explored as passive Optical Add/Drop filters(OAD) [16]. These elements insert (add) or extract (drop) opticalchannels (wavelengths) to or from the optical transmission streamwithout any electronic processing. To design a contention-freeONoC infrastructure, NCFOR utilizes wavelength routing methodalong with optical add/drop elements to overcome path blockingscenarios listed in Table 1.

In CONoC, contention-free operation of the network is accom-plished by having one (or a set of) dedicated wavelength(s) foreach node. For data injection to the network, optical data streamstargeted to a specific node are modulated on its dedicated wave-length(s), and are ejected from the network to the destination nodeaccording to their wavelength(s). Utilizing OAD elements, NCFORaugments the BPSB with Ejecting Microring Resonators (EMRs) toextract optical data streams targeted to this IP block from thosepassing through the router. Taking advantage of EMRs overcomesthe contention problem when ejection of an optical data streammeets a blocking scenario. In addition to EMRs, Injecting MicroringResonators (IMRs) are added to the BPSB to multiplex optical datastreams transmitted from this IP block with those passing throughthe router. Utilizing IMRs avoids the contention problem whenoptical data injection meets a blocking scenario. Despite theseblocking scenarios, as listed in the table, a blocking scenario mayoccur while transmitting a packet on the across link to reach theEast/West port of the opposite node. Across Microring Resonators(AMRs) are added on across links to overcome contention prob-lems in these cases. The proposed photonic router composed ofBPSB and extra microring resonators including EMRs, IMRs, andAMRs is shown in Fig. 3 for Ni0 6 i < N/2 and Ni N/2 6 i < N.

We have shown in [15] that NCFOR overcomes all blocking sce-narios listed in Table 1 and guarantees contention-free operation ofthe network. The proposed optical router allows for multiple data

S1

S3

S2

S4

R2 T2

R3

T3R1

T1

T4 R4

S5

S6

S9

S10

S8

S7

Ejection Injection

Across Links

AMR

EMR

IMR

(a) (b)

Injection

Fig. 3. NCFOR architecture: (a) 0 6 i < N/2 and (b) N/2 6 i < N.

S. Koohi, S. Hessabi / Journal of Systems Architecture 57 (2011) 4–23 7

streams multiplexing on the same optical waveguide, and thendemultiplexing according to their wavelengths. This additionalflexibility, besides ultra-wide bandwidth of the optical intercon-nection medium, leads to high performance ONoC infrastructure.

Table 2Delay contributions in CONoC.

Parameter Value Unit

Optical Modulator driver 9.5 psModulator 14.3 psPhoto-detector 0.2 psTIA 4 psWaveguide delay 15.4 ps/mm

Electrical Router processing 200 psWire delay 131 ps/mm

3. Scalability

In this section, we discuss scalability limitations of ONoCs re-sulted from increasing number of processing cores in future multi-processor SoC designs.

Simultaneous multiple packets transmission and reception bythe same optical router is idealistic in large scale optical on-chipinterconnects. Although the contention-free ONoC proposed byBriere et al. [5] achieves this goal, it requires one wavelength tobe associated to each distinct physical path. Since the number ofdistinct paths is quadratically proportional with the node count,this requirement limits the feasibility of the network for futureSoC designs. CONoC infrastructure however, limits the number ofdistinct wavelengths to one forth of node counts in the optical net-work while retaining the contention-free functionality of ONoC.Unlike CONoC, proposed ONoC by Shacham et al. [24] suffers fromcontention problems. None of the architectures proposed in [15,24]can satisfy the requirement of multiple packets reception andtransmission by the same optical router.

Unlike electrical devices, optical devices are not readily scalablewith technology node due to the light wavelength constraint [1].Therefore, compact photonic switching elements are inevitable tobuild an optical on-chip network in future MPSoC designs.Although small device foot-print have been reported by authorsin [15,25], it is unclear how hundreds of these photonic routers willbe integrated on a single chip without considerable area overhead.

Considering electrical/optical and optical/electrical conversions,on-chip optical interconnect is solely attractive for global connec-tions. Increasing number of processing cores scales down linklength between adjacent optical routers in an ONoC, which maydegrades advantages of optical interconnects over electrical ones.Hence, calculating minimum global link length above which opti-cal on-chip network retains its efficiency is valuable. In [15], wehave compared CONoC with the traditional NoCs at future 22 nm

technology node. Although the comparison is unabridged, it ismade at a constant optical and electrical link length and constantdegree of multiplexing. Both of these parameters directly affect to-tal power consumption and data transmission delay in CONoCcompared to ENoC. To analyze the efficiency of optical on-chipinterconnect, we analytically estimate and compare power and en-ergy consumption and data transmission delay of CONoC withthose of ENoC for varying values of link length and degree of mul-tiplexing. Based on this comparison, we extract the lower boundlimit on the optical link length.

3.1. Delay estimation

In the case of optical interconnects, there are four contributionsto the data transmission latency arising from modulator, propaga-tion delay in the waveguide, photo-detector, and TransimpedanceAmplifier (TIA). In addition to optical data transmission latency,some additional latency arises from control packet processing atthe routers and propagation delay in the electrical wires. Table 2lists main contributions to the data transmission latency in theCONoC. For calculating optimum length of the global optical con-nections, we assume a propagation velocity of 15.4 ps/mm in a sil-icon waveguide for the optical signals [12] and 131 ps/mm in anoptimally repeated wire at 22 nm for the electronic signals travel-ing along electrical wire [11]. Delay parameters of the receiver andtransmitter have been predicted by Chen et al. [6] for 1 cm optical

Table 3Power contributions in CONoC.

Optical Electrical

Parameter Value Unit Parameter Value Unit

Pw 0.51 dB/cm PLink 1 mW/mmPCR 0.6 dB PBuffer 0.6 mW/bitPMR,ON 10 mW PCrossbar 1.8 mW/bitPMR, drop-port IL 1.4 dB PStatic 1.75 mW/bitPTransmitter 5 mWPReceiver 0.3 mW

8 S. Koohi, S. Hessabi / Journal of Systems Architecture 57 (2011) 4–23

path at the future 22 nm CMOS process technology. Assuming5 GHz clock [24], electrical processing in each router (for path res-ervation and teardown) is considered to be one clock cycle, i.e.200 ps. Moreover, 8-bit electrical links are utilized for transmittingelectrical signals through the network in CONoC [15].

Total data transmission latency in CONoC consists of path-setuplatency, photonic data transmission delay, and path-teardown la-tency. Based on delay parameters from Table 2, we estimate pho-tonic data transmission latency as follows:

TOptical ¼ TModulator þ TModulatordriver þ TDetector þ TTIA þ TWaveguide

� lþ Packet Size=ðWDM �Modulation RateÞ ð1Þ

where, l is the length of the optical path from source to the destina-tion node, and WDM is the degree of multiplexing for each opticaldata stream. Without loss of generality, we assume that Packet Sizeparameter is a constant value, e.g. 2 KB [15]. Each optical datastream is modulated on WDM distinct wavelengths at the rate of40 Gbps [15], i.e. Modulation Rate equals to 40 Gbps. Therefore, TOpti-

cal is a function of two parameters, l and WDM. Electrical delay fortransmitting path-reservation and teardown packets is estimatedas follows:

TElectrial ¼ 2� fRouter Processing � HopCount þ TWire � l

þ Ctlpack Size=ðBusWidth� ClkFreqÞg ð2Þ

where, BusWidth equals to 8 according to 8-bit electrical links inCONoC, ClkFreq is 5 GHz according to the assumed clock frequency,and HopCount is the number of hops passed by each of the path-res-ervation and teardown packets. CtlPack Size is a constant parameterthat specifies the length of the path-reservation and path-teardownpackets. Since few control information are required for path-setupand teardown, we assume 4-byte path-reservation and teardownpackets. Based on above discussion, TElectrical becomes a function oftwo parameters, l and HopCount. For roughly estimating electricaldelay, we assume that waiting interval [15] is negligible and path-reservation and path-teardown packets pass through the electricallinks without blocking. These simplifying assumptions lead to anoptimistic estimation for total data transmission delay throughCONoC. As mentioned before, we will extract lower bound limiton the optical link length from delay, power, and energy compari-sons between optical and electrical NoCs. Hence, optimistic estima-tions for CONoC reinforces our final conclusion that optical on-chipnetwork loses its efficiency below the lower bound limit on theoptical link length.

Taking into account both of optical and electrical delays, totaldata transmission latency for transmitting a data message throughCONoC is computed as follows:

TCONoC ¼ TOptical þ TElectrical ð3Þ

Based on the predictions made by Shacham et al. [24], futureENoCs will route packets on 168-bit parallel links between adja-cent routers under 5-GHz clock frequency. Router processing delayis assumed to be 600 ps, or three cycle times of a 5 GHz clock [24].Assuming wormhole switching, data transmission latency throughENoC is estimated as follows:

TENoC ¼ Router Processing � HopCount þ TWire � l

þ Packet Size=ðBusWidth� ClkFreqÞ ð4Þ

where Router Processing and BusWidth equal to 600 ps and 168,respectively. Remaining parameters equal to the correspondingones from CONoC. For roughly estimating the total delay throughENoC, we assume that the network does not operate in saturationregion. Hence, similar to TElectrical, TENoC is a function of two param-eters, l and HopCount.

3.2. Power estimation

The performance of an optical communication system dependson the minimum optical power required by the receiver and on theefficiency of passive optical devices used in the system. The totalloss in any optical link is the sum of losses (in dB) of all opticalcomponents [20]:

PLink ¼ PCV þ PW þ PB þ PY þ PCR ð5Þ

where, PCV is the coupling coefficient between the photonic sourceand optical waveguide, PW is the waveguide propagation loss perunit distance, PB is the bending loss, PY is the Y-coupler loss (notpresent in CONoC), and PCR is the coupling loss from the waveguideto the optical receiver. Since we have supposed an off-chip lightsource in CONoC [15], PCV is zero. Propagation loss of the opticalwaveguides is set to 0.51 dB/cm [30]. The authors in [18] haveshown a 90� bend with submicrometer dimensions and negligiblebending loss. Finally, in this case study, the coupling efficiency is as-sumed to be 0.6 dB [23].

In addition to optical power losses in the waveguides, switchingelements of the CONoC significantly impact total power dissipatedin the network. The power consumed in a microring resonatorswitch in the ON state, when the multi-wavelength message isforced to turn [3], is approximately 10 mW, while there is no dis-sipation in the OFF state. Moreover, an activated microring resona-tor leads to average drop-port IL of approximately 1.4 dB [17],while its through-port IL is negligible. Switching power of about32 nW has been reported for microring resonator arrays in BPSB[28] which can be neglected. Besides optical losses in waveguidesand microring switches, electrical power is consumed in electro-optical converters. Chen et al. [6] have predicted that power con-sumed by the transmitter and receiver circuits for each wavelengthchannel for 1cm optical path at the future 22 nm technology are5 mW and 0.3 mW, respectively.

In addition to power dissipation in optical devices, routing andprocessing path-reservation and path-teardown packets in the net-work impose additional electrical power losses. The estimatedpower consumption per unit length for delay-optimized electricalinterconnects with optimal repeaters is of the order of 1 mW/mm [10]. Shacham et al. [24] have reported values of the energyspent in flit processing operations (neglecting arbiter energy).Based on above parameters and 5-GHz clock frequency, the maincontributions to the total power consumption for data transmis-sion through CONoC are listed in Table 3. Utilizing these parametervalues, we calculate power dissipated for transmitting an opticalmessage through CONoC as follows:

POptical ¼ ðPMR;ON þ PMR;drop�port ILÞ � NON þ PW � lþ PCR þWDM

� ðPtransmitter þ PReceiverÞ ð6Þ

where, NON is the number of microring resonators in the ON statepassed by the optical message, and l and WDM are defined similarto the corresponding parameters in Eq. (1). According to N-nodeCONoC architecture, in the case that target node for a packet is at

S. Koohi, S. Hessabi / Journal of Systems Architecture 57 (2011) 4–23 9

distance N/2 of the source node, photonic data path is conductedthrough the across link requiring all microring switches to beswitched off on the path (NON equals to zero). In all other cases,optical path is conducted through two activated microring resona-tors. Hence, it is straightforward to show that the expected numberof activated microring resonators per optical data stream equals toin an N-node CONoC, which leads to NON � 2 for N P 15. Based onthis assumption, POptical becomes a function of two parameters, land WDM. For a BER of 10�15, the smallest power required by thereceiver is �22.3 dBm [20]. Hence, minimal power required foroptical data transmission equals to. While computing optical power,appropriate power unit conversion should be considered. The elec-trical power consumed by path-reservation and teardown packets iscomputed as follows:

PElectrical ¼ 2� BusWidth� fILink � lþ HopCount

� ðPBuffer þ PCrossbar þ PStaticÞg ð7Þ

For simplicity, we have assumed that power consumed by path-reservation packet is equal to that of the path-teardown packet.Assuming power parameters from Table 3, PElectrical turns into afunction of two parameters, l and HopCount. Taking into accountboth of optical data transmission losses and electrical power con-sumed by path reservation and teardown packets, the total powerconsumption for transmitting a data message through CONoC iscomputed as follows:

PCONoC ¼ PPacket þ PElectrical ð8Þ

Power consumption for data transmission through SpidergonENoC is estimated as follows:

PENoC ¼ BusWidth� fPLink � lþ HopCount � ðPBuffer þ PCrossbar þ PStaticÞgð9Þ

Fig. 4. Comparing CONoC with ENoC for HopCount = 1. (a) Power co

Similar to PElectrical, PENoC is a function of two parameter, l andHopCount.

3.3. Energy estimation

For transmitting optical messages through CONoC, electrical en-ergy is consumed by path reservation and teardown packets whileoptical energy is dissipated by optical data routing through thenetwork. Hence, total energy spent for transmitting a data messageis computed as follows:

ECONoC ¼ EOptical þ EElectrical

¼ TOptical � POptical þ TElectrical � TElectrical ð10Þ

where, delay and power values are estimated from Eqs. (1), (2), (6),and (7). Similarly, we compute electrical energy spent by each pack-et in ENoC as follows:

ECONoC ¼ TENoC � PENoC ð11Þ

where, delay and power values are estimated from Eqs. (4) and (9).

3.4. Lower bound limit on optical link length

Decreasing the link length between adjacent routers andincreasing the degree of multiplexing in an optically intercon-nected SoC may lead to performance degradation for data trans-missions between neighbor nodes. For evaluating efficiency ofCONoC, we compare TCONoC, PCONoC, and ECONoC with TENoC, PENoC,and EENoC, respectively, for varying values of link length and degreeof multiplexing. For estimating power, delay, and energy values ofthe optical and electrical networks in the previous section, we as-sumed non-saturation region for electrical network and negligible

nsumption, (b) Data transmission delay and (c) energy spent.

10 S. Koohi, S. Hessabi / Journal of Systems Architecture 57 (2011) 4–23

waiting interval for optical network. These simplifying, but reason-able, assumptions led to acceptable estimations upon which wewill extract the lower bound limit on the optical link length.

As depicted by Eqs. (3), (4), (8), and (9), TENoC and PENoC dependon two parameters, l and HopCount, while TCONoC and PCONoC dependon three parameters, WDM, l, and HopCount. Hence, for evaluatingthe impact of path length (l) and degree of multiplexing (WDM) onthe efficiency of CONoC, we compare total delay, power, and en-ergy values of optical and electrical networks for different constantvalues of HopCount parameter. Using MATLAB, consequences ofthis comparison for HopCount = 1 are shown in Fig. 4. As depictedin Fig. 4a, total power consumption in CONoC exceeds that of ENoCfor large values of WDM and short paths (small l). Power incrementfor large values of WDM in CONoC is a result of high power con-sumption overhead due to opto-electrical conversions in transmit-ter and receiver nodes. On the other hand, while PCONoC slightlydepends on the length of optical path, PENoC rapidly growths forlarge values of l which leads to considerable power increment inENoC compared to CONoC. High power consumption in electricalwires, compared to low propagation loss of optical waveguides,leads to considerable power increment in ENoC compared to CON-oC for large and moderate values of l. Therefore, we can concludefrom Fig. 4a that optical on-chip network loses its efficiency forlarge values of WDM and short optical paths.

Fig. 4b compares total data transmission delay through CONoCwith that of ENoC. As shown in this figure, the degree of multiplex-ing strongly affects TCONoC. While optical data transmission resultsin smaller total latency for large values of WDM, ENoC substantiallyoutperforms CONoC for WDM < 4. Hence, for reducing TCONoC we

Fig. 5. ECONoC/EENoC ratio for different values of HopCount parame

should modulate optical data on several wavelength channels,which leads to power increase in the optical network, as illustratedin Fig. 4a. Moreover, both of TCONoC and TENoC slightly depend on thepath length (l) between source and destination routers, whichequals to hop length in the case of HopCount = 1. This indepen-dency is a result of lower propagation delay compared to transmis-sion delay in both of optical and electrical wires.

For emphasizing the tradeoff between delay and power con-sumption in CONoC, Fig. 4c compares ECONoC with EENoC. As shownin this figure, for small values of WDM we have: ECONoC > EENoC

which is a result of larger data transmission delay in CONoC com-pared to ENoC for small values of WDM (as depicted in Fig. 4b). Onthe other hand, for short optical paths, even for large values ofWDM, we have: ECONoC > EENoC which arises from larger power con-sumption in CONoC compared to ENoC for small values of l. There-fore, we can conclude that optical on-chip network outperformsthe traditional NoCs when the length of the optical paths and de-gree of multiplexing satisfy some lower bound limits.

Fig. 5 shows ECONoC/EENoC ratio for different values of HopCountparameter. CONoC outperforms ENoC for a pair of (l,WDM) if wehave: ECONoC(l,WDM)jHopCount < EENoC(l,WDM). For example, fromFig. 5a we can deduce that if HopCount and WDM equal to oneand four, respectively, then for l P 3 CONoC outperforms ENoC.As depicted in Fig. 5b, for all values of WDM and l P 7 we have:ECONoC < EENoC. Since HopCount equals to two in this figure, we con-clude the same lower bound limit as obtained for HopCount = 1,which is about 3 mm between adjacent optical routers. Finally, asshown in Fig. 5c, in the case of HopCount = 3, advantage of CONoCover ENoC is clear for all values of l and WDM. The same result is

ter. (a) HopCount = 1, (b) HopCount = 2 and (c) HopCount = 3.

S. Koohi, S. Hessabi / Journal of Systems Architecture 57 (2011) 4–23 11

obtained for HopCount > 3, but corresponding results are notshown for brevity.

4. Proposed hierarchical architecture

In this section, we propose a novel hierarchical NoC architecturewhich is referred to as H2NoC (Hierarchical Hybrid NoC). The pro-posed hierarchical architecture overcomes the scalability problem,discussed in the previous section, while utilizing global opticallinks. H2NoC, built upon CONoC, introduces a fully contention-freestructure and eliminates main limitations of CONoC.

4.1. Topology

From previous section we know that optical on-chip networkloses its efficiency for global optical connections shorter than3 mm between adjacent routers in CONoC. Lower bound limit onthe optical link length affects maximum number of processingcores in an optically interconnected SoC which can retain advanta-ges of optical NoCs over traditional NoCs. Moreover, preference ofoptical infrastructure over traditional networks is obvious in thecase that more than two hops are passed by each packet. In otherwords, if equal or less than two hops are passed by a packet, linklength between adjacent routers should satisfy the lower boundlimit to save advantages of optical data transmission over electricaltransmission. Based on these observations, we propose a hierarchi-cal architecture which benefits from local electrical network whenHopCount 6 2 and global optical network when HopCount > 2 totransmit data packets between processing cores. H2NoC increasesoptical link length between adjacent optical routers and reducesnumber of optical routers in the network. Consequently, it retainsefficiency of the network.

H2NoC is composed of local networks of processing cores hier-archically interconnected by global optical network. In H2NoC, datatransmissions between neighbor nodes in a local network areaccomplished through electrical links while optical data streamsare responsible for data exchange between local networks throughglobal optical links. Due to inefficiency of the electrical network forHopCount > 2, diameter of the local network should be equal or lessthan two. Topology of the optical and electrical networks can bechosen independently. Due to efficiency of the optical Spidergon

Fig. 6. H2NoC ar

topology, our proposed hierarchical architecture is built uponCONoC [15], as its global optical network. On the other hand, Torustopology is a good tradeoff between network diameter, connectiv-ity, and area requirement for on-chip network architectures with afew nodes. This advantage motivates us to implement local net-works of H2NoC on top of this topology. Due to the upper boundlimit on the network diameter, 3 � 3 Torus is preferred with net-work diameter equal to two.

Fig. 6 illustrates an example of H2NoC built upon 16-node CON-oC. Based on NCFOR architecture, optical routers are implementedas described in Section 2, but their corresponding ejection/injec-tion channels are connected to electrical networks instead of localIPs. Therefore, each local network (schematically shown as en-larged view in the figure) is associated with an optical router. Localelectrical networks are implemented as 3 � 3 Torus topology. Eightout of nine electrical routers in each local network are connected tolocal IPs. The remaining one, schematically located at the center ofTorus topology, is connected to the optical network through anoptical link (not shown in the figure). In addition to electrical rout-ing capability, the central node should be able to receive (transmit)optical (electrical) data streams targeted to (originated from) a lo-cal IP in the corresponding electrical network and perform optical/electrical (electrical/optical) conversions. We will refer to this nodeas OTER (Optical Transceiver Electrical Router) in the remainingsections of the paper.

According to eight processing cores in each local network andSpidergon topology for the global network, it is easy to show thattotal number of processing cores in the hierarchical networkshould be multiple of 16. In the case that this condition is not sat-isfied, dummy electrical or (and) optical routers should be includedin the local or (and) global network, respectively.

4.1.1. Routing algorithmSimilar to CONoC, H2NoC adopts the Across-first algorithm for

routing optical packets through the global optical network. How-ever, in some special cases, as discussed in more details later,H2NoC may route optical packets in an adaptive manner. On theother hand, Torus topology benefits from XY routing as its deter-ministic routing algorithm. If source and destination IPs belongto the same electrical network, data packets are routed throughlocal network without utilizing global optical links. Otherwise,

chitecture.

12 S. Koohi, S. Hessabi / Journal of Systems Architecture 57 (2011) 4–23

local network routes electrical data packets from source IP to thecorresponding OTER. Optical Transceiver converts these electricalpackets to optical data stream(s) and sends them to the associatedoptical router, where they are injected to the optical network. Rou-ted through CONoC, optical data streams are ejected from destina-tion optical router. They are sent to OTER in the correspondinglocal network where optical streams are converted again to electri-cal data packets. Finally, routed through local network, data pack-ets are received by the destination IP.

4.2. H2NoC enhancement over CONoC

CONoC is not capable of simultaneously transmitting (receiv-ing) multiple packets by the same optical router. Since each opticalrouter in CONoC is connected to one local IP, these limitations havenegligible impact on total performance of the network, as reportedby authors in [15]. However, associating one optical router to eachlocal network in H2NoC enforces OTER to transmit data streams be-tween global optical network and different IPs in the local network.Because of concurrent operation of eight different IPs in a local net-work, multiple packets are required to be simultaneously transmit-ted (received) by the corresponding OTER. Hence, the abovelimitations of CONoC significantly degrade total performance ofthe hierarchical network. In this paper, global optical networkovercomes the limitations of CONoC such that simultaneous pack-ets can be injected (ejected) to (from) the network by the sameoptical router, and simultaneous packets can pass through theacross link.

Contention-free operation of CONoC is accomplished by havinga dedicated wavelength set for each optical router. In the case ofH2NoC, distinct wavelength sets are assigned to different IPs in alocal network. Hence, multiple data streams targeted to differentprocessing cores in a local network can be multiplexed on the sameoptical waveguides and ejected from the global optical network tothe corresponding OTER at the same time. Concurrent operation oflocal IPs requires eight different sets of wavelengths to be assignedto each optical router. Moreover, EMRs of each router switch onwhen the wavelengths of the optical data streams match one ofthe dedicated wavelengths to the corresponding local network.Therefore, modulation wavelengths of an optical data stream tar-geted to a specific processing core depend both on its index andon the local network where it is located.

As discussed in [15], number of distinct wavelengths requiredin the network depends on the maximum number of multiplexed

Fig. 7. Optical Transce

optical flows on a waveguide (referred to as maximum degree ofmultiplexing). We have also shown that network diameter specifiesmaximum degree of multiplexing which equals to bN=4c in an N-node Spidergon topology. In H2NoC, each optical router of the Spid-ergon network requires eight different sets of wavelengths. Hence,number of distinct wavelength sets required in the global opticalnetwork equals to 8 � bN/4c. It can be shown that contention-freeoperation of the H2NoC is guaranteed if the same wavelength set isdevoted to IPi located in LNm, and IPj located in LNn in the case ofm � n(modbN/4c) and i � j(mod 8). As follows, we briefly compareH2NoC with CONoC in terms of the required number of wavelengthsets. For interconnecting N number of processing cores, H2NoC uti-lizes [N/8]-node optical Spidergon network, while locating eightnumber of processing cores in the local network associated to eachoptical router. Based on above discussion, number of wavelengthsets required by H2NoC equal to 8� bN=8c

4

j k. On the other hand,

interconnecting N number of processing cores, CONoC benefitsfrom number of wavelength sets. Since number of O/E and E/O con-verters specifies total complexity of the optical on-chip networks,above discussion results in analogous levels of complexity in CON-oC and H2NoC architectures. Therefore, the proposed hierarchicalon-chip architecture does not increase design complexity in spiteof overcoming the lower bound limit on the optical link length.

Concurrent operation of eight number of processing cores in alocal network enforces the corresponding optical router to simulta-neously inject multiple optical data streams to the optical network.To satisfy this requirement, we propose a proper injection scenario,discussed in more details later, which can overcome previously de-scribed injection constraint of CONoC.

4.3. Optical Transceiver/Electrical Receiver

OTER, as a key component in H2NoC, is responsible for routingelectrical data packets within local network, and transmitting datastreams between global optical network and different IPs in the lo-cal network. While the structure of the Electrical Router is evident,Fig. 7 shows a simple block diagram of Optical Transceiver (OT)unit in the OTER which consists of optical receiver and opticaltransmitter units. For transmitting optical data streams from glo-bal optical network to the corresponding local IPs, an array ofwavelength selective filters demultiplex different optical datastreams targeted to different IPs in the local network. At the nextstep, optical data streams are demodulated and converted to elec-trical signals. Utilizing wavelength routing method, the address of

iver unit of OTER.

Table 4Switch reservations to overcome blocking scenarios.

Path1 Sw1 Path2 Sw2 Path1 Sw1 Path2 Sw2

0 6 i < N/2 N/2 6 i < NT4 ? R3 S7 T1 ? R3 – T4 ? R3 S7 T1 ? R3 –T1 ? R2 S9 T1 ? R3 – T2 ? R3 S6 T1 ? R3 –T1 ? R2 S9(S1) T4 ? R3 S1(S7) T1 ? R4 S9 T1 ? R3 –T2 ? R3 S6 T1 ? R3 – T1 ? R4 S1(S9) T2 ? R3 S6

T2 ? R1 S5 T3 ? R1 – T2 ? R1 S5 T3 ? R1 –T4 ? R1 S8 T3 ? R1 – T3 ? R4 S3(S10) T2 ? R1 S5(S3)T3 ? R2 S4(S10) T4 ? R1 S8 T3 ? R4 S10 T3 ? R1 –T3 ? R2 S10 T3 ? R1 – T4 ? R1 S8 T3 ? R1 –T3 ? R2 S4(S10) T2 ? R1 S5(S3) T4 ? R3 S7(S1) T1 ? R4 S2(S9)T3 ? R2 S4(S10) T4 ? R3 S7(S1) T2 ? R1 S5(S3) T1 ? R4 S2(S9)T2 ? R1 S5(S3) T4 ? R3 S1(S7) T2 ? R1 S5(S3) T4 ? R3 S1(S7)

S. Koohi, S. Hessabi / Journal of Systems Architecture 57 (2011) 4–23 13

the destination IP is not contained in the optical data stream butrather in the wavelength of the optical signal. Hence, for routingthe produced electrical packets, index of the corresponding desti-nation IP within the local network should be determined from itsassociated wavelength set and concatenated to the electrical pack-et. Reverse operations are performed in the opposite direction fortransmitting data packets from local network to the global opticalnetwork (as shown by optical transmitter unit in the figure).

4.4. Implementation cost

OTER as a key component of the proposed hierarchical architec-ture, as shown in Fig. 7, consists of electro-optical (E/O and O/E)convertors and optical modulators, demodulators, band-pass filters(demultiplexer), and multiplexers.

Regarding transmitter unit of the OTER architecture, E/O con-vertors and optical modulators are used to encode a high-speedelectronic signal on constant-wave laser light, thus converting itto a stream of light pulses. According to our previous discussion,SOI technology provides very high light confinement, allowingsmall bending radii (few micrometers) and ultra-dense integration[19]. SOI-based microring devices are able to implement electro-optic modulators, passive filters, and all-optical switches [19].For example, silicon microring resonator-based modulators, fabri-cated by a group of researchers at Cornell University, exhibit goodoptical properties accompanied by unprecedented low power con-sumption, small foot-print, and modulation rates up to 12.5 Gbps[27]. Hence, we propose to implement E/O convertors and opticalmodulators utilizing active microring resonator devices. Moreover,optical multiplexers of the transmitter unit are simply imple-mented as Y-couplers of optical waveguides. On the other hand,considering the receiver unit of OTER, optical demodulatorspreceded by O/E convertors can be simply implemented as wave-length-tuned photo-detectors, while band-pass filters (demulti-plexer) are built upon passive microring switches. Therefore,taking advantages of low-power compact-structure microring res-onators, the proposed architecture for hierarchically transmittingdata packets through H2NoC leads to tolerable area overhead andpower dissipation.

Regarding network scalability, number of optical waveguides inthe bus-based optical architectures [14,22,26] linearly increaseswith the network size which limits scalability of the network.However, H2NoC takes advantages of WDM technique and multi-plexes several optical flows, targeted to different destinationnodes, on a single waveguide which reduces area overhead of theglobal optical network, and hence, the hierarchical network.

4.5. Data transmission through H2NoC

While electronic NoCs can simply provide header processingand data buffering functions, no buffer or all-optical processingunit can be implemented in a chip-scale area [25]. Based on theselimitations, CONoC has been proposed to transmit data streamsthrough optical NoC, while control packets are routed and pro-cessed electrically. Since H2NoC is built upon CONoC, optical datatransmission through its global network is preceded by path reser-vation phase. However, resource reservation is not required for lo-cal electrical networks. As follows, we will describe datatransmission phases in H2NoC in more details.

4.5.1. Optical path reservationFor optical data transmission through global network, a pho-

tonic path should be setup between optical routers associated tosource and destination local networks.

H2NoC prevents blocking scenarios at intermediate routers.However, although H2NoC facilitates simultaneous optical data

reception from different transmitters at each optical router, num-ber of dedicated wavelength sets to each processing core limitsconcurrent optical data receptions by each local IP. Consequently,path reservation phase is preceded by inspecting whether the cor-responding destination IP has any unoccupied wavelength set. Forthis purpose, a destination-checking packet is routed from thesource to the destination optical router, without reserving opticalresources at the intermediate routers. Specifically, control packetsare transmitted through an electrical Spidergon topology whichinterconnects optical routers associated to various local networks.Similar to optical data streams, destination-checking packets arerouted according to Across-first routing algorithm and are ejectedby the optical router associated to the destination local network. Inthe case that all of the ejection wavelength channels dedicated tothe destination IP are occupied by other optical messages, destina-tion-checking packet is sent back to the transmitter IP and informsit to attempt request transmission again [15]. Otherwise, path-res-ervation packet is transmitted from the destination processing coreto the source, and on its optical path through the global networkreserves proper optical resources.

Since switching of OAD elements increases optical loss, extramicroring resonators of NCFOR are only switched on if path con-tention occurs, while in other cases, they allow light to passthrough without being affected. This means that while reservingoptical resources, optical routing through primary microring reso-nators of the BPSB is preferred. Based on this approach, selectedmicroring resonator(s) to be switched on in each of the blockingscenarios (from Table 1) are listed in Table 4. In each scenario,Sw1 and Sw2 are switched on to remove the path blocking, androute optical data streams through Path1 and Path2, respectively.While Table 4 depicts microring reservations in the case of conges-tion, path reservation is straightforward in the absence of pathcontention. In some situations, there exists more than one solutionfor simultaneous routing of contending paths. For example,concurrent routing of T4 ? R3 and T1 ? R4 for Ni N/2 6 i < N canbe accomplished by switching on either pair of S7–S2 or S1–S9

microring resonators. It is worth noting that due to multiple datareception and transmission capability at optical routers and multi-plexing capability at intermediate routers, each of the contendingpaths shown by input-port ? output-port can be composed ofmultiple optical data streams.

Regarding preference of optical routing through primary reso-nators, an optical path may be reconfigured during data transmis-sion to retain the contention-free functionality of the network. Thisreconfiguration is inevitable to prevent path blocking while an-other optical transmission request tries to reserve the path. As anexample, consider the case that an optical path from South to Eastis guided in a photonic router, and so, S1 is switched on. Now as-sume that a new path-reservation packet intends to reserve anoptical path from West to East in the same router which needs S1

14 S. Koohi, S. Hessabi / Journal of Systems Architecture 57 (2011) 4–23

to be switched off. The proposed reservation strategy allows thenew reservation request to reconfigure the existing path(s) suchthat all optical data streams can be routed through the networkwithout any contention. In this case, S1 is switched off and opticaldata streams on South port are routed through S7 to East port in thephotonic router.

4.5.1.1. Injection scenario. Unlike CONoC, optical routers in H2NoCshould be able to simultaneously inject multiple optical datastreams with different destination IPs to the global optical net-work. On the other hand, considering NCFOR architecture depictedin Fig. 3, it is easy to show that injection to different output ports inan optical router causes path blocking for optical injections. A sim-ple solution to overcome contention problems in these cases is try-ing transmission again. Since path-setup latency dominates totaldata transmission latency, this obvious solution leads to perfor-mance degradation of the network. On the other hand, becauseof low propagation delay through optical waveguides, total delayis slightly affected with increasing number of hops passed by theoptical data streams. Consequently, non-minimal path for a givensource–destination pair, unlike data transmission postponement,does not lead to significant increment of data transmission latency.

According to Across-first routing algorithm, it is easy to showthat five different paths exist for injecting optical data streams tothe network. Consider the case that an optical router is injectingone (multiple) optical data stream(s). Now assume that the routerintends to inject a new data stream to the network which is orig-inated from different local IPs associated to this optical router. IfAcross-first algorithm tries to inject the new optical streamthrough a different path compared to currently injecting streams,contention problem arises between the new requested injectionpath and existing ones in the optical router. In this case, H2NoCchooses a new path properly to inject all optical data streams(including existing data streams and the new one) to the network.The new path is chosen among all possible paths such that totalnumber of hops passed by all data streams is minimized. Depend-ing on the chosen path, two possible cases exist; i.e. new injectionpath (a) is same as the existing one(s), and (b) differs from existingone(s). In the former situation, the new optical data stream is in-jected through existing injection path(s), while in the later case,H2NoC’s reservation strategy allows the new reservation requestto reconfigure the existing injection path(s) such that all opticaldata streams are injected to the network through the chosen path.Although non-minimal optical path may be taken through thewaveguide, the proposed strategy prevents electrical requestretransmission in both cases.

In the case of fully occupied wavelength channels in a local IP,H2NoC utilizes the request retransmission scheme proposed in[15] to postpone optical data transmission. In this situation, trans-mitter waits for a period of waiting interval and then reattempts toreserve the optical path. The duration of waiting interval signifi-cantly influences the path-setup latency and also total perfor-mance of the H2NoC. In the global ONoC architecture, waitinginterval is estimated in each router using the number of unsuccess-ful path-reservation requests for already received data from the IPblock and also average and maximum numbers of unsuccessful re-quest transmissions for different optical data streams in this routertill now.

4.5.2. Data transmission through local and global networksData transmissions between neighbor nodes in a local network

are accomplished through electrical links. Since resource reserva-tion is not required in this case, simultaneous packets can be re-ceived by a destination IP without concerning dedicatedwavelength sets to each processing cores. Moreover, receiving

multiple packets sent from neighbor nodes does not prohibit datareception from OTER through local network.

Now assume that source and destination IPs belong to differentlocal networks. In this case, once a path-reservation packet com-pletes its journey from destination to source optical router, a chainof silicon microring resonator switches is ready to route the opticaldata stream from source to destination local networks. Hence, theoptical message can be transmitted through the optical wave-guides and switches without buffering.

4.5.3. Optical path-teardownComplete data reception at the destination router can be

checked by examining the message size or an end tag encapsulatedin the transmitted optical data. After completion of optical datatransmission, a path-teardown packet is sent by the receiver OTERto the transmitter optical router. This packet frees up the path re-sources to be used by other optical messages.

5. Simulation environment

For functional validation and design exploration of the H2NoC,we have developed a behavioral event-driven network simulatorfor the proposed hierarchical architecture based on OMNeT++ sim-ulation framework [21]. While Section 7 investigates the effect ofnetwork size on the efficiency of H2NoC, experimental results re-ported in Section 6 are obtained for a 128-node topology whichconsists of two levels of hierarchy; i.e. 16-node Spidergon as theglobal optical network, and 8-node Tori as the local electrical net-works. Our analysis is based on a future 22 nm CMOS process tech-nology. The chip size is assumed to be 2 mm along its edge [13], soit is easy to show that each core is 1.8 mm � 1.8 mm in size. Con-sidering 16-node global network, optical and electrical links be-tween adjacent optical routers have an approximate length of5 mm. On the other hand, based on the 8-node Torus topology,adjacent electronic routers in a local network are spaced by1.8 mm. According to delay parameters from Table 2, optical andelectrical delays between adjacent routers in the global networkare 77 ps and 655 ps, respectively. Moreover, electrical delay be-tween adjacent electronic routers in the local networks approxi-mately equals to 236 ps. Since in each electronic router, local linkto the IP block is shorter than the others, we assume length of0.5 mm for local electrical links, which leads to delay value of65.5 ps for electrical data transmission between each electronicrouter and its associated IP block. For an error-free transmissionthrough an on-chip network, a bit error rate (BER) of 10�15 is as-sumed on both optical and electrical links [20].

To evaluate the efficiency of the proposed hierarchical architec-ture, it can be simulated under either of synthetic traffic model orreal application traffic pattern. While the later approach leads tomore accurate results, it limits final conclusion to specific sets oftraffic flows and prevents us to make a general conclusion. More-over, determining message flow in an on-chip network enforcesus to implement, or at least simulate the whole application onthe MPSoC. On the other hand, after partitioning, mapping, andprocess allocation stages in an MPSoC, we can specify nearly accu-rate traffic load between different nodes which can be summarizedin terms of some popular synthetic traffic models. For example, ifwe have a memory node which acts as a global memory for allother nodes, we can predict that the traffic behavior is a certainpercentage of the hotspot traffic model. Moreover, behavioral sim-ulation of on-chip infrastructures under synthetic traffic modelsenables us to generalize the simulation results to wide range of realapplications which obey similar traffic behavior. In all, the promi-nences of on-chip traffic modeling motivate us to specify the sim-ulated traffic pattern in terms of some predefined synthetic traffic

S. Koohi, S. Hessabi / Journal of Systems Architecture 57 (2011) 4–23 15

models in this case study, while real application traffic patternswill be addressed in future works.

Traffic pattern in an on-chip network is modeled by packet sizedistribution, packet injection process, and distribution of the pack-et destination. In our developed H2NoC simulator, packet sizes aredetermined uniformly in a predefined range. For further exploringthe efficiency of the proposed architecture, we will investigate theeffect of packet size distribution on H2NoC in the next section. Inall other analysis, unless stated, packet sizes are assumed to beuniformly distributed in a predefined range of [1 KB,4 KB]. Packetinjection process impressively impacts the load offered to the net-work. We model inter-message gap as a Poisson random variablewith the parameter of l. Large values of this parameter imposelight traffic to the network. Finally, traffic pattern in the networkhighly depends on the distribution of the packet destination. Weanalyzed H2NoC under two different traffic patterns including uni-form and local distributions. In the first case, each processing coresends its messages to any other core in the hierarchical networkwith equal probability, while in the later case, source and destina-tion IPs of the transmitted packet belong to the same electrical net-work with a predefined probability.

As follows, we briefly discuss wavelength selection approach inH2NoC. As discussed in Section 4, number of wavelength sets re-quired by H2NoC equal to. Consequently, assuming 4 Tbps peakbandwidth per optical waveguide in 128-node H2NoC, each opticaldata stream should be modulated on four distinct wavelengths atthe rate of 40 Gbps. For multi-wavelength modulation it is worthnoting that optical inter-channel crosstalk is negligible with achannel spacing of 1.3 nm [29].

6. Experimental results and analysis

In this section, we report a series of simulation-based experi-mental results and investigate efficiency of the proposed architec-ture. We also study the impact of packet size parameter on thesystem-level metrics of H2NoC. These experimental results, unlessstated, assume uniform traffic pattern.

Exploring various advantages of the proposed architecture, inpreceding sections, we compared H2NoC with previously proposed

0

2

4

6

8

0 10 20 30 40 50Poisson Parameter (us)

Fig. 8. Maximum number of concurrent flows ejected to an OTER.

4

244464

84104

0 10 20 30 40 50Poisson Parameter (us)

ba CONoCHHNoC

Fig. 9. (a) Maximum number of flows passing throug

optical on-chip infrastructures in a qualitative manner. Accordingto incomplete design details, quantitative comparison with thesearchitectures cannot be carried out properly. Hence, similar to pre-vious studies, such as Columbia’s architecture [24], Cornell’s archi-tecture [14], Corona [26], and Phastlane [7], we only involvetraditional NoC and our previously proposed ONoC [15] for quanti-tative comparisons. Specifically, we compare H2NoC with CONoCand ENoC for the same number of processing cores at future22 nm technology node.

Considering 128-node Spidergon network, adjacent electronicrouters in ENoC are spaced by 1.8 mm, which leads to 236 ps elec-trical delay between adjacent routers. Similarly, optical and electri-cal links between adjacent optical routers in CONoC have anapproximate length of 1.8 mm. Consequently, optical and electricaldelays between adjacent routers in the non-hierarchical opticalnetwork are 28 ps and 236 ps, respectively. Other parameters forCONoC and ENoC are reported in Section 3. Physical parametersfor global optical and local electrical networks of H2NoC are anal-ogous to those of CONoC and ENoC, respectively.

6.1. Architecture efficiency

As discussed before, global optical network in H2NoC overcomesthe limitations of CONoC such that multiple data streams can bereceived by an optical router and sent to the corresponding OTERat the same time. Emphasizing this advantage, Fig. 8 depicts totalnumber of concurrent optical data streams received by an OTERfor varying values of l. This figure represents maximum valuesover the simulation time for all topology nodes in the opticalnetwork.

In H2NoC, optical waveguides and switching elements are solelyused when source and destination processing cores for a generatedpacket belong to different local networks, while in CONoC all pack-ets are transmitted through optical network. Hence, the amount oftraffic passing through photonic components is reduced in H2NoCcompared to CONoC. Fig. 9a compares total number of multiplexedoptical flows passing through all resonator switches of NCFOR inH2NoC with that of CONoC for varying values of l. Although eachoptical router in H2NoC is responsible for routing optical packetsto (from) eight processing cores, this figures confirms that totaloptical traffic passing through optical routers in H2NoC is less thanCONoC. Analogous to Fig. 8, this figure represents maximum valuesover the simulation time for all topology nodes in the opticalnetwork.

Activating microring resonator switches imposes drop-portinsertion loss (IL). Hence, keeping small number of activated mic-roring resonators while increasing maximum degree of multiplexingshould be considered. Fig. 9b compares average number of micror-ing switches in the ON state for all optical routers of H2NoC withthat of CONoC for varying values of l. As depicted in this figure,compared to CONoC the proposed hierarchical architecture re-duces number of activated switches.

00.10.20.30.40.50.6

0 10 20 30 40 50Poisson Parameter (us)

CONoCHHNOC

h MRs and (b) average number of activated MRs.

16 S. Koohi, S. Hessabi / Journal of Systems Architecture 57 (2011) 4–23

6.2. Delay analysis

In the case that source and destination processing cores belongto the same local network in H2NoC, total latency equals to electri-cal data transmission delay through the local network (TLocal). Inother case, along with electrical delays through source and destina-tion local networks, path-setup latency, photonic data transmis-sion delay through global optical network, and path-teardownlatency should also be included. Depending on the traffic pattern,average latency for transmitting data packets through the hierar-chical network is calculated as the weighted average of thesetwo cases. Number of packets sent by each local IP to the neighborIPs as the fraction of total number of transmitted packets deter-mines the relative importance of each quantity on the average.

Based on delay parameters from Table 2, Fig. 10 depicts exper-imental results for the data streams transmitted between IPi lo-cated in LNm, and IPj located in LNn (m – n); including averageoptical and electrical latencies through global and local networks,respectively, and total transmission latency for varying values ofl. This figure also shows total latency in the case that data packetsare only transmitted through the corresponding local network.Based on these experimental results, weighted average represent-ing total latency of the hierarchical network is calculated and de-picted in Fig. 10. As shown in this figure, for low trafficconditions, total latency is dominated by data transmissionthrough optical waveguides. However, for high traffic imposed tothe network, setup-latency dominates total data transmission la-tency and degrades total performance of the H2NoC. Moreover,electrical latency through local networks is in order of (or even lessthan) optical latency through global network (whether destinationIP belongs to the same local network or not). Consequently, effi-ciency of the electrical network for local transmission in H2NoCis confirmed by this figure.

Duration of waiting interval significantly impacts the path-set-up latency and total performance of the H2NoC. Fig. 11a compares

5.E+4

5.E+5

5.E+6

0 10 20 30 40 50Poisson Parameter (us)

Global network_Optical DelayGlobal network_Local DelayGlobal network_Total DelayLocal network_Total DelayWeighted average delay

Fig. 10. Delay values in H2NoC (ps).

0.0

2.0

4.0

6.0

8.0

10.0

0 10 20 30 40 50Poisson Parameter (us)

aCONoCHHNoC

Fig. 11. (a) Waiting time % and (b) re

H2NoC with CONoC in terms of average value of waiting intervalsas the percentage of total latency for varying values of l. Fig. 11bcompares these two architectures in terms of average number ofpostponed optical data transmissions as the fraction of total num-ber of transmitted messages. Hence, Fig. 11b depicts requestretransmission probability. As shown by these figures, for low traf-fics, request retransmission scheme leads to insignificant perfor-mance degradation in both architectures. This impact remainstolerable for heavy traffics. Moreover, waiting interval and retrans-mission probability are reduced in H2NoC compared to CONoC.These advantages stem from the fact that the amount of opticaltraffic passing through photonic components is reduced in H2NoCcompared to CONoC (as depicted in Fig. 9a).

Finally, Fig. 12a and b compare average data transmission delaythrough H2NoC with that of Spidergon ENoC and CONoC, respec-tively, for varying values of l. As depicted in Fig. 12a, traditionalNoC compared to H2NoC, leads to about four times bigger averagelatency for low and moderate traffics and approximately equal la-tency for high traffics, which emphasizes the inefficiency of theelectrical interconnects for future on-chip communications.Fig. 12b depicts total data transmission latency of H2NoC as a frac-tion of data transmission latency of CONoC. As shown in this figure,CONoC slightly outperforms H2NoC, because the later architecturereplaces optical waveguides with electrical links in the local net-works. However, due to small network diameter in 3 � 3 Torustopology and short electrical links, performance degradation inH2NoC compared to CONoC is not considerable.

6.3. Power analysis

In the case that source and destination processing cores belongto the same local network in H2NoC, total power consumption fortransmitting electrical data packets through the corresponding lo-cal network is computed from Eq. (9) considering physical param-eters of the local electrical networks. In other case, in addition tolocal power consumption (PLocal), power consumption for transmit-ting a data message through 16-node CONoC (PGlobal) should also beincluded from Eq. (8) considering physical parameters of the globaloptical network. Similar to total delay computation, average powerconsumption for transmitting data packets through the hierarchi-cal network is calculated as the weighted average of these twocases.

Based on power parameters from Table 3, in H2NoC simulator,values of PLocal, POptical, PElectrical, and PGlobal are calculated fromEqs. (9), (6), (7), and (8), respectively, for each data stream trans-mitted between IPi located in LNm, and IPj located in LNn if m – n.However, in the case of, we should only consider PLocal. Fig. 13 de-picts average values of PLocal, PGlobal, and PNon_Neighbor = (PGlobal +PLocal) or m – n, and PLocal for varying values of l. Average powerconsumption for data transmission through hierarchical architec-ture is calculated as the weighted average of PNon_Neighbor for

0

0.05

0.1

0.15

0.2

0.25

0 10 20 30 40 50Poisson Parameter (us)

b CONoCHHNoC

quest retransmission probability.

1.E+5

1.E+6

1.E+7

0 10 20 30 40 50

Del

ay (p

s)

Poisson Parameter (us)

ENoCHHNoC

1.000

1.002

1.004

1.006

1.008

1.010

0 10 20 30 40 50Poisson Parameter (us)

ba

Fig. 12. (a) Average delay and (b) relative average delay ðT2HNoC=TCONoCÞ.

200400600800

1,0001,2001,400

0 10 20 30 40 50

Pow

er (m

W)

Poisson Parameter (us)

Global network_Local Power Global network_Global powerGlobal network_Total power Local network_Total powerWeighted average power

Fig. 13. Power values in H2NoC.

S. Koohi, S. Hessabi / Journal of Systems Architecture 57 (2011) 4–23 17

m – n, and PLocal for, which is depicted in Fig. 13. As shown in thisfigure, in the case of m – n, data transmission through source anddestination local networks dissipates nearly half of the total powerconsumed in H2NoC. On the other hand, for, power consumption isconsiderably reduced compared to PNon_Neighbor which emphasizesthe efficiency of the proposed hierarchical architecture. As de-picted in Fig. 13, weighted average of two different cases, i.e. andm – n, is considerably affected by locally transmitted packets,and hence, is reduced compared to PNon_Neighbor.

Fig. 14 compares average power consumption for data trans-mission through H2NoC with that of CONoC and Spidergon ENoCfor varying values of l. As depicted in this figure, traditional NoCcompared to H2NoC, leads to approximately eight times largeraverage power consumption. Considering aggregated power con-sumption of all packets in the network, Spidergon ENoC cannotmeet the package power constraints predicted by ITRS [13]. More-over, experimental results depicted in Fig. 14 confirm that H2NoCcompared to CONoC, reduces the average power consumption fortransmitting data packets between processing cores. This advan-tage stems from the fact that H2NoC eliminates power overheadof E/O and O/E conversions for data transmissions between neigh-

900

9,000

0 10 20 30 40 50

Pow

er (m

W)

Poisson Parameter (us)

HHNoCCONoCENoC

Fig. 14. Power consumption in H2NoC, CONoC, and ENoC.

bor IPs. Amount of power reduction strongly depends on the trafficpattern and number of locally transmitted packets as the fractionof total number of packets, which is discussed in more details later.

6.4. Energy analysis

Considering electrical data transmission through local networksin H2NoC, we have:

ELocal ¼ TLocal � PLocal ð12Þ

For transmitting optical messages through global network inH2NoC, similar to CONoC, electrical energy is consumed by pathreservation and teardown packets while optical energy is dissi-pated by optical data routing through the network. Sum of theseenergies for each packet transmitted through the global networkare calculated at the destination node as [15]:

EGlobal ¼ EOptical þ EElectrical

¼ TOptical � POptical þ ðTElectrical �WaitingTimeÞ � PElectrical ð13Þ

where, WaitingTime is the sum of waiting intervals computed by re-quest retransmission scheme for each optical packet. Total energyspent in the case of different source and destination local networksequals to ENon_Neighbor = EGlobal + ELocal. Average energy spent for datatransmission through hierarchical architecture is calculated as theweighted average of ENon_Neighbor for m – n, and ELocal for m = n.Fig. 15a and b compare average energy spent for data transmissionthrough H2NoC with that of Spidergon ENoC and CONoC, respec-tively, for varying values of l. As shown in Fig. 15a, average energyspent by data transmission through traditional NoC is approxi-mately 32 times larger than that of H2NoC. Hence, we can concludethat energy consumed by traditional NoCs cannot be afforded in fu-ture high performance CMPs. On the other hand, despite slight de-lay increment, Fig. 15b shows that energy spent for datatransmission through H2NoC is reduced compared to CONoC. Thisreduction arises from power enhancement in the proposed hierar-chical network.

6.5. Effect of traffic pattern on system-level metrics

In this section, we study the effect of on-chip traffic pattern onthe efficiency of H2NoC. As reported in Sections 6.3 and 6.4, localpacket transmission between neighbor nodes reduces power con-sumption and energy dissipation in H2NoC compared to non-hier-archical optical on-chip network (CONoC). Fig. 16 shows averagepower consumption, energy dissipation, and data transmission de-lay of H2NoC as the percentages of corresponding values of CONoC,for varying values of local traffic percentage and a moderate traffic(l = 1 us). As shown in this figure, by increasing local traffic per-centage (l), power and energy metrics in H2NOC are improvedwhile delay slightly increases. But the prominence of H2NoC overCONoC diminishes for large values of l. This drawback arises from

0.0

0.1

1.0

10.0

0 10 20 30 40 50

Ener

gy (u

J)

Poisson Parameter (us)

ENoC0.070.080.090.100.110.120.13

0 10 20 30 40 50

Ener

gy (u

J)

Poisson Parameter (us)

HHNoCCONoC

ba

HHNoC

Fig. 15. (a) Average energy in H2NoC and ENoC and (b) average energy in H2NoC relative to CONoC.

10

100

1,000

0 10 20 30 40 50 60 70 80 90 100Local Traffic Percentage

DelayPowerEnergy

Fig. 16. Comparing H2NoC with CONoC in terms of l.

18 S. Koohi, S. Hessabi / Journal of Systems Architecture 57 (2011) 4–23

incapability of local electrical network for handling huge amountof traffic. For large values of l, local network may operate in satu-ration region which leads to power and energy increment. As de-picted in Fig. 16, there exists an optimum (opt) value of l forwhich EH2NoC=ECONoC becomes minimum. Moreover, there exists athreshold (thr) value of l such that EH2NoC=ECONoC < 1 for all valuesof l smaller than thr. From Fig. 16, opt and thr values equal to 0.4and 0.8, respectively.

In summary, we conclude that H2NoC architecture outperformsCONoC when on-chip traffic pattern lies in the range of uniform tonearly heavy local traffics. However, H2NoC loses its efficiency forintensive local traffic due to the drawbacks of local electricalnetworks.

6.6. Effect of packet size distribution on system-level metrics

As discussed in previous studies [24,7], for small block sizes in aphotonic NoC, the overall latency is dominated by the path-setupoverhead which is greater than the serialization latency, whilefor large blocks, the increased serialization and contention laten-cies overshadow the gain in bandwidth. CONoC reduces path-setuplatency compared to the proposed architectures in [24,7]. Hence, itis expected that small packets could be optically routed withoutconsiderable efficiency degradation. For evaluating this statement,

0

2

4

6

8

250 2,500 25,000 250,000

Packet Size (B)

ba CONoCHHNoC

Fig. 17. (a) Waiting time % and (b) re

the effects of packet size distribution on the efficiency of CONoCand the hierarchical network built upon it, H2NoC, are addressedin this section. For this purpose, we analyze power consumption,data transmission delay, and energy dissipation of CONoC andH2NoC, and compare them with those of ENoC in terms of packetsize parameter. We also investigate the efficiency of the requestretransmission scheme for varying values of packet size.

To evaluate the impact of packet length on the relative advanta-ges of the proposed architecture over traditional NoC, we assumewide range of variation for packet size parameter, and divide itto several disjoint ranges. For each of these predefined ranges, sim-ulation-based experiments are performed while assuming packetsizes to be uniformly distributed in the corresponding range. As acase study, we assume wide range of [64 KB,256 KB] for packetsize parameter and divide it to six disjoint smaller range of [22iB,22i+2B], i 2 [3,8].

Fig. 17a shows average duration of waiting intervals as the per-centage of total data transmission latency in CONoC and H2NoC forvarying ranges of packet size and a moderate offered load(l = 1 us). Moreover, Fig. 17b depicts request retransmission prob-ability in these architectures in terms of packet size parameter.Without loss of generality, the predefined range of is representedby its upper bound on the horizontal axis. As shown in these fig-ures, for small packet sizes, request retransmission probability inCONoC and H2NoC architectures is inconsiderable due to smalldata transmission latency. On the other hand, although increasingpacket size leads to performance degradation, this impact remainstolerable for large optical packets. On the other hand, Fig. 17a and bverify the prominence of the proposed hierarchical architectureover non-hierarchical one which is a result of local data transmis-sion between neighbor nodes.

Considering delay and power parameters reported in the previ-ous section, Figs. 18–20 compare average delay, power consump-tion, and energy dissipation for data transmission through theproposed architecture, respectively, with those of CONoC and ENoCfor varying ranges of packet size parameter and moderate offeredload (l = 1 us). As shown in these figures, CONoC and H2NoC retaintheir advantages over traditional NoC for small packet sizes, and

0.00

0.03

0.06

0.09

0.12

0.15

0 50 100 150 200 250Packet Size (KB)

CONoCHHNoC

quest retransmission probability.

3.E+3

3.E+4

3.E+5

3.E+6

3.E+7

3.E+8

0 50 100 150 200 250

Del

ay (p

s)

Packet Size (KB)

ENoCCONoC_Total DelayHHNoC

Fig. 18. Data transmission delay.

2628303234363840

0 50 100 150 200 250

Pow

er (d

Bm)

Packet Size (KB)

ENoCCONoCHHNoC

Fig. 19. Power consumption for data transmission.

S. Koohi, S. Hessabi / Journal of Systems Architecture 57 (2011) 4–23 19

this prominence substantially increases for large packets transmit-ted through the networks.

As depicted in Fig. 18, the proposed hierarchical architectureslightly increases data transmission delay, compared to CONoC,in the case of large data packets transmitted through the network.However, this performance degradation is inconsiderable with re-spect to the speed up achieved over the traditional NoC.

0.002

0.020

0.200

2.000

20.000

200.000

2,000.000

0 50 100

Ener

gy (u

J)

Packet Siz

ENoCCONoC_Total EneHHNoC_Optical E

Fig. 20. Energy spent in

(a) (b)

7

10

13

16

250 2,500 25,000 250,000Packet Size (B)

CONoCHHNoC

5.0

6.0

7.0

8.0

9.0

10.0

250 2,500 2Packet Size

Fig. 21. Comparison between ENoC and optical architecture

Fig. 19 compares power consumption for data transmissionthrough H2NoC with that of CONoC and ENoC for varying valuesof packet size. As shown in this figure, while increasing packetlength has no considerable impact on power consumption in CON-oC and H2NoC, power dissipated in traditional NoC increases withpacket size increment as a result of power consumed for data buf-fering in the case of high traffic load offered by large data packets.On the other hand, as depicted in Fig. 19, the proposed hierarchicalarchitecture reduces total power consumption compared to CON-oC, especially in the case of small or moderate packet sizes.

Finally, Fig. 20 confirms that energy spent in optical devices fordata transmitting through CONoC and H2NoC dominates total en-ergy dissipated in the network for various packet lengths. More-over, as depicted in this figure, energy dissipation in ENoCcompared to CONoC and H2NoC significantly increases with packetsize increment.

Emphasizing the advantages of H2NoC and CONoC over the tra-ditional NoC in terms of packet length variation, Fig. 21a–c depictratios of average data transmission delay, power consumption, andenergy dissipation of ENoC, respectively, to the corresponding val-ues of CONoC and H2NoC for varying ranges of packet length and amoderate offered load (l = 1 us). As shown in these figures,increasing packet size parameter intensifies the prominence ofoptical NoCs over traditional NoC. Specifically, while routing smallpackets leads to approximately 50 times larger energy dissipationin ENoC in the case of l = 1 us, for packet lengths in the range of[64 KB,256 KB] energy spent by data transmission through CONoCand H2NoC is about two orders of magnitude smaller than that oftraditional NoC. As shown in Fig. 21a, the slope of delay incrementin ENoC considerably increases for large packets. This drawback,which is a result of electrical packet buffering in the case of heavytraffic imposed by large packets, leads to considerable energyincrement for data transmission through ENoC.

Moreover, Fig. 21a–c compare the proposed hierarchical archi-tecture with CONoC for various packet lengths. As depicted inFig. 21a, data transmission delay through H2NoC slightly increases

150 200 250e (KB)

CONoC_Optical Energyrgy HHNoC_Total Energynergy

data transmission.

(c)

5,000 250,000 (B)

CONoCHHNoC

30

60

90

120

150

250 2,500 25,000 250,000Packet Size (B)

CONoCHHNoC

s. (a) Delay ratio, (b) power ratio and (c) energy ratio.

20 S. Koohi, S. Hessabi / Journal of Systems Architecture 57 (2011) 4–23

for large data packets transmitted through the network due to thelimited data transmission capacity of the local electrical networks.On the other hand, the proposed hierarchical network considerablyreduces power consumption, especially for small or moderate datapackets, compared to the CONoC architecture. Consequently, de-spite delay increment in the H2NoC architecture, it improves totalenergy dissipated for on-chip data communication.

In summary, we conclude that optically resolving packet con-gestions, CONoC and H2NoC retain their advantages over ENoCfor small packets. Moreover, while traditional NoC loses its effi-ciency in the case of large packets, values of power consumptionin the optical NoC and the hierarchical one slightly change withpacket length. On the other hand, the slopes of delay incrementin terms of packet length in CONoC and H2NoC are much smallerthan that of the ENoC. Moreover, as depicted in these figures, theprominence of the proposed hierarchical architecture over CONoC,in terms of power consumption and energy dissipation, substan-tially increases for small packets transmitted through the network.

7. CONoC and H2NoC in large-scale MPSoCs

In this section, we address the scalability of the proposed archi-tecture to investigate its feasibility for on-chip data transmission infuture high performance MPSoCs. As discussed before, H2NoCarchitecture has been proposed to overcome the limitations ofCONoC in large scale MPSoCs. Hence, the effects of network sizeon the efficiency of CONoC and H2NoC are evaluated in this section.Analyzing the simulation-based experimental results, we extractlower bound on the network size parameter above which H2NoCoutperforms CONoC. For this purpose, we analyze power consump-tion, data transmission delay, and energy dissipation of thesearchitectures compared to those of ENoC for various network sizesin the range of [32,192] and uniformly distributed packet length inthe range of [1 KB,4KB]. We also investigate the efficiency of therequest retransmission scheme for varying numbers of processingcore.

7.1. Request retransmission efficiency

Fig. 22a and b shows average duration of waiting intervals (asthe percentage of total data transmission latency) and requestretransmission probability, respectively, for various network sizesand a moderate offered load (l = 1 us) transmitted through CONoCand H2NoC. It can be easily shown that the probability of simulta-neous optical data transmission from two (or more) different send-ers to a specific destination node is inversely proportional with thenumber of processing cores. This statement is verified by Fig. 22aand b. Moreover, as shown in these figures, the proposed hierarchi-cal architecture outperforms CONoC in terms of request retrans-mission probability and waiting interval for various network sizes.

0

2

4

6

8

32 64 96 128 160 192Network Size

CONoCHHNoC

a

Fig. 22. (a) Waiting time % and (b) re

7.2. Delay analysis

As shown in the previous section, for low and moderate trafficconditions, total latency is dominated by photonic data transmis-sion delay through optical waveguides which depends on the pack-et size and waveguide length. Increasing the number of processingcores in an optically interconnected SoC impacts average numberof hops passed by each optical router, while slightly alters averageoptical distance between source and destination optical routers(assuming constant chip size). Hence, we can conclude that pho-tonic data transmission delay is independent of the network size.Moreover, due to the insignificant role of path-setup delay formoderate traffic and waiting time degradation (Fig. 22a), total de-lay slightly decreases with increasing network size parameter inCONoC and H2NoC architectures. This behavior is verified by exper-imental results shown in Fig. 23. Moreover, this figure also com-pares average data transmission delay of CONoC and H2NoC withthat of Spidergon ENoC for varying number of processing cores.Unlike the former architectures, average latency in traditionalNoC considerably increases with network size increment due tothe electrical delays of router processing and data buffering. Conse-quently, the prominence of CONoC and H2NoC over ENoC (in termsof data transmission delay) substantially increases for large scalenetworks.

7.3. Power analysis

Fig. 24 compare average power consumption for data transmis-sion through CONoC and H2NoC, with that of ENoC for varying net-work sizes and moderate offered load (l = 1 us). Based on thesimilar discussion made for optical delay, optical power consump-tion for data transmission through CONoC and H2NoC architecturesis independent of the network size. On the other hand, electricalpower increases with the number of processing cores accordingto the power consumed for processing and transmitting path-setupand teardown packets. Hence, total power consumption for datatransmission through CONoC and H2NoC increases with networksize increment due to electrical power growth.

Fig. 24 depicts that the number of electrical routers in a tradi-tional NoC directly impacts total power consumption for datatransmission. Although average length of electrical wire traversedby each electrical packet slightly changes with the number of pro-cessing cores, power consumed in flit processing operations pro-portionally increases with the number of processing cores whichleads to significant relative power increment of the traditionalNoC compared to CONoC and H2NoC architectures.

Moreover, as illustrated in Fig. 24, despite prominence ofCONoC over H2NoC for small number of processing cores, powerconsumption of the hierarchical architecture reduces comparedto that of CONoC for moderate and large numbers of processingcores. This observation stems from the fact that in the case of small

0.00

0.02

0.04

0.06

0.08

0.10

32 64 96 128 160 192Network Size

CONoCHHNoC

b

quest retransmission probability.

0.4

0.8

1.6

3.2

6.4

12.8

Pow

er (W

)

Network Size

CONoC HHNoC ENoC

32 48 64 80 96 112 128 144 160 176 192

Fig. 24. Power consumption for data transmission.

0.060

0.600

6.000

32 48 64 80 96 112 128 144 160 176 192

Ener

gy (u

J)

Network Size

CONoC HHNoC ENoC

Fig. 25. Energy spent in data transmission.

1.E+05

1.E+06

32 48 64 80 96 112 128 144 160 176 192

Del

ay (p

s)

Network Size

HHNoCENoCCONoC

Fig. 23. Data transmission delay.

S. Koohi, S. Hessabi / Journal of Systems Architecture 57 (2011) 4–23 21

network sizes, most of the packets are locally transmitted throughelectrical networks in the proposed hierarchical architecture, whilein the large scale MPSoCs, the proposed hierarchical architectureeliminates power overhead of opto-electrical conversions forshort-distance on-chip data transmission.

(a) (b)

1

2

3

4

5

6

32 64 96 128 160 192Network Size

CONoCHHNoC

6.0

6.5

7.0

7.5

8.0

8.5

9.0

32 64 96Network

CONoCHHNoC

Fig. 26. Comparison between ENoC and optical architecture

7.4. Energy analysis

Fig. 25 compares average energy dissipation for data transmis-sion through CONoC and H2NoC with that of ENoC for various net-work sizes and a moderate offered load (l = 1 us). As depicted inthis figure, total energy spent for data transmission through CONoCand H2NoC increases with the network size increment due to theelectrical power dissipated in large scale networks, shown inFig. 24. As depicted in Fig. 25, energy dissipation for data transmis-sion through the proposed hierarchical architecture is reducedcompared to CONoC in the case of moderate or large scale net-works. On the other hand, although energy dissipated in H2NoCis slightly larger than that of the proposed non-hierarchical archi-tecture for small numbers of processing cores, it slowly changeswith the number of processing cores.

Fig. 25 also compares average energy dissipation for data trans-mission through CONoC and H2NoC with that of ENoC for varyingnetwork sizes. According to the significant impact of network sizeon the total data transmission delay in ENoC, shown in Fig. 23, theexperimental results depicted in Fig. 25 verify that the energy dis-sipation in ENoC considerably growths with the number of pro-cessing cores. Moreover, as shown in Fig. 25, the slope of energyincrease in H2NoC in terms of network size parameter is inconsid-erable compared to that of the traditional NoC which stems fromthe nearly size-independent nature of the power and delay metricsin the proposed hierarchical architecture.

7.5. H2NoC prominence over CONoC

Similar to the discussion made in the previous section, furtherexploring the advantages of H2NoC over CONoC and the traditionalNoC in terms of network size parameter, Fig. 26a–c depict ratio ofaverage data transmission delay, power consumption, and energydissipation of ENoC, respectively, to the corresponding values ofCONoC and H2NoC for varying number of processing cores and amoderate offered load (l = 1 us).

As shown in Fig. 26a and c, increasing the network size param-eter intensifies the prominence of both of CONoC and H2NoC archi-tectures over traditional NoC in terms of data transmission delayand energy dissipation. However, power overhead due to opto-electrical conversions in the non-hierarchical architecture affectstotal power consumption for photonic data transmission. Hence,short optical interconnects between neighbor nodes in a large scaleMPSoCs degrades benefits of optical data transmission over electri-cal ones. This statement is verified through experimental resultsdepicted in Fig. 26b. However, as shown in this figure, althoughthe prominence of CONoC over ENoC in terms of power consump-tion degrades with increasing network size, H2NoC still leads toapproximately six times smaller power consumption in a 192-nodeMPSoC.

As discussed before, the proposed hierarchical architectureavoids poor efficiency in large scale MPSoCs, and hence, improves

(c)

128 160 192 Size

5

15

25

35

45

32 64 96 128 160 192Network Size

CONoCHHNoC

s. (a) Delay ratio, (b) power ratio and (c) energy ratio.

22 S. Koohi, S. Hessabi / Journal of Systems Architecture 57 (2011) 4–23

power and energy metrics compared to the non-hierarchical archi-tecture, CONoC. As shown in Fig. 26b, despite power efficiency deg-radation of CONoC for large values of network size parameters,increasing the number of processing cores intensifies the promi-nence of H2NoC over traditional NoC in terms of average powerconsumption. The later advantage of H2NoC over CONoC improvesenergy saving in the proposed hierarchical architecture, as shownin Fig. 26c. However, in the case of small network sizes, Fig. 26a–c illustrates the prominence of the non-hierarchical optical archi-tecture over hierarchical hybrid one. This property, as discussedbefore, emphasizes the efficiency of H2NoC for large scale MPSoCs.

In summary, we can conclude that the proposed hierarchicalhybrid NoC outperforms CONoC for large scale MPSoCs, while itsprominence degrades for small number of processing cores. Thelower bound on the network size parameter can be deduced fromFig. 26c which approximately equals to 80, in our case study.

8. Conclusions and future work

In this paper, we extracted analytical models for data transmis-sion delay, power consumption, and energy dissipation of opticaland traditional NoCs. According to the architectural benefits ofCONoC [15] over previously proposed ONoCs, the analysis is basedon this architecture. Comparing CONoC with ENoC for varying val-ues of link length and degree of multiplexing, we calculated lowerbound limit on the optical link length above which an optical NoCretains its advantages over ENoCs.

Based on the optimum link lengths, we proposed a novel hier-archical on-chip network architecture, named as H2NoC, whichcan overcome the scalability problem resulted from lower boundlimit on the optical link length. It benefits from local electrical net-work and global optical network to transmit data packets betweenprocessing cores. Taking advantages of CONoC, the proposed hier-archical architecture introduces a fully contention-free structure.Moreover, global ONoC in H2NoC architecture eliminates main lim-itations of CONoC.

We developed an event-driven behavioral simulator for H2NoC.Performing a series of simulation-based experiments, we studiedthe efficiency of H2NoC along with its power and energy consump-tion and data transmission delay. The experimental results depictthat the proposed hierarchical architecture, compared to a tradi-tional NoC, leads to about four and eight times smaller average la-tency and power consumption, respectively. Moreover, weconcluded that optically resolving packet congestions, CONoCand H2NoC retain their advantages over ENoC for small packets,and the prominence of these architectures over traditional NoCsubstantially increases for large packets transmitted through thenetworks.

Analyzing simulation-based experimental results, we showedthat despite slight delay increment in H2NoC compared to CONoC,power and energy spent for data transmission through the pro-posed hierarchical architecture is reduced compared to the non-hierarchical one. Evaluating the effect of on-chip traffic patternon system-level metrics in H2NoC, we concluded that H2NoC archi-tecture outperforms CONoC when on-chip traffic pattern lies in therange of uniform to nearly heavy local traffics. However, H2NoCloses its efficiency for intensive local traffic due to the drawbacksof local electrical networks.

Finally, we investigated the impact of network size on the effi-ciency of CONoC and H2NoC. For this purpose, we compared sys-tem-level metrics of these architectures with those of traditionalNoC for various network sizes. Our experimental results verify thatthe proposed hierarchical hybrid NoC outperforms the non-hierar-chical one for moderate and large scale MPSoCs, while its promi-nence degrades for small number of processing cores. Based on

the experimental results, we extracted the lower bound on the net-work size parameter above which H2NoC outperforms the non-hierarchical architecture.

References

[1] G.P. Agrawal, Fiber-Optic Communication Systems, Wiley, New York, 1997.[2] H.B. Bakoglu, J.D. Meindl, Optimal interconnection circuits for VLSI, IEEE Trans.

Electron Dev. ED-32 (1985) 903–909.[3] A. Biberman, B.G. Lee, K. Bergman, P. Dong, M. Lipson, Demonstration of all-

optical multi-wavelength message routing for silicon photonic networks, in:Proceedings of the Optical Fiber Communications Conference (OFC), 2008, pp.1–3.

[4] L. Bononi, N. Concer, Simulation and analysis of network on chip architectures:ring, Spidergon and 2D mesh, in: Proceedings of the Design, Automation andTest in Europe (DATE), 2006, pp. 154–159.

[5] M. Briere et al., System level assessment of an optical NoC in an MPSoCplatform, in: Proceedings of the 2007 Design, Automation and Test in Europe(DATE), 2007, pp. 1084–1089.

[6] G. Chen et al., Predictions of CMOS compatible on-chip optical interconnect,VLSI J. Integr. 40 (4) (2007) 434–446.

[7] M.J. Cianchetti, J.C. Kerekes, D.H. Albonesi, Phastlane: a rapid transit opticalrouting network, in: Proceedings of the IEEE/ACM International Symposium onComputer Architecture (ISCA), 2009, pp. 441–450.

[8] M. Coppola, R. Locatelli, G. Maruccia, L. Pieralisi, M.D. Grammatikakis,Spidergon: a NoC modeling paradigm, Book Chapter in Model DrivenEngineering for Distributed Real-time Embedded Systems, 2005, ISBN:1905209320.

[9] C. Guillemot, Transparent optical packet switching: the European ACTS KEOPSproject approach, IEEE/OSA J. Lightwave Technol. 16 (12) (1998) 2117–2134.

[10] M. Haurylau et al., On-chip optical intercosnnect roadmap: challenges andcritical directions, IEEE J. Sel. Top. Quant. Electron. 12 (6) (2006) 1699–1705.

[11] R. Ho, Wire Scaling and Trends, A Presentation at MTO DARPA Meeting, SunMicrosystems Laboratories, Jackson Hole, WY, 2006.

[12] I.-W. Hsieh et al., Ultrafast-pulse self-phase modulation and third-orderdispersion in si photonic wire-waveguides, Opt. Express 14 (25) (2006)12380–12387.

[13] ITRS, The international technology roadmap for semiconductors – 2007edition, 2007, Available at <http://public.itrs.net>.

[14] N. Kirman et al., Leveraging optical technology in future bus-based chipmultiprocessors, in: Proceedings of the IEEE/ACM Annual InternationalSymposium on Micro-architecture, 2006, pp. 492–503.

[15] S. Koohi, S. Hessabi, Contention-free on-chip routing of optical packets, in:Proceedings of the International Symposium on Networks-on-Chip (NOCS),2009, pp. 134–143.

[16] P. Koonath, T. Indukuri, B. Jalali, Add-drop filters utilizing vertically coupledmicrodisk resonators in silicon, J. Appl. Phys. Lett. 86 (2005) 091102-1–091102-3.

[17] B.G. Lee, A. Biberman, P. Dong, M. Lipson, K. Bergman, All-optical comb switchfor multiwavelength message routing in silicon photonic networks, IEEEPhoton. Technol. Lett. 20 (10) (2008) 767–769.

[18] C. Manolatou, S.G. Johnson, S. Fan, P.R. Villeneuve, H.A. Haus, J.D.Joannopoulos, High-density integrated optics, J. Lightwave Technol. 17 (9)(1999) 1682–1692.

[19] D. Miller, Rationale and challenges for optical interconnects to electronicchips,Proc. IEEE 88 (6) (2000) 728–749.

[20] I. O’Connor, F. Gaffiot, On-chip optical interconnect for low-power, in: E. Macii(Ed.), Ultra-Low Power Electronics and Design, Kluwer, Dordrecht, 2004.

[21] ‘‘OMNeT++ discrete event simulation system”, Available online at <http://www.omnetpp.org/>.

[22] Y. Pan, P. Kumar, J. Kim, G. Memik, Y. Zhang, A. Choudhary, Firefly: illuminatingfuture network-on-chip with nanophotonics, in: Proceedings of the IEEE/ACMInternational Symposium on Computer Architecture (ISCA), 2009, pp. 429–440.

[23] S. Schultz, E. Glytsis, T. Gaylord, Design, fabrication, and performance ofpreferential-order volume grating waveguide coupler, Appl. Opt.-IP 39 (8)(2000) 1223–1232.

[24] A. Shacham, K. Bergman, L.P. Carloni, Photonic networks-on-chip for futuregenerations of chip multi-processors, IEEE Trans. Comput. 57 (2008) 1–15.

[25] A. Shacham, B.G. Lee, A. Biberman, K. Bergman, L.P. Carloni, Photonic NoC forDMA communications in chip multiprocessors, in: Proceedings of the 15thAnnual IEEE Symposium High-Performance Interconnects (HOTI), 2007, pp.29–38.

[26] D. Vantrease et al., Corona: system implications of emerging nanophotonictechnology, in: Proceedings of the IEEE/ACM International Symposium onComputer Architecture (ISCA), 2008, pp. 153–164.

[27] Q. Xu et al., 12.5 Gbit/s carrier-injection-based silicon micro-ring siliconmodulators, Opt. Express 15 (2) (2007) 430–436.

[28] F. Xu, A.W. Poon, Multimode-interference waveguide crossing coupledmicroring-resonator-based switch nodes for photonic networks-on-chip, in:Proceedings of the Lasers and Electro-Optics conf. and Quantum Electronicsand Laser Science (CLEO/QELS) Conference, 2008, pp. 1–2.

[29] Q. Xu, B. Schmidt, J. Shakya, M. Lipson, Cascaded silicon micro-ring modulatorsfor WDM optical interconnection, Opt. Express 14 (20) (2006) 9431–9435.

S. Koohi, S. Hessabi / Journal of Systems Architecture 57 (2011) 4–23 23

[30] J. Zhou, W.H. Wong, E.Y.B. Pun, Y.Q. Shen, Y.X. Zhao, Fabrication of low lossoptical waveguides with a novel thermo-optical polymer material, J. Opt. Appl.36 (2/3) (2006) 429–435.

Somayyeh Koohi received her B.Sc. double degree fromSharif University of Technology, Tehran, Iran in Electri-cal Engineering and Computer Engineering in 2005. Shethen received her M.Sc. degree from Sharif University ofTechnology in Computer Engineering in 2007. Since2007, she has been working toward the Ph.D degree atSharif University of Technology in Computer Engineer-ing. Her research interests include design and analysisof on-chip optical interconnects, low power design,design of network-on-chips for future high performancemulti-processor systems, and optical network-on-chipas a novel solution for future systems-on-chip.

Shaahin Hessabi was born in Tehran, Iran on February14, 1961. He received the B.Sc. and M.Sc. degrees inElectrical Engineering from Sharif University of Tech-nology, Tehran, Iran in 1986 and 1990, respectively. Hereceived the Ph.D. degree in Electrical and ComputerEngineering from University of Waterloo, Waterloo,Ontario, Canada in 1995. He joined Sharif University ofTechnology in 1996, and is currently an associate pro-fessor at the Department of Computer Engineering. Hiscurrent research interests include testing and design fortestability, VLSI design, SoC, and reconfigurable sys-tems.


Recommended