+ All Categories
Home > Documents > Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems,...

Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems,...

Date post: 12-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
30
Low Power Networks-on-Chip
Transcript
Page 1: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

Low Power Networks-on-Chip

Page 2: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of
Page 3: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

Cristina Silvano � Marcello LajoloGianluca PalermoEditors

Low Power Networks-on-Chip

ABC

Page 4: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

EditorsCristina SilvanoPolitecnico di MilanoDip. Elettronica e Informazione (DEI)Via Ponzio 34/520133 [email protected]

Marcello LajoloNEC Laboratories America, Inc.Independence Way 408540 Princeton New [email protected]

Gianluca PalermoPolitecnico di MilanoDip. Elettronica e Informazione (DEI)Via Ponzio 34/520133 [email protected]

ISBN 978-1-4419-6910-1 e-ISBN 978-1-4419-6911-8DOI 10.1007/978-1-4419-6911-8Springer New York Dordrecht Heidelberg London

Library of Congress Control Number: 2010935810

c� Springer Science+Business Media, LLC 2011All rights reserved. This work may not be translated or copied in whole or in part without the writtenpermission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York,NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use inconnection with any form of information storage and retrieval, electronic adaptation, computer software,or by similar or dissimilar methodology now known or hereafter developed is forbidden.The use in this publication of trade names, trademarks, service marks, and similar terms, even if they arenot identified as such, is not to be taken as an expression of opinion as to whether or not they are subjectto proprietary rights.

Printed on acid-free paper

Springer is part of Springer Science+Business Media (www.springer.com)

Page 5: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

“Ma sopra tutte le invenzioni stupende, qualeminenza fu quella di colui che s’immaginodi trovar modo di comunicare i suoi piureconditi pensieri a qualsivoglia altrapersona, benche distante per lunghissimointervallo di luogo e di tempo? Parlare conquelli che son nell’Indie, parlare a quelli chenon sono ancora nati ne saranno se non diqua a mille e diecimila anni? E con qualfacilita? Con i vari accozzamenti di venticaratteruzzi sopra una carta.”

Galileo Galilei, Dialogo sopra i duemassimi sistemi del mondo, Tolemaico eCopernicano, Firenze, 1632

“But surpassing all stupendous inventions,what sublimity of mind was his who dreamedof finding means to communicate his deepestthoughts to any other person, though distantby mighty intervals of place and time! Oftalking with those who are in India; ofspeaking to those who are not yet born andwill not be born for a thousand or tenthousand years; and with what facility, by thedifferent arrangements of twenty charactersupon a page.”

Galileo Galilei, Dialogue concerning thetwo chief world systems, Ptolemaic &Copernican, Florence, 1632

Page 6: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of
Page 7: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

Preface

Given the increasing complexity of multiprocessor system-on-chip (MPSoC)designs, the current trends in on-chip communication architectures are convergingtowards the network-on-chip (NoC). The NoC-based design approach represents ahigh bandwidth and low energy solution. Using the NoC-based design approach hasseveral other advantages, such as scalability, reliability, IP reusability and separa-tion of IP design from on-chip communication design and interfacing. NoC designrepresents a new paradigm to design MPSoC, shifting the design methodologiesfrom computation-based to communication-based.

Given these premises, during the last decade, we assisted an increasing researcheffort on NoC architectures and related design methodologies. Many key designchallenges of NoC have been investigated in the past years. These challenges haverecently been classified by Marculescu et al. in three main categories: the design ofthe communication infrastructure, the selection of the communication paradigm andthe application mapping optimization. First, the problem of designing the commu-nication infrastructure consists in turn of the following problems: network topologysynthesis, the selection of the channel width, the buffer sizing problem and theNoC floorplanning problem. Second, the selection of the communication paradigmincludes the routing problem and the choice of switching techniques (store-and-forward, cut-through, wormhole, etc.) to be used. Third, the application mappingoptimization problem consists in turn of the IP mapping and the task schedulingproblems of an application onto the NoC platform. All these optimisation tech-niques should take into consideration several metrics of interest to be traded off.These metrics are mainly performance, energy, quality of service, reliability andsecurity.

In this scenario, even some semiconductor industries have started to proposesome NoC-based designs. Among them, we can cite the Aetheral NoC from NXP-Philips, the STNoC from STMicroelectronics and the 80-core NoC from Intel.Several industrial design flows supporting NoC design have also been proposed,such as the CHAIN works tool suite by Silistix, the NoCexplorer and NoCcompilerframeworks by Arteris and the iNOCs tools from iNoCs. The interest demonstratedby several industries and EDA providers contributed to confirm NoC as a feasibleand energy-efficient approach to interconnect a scalable number of IP cores on asingle die.

vii

Page 8: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

viii Preface

Although many scientific books and journal papers have recently been published,many challenging topics related to NoC research are still open. The story behind thisbook began more than a year ago, when we started thinking with Charles Glaserfrom Springer about a book focusing on low-power NoC, as power and energyissues still represent one of the limiting factors in integrating multi- and many-coreson a single chip. Power-aware design techniques at several abstraction levels rep-resent the enabling keys for an energy-efficient design of on-chip interconnectionnetwork. Starting from this idea, the book tries to answer to the necessity of a sin-gle textbook on the topic of low-power NoC, covering power- and energy-awaredesign techniques from several perspectives and abstraction levels. To this purpose,the present book tries to put together several outstanding contributions in severalareas of low-power NoC design.

The book chapters are organized in three parts. In Part I, several power-awaredesign techniques on NoC are discussed from the low-level perspective. These low-level NoC design techniques address the following topics: hybrid circuit/packetswitched networks, run-time power-gating techniques, adaptive voltage controltechniques for NoC links and asynchronous communication. In Part II, severalsystem-level power-aware design techniques are presented dealing with application-specific routing algorithms, adaptive data compression and design techniques forlatency constrained and power optimized NoCs. In Part III of the book, someemerging technologies related to low-power NoC, namely 3D stacking, CMOSnanophotonics and RF-interconnect are discussed to envision their applicability tomeet the requirements imposed by future NoC architectures.

Entering Part I on low-level design techniques, Chap. 1 introduces some is-sues and challenges for future NoCs with demands for high bandwidth and lowenergy. Starting from the analysis of some state-of-the-art approaches to designNoC architectures, the chapter presents details of how coupling packet-switchedarbitration with circuit-switched data transfer can achieve energy savings and im-prove network efficiency by reducing arbitration overhead and increasing overallutilisation of network resources. In this hybrid network, packet-switched arbitra-tion is used to reserve future circuit-switched channels for the data transfer, thuseliminating the performance bottlenecks associated with pure circuit-switched net-works, while maintaining their power advantage. Furthermore, the chapter discusseshow proximity-based data streaming can increase network throughput and improveenergy efficiency. Finally, some NoC measurements and design trade offs are anal-ysed on 45 nm CMOS technology from an industrial research perspective.

Chapter 2 surveys several power gating techniques to reduce the leakage powerof on-chip routers. Leakage power is responsible for a considerable portion of theactive power in recent process technologies. Then, the chapter introduces a run-timefine-grained power gating router, in which power supply to each router component(e.g. virtual-channel buffer, crossbar’s multiplexer, and output latch) can be indi-vidually controlled in response to the applied workload. To mitigate the impact ofwake-up latency of each power domain on application performance, the chapterintroduces and discusses three wake-up control methods. Finally, the fine-grained

Page 9: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

Preface ix

power gating router with 35 micro power domains and the early wake-up methodsare designed with a commercial 65 nm process and evaluated in terms of the areaoverhead, application performance and leakage power reduction.

Chapter 3 surveys the state of the art in energy-efficient communication linkdesign for NoCs. After reviewing techniques at the datalink and physical abstractionlayers, the chapter introduces a lookahead-based transition-aware adaptive voltagecontrol method for achieving improved energy efficiency at moderate cost in perfor-mance and reliability. Then, performance and limitations of the proposed methodare evaluated and future prospects in energy-efficient link design are projected.

Chapter 4 provides an overview of the various asynchronous techniques thatare used at the link layer in NoCs, including signalling schemes, data encodingand synchronization solutions. These asynchronous techniques are discussed witha view of comparison in terms of area, power and performance. The fundamen-tal issues of the formation of data tokens based on the principles of data validity,acknowledgement, delay-insensitivity, timing assumptions and soft-error toleranceare considered. The chapter also covers some of the aspects related to combiningasynchronous communication links to form parts of the entire network architecture,which involves asynchronous logic for arbitration and routing hardware. To this end,the chapter also presents basic techniques for building small-scale controllers usingthe formal models of Petri nets and signal transition Graphs.

Entering Part II on system-level design techniques, Chap. 5 describes how therouting algorithm can be optimized in NoC platforms. Routing algorithm has amajor effect on the performance (packet latency and throughput) as well as powerconsumption of NoC. A methodology to develop efficient and deadlock free routingalgorithms which are specialized for an application or a set of concurrent applica-tions is presented. The methodology, called application specific routing algorithms(APSRA), exploits the application-specific information regarding pairs of coreswhich communicate and other pairs which never communicate in the NoC platform.This information is used to maximize the adaptivity of the routing algorithm with-out compromising the important property of deadlock freedom. The chapter alsopresents an extensive comparison between the routing algorithms generated usingAPSRA methodology with general purpose deadlock-free routing algorithms. Thesimulation-based evaluations are performed using both synthetic traffic as well astraffic from real applications. The comparison embraces several performance indicessuch as degree of adaptiveness, average delay, throughput, power dissipation and en-ergy consumption. In spite of an adverse impact on router architecture, the chapterproves that the higher adaptivity of APSRA leads to significant improvements inboth routing performance and energy consumption.

Chapter 6 presents a method to exploit a table-based data compression tech-nique, relying on value patterns in cache traffic. Compressing a large packet intoa small one saves power consumption by reducing required operations in net-work components and decreases contention by increasing the effective bandwidthof shared resources. The main challenges are providing a scalable implementa-tion of tables and minimizing the latency overhead of compression.We proposea shared table scheme that needs one encoding and one decoding table for each

Page 10: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

x Preface

processing element, and a management protocol that does not require in-order de-livery. This scheme eliminates table size dependence on a network size, whichrealizes scalability and reduces overhead cost of table for compression. The chapteralso presents some simulation results obtained by using the proposed compressionmethod for 8-core and 16-core tiled designs. The experimental results are discussedin terms of packet latency and network power consumption.

Chapter 7 describes the design process of a Network on-Chip for a high-endcommercial system on-chip (SoC) application. Several design choices are discussedin the chapter focusing on the power optimization of the NoC while achievingthe required performance. The chapter describes the NoC design steps includingmodule mapping and allocation of customized capacities to links. Unlike previousstudies, in which point-to-point, per-flow timing constraints were used, the chapterdemonstrates the importance of using the application end-to-end traversal latencyrequirements during the optimization process. To compare several design alterna-tives, the chapter reports the synthesis results of an NoC design that meets the actualthroughput and timing requirements of a commercial 4G SoC.

Entering Part III on future and emerging technologies, Chap. 8 addresses theproblem of 2D and 3D SoC designs where the cores are grouped into voltage islands.To reduce the leakage power consumption, an island containing cores that are notused in an application can be shutdown, while the other islands can still be oper-ational. When one or more of the islands are shutdown, the interconnect shouldallow the communication between islands that are operational. For this, the NoChas to be designed efficiently to allow shutdown of voltage islands, thereby reduc-ing the leakage power consumption. The chapter presents methods to design NoCtopologies that provide such a support for both 2D and 3D technologies. The chap-ter outlines how the concept of voltage islands needs to be considered during thetopology synthesis phase itself. The chapter also analyses the benefits of migratingto 3D stacked chips for realistic applications that have multiple voltage islands.

Chapter 9 introduces the emerging CMOS nanophotonic technologies represent-ing a compelling alternative to traditional all-electronic NoCs. This is because ofnanophotonic NoCs can provide both higher throughput and lower power consump-tion than all-electrical NoCs. The chapter introduces CMOS nanophotonic tech-nology and considers its use in photonic chip-wide networks enabling many-coremicroprocessors with greatly enhanced performance and flexibility while consum-ing less power than their electrical counterparts. The chapter also provides, as a casestudy, a design that takes advantage of CMOS nanophotonics to achieve ten-teraopperformance in a 256-core 3D chip stack, using optically connected main memory,very high memory bandwidth, cache coherence across all cores, no bisection band-width limits on communication and cross-chip communication at very low latencywith cache-line granularity.

Chapter 10 explores the use of multi-band RF-interconnect for future Network-on-Chip. RF-interconnect can communicate simultaneously through multiple fre-quency bands with low-power signal transmission and reconfigurable bandwidth.At the same time, the chapter investigates the CMOS mixed-signal circuit imple-mentation challenges for improving the RF-I signalling integrity and efficiency.

Page 11: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

Preface xi

Furthermore, the chapter proposes a micro-architectural framework that can beused to facilitate the exploration of scalable low-power NoC architectures basedon physical planning and prototyping.

Due to the large number of topics discussed in the book and their heterogeneity,the background on low-power NoC is discussed chapter by chapter with a separatereference set for each chapter. This choice also contributed to make each chapterself-contained.

Overall, we believe that the chapters cover a set of definitely important and timelyissues impacting the present and future research on low-power NoC. We sincerelyhope that the book could become a solid reference in the next years. In our vision,the authors put a big effort in clearly presenting their technical contribution outliningthe potential impact and some case studies. We would like to have the opportunityto specially thank all the authors who contributed to the book. A special thanksto Charles Glaser from Springer for encouraging us from the beginning of thisbook and Amanda Davis from Springer for her continuous support in reviewingthe materials.

Milano, Italy Cristina SilvanoPrinceton, NJ, USA Marcello LajoloMilano, Italy Gianluca PalermoApril 2010 The Editors

Page 12: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of
Page 13: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

About the Editors

Cristina Silvano received the M.S. degree in electronic engineering from Po-litecnico di Milano, Milano, Italy, in 1987 and the Ph.D. degree in computerengineering from the University of Brescia, Brescia, Italy, in 1999. From 1987to 1996, she was a Senior Design Engineer with the R&D Labs, Groupe Bull,Pregnana, Italy. From 2000 to 2002, she was an Assistant Professor with theDepartment of Computer Science, University of Milan, Milano. She is currentlyan Associate Professor (with tenure) in computer engineering with the Diparti-mento di Elettronica e Informazione, Politecnico di Milano. She has publishedone scientific international book and more than 70 papers in international jour-nals and conference proceedings, and she is the holder of several internationalpatents. Her primary research interests are in the area of computer architecturesand computer-aided design of digital systems, with particular emphasis on designspace exploration and low-power design techniques for multiprocessor systems-on-chip. She participated in several national and international research projects, someof them in collaboration with STMicrolectronics. She is currently the EuropeanCoordinator of the project FP7-2PARMA-248716 on “PARallel PAradigms andRun-time MAnagement techniques for Many-core Architectures” (Jan 2010to Dec 2012). She is also the European Coordinator of the on-going projectFP7-MULTICUBE-216693 on “Multi-objective design space exploration of multi-processor SoC architectures for embedded multimedia applications” (Jan 2008 toJune 2010).

Marcello Lajolo received his Master and Ph.D. degrees in electrical engineer-ing, both from Politecnico di Torino (Italy) in 1995 and 1999, respectively. He thenjoined the Computer & Communication Research Laboratories (CCRL; now NECLaboratories America) in Princeton, NJ, where he led various projects in the areasof on-chip communication design and advanced embedded architectures. He alsocollaborates with Advanced Learning and Research Institute (ALaRI) in Lugano,Switzerland, where he has been teaching a course on networks on chip since 2002.He has served or is serving as a program committee member for major conferencesin electronic design automation and embedded system design like DAC, DATE,ASP-DAC, and ISCAS. He has given full-day tutorials at conferences like ICCAD,ASP-DAC, ICCD and others in the area of embedded system design. He is a Senior

xiii

Page 14: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

xiv About the Editors

Member of the IEEE. His primary research topics are related to Networks on Chip,Hardware/Software Codesign Low Power Design, Computer Architectures, HighLevel Synthesis of Digital Integrated Circuits and System-on-Chip Testing.

Gianluca Palermo received the M.S. degree in electronic engineering and thePh.D. degree in computer engineering from Politecnico di Milano, Milano, Italy, in2002 and 2006, respectively. Previously, he was a Consultant Engineer with the LowPower Design Group, Advanced System Technology, STMicroelectronics, wherehe worked on network-on-chip, and also a Research Assistant with the AdvancedLearning and Research Institute, University of Lugano, Lugano, Switzerland. Heis currently an Assistant Professor in the Dipartimento di Elettronica e Infor-mazione, Politecnico di Milano. His research interests include design methodologiesand architectures for embedded systems, focusing on low-power design, on-chipmultiprocessors, and network-on-chip. He participated in several national and inter-national research projects.

Page 15: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

Contents

Part I Low-Level Design Techniques

1 Hybrid Circuit/Packet Switched Network for EnergyEfficient on-Chip Interconnections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Mark A. Anders, Himanshu Kaul, Ram K. Krishnamurthy,and Shekhar Y. Borkar

2 Run-Time Power-Gating Techniques for Low-PowerOn-Chip Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21Hiroki Matsutani, Michihiro Koibuchi, Hiroshi Nakamura,and Hideharu Amano

3 Adaptive Voltage Control for Energy-Efficient NoC Links . . . . . . . . . . . . . . 45Paul Ampadu, Bo Fu, David Wolpert, and Qiaoyan Yu

4 Asynchronous Communications for NoCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71Stanislavs Golubcovs and Alex Yakovlev

Part II System-Level Design Techniques

5 Application-Specific Routing Algorithms for Low PowerNetwork on Chip Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .113Maurizio Palesi, Rickard Holsmark, Shashi Kumar,and Vincenzo Catania

6 Adaptive Data Compression for Low-Power On-ChipNetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .151Yuho Jin, Ki Hwan Yum, and Eun Jung Kim

7 Latency-Constrained, Power-Optimized NoC Designfor a 4G SoC: A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .175Rudy Beraha, Isask’har Walter, Israel Cidon,and Avinoam Kolodny

xv

Page 16: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

xvi Contents

Part III Future and Emerging Technologies

8 Design and Analysis of NoCs for Low-Power2D and 3D SoCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .199Ciprian Seiculescu, Srinivasan Murali, Luca Benini,and Giovanni De Micheli

9 CMOS Nanophotonics: Technology, System Implications,and a CMP Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .223Jung Ho Ahn, Raymond G. Beausoleil, Nathan Binkert,Al Davis, Marco Fiorentino, Norman P. Jouppi,Moray McLaren, Matteo Monchiero, Naveen Muralimanohar,Robert Schreiber, and Dana Vantrease

10 RF-Interconnect for Future Network-On-Chip . . . . . . . . . . . . . . . . . . . . . . . . . . .255Sai-Wang Tam, Eran Socher, Mau-Chung Frank Chang,Jason Cong, and Glenn D. Reinman

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .281

Page 17: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

Contributors

Jung Ho Ahn Seoul National University, Sumon, Gyeonggi-do, Korea,[email protected]

Hideharu Amano Keio University, 3-14-1, Hiyoshi, Kohoku-ku, Yokohama,Japan 223-8522, [email protected]

Paul Ampadu University of Rochester, Rochester, NY 14627, USA,[email protected]

Mark A. Anders Intel Corporation, Hillsboro, OR, USA,[email protected]

Raymond G. Beausoleil Hewlett-Packard Labs, Palo Alto, CA, USA,[email protected]

Luca Benini DEIS, Univerity of Bologna, Bologna, Italy, [email protected]

Rudy Beraha Qualcomm Corp. Research and Development, San Diego,California 92121, USA, [email protected]

Nathan Binkert Hewlett-Packard Labs, Palo Alto, CA, USA, [email protected]

Shekhar Y. Borkar Intel Corporation, Hillsboro, OR, USA,[email protected]

Vincenzo Catania Dipartimento di Ingegneria Informatica e delle Telecomuni-cazioni, University of Catania, Italy, [email protected]

Mau-Chung Frank Chang Electrical Engineering Department, University ofCalifornia, Los Angeles, Engineering IV Building, CA 90095, Los Angeles, USA,[email protected]

Israel Cidon Electrical Engineering Department, Technion - Israel Instituteof Technology, Haifa 32000, Israel, [email protected]

Jason Cong Computer Science Department, University of California, Los Angeles,4731J, Boelter Hall, Los Angeles, CA 90095, [email protected]

Giovanni De Micheli LSI, EPFL, Lausanne, Switzerland,[email protected]

xvii

Page 18: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

xviii Contributors

Al Davis Hewlett-Packard Labs, Palo Alto, CA, USA, [email protected]

Marco Fiorentino Hewlett-Packard Labs, Palo Alto, CA, USA,[email protected]

Bo Fu University of Rochester, Rochester, NY 14627, USA,[email protected]

Stanislavs Golubcovs Asynchronous Systems Laboratory, School of EECE,Newcastle University, Newcastle upon Tyne, United Kingdom,[email protected]

Rickard Holsmark Department of Electronics and Computer Engineering,Jonkoping University, Jonkoping, Sweden, [email protected]

Yuho Jin Department of Electrical Engineering, University of Southern California,3740 McClintock Ave., Los Angeles, CA 90089, USA, [email protected]

Norman P. Jouppi Hewlett-Packard Labs, Palo Alto, CA, USA,[email protected]

Himanshu Kaul Intel Corporation, Hillsboro, OR, USA,[email protected]

Eun Jung Kim Department of Computer Science and Engineering, Texas A&MUniversity, College Station, TX 77843-3112, USA [email protected]

Michihiro Koibuchi National Institute of Informatics, 2-1-2, Hitotsubashi,Chiyoda-ku, Tokyo, Japan 101-8430, [email protected]

Avinoam Kolodny Electrical Engineering Department, Technion - Israel Instituteof Technology, Haifa 32000, Israel, [email protected]

Ram K. Krishnamurthy Intel Corporation, Hillsboro, OR, USA,[email protected]

Shashi Kumar Department of Electronics and Computer Engineering,Jonkoping University, Jonkoping, Sweden, [email protected]

Hiroki Matsutani The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo,Japan 113-8656, [email protected]

Moray McLaren Hewlett-Packard Labs, Palo Alto, Bristol, UK,[email protected]

Matteo Monchiero Hewlett-Packard Labs, Palo Alto, CA, USA,[email protected]

Srinivasan Murali LSI, EPFL and iNoCs, Lausanne, Switzerland,[email protected]

Naveen Muralimanohar Hewlett-Packard Labs, Palo Alto, CA, USA,[email protected]

Page 19: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

Contributors xix

Hiroshi Nakamura The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo,Japan 113-8656, [email protected]

Maurizio Palesi Dipartimento di Ingegneria Informatica e delle Telecomuni-cazioni, University of Catania, Italy, [email protected]

Glenn D Reinman Computer Science Department, University of California,Los Angeles, 4731-D Boelter Hall, Los Angeles, CA 90095, [email protected]

Robert Schreiber Hewlett-Packard Labs, Palo Alto, CA, USA,[email protected]

Ciprian Seiculescu LSI, EPFL, Lausanne, Switzerland,[email protected]

Eran Socher School of Electrical Engineering - Physical Electronics, Tel AvivUniversity, 234 Wolfson EE Lab Bldg, Tel Aviv University, Ramat Aviv, Tel Aviv69978, Israel, [email protected]

Sai-Wang Tam Electrical Engineering Department, University of California,Los Angeles, Engineering IV Building, Los Angeles, CA 90095, USA,[email protected]

Dana Vantrease Hewlett-Packard Labs, Palo Alto, CA, USA, [email protected]

Isask’har Walter Electrical Engineering Department, Technion - Israel Instituteof Technology, Haifa 32000, Israel, [email protected]

David Wolpert University of Rochester, Rochester, NY 14627, USA,[email protected]

Alex Yakovlev Asynchronous Systems Laboratory, School of EECE,Newcastle University, Newcastle upon Tyne, United Kingdom,[email protected]

Qiaoyan Yu University of Rochester, Rochester, NY 14627, USA,[email protected]

Ki Hwan Yum Department of Computer Science and Engineering, Texas A&MUniversity, College Station, TX 77843-3112, USA, [email protected]

Page 20: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of
Page 21: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

Part ILow-Level Design Techniques

Page 22: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

Chapter 1Hybrid Circuit/Packet Switched Networkfor Energy Efficient on-Chip Interconnections

Mark A. Anders, Himanshu Kaul, Ram K. Krishnamurthy,and Shekhar Y. Borkar

Abstract Network on-Chip (NoC) is an interconnect fabric to connect sub-systemblocks on a chip. The NoC should provide high bandwidth and low latency, shouldconsume low energy, and should be compact. However, all these requirements are atodds and require tradeoffs at all levels. In this chapter, we discuss issues and chal-lenges for future NoCs with demands for high bandwidth and low energy. Next, wepresent details of how coupling packet-switched arbitration with circuit-switcheddata transfer can achieve these goals. In this hybrid network, packet-switchedarbitration is used to reserve future circuit-switched channels for the data trans-fer, eliminating the performance bottlenecks associated with pure circuit-switchednetworks while maintaining their power advantage. Furthermore, proximity-baseddata streaming increases network throughput and improves energy efficiency. Mea-surements of this NoC in 45 nm CMOS are described to analyze design tradeoffs.

1.1 Network on-Chip: Past, Present, and the Future

Network on-Chip (NoC) has evolved from the good old supercomputer days wherecomputers in a cabinet, as well as multiple cabinets, were connected together to forma complete parallel computer system. These networks were primitive indeed, suchas simple Ethernet at times, nevertheless sufficient to provide the necessary band-width with acceptable latencies. That was then, and now, with technology scalingover several generations, you can afford to have several computers themselves on asingle die, connected together by a network forming a homogeneous many-core par-allel computer system. To take it even further, the integration capacity is now so vastthat it is possible to integrate diverse functional blocks on a chip, to be connected bya communication network, to form a heterogeneous system, what we call a systemon-chip or SoC. And the network that connects these functional blocks together isthe backbone of such a system. In this chapter, we discuss the state of the art in this

M.A. Anders (�)Intel Corporation, Hillsboro, OR, USAe-mail: [email protected]

C. Silvano et al. (eds.), Low Power Networks-on-Chip,DOI 10.1007/978-1-4419-6911-8 1, c� Springer Science+Business Media, LLC 2011

3

Page 23: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

4 M.A. Anders et al.

field, the issues and challenges that we will face in the future, and some of the promi-nent work addressing these issues. We also propose a hybrid packet/circuit-switchednetwork that combines network advantages, higher resource utilization of packet-switched networks, and low power consumption of circuit-switched networks, toimprove the energy-efficiency. The energy-efficiency advantage and design trade-offs will be quantified with silicon measurements of an 8 � 8 mesh NoC in 45 nmCMOS.

1.1.1 State of the Art in NoCs

Evolution of the NoC occurred over the last 3 decades, from early days of single chipmicrocontrollers incorporating several simple functional blocks, to today’s sophis-ticated SoCs integrating diverse functional blocks on a chip. The early NoCs weregood enough for the purpose, and as the bandwidth demand increased, they mor-phed into even sophisticated networks, with higher order topologies implementedwith complex switches. Let us examine the evolution, comparing and contrastingtheir benefits as they evolved.

1.1.1.1 Buses

A bus is the simplest NoC used in the early days of microcontrollers, to connect atiny processor core to other peripherals, such as memory, timers, counters, and serialcontrollers. The bus was typically narrow, of the order of 8 to 16 bits wide, spanningthe entire chip, connecting almost all the agents together. Such a long bus seems veryslow due to large RC delays associated with a long bus, but the chip frequency waslimited by the transistor performance, not the bus. The most prominent feature of thebus is its simplicity, needing a small transistor budget. On one hand, bus utilizationis limited because it is shared, arbitrated by all the agents to transfer data. On theother hand, such a shared bus also provides the benefit of broadcast and multicast.

1.1.1.2 Rings

When transistors became faster, and the bus RC delay started to dominate the oper-ating frequency, the obvious solution was to use repeaters in the bus to improvethe delay, ultimately emerging into pipelined buses, where every repeater stageof the bus is clocked. The result is a pipelined bus, with repeated bus segments,with the clocked repeater stage at the agent itself. The result is a ring, if the twofar ends are connected together [1]. The advantage of a ring is that it offers higherfrequency of operation, but with potentially increased latency of a number of clocksin each hop, with average node to node latency of half the number of hops. A ring isgood enough for a small number of agents; however, as the number of agents grows,the latency increases linearly.

Page 24: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

1 Energy Efficient Hybrid Circuit/Packet Switched NoC 5

1.1.1.3 Meshes

The latency limitation of rings resulted in a higher dimensional network such as amesh or a torus [2]. A mesh too is a segmented bus in two dimensions, with switch ateach agent, but with added complexity to route data across dimensions such as fromX to Y. The advantage of a mesh network is that the average latency grows slowly(square root) with the number of agents, but adds more complexity into networkprotocols and implementation logic, and if not careful then could create hazardousconditions such as dead-locks. Such a network can be virtualized too, with virtualchannels over physical links to further improve utilization [3].

1.1.2 Issues and Challenges for the Future

As the technology continues to scale providing abundance of transistors for integra-tion of diverse functional blocks, how will the NoCs keep up? We will now lookinto the challenges of NoCs in the future.

1.1.2.1 Power and Energy

Consider an SoC on 45 nm technology, with eight agents connected on the die asshown in Fig. 1.1. Successive technology generation will double the integrationcapacity following Moore’s Law, and expect billions of transistors, with almost64 agents by 15 nm technology, providing terascale (Tera-ops) level performance.If the agents are connected by an 8 � 8 mesh network, then the wire segments in themesh will be of the order of 1 mm in size.

Note that as technology scales, the number of agents on the die double, and ifeach agent carries a switch for a mesh network, then the energy dissipated in theswitches increases proportionally. The number of wires doubles too, but the lengthof the wire reduces. Figure 1.2a shows estimated delay and energy of a bus, and

Fig. 1.1 Future integration capacity for SoCs

Page 25: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

6 M.A. Anders et al.

Fig. 1.2 On-die interconnect delay and energy with respect to (a) length and (b) technology

Fig. 1.3 Hierarchical, heterogeneous NoC

Fig. 1.2b shows energy expended in the switch. Using these estimates, and assumingthat the Terascale SoC accesses one Tera-operand (32 bit), traversing 10 hops on anaverage, then the power consumption of the network alone would be too high.

1.1.2.2 Heterogeneity

Clearly, a mesh network as a homogenous NoC is not optimum. For short distances,such as adjacent agents, a bus is a much better solution because energy and delayboth can be low. Moreover, buses can be designed for low voltage swings to reducethe energy further. As the wire length increases, approaching delay close to thelatency in a switch, then it is more appropriate to incorporate traditional packetswitched mesh.

New approaches for a NoC are needed, as shown in Fig. 1.3. Agents in closeproximity could be interconnected into clusters with traditional buses which areenergy efficient for data movement over short distances. The clusters could be con-nected together with wide (high bandwidth) low-swing (low energy) buses, or theycould be connected with packet or circuit switched networks depending on the

Page 26: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

1 Energy Efficient Hybrid Circuit/Packet Switched NoC 7

distance. Hence, the NoC could be hierarchical and heterogeneous, a radicaldeparture from the traditional approach for NoC [4].

1.2 Proposed Hybrid Packet/Circuit Switched NoC

As integration densities continue to increase in a power-limited environment, multi-core processors provide increased performance vs. power efficiency through parallelprocessing at reduced voltages and frequencies. Innovations in interconnect net-works for on-die communication between cores are key to enabling scalable per-formance as the number of cores increases [5–8]. By combining network topologyand architecture advantages with efficient circuit implementations, more efficientcommunication can be achieved. For multi-core networks, packet-switched 2Dmeshes provide efficient interconnect utilization, low latencies, and high through-puts, but suffer from low energy efficiency due to data storage during routing [9,10].Circuit-switched data transfer achieves both high bandwidth and energy efficiencyby eliminating intra-route data storage [11–13]. It offers a dedicated channel duringdata transmission without the need for intermediate buffering or arbitration. How-ever, by avoiding buffering and arbitration, the dedicated channel resources mustbe reserved prior to data transmission, possibly preventing other more optimal datatransmissions from occurring. Unlike prescheduled source-directed routing schemes[2, 14], distributed routing schemes are not limited to predefined traffic patternsor applications, but determine packet routes and priorities for the reservation ofresources based on incomplete real-time information. Therefore, in order to over-come challenges of resource allocation and distributed control, efficient circuits areneeded that can approach throughputs of packet-switched networks while maintain-ing energy savings of a circuit-switched network.

1.2.1 Circuit-Switched Data with Packet-SwitchedArbitration NoC

A circuit-switched 2D mesh network with packet-switched arbitration is composedof a packet-switched request address network alongside a circuit-switched acknowl-edge and data network (Fig. 1.4). This heterogeneous network allows delayingchannel allocation for improved resource utilization, since small packets reservechannels before data transfer. However, since the data transfer uses circuit-switchedpaths without intermediate data storage, energy savings are also maintained. Fur-thermore, efficient circuits improve the overall network efficiency by reducingarbitration overhead and increasing overall utilization of network resources.

A data transmission using this hybrid network is composed of three separatephases (Fig. 1.5). During the setup phase for a circuit-switched data transmission,request packets containing the destination address are routed using the packet-

Page 27: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

8 M.A. Anders et al.

Fig. 1.4 Circuit-switched 2D mesh organization

Fig. 1.5 Circuit-switched pipeline and clocking

switched network. As the request packet passes each router and interconnect seg-ment, the corresponding circuit-switched data channel for that segment is allocatedfor the future circuit-switched data transmission. When the request packet reachesits destination, a complete channel or circuit has been allocated. This channel has alatching or storage element only at the destination, with only multiplexers and re-peaters along the way. Acknowledge signals indicate that the channel is ready for thedata transfer, thus completing the setup phase. When the channel is ready, the sourcerouter drives the data onto its output, where it propagates to the destination without

Page 28: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

1 Energy Efficient Hybrid Circuit/Packet Switched NoC 9

interruption by state elements. Following the reception of data at the destination, thechannel is deallocated at the end of the cycle.

Compared to purely packet-based networks, energy is reduced by not storingdata between the source and destination. Also, since only a single header packetis transmitted to allocate each channel without multiple subsequent data flits, thetraffic on the packet-switched network is also reduced. During this allocation phase,packets only hold resources at their current router while storing their routing direc-tion for future data transfer. In contrast, a purely circuit-switched network wouldhold resources between the source router and its current arbitration location evenwhen blocked by other traffic. Because circuit-switched resource allocation is adistributed optimization without global control, efficient circuits such as pipelin-ing and queue slots can further improve overall energy efficiency by increasingutilization of the wide circuit-switched data buses. Pipelining of the three routingphases improves the data throughput. Different clocks are used to synchronize therequest packet-switched and data circuit-switched portions of the network (Fig. 1.5).Since each request packet travels only between neighboring cores of each cycle, itcan operate with a higher frequency clock (PClk) than the circuit-switched por-tion (CClk), where data may travel across the whole network of each cycle. Duringcircuit-switched data transmissions, acknowledges for future transmissions are sent(Fig. 1.6). Also, request packets are simultaneously creating new channels by stor-ing the routing direction for future data transmissions. This pipelining removes therequest and acknowledge phases from the critical path, improving circuit-switchedthroughput by 3�.

In order to further improve resource utilization with distributed control, queueslots added to each router port store multiple request paths. This provides sev-eral potential paths for the circuit-switched network to choose from during theacknowledge phase. With this increase in available data transfer paths, more optimalnon-interfering simultaneous data transfers occur, improving total throughput andresource utilization.

Fig. 1.6 Network timing diagram

Page 29: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

10 M.A. Anders et al.

1.2.2 Circuit Innovations for Circuit/Packet Switched NetworkArbitration

Each router within the network is divided into five separate ports: north, south, east,west, and core (Fig. 1.7). Each of these ports is further divided into IN and OUTports, for receiving and transmitting data, respectively. All ports within a routerare fully connected as a crossbar. In order to avoid deadlocks, the 2D mesh usesx-first, y-second routing, and unused paths are removed from within the router.Request packets for initial arbitration are sent between neighboring routers withpacket hold signals providing flow control. Bidirectional acknowledge signals, fromsource (SrcAck) and destination (DestAck), indicate that a circuit-switched path isready for data transfer during the next CClk cycle, completing arbitration. Finallycircuit-switched data is routed from source to destination.

As each request packet propagates from one router to the next, its routing direc-tion is stored in a queue slot. During a CClk cycle, each router port independentlyselects one of its queue slots, based on a rotating priority (Fig. 1.8a). The directionpreviously stored in that queue slot is used to route source and destination acknowl-edge signals. Arrival of both acknowledges at any router along the path indicatesthat the complete path is ready for data transmission in the next cycle (Fig. 1.8b).Paths that are not ready must wait for a future CClk cycle, while ready paths freetheir resources following data transmission. The request packet circuits route pack-ets containing destination address and queue slot (Fig. 1.9). The IN port comparesthe router and destination addresses to determine the routing direction and corre-sponding OUT port. Round-robin priority circuits select one of the valid packets ateach OUT port and send Hold signals to the unselected IN ports. As each request

Fig. 1.7 Circuit-switched network router organization

Page 30: Low Power Networks-on-Chip · Galileo Galilei, Dialogue concerning the two chief world systems, Ptolemaic & Copernican, Florence, 1632. Preface Given the increasing complexity of

1 Energy Efficient Hybrid Circuit/Packet Switched NoC 11

Fig. 1.8 (a) Slot selection and (b) path selection and data transmission

Fig. 1.9 Arbitration and slot generation circuits

packet is transmitted, the routing direction is written to the queue slot entry withina 2b register file, creating request paths from router to router. The Hold signal isalso asserted when the requested queue slot is full or a Hold signal arrives from thefollowing router, preventing that request packet’s IN port from continuing.


Recommended