Handling Global Traffic in Future CMP NoCs
Ran Manevich, Israel Cidon, and Avinoam Kolodny.
Module
Modu le Module
Modu le Modu le
Modu le Modu le
Modu le
Module
Modu le
Modu le
Modu leGroup
ResearchQNoC
Electrical Engineering DepartmentTechnion – Israel Institute of Technology
Haifa, Israel
SLIP 2012
Bandwidth Version of Rent’s Rule
B – Cluster external bandwidth.k – Average bandwidth per
module.G – Number of modules in a
cluster.R – Rent’s exponent, 0<R<1.
B = kGR
G = 16B = ∑
Greenfield et al., “Implications of Rent’s Rule for NoC Design and Its Fault-Tolerance”, NOCS 2007
Rent’s Exponent Reflects Traffic Locality
CMP NoC Traffic Follows Rent’s Rule
2D Mesh NoC
~Average of CMP parallel programs*
* Heirman et al., “Rent’s Rule and Parallel Programs: Characterizing Network Traffic Behaviour”, SLIP 2008
2D Mesh – Packets Classification by Distance For illustration purposes, packets are
classified according to distances between sources and destinations.
K=8
Nearest Neighbor (NN) –Dist = 1
Local – 1<Dist<2+K/8
Global – Dist ≥ 2+K/8K=
16
Fraction of global packets decreases in large systems
Rent’s exponent (R) = 0.7
(NearestNeighbor)
Dominance of Global Packets in BW/Router and Light Load Latency
Nearest Neighbor traffic is dominant in small systems.
* Zarkesh-Ha et al., “Hybrid Network on Chip (HNoC): local buses with globalmesh architecture”, SLIP 2010
*
In large systems:1.Global packets are
minority.2.Global packets
dominate BW/router and average latency.
Problem!!!
In large systems, global packets (minority):
Consume most of the network’s BW.Significantly increase average light load latency.
Solution - PyraMesh
Overall hops-count is reduced.Average latency is reduced.
Average BW per router is reduced.
Hierarchical 2D mesh. Global packets are routed
through higher hierarchy levels.
12345678 hopsinstead of 14!
Source
Dest.
PyraMesh - ArchitectureK – The size of the base
mesh.NL – Number of levels.NP – Number of pyramids on
top of the base mesh.
αi – Ratio between the sizes of levels i and i+1.
Ci – Number of routers in level i that are connected to a router in level i+1 along a single dimension.
K = 8, NL = 2, NP = 1αi = 4, Ci = 2
K = 8, NL = 3, NP = 1αi = 2, Ci = 1
K = 8, NL = 2, NP = 4αi = 4, Ci = 1
Addressing – On each level i, node (X,Y)Base Mesh is represented by the nearest router in the North-East quarter:
Routing – XY:
PyraMesh – Addressing and Routing
1
,( , )1
, ;
i
i X Y i mmi i
X YAddress at a
at at
Packets are distributed among levels i according to their travel distance (D) in the base mesh.
DThi – Distance threshold of level i. If D > DThi , the packet is directed to level
i+1. Example: DThi = 6, 12, 20
PyraMesh – Packets Classification
Highest Level Travel Distance
4 D>203 12<D≤202 6<D≤12
1 (Base Mesh) D≤6
Area overhead,
Wiring overhead,
Maximum bandwidth per router*,
Average light-load latency* =
F(K,NL,NP,αi,Ci,Dthi*,R*)
PyraMesh – OptimizationCONSTRAINTSOPTIMIZATION
OBJECTIVES
Optimization Results Example of 16x16 System, R = 0.7
Throughput optimized PyraMesh:
Light load latency optimized PyraMesh:
D≤55<D≤8
D>8Packets distance thresholds
D≤66<D≤18
D>18
Light Load Latency Performance
BMesh – The baseline meshScaled Mesh (SMesh) – Links wider than inBMesh by PyraMesh area overhead factor.
HNoC –
Throughput Results, R = 0.7
Our Contributions
The observation that global packets limit scalability of large systems.
PyraMesh – A novel framework for hierarchical NoCs design.
Characterization of Rentian traffic in large NoCs.
Conclusions Global packets limit performance in
large (future) CMP systems.
PyraMesh – A novel class of hierarchical 2D mesh topologies.
PyraMesh handles global traffic in future CMP NoCs.
Thank You!
Related Work
CMesh J. D. Balfour and W. J. Dally. “Design tradeoffs for tiled CMP on-chip
networks”. International Conference on Supercomputing, 2006.
GigaNoC C. Puttmann, J.-C. Niemann, M. Porrmann, and U. Rückert. “GigaNoC – A
hierarchical network-on-chip for scalable chip-multiprocessors.” Euromicro DSD 2007.
Long Range Links U. Y. Ogras and R. Marculescu. “ ‘It’s a small world after all’: NoC performance
optimization via long-range link insertion”. IEEE Trans. on Very Large Scale Integr. (VLSI) Syst. 2006.
Hierarchical Rings on a Mesh S. Bourduas and Z. Zilic. “Latency reduction of global traffic in wormhole-routed
meshes using hierarchical rings for global routing”. ASAP 2007.
Hierarchical 2-Levels 2D MeshMarkus Winter and Steffen Prusseit and Gerhard P. Fettweis. Hierarchical routing architectures in clustered 2D-mesh networks-on-chip. ISOCC 2010.