Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | nikolas-garfield |
View: | 216 times |
Download: | 2 times |
Presentation of
Designing Efficient Irregular Networks forHeterogeneous Systems-on-Chip
byChristian Neeb and Norbert Wehn
and
Workload Driven Synthesis of Irregular Application Specific NoC's
byBen Meakin
Introduction
Irregular vs Regular NetworksN1 N2 N3
N4 N5 N6
N7 N8 N9
N1 N2 N3
N4 N5 N6
N7 N8 N9
Advantages: High performance/low power
for specific application Lower hardware cost of
network Disadvantages:
Difficult routing Difficult floor planning/P&R
Advantages: Efficient for general purpose
applications Easy routing Easy floor planning/P&R
Disadvantages: Expensive Under utilized channels Avg. latency not as scalable
Background and Motivation
NoC power a function of network hops: Pperflit = (buff + xbar + arb)*Hops + wire*D
NoC latency a function of network hops: L = Rpipeline * Hops (clock cycles)
NoC hardware cost a function of router radix: C = sum (k1*radix_n + k2) n = 1....nodes
Minimize hops and router complexity!
Design of Efficient Irregular Networks for Heterogeneous Systems-on-Chip
Christian Neeb and Norbert WehnUniversity of Kaiserslautern
9th EUROMICRO Conference on Digital System Design 2006
Application Communication Model
Resource Communication Graph (RCG) Vertex: hardware component Edge: abstract communication channel Edge Weight: average communication rate (bandwidth)
Normalized to physical bandwidth of one channel link
Efficient mapping of task communication to RCG is assumed and is not discussed in this paper
INOC Architecture
Nodes are tiles consisting of a chip resource and a router
Routers: Wormhole with virtual channels
Round-Robin switch arbitration
Routing: Deterministic: one output channel per destination
address Adaptive: > one output channel per destination address
Routing in Irregular Topologies
Channel Dependency Graph (CDG) Vertex: channel Edge: pair of channels with a dependency due to
routing function Deadlock free if there are no cycles
Node Numbering Arbitrary in this paper
Channels Increasing / Channels Decreasing Acyclic CDGs Determined by source/destination node numbers
Deadlock Free Routing Route packets on increasing channels, then decreasing
General form of dimension order routing
Routing in Irregular Topologies
Unsafe paths allowed provided there is a safe path (acyclic) that a packet can escape to Routing table filled via Dijkstra's algorithm
Virtual channels are used to expand routing function using similar rules as dimension order schemes
Network Generation
Start with a bidirectional connection of all nodes in ascending order Ensures a correct routing path exists
Add shortcuts based on network traffic estimation and RCG Provides an optimal routing path that will be used in
the common case
Results
Network traffic patterns randomly generated with constraints to model heterogeneous SoC traffic Simulated with 100 different patterns
Variable flit injections rates
Limitations
Naive implementation of “safe path” increases cost/complexity of network RCG is the most efficient topology More efficient deadlock free solutions may exist
Efficient mapping of TCG to RCG is assumed Can be a difficult problem Network generation could be linked to a specific
programming model or API for simple TCG to RCG mapping
Project Objectives
Synthesis tool for INoC's Generation of optimal network topology MCAPI workload specification Synthesis of deadlock free routing
Comparison of irregular to regular NoC performance, power, and cost
Motivation
CAD for ASICS always needs improvement Reduce design cost, time-to-market, etc. Synthesis of INoCs could be useful in
implementing parallel algorithms in programmable logic
Average router complexity needs to be minimized Asynch NoCs can realize improvement in average case performance
GALS INoCs could be future direction in heterogeneous SoC
Minimize hops to reduce power and latency
Specification of Workload
Communication channels:
ScalarChan N1 N2 256e9 10 Comm Type Sender Receiver BW (bits/sec) Priority
Optimization Effort:
E = 100 * ((BW / Norm)/2 + (PR / Norm)/2)
Network Synthesis Algorithm
Sort channels by priority/effort Add all nodes to network For each channel:
if sender has links: while link has not been added
if receiver has links, add link; else increase search depth else recursive call from next sender neighbor node
while not done if network is not correct, add link; else done
Network Synthesis Example
N1 N2
Max Radix is 3 N1 – N2 added
Highest priority link
Network Synthesis Example
N1 N2
N3
N4
N1 – N4 added N1 – N3 added N3 – N4 added
Network Synthesis Example
N1 N2
N3
N4
N5
N6
N7
N1 – N5 added Max radix of N1 reached, so
N2 used as via
N6 – N7 added N5 – N8 added Are we done?
N8
Network Synthesis Example
N1 N2
N3
N4
N5
N6
N7
N2 – N6 added for correctness Every node must have a path to every other node
N8
Synthesis of Routing Function
Rename nodes with naming algorithm Depth first, breadth first, root node selection
Most efficient synthesis for application For each node pair in workload
check if shortest path is correct if not, add a virtual channel to make it correct
Most correct synthesis for general purpose For ALL node pairs in network
check if shortest path is correct if not, add a virtual channel to make it correct
Synthesize ROM for each node
Uses of this Tool
Incorporate with previous MCAPI hardware implementation Allow user to synthesize a working HDL model based
on MCAPI workloads and a set of IP blocks Modern FPGA's could make this a feasible and cost
effective alternative to fabricating ASIC's
Aid for studying optimal topology for known network traffic
Questions?