Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 213 times |
Download: | 0 times |
King Fahd University of Petroleum and Minerals
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
A High Throughput Network-on-Chip
Architecture for System-on-Chip
InterconnectAbdelhafid Bouhraoua and M.E.S El-
Rabaa
Computer Engineering Department (COE)College of Computer Science and Engineering (CCSE)
King Fahd University of Petroleum and Minerals (KFUPM)Dhahran, Eastern Province, Saudi Arabia
King Fahd University of Petroleum and Minerals
2
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Outline
Networks-on-Chips State of The Art Fat Tree Network Properties Modified Fat Tree Router Architecture Performance Evaluation Conclusion
King Fahd University of Petroleum and Minerals
3
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Outline
Networks-on-Chips State of The Art Fat Tree Network Properties Modified Fat Tree Router Architecture Performance Evaluation Conclusion
King Fahd University of Petroleum and Minerals
4
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Semiconductor Industry Future
Technology Evolution is Faster than Design Evolution
130
90
65
45
3222
1 1.93.9
7.8
15
31
0
20
40
60
80
100
120
140
2000 2002 2004 2006 2008 2010 2012
Year
Tec
hn
olo
gy
(nm
)
0
5
10
15
20
25
30
35
Tra
ns
isto
rs/C
hip
(1
00
Mil
lio
ns
)
King Fahd University of Petroleum and Minerals
5
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Chip Design Methodology
Chip Complexity: 100M – 3B. ASIC Methodology Not Suitable
• RTL Synthesis Back End• Impossible to handle Full RTL Design for the
whole project
IP-Reuse Based Methodology• Opens a Wide Range of Possibilities• IP Blocks Together On a Chip “System-on-
Chip”From ASIC System-on-Chip Era
King Fahd University of Petroleum and Minerals
6
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
SoC Constraints
Very Short Time-To-Market • Compressed Schedule
Very Short Lifecycle • Low Development Cost• Small Team
High Complexity • Available Silicon Resources to
Produce Cost-Effective Highly Integrated SoCs.
Broad Range of IP Blocks• Impossibility to know them all
King Fahd University of Petroleum and Minerals
7
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
SoC Methodology
Main Task: Integration of The IP Blocks System Level Integration
• Data Formatting and Conversion• Protocol Interfacing• Control Interfacing
Interconnection Level Integration• Signal Interfacing• Data Transfer Interfacing• Wire Interconnect and Back-end Integration
King Fahd University of Petroleum and Minerals
8
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Interconnecting The IPs
Brute Force Method: Design The Interface Block for Every Pair of IPs in
the SoC Point-to-Point Communication between IPsProblems: Design Effort Similar to that of a new IP Block
• If 20 Different Blocks Around 400 New Designs !!!!
Point-to-Point Communication Wiring Mess• 50 Blocks; 8bits bidirectional More than 20,000
Globally Routed Wires
Should Look For a Better Way
King Fahd University of Petroleum and Minerals
9
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Networks-on-Chips“Route Packets NOT Wires”, William J. Dally
Idea: Build a Complete on-Chip Network Unified Communication Model (Similar to OSI
Stack)• No Ad-hoc Effort• Standardized Interfacing (May be provided by IP Vendors)
Unified Network Elements (Routers, Link Interfaces)• No Design required by the SoC Teams
Flexible Interconnect and Reduced Global Wiring
King Fahd University of Petroleum and Minerals
10
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
NoC Requirements
Performance• How fast packets are moved across the
network?• How much traffic is carried at the same time
and for how long?
Overhead• How Big is its required Size (in Gates) ?
Adaptivity• Does it Adapt Easily to new Designs ?
Complexity• How Easy is Interfacing to it ?
King Fahd University of Petroleum and Minerals
11
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Outline
Networks-on-Chips State of The Art Fat Tree Network Properties Modified Fat Tree Router Architecture Performance Evaluation Conclusion
King Fahd University of Petroleum and Minerals
12
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Previous Work Majority directly derived from other research
(Interconnection Networks for Parallel Architectures) Reproduce what has been learned in the area of inter-chip
networks, Focus on the router architecture alone to achieve certain
goals in latency Asynchronous design of NoCs, mainly GALS Circuit switching techniques introduced to provide a certain
guarantee for the latency. Did not fully take advantage of the fact that the network is
on-chip where the main gain is no-pin limitation. Router architectures directly derived from inter-chip
architectures where the routers were implemented on a single chip substantial overhead.
Added complexity to achieve guaranteed latency is an overkill in the on-chip context.
King Fahd University of Petroleum and Minerals
13
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Which Network?Most Straightforward Crossbar
Good Throughput (maxes at 66%) Non Scalable (Quadratic) Complexity Of Implementation for Higher
Number of I/Os.
King Fahd University of Petroleum and Minerals
14
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
2-D Mesh
Very Popular Topology in NoCs.• Very Suitable for the 2D nature of Chip
Floorplanning (Tiling)
Very High Constraints• Inefficient routing algorithms (deadlock-free
by construction)• Efficient routing algorithms (Complex
implementation)• Poor performance: Saturation reached at 30
%.
King Fahd University of Petroleum and Minerals
15
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Analysis
Low throughput. Means: latency cannot be guaranteed above the maximum throughput levels
Low throughput cause by contention over the output ports of routers among several incoming packets
Cannot prevent contention from happening. Contention makes router architectures more complex because they need to integrate buffering and prioritization logic.
Routers that implement both packet and circuit switching makes the architecture even more complex.
King Fahd University of Petroleum and Minerals
16
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Methodology
Take advantage of the On-Chip Context:• Design frozen before tape out• No internal IO limitations
Aim for a High Throughput Architecture• Circuitry used at 30% of its maximum is NOT
an optimal Solution (Clock frequency, power).
Reduced router size • Integrate a large number of routers
Wormhole routing vs. Store and Forward• Reduce required buffers in routers
King Fahd University of Petroleum and Minerals
17
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Fat Tree
R R R R
R R R R
R R R R
C C C C C C C C
Bidirectional multistage or folded multistage networks
Bidirectional multistage are two entities:• The Fat Tree (FT)• The butterfly.
Fat Tree better than butterfly (previous work)
What topology resembles a crossbar?Banyans or Multistage Interconnection Networks.
n+1 Stages (or rows) Size is
• Routers = n x 2n
• Clients = 2n+1
Diameter = 2logk + 1; n = log k
King Fahd University of Petroleum and Minerals
18
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Outline
Networks-on-Chips State of The Art Fat Tree Network Properties Modified Fat Tree Router Architecture Performance Evaluation Conclusion
King Fahd University of Petroleum and Minerals
19
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Routing in Fat Tree
Routing reduced to routing in a binary tree.Binary Trees
Three Routing Directions UP RIGHT LEFT
Router
UP
LEFT RIGHT
King Fahd University of Petroleum and Minerals
20
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Routing in Fat Tree
Router (r,c)
[0, l[ U [u, 2n+1 -1]
[l, l+2r-1[ [l+2r-1, u[
Lower bound l : smallest address reached from the router (r,c). Smallest address within the range obtained by clearing the lowest r
bits of the column c.• l = (c/2r) x 2r.
Upper bound u: largest address reached from the router (r,c). • Largest address obtained by adding 2r to the lower bound l.
• u = l+ 2r.
Matrix n rows x 2(n-1) columns. Router (r,c)
• r : row index (rows are indexed from 0 to n-1)• c: column index (columns are indexed from 0 to 2(n+1) -1)
Size of the clients’ address space reachable using the downside ports is equal to 2r
It is always a continuous interval of addresses of the form [l, u[.
King Fahd University of Petroleum and Minerals
21
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Routing In Fat Tree
R R R R
R R R R
R R R R
C C C C C C C C
“Summit” Routers
Routing UP: AdaptiveRouting Down: Deterministic
Alternate Paths
King Fahd University of Petroleum and Minerals
22
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Outline
Networks-on-Chips State of The Art Fat Tree Network Properties Modified Fat Tree Router Architecture Performance Evaluation Conclusion
King Fahd University of Petroleum and Minerals
23
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Contention in Fat Tree
Packets coming from the UP links are never routed up Only packets coming from the bottom links are routed up. Since the number of UP links is equal to the number of bottom
links, there cannot be any contention when routing up. Contention occurs only when going down.
• Bottom links are split in RIGHT and LEFT links, deterministic routing of packets will lead to contention.
UP
LEFT RIGHT
Many Choices for Going UP
Contention
on the way down
King Fahd University of Petroleum and Minerals
24
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Modified Fat TreeDoubling of downward links eliminates contention
C
R
C C C C C C C
R RR
R R R R
R R R R
King Fahd University of Petroleum and Minerals
25
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Outline
Networks-on-Chips State of The Art Fat Tree Network Properties Modified Fat Tree Router Architecture Performance Evaluation Conclusion
King Fahd University of Petroleum and Minerals
26
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Router Architecture
No Crossbar No Buffers (Pushed to the
Clients) Every downstream input
simultaneously connected to two outputs.
Contention eliminated between the inputs going downstream.
Number of outputs is 2k+2 for k inputs (case of when the router is a summit)
Left Inputs
RightInputs
kk
2k + 2 2k + 2
Router models differ from each other only by two items:• Number of input and output ports on the down link• Routing function constants (r,c)
King Fahd University of Petroleum and Minerals
27
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Routing Circuitry
All network elements are constants and frozen at design time. • All lower bound and upper bound
values, used to generate the routing functions, are constants for each router.
• These constants are entered as inputs into the routing function
Routing Function implemented using comparators.
Constants needed by the routing function are:• l• L = l + 2r-1• u
Address≥
<
A
B
A
B
A
B
l
L
u
≥
<
≥
<
LEFT
RIGHT
UP
King Fahd University of Petroleum and Minerals
28
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Client Interface
Buffers pushed to the Client Interfaces
Each incoming link is terminated with a FIFO memory.
The different FIFO memories connected to the client through a single shared bus.
Client/IP Block
FIFO
Down Links (from router)Up Link
FIFOFIFOFIFOFIFO
• Bus can be wider to perform data transfers faster than what is received in the FIFOs.
The size of FIFOs customizable by design team according to the specifications
King Fahd University of Petroleum and Minerals
29
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Outline
Networks-on-Chips State of The Art Fat Tree Network Properties Modified Fat Tree Router Architecture Performance Evaluation Conclusion
King Fahd University of Petroleum and Minerals
30
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Simulation Conditions
Uniform Traffic Generation Uniform Distribution of Destinations Traffic Rate constant fraction of Maximum
Link Bandwidth Variable Packet Size (within a
predetermined range; eg. 64 bytes +/- 10%)
Simulation Platform: Cycle-based C-based. Developed for this purpose
King Fahd University of Petroleum and Minerals
31
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Throughput
0
10
20
30
40
50
60
70
80
90
100
10 20 30 40 50 60 70 80 90
Input Load (%)
Th
rou
gh
pu
t (%
)
FT2
FT
More than 90% Throughput achieved Compare with Regular Fat Tree
King Fahd University of Petroleum and Minerals
32
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Latency
30
40
50
60
70
80
90
100
110
120
130
10 20 30 40 50 60 70 80 90
Input Load (%)
Lat
ency
(cy
cles
)FT2-32C-64B
FT2-32C-128B
FT2-64C-64B
FT2-64C-128B
FT-32C-64B
FT-32C-128B
FT-64C-64B
FT-64C-128B
King Fahd University of Petroleum and Minerals
33
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Area and Speed
Buffer-less architecture less costly
King Fahd University of Petroleum and Minerals
34
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Client Buffer Utilization
0
1
2
3
4
5
6
7
8
9
10
10 20 30 40 50 60 70 80 90
Input Load (%)M
ax.
Nu
mb
er o
f A
ctiv
e F
IFO
s ML = 64 bytes
ML = 128 bytes
ML = 256 bytes
Buffers pushed to the client interfaces. • Considerable
number of buffer lanes is necessary for every client interface.
Simulations shows a linear progression of
the maximum number of lanes used during operation. Obtained figures are an order of magnitude lower than the number
imposed by the architecture. Number of buffer lanes in the client interface can be tailored to suit
the class of applications at hand while reducing buffering area.
King Fahd University of Petroleum and Minerals
35
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Outline
Networks-on-Chips State of The Art Fat Tree Network Properties Modified Fat Tree Router Architecture Performance Evaluation Conclusion
King Fahd University of Petroleum and Minerals
36
CCSE – COE SOC 2006 – Tampere, 14-16 Nov. 2006 Abdelhafid Bouhraoua
Conclusion
A contention-free modified FT architecture is proposed.
Proposed architecture achieves maximum theoretical throughput and has smaller latency than conventional FTs.
Latency increases linearly with input load. Achieved performance is actual performance
using a contention-free network. The area of the network is kept small because of
the absence of buffers in the router architecture. Number of buffer lanes in the client interfaces
can be tailored for a specific platform to suit the class of applications at hand while reducing buffering area.