1 H Friedman Fully Asynchronous framework for GALS network on chip 2010
Fully Asynchronous framework for GALS network on chip
Friedman Harel
Seminar in VLSI Architectures (048879)Electronic Engineering
Technion
EE Department Technion, Haifa, Israel
Mentor: Prof Ran Ginosar
2 H Friedman Fully Asynchronous framework for GALS network on chip 2010
Asynchronous Network On Chip (A-NOC)
The demand for scalable, low latency and power efficient System-On-Chip interconnection, leads the development of network on chip (NOC).This presentation reviews architectural and practical aspects, of Asynchronous Network-On-Chip solution.
3 H Friedman Fully Asynchronous framework for GALS network on chip 2010
Agenda
• Why NOC In SOC• The evolution of networks – brief• The advantages of A – NOC VS NOC• The GALS solution• The A-NOC major blocks• A NOC protocol• The routing algorithm• The LETI Faust SOC Approach (Dynamic Reconfiguration)• The LETI Alpin SOC Approach (DVFS)• The MAGALI SOC Approach (Heterogeneous architecture)• Summary
4 H Friedman Fully Asynchronous framework for GALS network on chip 2010
System On Chip (SOC)•With many tens of million transistors available on asingle chip, the System-on-Chip (SOC) has become a reality.
•Design with IP reuse is mandatory .Integrated processor cores, DSPs, on-chip memories,IP-blocks, etc…are commonlyin use.
SOC implementation using A-NOC in telecommunication chip:ISSCC 2010 / SESSION 15 / LOW-POWER PROCESSORS & COMMUNICATION / 15.3
5 H Friedman Fully Asynchronous framework for GALS network on chip 2010
From buses to networks
6 H Friedman Fully Asynchronous framework for GALS network on chip 2010
From buses to networks
7 H Friedman Fully Asynchronous framework for GALS network on chip 2010
A globally shared buses cannot meet the increasing demands of System-on-Chip interconnects.The following handicaps has become dramatic obstacles: 1.Long-wire loads and resistances, results slow signals propagation.2.Difficulties in timing validation.3.Connecting blocks running at different speeds.4.Connecting blocks using different voltage levels.5.Power efficiency drops down.
Why NOC In SOC
8 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The Synchronous network - brief
9 H Friedman Fully Asynchronous framework for GALS network on chip 2010
Generic On-Chip Router
10 H Friedman Fully Asynchronous framework for GALS network on chip 2010
Frequency Distribution
• Clock skew may force the system to be partitioned into multiple clock domains
• Can exploit the fact that only the phase of each router’s clock differs, simple error-free clock-domain crossing possible (single clock source)
11 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The need for Asynchronous hand shake
12 H Friedman Fully Asynchronous framework for GALS network on chip 2010
Synchronous Routers with Asynchronous Links
• Synchronization:
Time Safe: e.g. Traditional 2 FF synchronizers Value Safe: Clock Pausing/Data-driven clocks
13 H Friedman Fully Asynchronous framework for GALS network on chip 2010
1. Clock management of a NOC-based chip is still an issue when multi-clock synchronization is required.
2. Asynchronous and self-timed circuits are known to be feasible. Delay Insensitive (DI) asynchronous communication provides chip level communication robustness, ensuring functionality in a large voltage and process range.
3. With DI encoding, delay variations, due to physical constraints such as crosstalk, are no longer an issue. Wire pipelining is easy to achieve
at chip level by adding asynchronous latches, in order to re-power the
signals while cutting-off the wire cycle time.
The advantages of A – NOC VS NOC
14 H Friedman Fully Asynchronous framework for GALS network on chip 2010
Because most of the IP cores and logic blocks are synchronous, the SOC design which uses A – NOC is based on: Globally Asynchronous Locally Synchronous (GALS) topology.
The GALS solution
The GALS topology is base on asynchronous network of synchronous blocks
Each block on the network has its one clock domain.
15 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The A-NOC major blocks
Links
Router (Node)
The network topology is 2D point to point connections of routers (nodes) , arranges as 2D matrix and function as a mesh network .
IP Core
16 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The A-NOC RouterEvery router (beside the routers on the
edges of the NET), contains five sets of inputs and outputs.
Four inputs and outputs are directed to the fourth possible directions (North, South, East and West).
The fifth Input / Output set is directed to the specific core in the node (Locally synchronous).
Every Input is connected to all the fifth outputs and vice versa.
17 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The A-NOC Link
• The link is based on 4-phase handshake of QDI (quasi delay insensitive) 4-Phase asynchronous protocol. • In long traces, asynchronous pipelining is added to the NoC links.
• Typical 4-Rail QDI interconnect, and the associated pipelining is presented in figure (b)
18 H Friedman Fully Asynchronous framework for GALS network on chip 2010
One of four data encoding
19 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The GALS adapterThe synchronous IP core is Connected to the A-NOC via the GALS adapter .The adapter has two objectives:
1. Resynchronize the asynchronous NoC protocol with the synchronous domain.
2. To generate a local clock with configurable frequency.
The synchronization is based on two FIFO. For every FIFO an ordinary synchronizers are used to adapt the Read and Write signals of the FIFO to the synchronous and asynchronous domains.
The Johnson Encoding method, is used in order to offer small and efficient FIFO, by locally generated clock using a pausable clocking scheme with programmable IP clock.
20 H Friedman Fully Asynchronous framework for GALS network on chip 2010
GALS synchronization Pausable clock
Meta stability filter
21 H Friedman Fully Asynchronous framework for GALS network on chip 2010
GALS synchronization Pausable clock
• Simple GALS interface (receiver)• Note: Req/Ack uses 2-phase handshaking protocol
22 H Friedman Fully Asynchronous framework for GALS network on chip 2010
GALS synchronization Pausable clock
23 H Friedman Fully Asynchronous framework for GALS network on chip 2010
Data-Driven Clock Waveform
24 H Friedman Fully Asynchronous framework for GALS network on chip 2010
Data-Driven Clock Waveform
• Imagine data from two packets arriving at a single router node at different rates
• An aperiodic clock may be generated to minimise latency and power
• Minimum clock period set by delay line• Value safe synchronization (no chance data is ever lost)
25 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The FIFO approach
Synchronization issue ? pointers are cross timing domains need synchronization with opposite clock Needs ad-hoc encoding to ensure proper detection of full and empty states
26 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The Johnson counter
27 H Friedman Fully Asynchronous framework for GALS network on chip 2010
Johnson Encoding for FIFO design
28 H Friedman Fully Asynchronous framework for GALS network on chip 2010
Johnson Encoding FIFO architecture
29 H Friedman Fully Asynchronous framework for GALS network on chip 2010
A to S interface
30 H Friedman Fully Asynchronous framework for GALS network on chip 2010
A to S interface
31 H Friedman Fully Asynchronous framework for GALS network on chip 2010
Local clock generation
32 H Friedman Fully Asynchronous framework for GALS network on chip 2010
A NOC protocolNOC communication architecture protocol, is defined at the following levels :
1.The physical layer corresponds to the signal level of data exchange. This is implemented by a 4-phase handshake protocol.2.The flit level corresponds to an atomic 32-bit data transfer. At this level, we describe the signal mechanism to exchange flits, for a given priority. The flit level allows to remove any dependency with a clock cycle within the full network.3.The packet level corresponds to packet transmission through the network. Packets are coherent messages, built of successive data flits. At this level is defined all required information for proper message routing within the network. This is the level where network arbitration is performed. Virtual channel mechanism is used to improve efficiency and guarantee low latency for priority packets.4.The last level is the message level. This does not concern the network itself, but only the source unit and the destination units which communicate together.
33 H Friedman Fully Asynchronous framework for GALS network on chip 2010
A NOC protocol
34 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The Physical layer
A pipeline stage every millimeter suppose to be enough for 65 nm CMOS
• Quasi Delay Insensitive circuits design .for instance, a 4-phase protocol handshaking for asynchronous channels.
• The full data path =>(32 bits + BoP + EoP) is entirely designed with 4-rail encoded data, requires 17 * 4-rail vectors.
35 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The “flit” level
The whole network synchronization mechanism is based on a basic handshake between nodesto exchange a data flit. OP: Output port.IP: Input port
Each flit is composed of 32-bit data and 2 control bits, where the 34th bit encodes the begin-of-packet (BoP) and the 33rd bit encodes the end-of-packet (EoP).
36 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The packet level• Data is transmitted in packets made of several flits.• Every packet contains header flit and several payload flits.
The header flit is comprised of the following fields:• path-to target field:The encoding is the following : 00 for north, 01 for east, 10 for south, 11 for west. 18-bits, which allows to cross at most 9 different nodes in the network topology. In case more nodes must be addressed, a specific programmable resource can be used in order to extend the path value.• message control field :Is used to encode message level of the packet : whether it is a read packet, a write packet, an interrupt packet, etc
37 H Friedman Fully Asynchronous framework for GALS network on chip 2010
Basic Routing
38 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The router structure • In order to improve efficiency and to guarantee low latency, two virtual channels are implemented in each node. • The first one is dedicated to real-time, low latency packets, and the other one for best-effort traffic. The first channel VC0 has the highest priority and can suspend the path of the second channel VC1. • A given packet cannot be suspended by a packet of the same or lower priority, it may only be suspended when a packet with a higher priority requests the same network link (which is actually a given node output). In that case, the suspended packet is stalled and stored in previous nodes.
senddata
acc 0acc 1
senddata
acc 0acc 1
39 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The routing algorithm
• The static paths between initiator and target resources are programmed and stored in the initiator resources.
• Even if the routing is deterministic, the routing paths between the resources, in case of blocks, routing is determined using a dynamic routing algorithm. • One of efficient and dead-lock free proven algorithm called the “odd-even turn model”• In a two dimensional Mesh of size m× n every node is identified by a two element vector (x, y), 0< x <m-1, and 0 <y <n-1, where x and y are the coordinates in the two dimensions. Rule 1: East-north and north-west turns are not allowed at any nodes located in the even column and odd column respectively.Rule 2: East-south and south-west turns are not allowed at any nodes located in the even column and odd column respectively.
40 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The routing algorithm
An example of faulty pass recovery – block relief
41 H Friedman Fully Asynchronous framework for GALS network on chip 2010
Blocking Issue
42 H Friedman Fully Asynchronous framework for GALS network on chip 2010
QNoC-based SoC design flow
An example of faulty pass recovery – block relief
43 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The router structure – input ports1. DEMUX stage which routes
flits to their corresponding VC queues.
2. The Shifter stage modifies the routing information as needed (path to target field in the flit header), and the flit is stored in a buffer stage waiting for the Availability of the appropriate output port.
44 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The router structure – output ports
1. First performs arbitration between directions within each VC, and only then between VCs.
2. Direction Arbiter performs fair arbitration of the possible new packet requests from the input ports.
3. Generates a single command token to the Direction Switch that will be received the data only at the end of packet.
4. Finally, the VC Arbiter arbitrates at flit level between the two VCs, and commands the VC Switch.
45 H Friedman Fully Asynchronous framework for GALS network on chip 2010
GALS adapter unit – input port• IP Decodes packet routing bits and shift the path-to-target bits for following nodes. • IP Transfers data, priority, BoP and EoP information to the selected output controller. • A first process (get_priority_bit) decodes theincoming flit priority level from the IP_send signal.-If the flit is a begin of packet : The path to target shifted and the flit is stored in the corresponding priority level channel ; Token to the appropriate channel and path information is maintained using the loop processes.-If the flit is not a begin of packet : The incoming flit is stored in the corresponding priority channel • The process get_new_flit Is responsible to shift the path to target bits, and to transmit the received 32-bit data and EoP bit toward the proper register, according to the Virtual Channel number.
46 H Friedman Fully Asynchronous framework for GALS network on chip 2010
GALS adapter unit – output ports
• Arbitration between virtual channels and arbitration within a virtual channel.
A "first arrived, first served” policy (FAFS) - priority virtual channels.
• VC0 is made simpler with only static arbitration (N/E/S/W)
• VC1 arbitration is in accordance with priority list using the mechanism of FAFS
• 34 bit data switch.
47 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The Low-power processor & communication chip design
A local processing core connected to the A-NOT
48 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The Low-power processor & communication chip design
Receiver Block diagram on the NOC
49 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The Low-power processor & communication chip design
The cores structure on the NOC
50 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The Low-power processor & communication chip design LETI - FAUST approach
FAUST: Flexible Architecture of Unified System for Telecom
51 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The Low-power processor & communication chip designFAUST approach
Flexible reconfiguration per application
52 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The Low-power processor & communication chip designFAUST approach
Flexible reconfiguration per application
53 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The Low-power processor & communication chip design
FAUST – IP GALS implementation
54 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The Low-power processor & communication chip design LETI - Alpin approach
Local clock gatingLCG
Sync/Async/Sync synchronizer
SAS
55 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The Low-power processor & communication chip design LETI - Alpin approach
56 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The Low-power processor & communication chip designAlpin approach - DVFS
Dynamic Voltage and Frequency Scaling (DVFS)
57 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The Low-power processor & communication chip designAlpin approach - DVFS
58 H Friedman Fully Asynchronous framework for GALS network on chip 2010
The Low-power processor & communication chip designAlpin approach - DVFS
59 H Friedman Fully Asynchronous framework for GALS network on chip 2010
Heterogeneous design System On Chip Multi processors LETI - MAGALI Approach
Adaptive and flexible structure, provides better fitness to various applications needs.
60 H Friedman Fully Asynchronous framework for GALS network on chip 2010
Heterogeneous design System On Chip (SOC) MAGALI Approach
Multi Processor System on Chip – MPSoCA strong control module is used to configure the data flow and the processing modules connection, in accordance with specific application needs.
61 H Friedman Fully Asynchronous framework for GALS network on chip 2010
Heterogeneous design System On Chip (SOC) MAGALI Approach
Flexible HW structure is designed to allow the connection of two optional cores per application, as needed :1.The DSP unit is aVLIW 32-bit low-power DSP, optimized to handle complex numbers (16-bit I + 16-bit Q), including Complex MAC and Cordix operators.2. The MMC unit is a MicroprogrammableMemory Controller for intensive data manipulation involving synchronization, buffering, duplication and reordering.
A local Configuration and Communication Controller (CCC)Is used for dynamic reconfiguration
62 H Friedman Fully Asynchronous framework for GALS network on chip 2010
Comparison three devices GALS performance
63 H Friedman Fully Asynchronous framework for GALS network on chip 2010
ANOC GALS Adapter performance
64 H Friedman Fully Asynchronous framework for GALS network on chip 2010
Summary
NoCs at SOC Service!
• NoCs are useful in large SOC.
• Better Asynchronous NOC
• Power and frequency scalability & Quality of service provides significant improvement (next session)
65 H Friedman Fully Asynchronous framework for GALS network on chip 2010
References
[1] A Fully-Asynchronous Low-Power Framework for GALS NoC Integration Yvain Thonnart, Pascal Vivet, Fabien
Clermidy CEA-LETI, MINATEC Grenoble, France {yvain.thonnart, pascal.vivet, fabien.clermidy}@cea.fr[2] An Asynchronous NOC Architecture Providing Low Latency Service and its Multi-level Design FrameworkE. Beign. Clermidy, P. VivetCEA-LETI, 38054, Grenoble, FRANCE {edith.beigne; fabien.clermidy; pascal.vivet}@cea.frA. ClouardSTMicroelectronics 38920, Crolles, FRANCE [email protected]. RenaudinTIMA Laboratory, CIS group 38031, Grenoble, FRANCE [email protected][3] DAMQ-Based Approach for Efficiently Using the Buffer Spaces of a NoC RouterMohammad Ali Jabraeil Jamali, Ahmad khademzadeh[4] Asynchronous Techniques for System-on-Chip Design Digital circuit designs that are not sensitive to delay promise to allow operation without clocks for future systems-on-a-chip By Alain J. Martin, Member IEEE, and Mika Nystro¨m, Member IEEE[5] CHAIN: A DELAY-INSENSITIVE CHIP AREA INTERCONNECTTHE INCREASING COMPLEXITY OF SYSTEM-ON-A-CHIP DESIGNS EXPOSES THE LIMITS IMPOSED BY THE STANDARD SYNCHRONOUS BUS. THE AUTHORS PROPOSE A MIXED SYSTEM AS A SOLUTION
66 H Friedman Fully Asynchronous framework for GALS network on chip 2010
SOCs
67 H Friedman Fully Asynchronous framework for GALS network on chip 2010
SOCs
68 H Friedman Fully Asynchronous framework for GALS network on chip 2010
SOCs
69 H Friedman Fully Asynchronous framework for GALS network on chip 2010
SOCs
70 H Friedman Fully Asynchronous framework for GALS network on chip 2010
SOCs
71 H Friedman Fully Asynchronous framework for GALS network on chip 2010
SOCs