8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864
1/17
1
Asynchronous vs. SynchronousAsynchronous vs. Synchronous
NetworkNetwork--onon--ChipChip
Prepared by Sergey RudkoPrepared by Sergey Rudko
Advanced Topics in VLSI 1 (NoC) 049036Advanced Topics in VLSI 1 (NoC) 049036
8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864
2/17
2
IntroductionIntroduction Problem DefinitionProblem Definition
NoC Implementation AlternativesNoC Implementation Alternatives Fully asynchronousFully asynchronous
MultiMulti--synchronous (GALS)synchronous (GALS)
SynchronousSynchronous
Proposed SolutionProposed Solution Systematic Comparison between Different StrategiesSystematic Comparison between Different Strategies
Silicon AreaSilicon Area
Network Saturation ThresholdNetwork Saturation Threshold
Communication ThroughputCommunication Throughput
Packet LatencyPacket Latency Power ConsumptionPower Consumption
Implementation Flexibility and ToolsImplementation Flexibility and Tools
Related ApproachesRelated Approaches I. MiroI. Miro--Panades, F. Clermidy, P. Vivet, A. Greiner,Panades, F. Clermidy, P. Vivet, A. Greiner, Physical Implementation of the DSPINPhysical Implementation of the DSPIN
NetworkNetwork--onon--Chip in the FAUST ArchitectureChip in the FAUST Architecture, NoCs 2008, NoCs 2008
8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864
3/17
3
Synchronous RouterSynchronous Router
Router Pipeline may include many stagesRouter Pipeline may include many stages Increases communication latencyIncreases communication latency
Router Pipeline may be optimized to single cycle routerRouter Pipeline may be optimized to single cycle router Possible by use of speculationPossible by use of speculation
Clock period same as pipeline routerClock period same as pipeline router
Presence of clock simplify designPresence of clock simplify design Standard libraries and toolsStandard libraries and tools
VCAVCA SASARouterRouter
Data pathData path
LINKLINK LINKLINK
A. Kumar, P. Kundu, A. Singh, L. Peh and N. Jha ,
"A 4.6Tbits/s 3.6GHz Single-cycle NoC Router with a Novel Switch Allocator",International Conference on Computer Design (ICCD), October, 2007.
Speculative Control SignalsSpeculative Control Signals
8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864
4/17
4
Limitations ofLimitations of
FullyFully--Synchronous NetworksSynchronous Networks
Difficult to distribute clockDifficult to distribute clock Network spread over die & may have irregular layoutNetwork spread over die & may have irregular layout
MinimisingMinimising skew costs complexity and powerskew costs complexity and power Solution:Solution: Alternatives/extensions to PLL and HAlternatives/extensions to PLL and H--treetree
Single Network Clock FrequencySingle Network Clock Frequency Communicating synchronous IP blocks with different frequenciesCommunicating synchronous IP blocks with different frequencies
What is most appropriate network clock frequency?What is most appropriate network clock frequency?
Problem:Problem: Clock Distribution and Frequency SelectionClock Distribution and Frequency Selection
Solution:Solution: Beyond a Single Global ClockBeyond a Single Global Clock
8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864
5/17
5
Synchronous Routers withSynchronous Routers with
Asynchronous Links (GALS)Asynchronous Links (GALS)
s*
Synchronization is simpleSynchronization is simple TraditionalTraditional 22 FF synchronizersFF synchronizers
Can support asynchronous interconnectsCan support asynchronous interconnects
No longer exploiting periodic nature of router clocksNo longer exploiting periodic nature of router clocks Correct operation is independent of the delay of the linkCorrect operation is independent of the delay of the link
GALS interfaces with pausible clocksGALS interfaces with pausible clocks If necessary clock is stretched, data is always transferred reliablyIf necessary clock is stretched, data is always transferred reliably
Need to construct local delay lineNeed to construct local delay line
RouterRouter RouterRouterAsynchronous FIFO
s* r*
Connect Frequency Independent RoutersConnect Frequency Independent Routers
8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864
6/17
6
Asynchronous NoCsAsynchronous NoCs Simple/elegant solution when networked IP blocks run at differentSimple/elegant solution when networked IP blocks run at different
clock frequenciesclock frequencies Data driven, no superfluous switching activityData driven, no superfluous switching activity
No synchronization/clock alignment issues at interfacesNo synchronization/clock alignment issues at interfaces
Solves synchronization, clock domain crossings, timing, long connectsSolves synchronization, clock domain crossings, timing, long connects
No clock distribution issuesNo clock distribution issues
Security and EMI advantagesSecurity and EMI advantages Clock focuses EM emissionsClock focuses EM emissions
The presence of a clock can also aid faultThe presence of a clock can also aid fault--induction and sideinduction and side--channelchannelanalysis attacksanalysis attacks
Reduced design timeReduced design time Easy to use interfaces, modularityEasy to use interfaces, modularity
Robust and simple implementationRobust and simple implementation
Reduced powerReduced power
But network latency significantly increasedBut network latency significantly increased
8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864
7/17
7
Asynchronous NoCs ApproachesAsynchronous NoCs Approaches
An Asynchronous Router for Multiple Service Levels Networks on Chip,An Asynchronous Router for Multiple Service Levels Networks on Chip,
R. Dobkin et al, ASYNCR. Dobkin et al, ASYNC0505. (QNoC Group). (QNoC Group)
MANGO Clockless NetworkMANGO Clockless Network--onon--ChipChip
A Scheduling Discipline for Latency and Bandwidth Guarantees inA Scheduling Discipline for Latency and Bandwidth Guarantees inAsynchronous NetworkAsynchronous Network--onon--ChipChip,,
T. Bjerregaard and J. Spars, ASYNCT. Bjerregaard and J. Spars, ASYNC0505..
A router Architecture for ConnectionA router Architecture for Connection--Orientated Service Guarantees inOrientated Service Guarantees inthe MANGO Clockless Networkthe MANGO Clockless Network--onon--ChipChip,,
T. Bjerregaard and J. Spars, DATET. Bjerregaard and J. Spars, DATE0505
R. Dobkin Provide Synchronous versus Asynchronous Router StudyR. Dobkin Provide Synchronous versus Asynchronous Router Study
8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864
8/17
8
Synchronous or AsynchronousSynchronous or Asynchronous
NoCs?NoCs?
Physical Implementation of the DSPIN NetworkPhysical Implementation of the DSPIN Network--onon--Chip in the FAUST ArchitectureChip in the FAUST ArchitectureI. MiroI. Miro--Panades, F. Clermidy, P. Vivet and A. GreinerPanades, F. Clermidy, P. Vivet and A. Greiner
NoCsNoCs 20082008
8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864
9/17
9
MotivationMotivation Physically implement the DSPIN NoC into thePhysically implement the DSPIN NoC into the
FAUST application platformFAUST application platform
Compare the performances between ANOC andCompare the performances between ANOC and
DSPIN on a real application and trafficDSPIN on a real application and traffic Silicon AreaSilicon Area
ThroughputThroughput
Packet LatencyPacket Latency
Power ConsumptionPower Consumption
8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864
10/17
10
FAUST Architecture with ANOCFAUST Architecture with ANOC
Asynchronous NoC (ANOC)Asynchronous NoC (ANOC) QDIQDI 44--phase/phase/44--rail asynchronous logicrail asynchronous logic
2020 RoutersRouters 55 port routerport router
Source routingSource routing
Wormhole packet switchWormhole packet switch
3232 bit payloadbit payload
GALS ConceptionGALS Conception
2424 independent clocksindependent clocks FIFO based InterfaceFIFO based Interface
HardHard--macro approach for ANOC reusemacro approach for ANOC reuse
8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864
11/17
11
DSPIN ArchitectureDSPIN Architecture
Packet BasedPacket Based Distributed Router ArchitectureDistributed Router Architecture
Suited for GALS ApproachSuited for GALS Approach
Mesochronouse links between routersMesochronouse links between routers
Metastability Resolved by Metastability Resolved by bibi--synchronoussynchronous FIFO FIFO
Synthesizable with Standard CellsSynthesizable with Standard Cells
8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864
12/17
12
DSPIN Clock TreeDSPIN Clock Tree
Mesochronous Link between Neighbor RoutersMesochronous Link between Neighbor Routers
8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864
13/17
13
NoC Architecture ComparisonNoC Architecture Comparison
Both implementation use GALS principlesBoth implementation use GALS principles
8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864
14/17
14
Network ComparisonNetwork Comparison
DSPIN clock-tree Consumes as much Power as the Router Itselftself
ParameterParameter ANOCANOC DSPINDSPIN
ImplementationImplementation HardHard--MacroMacro SoftSoft--MacroMacro
AreaArea 0.281 mm 0.187 mm
ThroughoutThroughout(worst case conditions(worst case conditions))
~~ 160160Mflit/sMflit/s 289289Mflit/sMflit/s
ThroughoutThroughout
(nominal conditions)(nominal conditions)
~~ 220220Mflit/sMflit/s 408408Mflit/sMflit/s
Power Consumption (F=150MHz)Power Consumption (F=150MHz) 3.69mW3.69mW 5.89mW5.89mW
Power Consumption (F=250MHz)Power Consumption (F=250MHz) 3.69mW3.69mW 10.39mW10.39mW
DSPIN throughput is deterministic with respect to the clock frequencyDSPIN throughput is deterministic with respect to the clock frequency
DSPIN Power IssuesDSPIN Power Issues Power consumption mainly dominated by FIFO data registersPower consumption mainly dominated by FIFO data registers
The DSPIN clockThe DSPIN clock--gating reduced the power consumption by 67%gating reduced the power consumption by 67%
8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864
15/17
15
Network ComparisonNetwork Comparison -- LatencyLatency
DSPIN Router is IP Data Locality Aware
DSPIN routers resynchronize the data packetsDSPIN routers resynchronize the data packets
DSPIN should be clocked toDSPIN should be clocked to 367367MHzMHz
Flit PathFlit Path ANOCANOC DSPINDSPIN ANOCANOC DSPINDSPIN
F=F=150150MHzMHz F=F=250250MHzMHz
Intermediate Router LatencyIntermediate Router Latency 6.80 ns 1616..6666 nsns 66..8080 nsns 10.00 ns
First + Last Router LatencyFirst + Last Router Latency 6060..0000 nsns 5656..6666 nsns 4747..0000 nsns 3434..0000 nsns
Latency for 5 hops pathLatency for 5 hops path 8080..0000 nsns 106106..6666 nsns 6868..0000 nsns 6464..0000 nsns
Latency for 9 hops pathLatency for 9 hops path 106.66 ns106.66 ns 173.30 ns173.30 ns 96.00 ns96.00 ns 104.00 ns104.00 ns
8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864
16/17
16
ConclusionConclusion Little published work on asynchronous routers and networksLittle published work on asynchronous routers and networks
Comparing synchronous and asynchronous designs are difficultComparing synchronous and asynchronous designs are difficult System timing styleSystem timing style
TechnologyTechnology
Circuit style and architectureCircuit style and architecture
Difficult to reproduce and simulate asynchronous designs fromDifficult to reproduce and simulate asynchronous designs frompublished workpublished work No notion of cycleNo notion of cycle--accurate modelaccurate model
Hide detailed control and datapath delaysHide detailed control and datapath delays
Asynchronous Performance GuaranteesAsynchronous Performance Guarantees Performance guarantees are requiredPerformance guarantees are required
Less predictable, nonLess predictable, non--deterministicdeterministic Predicting performance is more complexPredicting performance is more complex
Asynchronous EDA Tool RequirementsAsynchronous EDA Tool Requirements
Synchronous RoutersSynchronous Routers Predictability and determinism can be exploitedPredictability and determinism can be exploited
Fast single cycle routers possibleFast single cycle routers possible
ANoC for Low Power & SNoC for Small AreaANoC for Low Power & SNoC for Small Area
8/8/2019 Serhey Rudko 049036 SNoC ANoC Id 309501864
17/17
17