+ All Categories
Home > Documents > Technion NOC Tutorial Part Four - University of California...

Technion NOC Tutorial Part Four - University of California...

Date post: 09-Mar-2018
Category:
Upload: trandan
View: 216 times
Download: 0 times
Share this document with a friend
18
The Asynchronous NOC Ran Ginosar NOCS Tutorial San Diego, 10 May 2009 Ran Ginosar Async Noc Tutorial NOCS 2009 2 SoC Each color is a separate clock domain R R R R R R R R R R R R R R R R R Ran Ginosar Async Noc Tutorial NOCS 2009 3 SoC What clock for the interconnect? Fastest? Opportunistic? None? R R R R R R R R R R R R R R R R R Ran Ginosar Async Noc Tutorial NOCS 2009 4 Conceptual Summary NOCs are for large SOCs Large SOCs = multiple clock domains NOCs should be asynchronous Two complementary research areas: – Asynchronous routers simplify design, low power – Asynchronous interconnect high bandwidth, low power Problem: need special CAD, special methodology – Solutions: deliver and use as “configurable hard IP core” use only at physical design phase deliver as predesigned infrastructure (FPGA, SOPC)
Transcript
Page 1: Technion NOC Tutorial Part Four - University of California ...circuit.ucsd.edu/~nocs2009/talks/Technion NOC Tutorial Part Four.pdf · The Asynchronous NOC Ran Ginosar NOCS Tutorial

The Asynchronous NOC

Ran Ginosar

NOCS Tutorial

San Diego, 10 May 2009

Ran Ginosar Async Noc Tutorial NOCS 2009 2

SoC

• Each color is a separate clock domain

R

R

R R R

R

RR R R R

R R

R

R

R

R

Ran Ginosar Async Noc Tutorial NOCS 2009 3

SoC

• What clock for the interconnect?– Fastest?– Opportunistic?– None?

R

R

R R R

R

RR R R R

R R

R

R

R

R

Ran Ginosar Async Noc Tutorial NOCS 2009 4

Conceptual Summary

• NOCs are for large SOCs

• Large SOCs = multiple clock domains

→ NOCs should be asynchronous

• Two complementary research areas: – Asynchronous routers

• simplify design, low power

– Asynchronous interconnect• high bandwidth, low power

• Problem: need special CAD, special methodology– Solutions:

• deliver and use as “configurable hard IP core”

• use only at physical design phase

• deliver as predesigned infrastructure (FPGA, SOPC)

Page 2: Technion NOC Tutorial Part Four - University of California ...circuit.ucsd.edu/~nocs2009/talks/Technion NOC Tutorial Part Four.pdf · The Asynchronous NOC Ran Ginosar NOCS Tutorial

Ran Ginosar Async Noc Tutorial NOCS 2009 5

Contextual Summary

• Async routers– QNoC (Technion, Israel)

– Other research (borrowed slides acknowledged and referenced)• Faust (LETI, France)

• Alpin (LETI, France)

• Mango (DTU, Denmark)

• Async Interconnect– FOX (Technion, Israel)

The QNoC Async Router

Ran Ginosar Async Noc Tutorial NOCS 2009 7

NoC router 101

SL1INPUT PORT SL1OUTPUT PORT

SL1INPUT PORT

SL1INPUT PORT SWITCH

SL1OUTPUT PORT

SL1OUTPUT PORT

Ran Ginosar Async Noc Tutorial NOCS 2009 8

Single-Service-Level Router

InputPort

InputPort

OutputPort

OutputPort

Page 3: Technion NOC Tutorial Part Four - University of California ...circuit.ucsd.edu/~nocs2009/talks/Technion NOC Tutorial Part Four.pdf · The Asynchronous NOC Ran Ginosar NOCS Tutorial

Ran Ginosar Async Noc Tutorial NOCS 2009 9

Single-Service-LevelInput-Port

Ran Ginosar Async Noc Tutorial NOCS 2009 10

Single-Service-LevelOutput-Port

Adding multiple service levels

Ran Ginosar Async Noc Tutorial NOCS 2009 12

Multi-Service-Level Input-Port

Reuse of the

Single-Service-

Level Input-Port

Reuse of the

Single-Service-

Level Input-Port

Page 4: Technion NOC Tutorial Part Four - University of California ...circuit.ucsd.edu/~nocs2009/talks/Technion NOC Tutorial Part Four.pdf · The Asynchronous NOC Ran Ginosar NOCS Tutorial

Ran Ginosar Async Noc Tutorial NOCS 2009 13

Multi-Service-Level Output-Port

SSL-OP

(SL0) Ro4-Way

SPA

G_SL3

Ro

Ao

Do

BT

A_SL0

RH_SL0

RBT_SL0

D_SL0 Di

A

H

SSL-OP

(SL2)

Ro

Ao

Do

SSL-OP

(SL3)

Ro

Ao

Do

SSL-OP

(SL1)

Ro

Ao

Do

Ao

Dout

Ao_0

Ao_1

Ao_2

Ao_3

Ro_0

Ro_1

Ro_2

Ro_3

Do_0

Do_1

Do_2

Do_3

Gate

G_SL1G_SL2G_SL3G_SL4

4

4

4

4*Flit

BT

A_SL1

RH_SL1

RBT_SL1

D_SL1 Di

A

H4

4

4

4*Flit

BT

A_SL2

RH_SL2

RBT_SL2

D_SL2 Di

A

H4

4

4

4*Flit

BT

A_SL3

RH_SL3

RBT_SL3

D_SL3 Di

A

H4

4

4

4*Flit

SL

Index

S* R

Reuse of the

Single-Service-Level

Output-Port

Reuse of the

Single-Service-Level

Output-Port

Inter-Service-

Level Arbitration

Inter-Service-

Level Arbitration

Ran Ginosar Async Noc Tutorial NOCS 2009 14

Buffering and Credits

MonitoringActivity

Ran Ginosar Async Noc Tutorial NOCS 2009 15

PerformanceASYNC routerSYNC router

MHz-270.2 **Max Clock Frequency

Mflits/s75.267.6Max Data Rate

ns (CLK)13.314.8 (4)Data Cycle

ns (CLK)13.0/9.2 *3.7 (1)Min Latency (Input to Output)

620880Number of FFs+Latches

34,20070,000 Number of transistors

Gates8,50017,500 Equivalent Gates (2-in NAND)

µm2470,000960,000Cell Area

* Latency for async router specified for header / body flits.** Synchronous router has a critical path of ~20 FO4 gate delays, matching or outperforming other published results.

Ran Ginosar Async Noc Tutorial NOCS 2009 16

The critical path spans multiple routers

IP

OP

IP

OP

Multi-Service-Level Router Critical Path

Single-Service-Level Router Critical Path

Router Router

Page 5: Technion NOC Tutorial Part Four - University of California ...circuit.ucsd.edu/~nocs2009/talks/Technion NOC Tutorial Part Four.pdf · The Asynchronous NOC Ran Ginosar NOCS Tutorial

Adding virtual channels

Ran Ginosar Async Noc Tutorial NOCS 2009 18

Asynchronous Router 2D Structure

VC classification

SL classificationSending Request

to Output Port

Virtual Channel

Admission Control

Intra-Service

Level Arbitration

Inter-Service

Level Arbitration

Ran Ginosar Async Noc Tutorial NOCS 2009 19

Conclusions• Async routers

– less area than sync routers

– no need for global or local clocks

– no need for synchronization

• Highly configurable– Ports

– Service Levels

– Virtual Channels (inside each service level)

• Dynamic virtual channel allocation

• Fast and fair asynchronous arbitration

• Simulated: 200MFlits/s @ 0.18µm – 5 ports, 4 x SL, 2 x VC

FAUST

CEA-LETI

Grenoble, France

Page 6: Technion NOC Tutorial Part Four - University of California ...circuit.ucsd.edu/~nocs2009/talks/Technion NOC Tutorial Part Four.pdf · The Asynchronous NOC Ran Ginosar NOCS Tutorial

Ran Ginosar Async Noc Tutorial NOCS 2009 21

FAUST• Actual chip demonstrating async NOC

– CEA LETI (France)

• Circa 2005, 0.13µ– ST microelectronics (France)

Ran Ginosar Async Noc Tutorial NOCS 2009 22

The FAUST environmentT

est L

inks

ConverterModule

Host Interface

Data

Co

mp

uti

ng

an

dS

tora

ge

asp

ects

Ethernet

IF unit CPU

unit

RAM

unit

HW

unit

Ext RAM

Ctrl unit

FAUST

FAUST FPGA

FPGA

Ext RAM

Ctrl unit

Eth

TRX

Ethernet

IF unit

Eth

TRX

RF1

IF unit

RF1 module

DACDAC

ADCADC

Analog

TX

RX

RF2

IF unit

RF2 module

DACDAC

ADCADC

Analog

TX

RX

Flash

FPGA

JTAG link

Flash

LVDS Test1 links

Test2 links

Ext RAM

Ctrl unit

Ext RAM

Ctrl unit

RAM

RAM

RAM

RAM

LVDS

MTETH FPGA

Standard component

Integrated unit (FAUST – FPGA)

ETH FAUST

Ethernet

IF unit CPU

unit

RAM

unit

HW

unit

Ext RAM

Ctrl unit

FAUST

FAUST FPGA

FPGA

Ext RAM

Ctrl unit

Eth

TRX

Ethernet

IF unit

Eth

TRX

RF1

IF unit

RF1 module

DACDAC

DACDAC

ADCADC

ADCADC

Analog

TX

RX

RF2

IF unit

RF2 module

DACDAC

DACDAC

ADCADC

ADCADC

Analog

TX

RX

Flash

FPGA

JTAG link

Flash

LVDS Test1 links

Test2 links

Ext RAM

Ctrl unit

Ext RAM

Ctrl unit

RAM

RAM

RAM

RAM

LVDS

MTETH FPGA

Standard component

Integrated unit (FAUST – FPGA)

ETH FAUST

Mixed FAUST / FPGA platform

Board: Sept 2005

Ran Ginosar Async Noc Tutorial NOCS 2009 23

The FAUST architecture

RAM IF

58 Pads

NOC2 IF

83 Pads

OFDMMOD.

ALAM.MOD.

CDMAMOD. MAPP.

BITINTER.

TURBOCODER

RAM CPU RAMEXT.RAMCTRL

AHB

ROTOR EQUAL.CHAN.EST.

CONV.DEC.

ETHERNET

FRAMESYNC.

ODFMDEM.

CDMADEM.

DE-MAPP.

DE-INTER.

DART

EXP

SPort

APort

NOC1 IF

84 Pads

SPort

APort

RAC

NoCPerf.

EXP

CONV.CODER

Clk & Test CTRL

IPs from ext. partners

Async�Sync Interf.

ANOC async. nodes

Latency and throughput

NoC performance analysis

Clk control for power

management (24 clocks)

Direct NoC accessAHB to NoC bridge

Reconf.

Data paths

DMA complex blocks

23 NOC units

(from 100 kgates up to 1Mgates)

Ran Ginosar Async Noc Tutorial NOCS 2009 24

FAUST chip description & Floor-plan

AHB

DART

ARM946

TX units

RX units

ETH unit

RAMs

units

NOC

nodesand

links

• Tech.: STM/HCMOS9GPLL (0.13µ)

• 23 NoC units (166 MHz)

• RAM blocks : 81

• CPU: ARM946

• 4.5 Mgates

• 275 I/Os

• Core area = 70 mm2

• Chip area = 80 mm2

• Package : TBGA 420

• Core power supply: 1.2 V

• I/O power supply: 3.3 V

• Power Consumption : 3 Watts max

Page 7: Technion NOC Tutorial Part Four - University of California ...circuit.ucsd.edu/~nocs2009/talks/Technion NOC Tutorial Part Four.pdf · The Asynchronous NOC Ran Ginosar NOCS Tutorial

Ran Ginosar Async Noc Tutorial NOCS 2009 25

FAUST Node Architecture

• QDI 4 phases + multi-rail protocol

– 90 wires per NOC link

Asynchronous Node 2

Asynchronous Node 1

IP OP

Synchronous

Resource

Data + Send

Accept1

Accept1

Data + Send OP

IP

OP

IP

IP

OP

OP IP

Data +

Sen

d

Acce

pt1

Data

+ S

end

Acc

ept1

Accept1

Data + Send

NORTH

SOUTH

WEST

EAST

OP

Synchronous

Resource

async/sync

Interface

Accept1

Data + Send

IP

IP

OP

OP

IP

IP

OP

OP IP

Data +

Sen

d

Acce

pt1

Data

+ S

end

Acc

ept1

Accept1

Data + Send

NORTH

SOUTH

WEST EAST

async/sync

Interface

Ran Ginosar Async Noc Tutorial NOCS 2009 26

FAUST Node Input Controller

• 2 Virtual Channels

Get_new_flit

Get_priority_bit

Loop_R0

IP_data[33:0]

R0_to_OP0

R0_to_OPn-1

R1_to_OP0

R 1_to_OPn-1

CTRL_shift

R0

R1

Split_R0

Split_Rk-1

IP_send

CUR_R0

CUR_R1

Loop_R1

NXT_R1

NXT_R0

Valid_R0_to_OP0

Valid_R1_to_OPn-1

BopR1_toOP0

BopR1_toOPn-1

Send_accept1 IP_accept1

×n

×n

×n

×n

×n

Accept1_fromOP0

Accept1_fromOPn-1

Ran Ginosar Async Noc Tutorial NOCS 2009 27

FAUST Node Output Controller• Arbitration for VC0

static (N/E/S/W)

• Arbitration for VC1

uses a priority list

(first-arrived first-served)

• 34-bit data Switch

Arbiter

Next_state

CUR_STATE

NXT_STATE

CUR_SUSPENDED

NXT_SUSPENDED

Switch

R0_fromIP0

R0_fromIPn-1

R1_fromIP0

R1_fromIPn-1

OP_accept1 valid_R0_fromIP0

valid_R0_fromIPn-1

valid_R1_fromIP0

valid_R1_fromIPn-1

bopR1_fromIP0

bopR1_fromIPn-1

R0_eop_fromIP0

R0_eop_fromIPn-1

R1_eop_fromIP0

R1_eop_fromIPn-1

Accept1_toIP0

Accept1_toIPn-1

FAFS

OP_data[33:0]

List1

CUR_FROM

CUR_PRIO

NXT_FROM

NXT_PRIO

TO_SUSPEND

OP_send

CTRL_SWITCH

×n

×n

×n

×n

×k

×n

×n

×n

×n

var_state

var_suspended

Ran Ginosar Async Noc Tutorial NOCS 2009 28

FAUST GALS interface

• The GALS interface: A synchronizer

• Uses dual-clock fifo (2 stages)– Allows sync & async transfers every clock cycles

Asynchronous

to

Synchronous

Interface

Edata[67:0]

Edata_ack[16 :0]

Esend[1:0]

Esend_ack

Eaccept1

Eaccept1_ack

n_data[33:0]

n_send[1:0]

u_accept[1:0]

Synchronous

to

Aynchronous

Interface

Sdata[67:0]

Sdata_ack[16 :0]

Ssend[1:0]

Ssend_ack

Saccept1

Saccept1_ack

u_data[33:0]

u_send[1:0]

n_accept[1:0]

Synchronous

UnitANOC

Unit

connec

tion

North

West

South Clock

Page 8: Technion NOC Tutorial Part Four - University of California ...circuit.ucsd.edu/~nocs2009/talks/Technion NOC Tutorial Part Four.pdf · The Asynchronous NOC Ran Ginosar NOCS Tutorial

Ran Ginosar Async Noc Tutorial NOCS 2009 29

FAUST GALS interface• Design is based on Gray-

code Dual-clock fifo’s– Main objective is to provide a full std-cell design approach

– Modification required for the async side

• NOC interface– A-to-S + S-to-A fifo’s

– One fifo per Virtual Channel

Fifo Gray

A - S

write_clock

write_data[33:0]

write_enable

full

read_clock

read_data[33:0]

read_enable

empty

A-to-S VC0

A-to-S VC1

S-to-A VC0

S-to-A VC1

Asynchronous side :

Data /Send /Accept

NOC link QDI 4-phase

Synchronous side :

Data /Send /Accept

NOC link synchronous version

A-to-S interface

S-to-A interface

Ran Ginosar Async Noc Tutorial NOCS 2009 30

FAUST external interfaces

• NOC access in dual-mode : sync / async– Good for debug / explore asynchronous external connections

• For async mode, convert 4-rail/4-phase to bundled-data/2-phase

A-to-S

QDI-to-BD

S-to-A

BD-to-QDI

to/from ANOC

Data / Send /Accept

NOC link QDI 4-phase

to/from Chip Pads

Data / Send /Accept NOC link :

- synchronous version + clk

- bundled-data async version

sync / async

interface mode

A-to-S

QDI-to-BD

S-to-A

BD-to-QDI

to/from ANOC

Data / Send /Accept

NOC link QDI 4-phase

to/from Chip Pads

Data / Send /Accept NOC link :

- synchronous version + clk

- bundled-data async version

sync / async

interface modeRAM IF

58 Pads

ETHERNET IF

17 Pads

On chip NOC GALS IF

ANOC asynchronous node

OFDMMOD.

ALAM.MOD.

CDMAMOD.

MAPP.BIT

INTER.TURBOCODER

RAM CPU RAMEXT.RAMCTRL

AHB

ROTOR EQUAL.CHAN.EST.

CONV.DEC.

ETHERNET

FRAMESYNC.

ODFMDEM.

CDMADEM.

DE-MAPP.

DE-INTER.

SPort

APort

NOC1

PORT

SPort

APort

NoCPerf.

CONV.CODER

Clk & Test CTRL

NOC2

PORT

Off chip NOC IF

Ran Ginosar Async Noc Tutorial NOCS 2009 31

FAUST Performance Results• ANOC performances (worse case, 5/5 nodes)

– Per async Node : 150 Mflit/s, 5 ns Latency– Per GALS interf. : 120 Mflit/s, 10 ns Latency

• Area results (HCMOS9)– Network Interface (wo config reg.) : 10 kgates– ANoC node (requires specific async cells)

• NoC node : 20 kgates• GALS interface : 15 kgates

– 45 kgates totally per NoC block unit to provide :• communication, Quality-of-Service, configurability,• robustness & multi-clock domains

– OK for units with average complexity of about 300 kgates (~15%)

• NoC communication overhead– NI credit mechanism + packet headers : ~10% total NoC throughput– Virtual channel (low latency packets) : 50 % area of NoC node + GALS IF

NODE

GALS

NI

Unit

North

West

South

East

Unit

Clock

ALPIN

CEA-LETI

Grenoble, France

Page 9: Technion NOC Tutorial Part Four - University of California ...circuit.ucsd.edu/~nocs2009/talks/Technion NOC Tutorial Part Four.pdf · The Asynchronous NOC Ran Ginosar NOCS Tutorial

Ran Ginosar Async Noc Tutorial NOCS 2009 33

ALPIN

• Claim 1: Async NOC (=GALS SOC) easily enables dynamic voltage and frequency scaling (DVFS)– Lower voltage to some modules when slow

– Power off to some modules to save leakage

– Sync modules use “pausable clock”• When voltage and frequency change, local module clock is paused momentarily

• Claim 2: Routers used lightly– Shut off when idle

– Easy when async

• Based on FAUST

• Another actual chip ☺– ST Micro 65nm

Ran Ginosar Async Noc Tutorial NOCS 2009 34

ALPIN

SAS=Sync/Async/Sync synchronizer LCG=Local Clock Gating

Ran Ginosar Async Noc Tutorial NOCS 2009 35

ALPIN Unit: Sync core, Async DVFS Wrapper

• Power Modes:– High

– Low

– Changing

– Retention

– Off

Ran Ginosar Async Noc Tutorial NOCS 2009 36

ALPIN Claim 2: Routers used lightly

Total idle � 90%. If shut off, save 90% of leakage in routers

Page 10: Technion NOC Tutorial Part Four - University of California ...circuit.ucsd.edu/~nocs2009/talks/Technion NOC Tutorial Part Four.pdf · The Asynchronous NOC Ran Ginosar NOCS Tutorial

Ran Ginosar Async Noc Tutorial NOCS 2009 37

ALPIN auto-off router architecture

Ran Ginosar Async Noc Tutorial NOCS 2009 38

ALPIN auto-off router layout

• Area:– Typical unit 0.2 mm2

– Core: 85%

– Level shifters: 13%

– Activity detection: 2%

– Power switch: 0%

MANGO

Technical University Denmark

Ran Ginosar Async Noc Tutorial NOCS 2009 40

MANGO

• Guaranteed service– In addition to ”best effort” service

– Not the same as 2 service levels

– Not the same as 2 virtual channels with different priorities

• Connection-oriented service guarantees– Allocate resources (reserve VCs) source-to-destination

• Service levels are applied hop-by-hop

– Similar to circuit switching • Vs. packet switching

• Async / GALS– Cf. Sync GS/BE in Philips/NXP Aethereal

Page 11: Technion NOC Tutorial Part Four - University of California ...circuit.ucsd.edu/~nocs2009/talks/Technion NOC Tutorial Part Four.pdf · The Asynchronous NOC Ran Ginosar NOCS Tutorial

Ran Ginosar Async Noc Tutorial NOCS 2009 41

MANGO Router

Ran Ginosar Async Noc Tutorial NOCS 2009 42

MANGO GS router

Non-blocking

previous

router

Ran Ginosar Async Noc Tutorial NOCS 2009 43

MANGO simulation

• 5x5 MANGO router: 8 VCs - 32 bits– Connection-oriented GS (fair-share)

– Connection-less BE

• Clockless circuits 130nm std cells

• Results:– 795 MHz (typical) / 515 MHz (worst case)

– Area: 0.188 mm2 (pre-layout)

Ran Ginosar Async Noc Tutorial NOCS 2009 44

Async Router Conclusions

• Eliminate clocks in the NoC– Useful for heterogeneous Multi-Clock-Domain SoCs

– Useful for multi-voltage domain SoCs

– Facilitates modularity

– Helps timing closure of large SoCs

– Facilitates DVFS

– Can prioritize or guarantee service

• Handshake may slow traffic– Need careful design

• More room for improvement

Page 12: Technion NOC Tutorial Part Four - University of California ...circuit.ucsd.edu/~nocs2009/talks/Technion NOC Tutorial Part Four.pdf · The Asynchronous NOC Ran Ginosar NOCS Tutorial

FOX:Fast async serial On-chip Interconnect

Ran Ginosar Async Noc Tutorial NOCS 2009 46

Outline

• Why Serial Link

• Architecture– Transmitter

– Receiver

– Channel

Ran Ginosar Async Noc Tutorial NOCS 2009 47

Why serial?

• Less interconnect area

• Less routing congestion

• Less coupling

• Lower power

• Relative improvement scales

• Must be FAST!

• For example: – FOX serial link

• 1 gate delay bit cycle

– Fully-shielded parallel link• 8 gate delay clock cycle

– Equal bit-rate

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

180 130 90 65 30 15

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

180 130 90 65 30 15

Parallel Link dissipates less power

Serial Link dissipates less power

Technology Node [nm]

Link Length [mm]

Parallel Link requires less area

Serial Link requires less area

Ran Ginosar Async Noc Tutorial NOCS 2009 48

Fast serial highways vs. slow parallel local roads

Page 13: Technion NOC Tutorial Part Four - University of California ...circuit.ucsd.edu/~nocs2009/talks/Technion NOC Tutorial Part Four.pdf · The Asynchronous NOC Ran Ginosar NOCS Tutorial

Ran Ginosar Async Noc Tutorial NOCS 2009 49

FOX Serial Link

• Transition signaling (instead of sampling): – 2-phase Level Encoded Dual Rail (LEDR) async protocol (aka

data-strobe, DS)

• Differential encoding (DS-DE, IEEE1355-95)

• Wave-pipelining over channel

• Acknowledge per word (instead of per bit)

• Low-latency synchronizers

Ran Ginosar Async Noc Tutorial NOCS 2009 50

Encoding: 2-phase LEDR

Strobe (S)

Data (D)

0 0 1 1 0 0 0 0 1 0

Ran Ginosar Async Noc Tutorial NOCS 2009 51

FOX Transmitter

• Data cycle: One gate delay between bits– 15 ps @ 65nm (30ps @ low-power 65nm)

Ran Ginosar Async Noc Tutorial NOCS 2009 52

Transmitter: Transition Generator

Adapted from M.J.E. Lee, "An Efficient I/O and Clock Recovery for TERABIT Integrated Circuits Design,“ PhD Thesis, Stanford Univ., 2001.

Page 14: Technion NOC Tutorial Part Four - University of California ...circuit.ucsd.edu/~nocs2009/talks/Technion NOC Tutorial Part Four.pdf · The Asynchronous NOC Ran Ginosar NOCS Tutorial

Ran Ginosar Async Noc Tutorial NOCS 2009 53

Transmitter: Fast Async Shift Register

Ran Ginosar Async Noc Tutorial NOCS 2009 54

Transmitter: Encoder + Combiner

Ran Ginosar Async Noc Tutorial NOCS 2009 55

Example: SR-Element layout

Ran Ginosar Async Noc Tutorial NOCS 2009 56

FOX Receiver

Page 15: Technion NOC Tutorial Part Four - University of California ...circuit.ucsd.edu/~nocs2009/talks/Technion NOC Tutorial Part Four.pdf · The Asynchronous NOC Ran Ginosar NOCS Tutorial

Ran Ginosar Async Noc Tutorial NOCS 2009 57

Receiver: Decoder + Splitter

Ran Ginosar Async Noc Tutorial NOCS 2009 58

Toggle Circuit

• New circuit

• Single gate delay operation

Ran Ginosar Async Noc Tutorial NOCS 2009 59

Channel

• Channel driver and receiver

– Differential voltage mode

– Differential current mode

– Single-ended (just repeaters)

• Channel layout

• Interconnect modeling

Ran Ginosar Async Noc Tutorial NOCS 2009 60

Differential Voltage Mode

• Current mode differential low-swing transmit

• Differential voltage receive

• Voltage swing � high power, low speed

P / S P / S

Page 16: Technion NOC Tutorial Part Four - University of California ...circuit.ucsd.edu/~nocs2009/talks/Technion NOC Tutorial Part Four.pdf · The Asynchronous NOC Ran Ginosar NOCS Tutorial

Ran Ginosar Async Noc Tutorial NOCS 2009 61

Differential Current Mode: Goals

• Full Current Mode:– Send current from TX

– Measure current at RX

• Minimal voltage swing over channel– Low power, high speed

• Current to voltage conversion at RX

• Long range channel without repeaters

Ran Ginosar Async Noc Tutorial NOCS 2009 62

Differential Current Mode:Transmitter and Receiver

OutputStage

FS

Ran Ginosar Async Noc Tutorial NOCS 2009 63

Channel wires

• Four wires

• Thick metal

• Single layer: non-disruptive to routing

D D S S

Ran Ginosar Async Noc Tutorial NOCS 2009 64

Channel wire model

2 1 2 2 1

2 2

1 1 2 2 1 2 2 1

2 22 21 21 1 2 2

2 2

1 2

1 ( )

( )

( 1) ( 1)DC DC

L L L L

s L R s L R L L R RZ s sL R

L Ls L R s L R

R R

ω

ω ω

∆ ∆ ∆ ∆+ ⋅ +

∆ ∆ ∆ ∆ ∆ + ∆ ∆ ∆= + + + =

∆ ∆∆ + ∆ ∆ + ∆⋅ + ⋅ ⋅ +∆ ∆

Page 17: Technion NOC Tutorial Part Four - University of California ...circuit.ucsd.edu/~nocs2009/talks/Technion NOC Tutorial Part Four.pdf · The Asynchronous NOC Ran Ginosar NOCS Tutorial

Ran Ginosar Async Noc Tutorial NOCS 2009 65

FOX Status

• Four years study and design

• Simulations show:– 65 Gbps over 7mm

– All corners and ±5σ in-die variations• Thanks to async operation

• Presently going for fab on IBM 65nm (MOSIS)

Ran Ginosar Async Noc Tutorial NOCS 2009 66

FOX Summary

• Fastest possible digital on-chip serial interconnect– Data rate of a single gate delay

• Asynchronous does it

• To be used as “hard IP core”

Ran Ginosar Async Noc Tutorial NOCS 2009 67

Summary

• NOCs are for large SOCs

• Large SOCs = multiple clock domains

→ NOCs should be asynchronous

• We reviewed two complementary areas: – Async routers

– High speed async serial interconnect

Ran Ginosar Async Noc Tutorial NOCS 2009 68

References• QNoC Async Router

– R. Dobkin, V. Vishnyakov, E. Friedman, R. Ginosar, An asynchronous router for multiple service levels networks on chip, ASYNC 2005.

– R. Dobkin, R. Ginosar and I. Cidon, QNoC Asynchronous Router with Dynamic Virtual Channel Allocation, NOCS 2007.

– R. Dobkin, R. Ginosar and A. Kolodny, QNoC Asynchronous Router, Integration—The VLSI Journal, 42(2):103-115, 2009.

• FAUST– E. Beigné, F. Clermidy, P. Vivet, A. Clouard, M. Renaudin, An Asynchronous NOC Architecture Providing Low

Latency Service and its Multi-level Design Framework, ASYNC 2005.

• ALPIN– E. Beigné, F. Clermidy, S. Miermont, P. Vivet, Dynamic Voltage and Frequency Scaling Architecture for Units

Integration within a GALS NoC, ASYNC 2008.– Y. Thonnart, E. Beigné, A. Valentian, P. Vivet, Automatic Power Regulation based on an Asynchronous

Activity Detection and its Application to ANOC Node Leakage Reduction, NOCS 2008.

• MANGO– T. Bjerregaard, J. Sparso, A scheduling discipline for latency and bandwidth guarantees in asynchronous

network-on-chip, ASYNC 2005.– T. BJERREGAARD, J. SPARSØ, A router architecture for connection-oriented service guarantees in the

MANGO clockless network-on-chip, DATE 2005.

• FOX– R. Dobkin, R. Ginosar and A. Kolodny, Fast Asynchronous Shift Register for Bit-Serial Communication,

ASYNC 2006. – R. Dobkin, Y. Perelman, T. Liran, R. Ginosar, and A. Kolodny, High rate wave-pipelined asynchronous on-

chip bit-serial data link, ASYN 2007. – R. Dobkin, A. Morgenshtein, A. Kolodny, R. Ginosar, Parallel vs. Serial On-Chip Communication, SLIP 2008. – R. Dobkin, M. Moyal, A. Kolodny and R. Ginosar, Asynchronous Current Mode Serial Communication, IEEE

Trans. On VLSI, 2009.

• Others– T. Felicijan, S.B. Furber, An asynchronous on-chip network router with Quality-of-Service (QoS) support,

Int. SOC Conf. (2004) 274–277.

Page 18: Technion NOC Tutorial Part Four - University of California ...circuit.ucsd.edu/~nocs2009/talks/Technion NOC Tutorial Part Four.pdf · The Asynchronous NOC Ran Ginosar NOCS Tutorial

Ran Ginosar Async Noc Tutorial NOCS 2009 69

More Literature– S. Moore, G. Taylor, R. Mullins, P. Robinson, Point to point GALS interconnect, ASYNC (2002) 69–75.

– S. Oetiker, F.K. Gu¨ rkaynak, T. Villiger, H. Kaeslin, N. Felber, W. Fichtner, Design flow for a 3-million transistor GALS test chip, ACiD Workshop (2003).

– T. Villiger, H. Kaeslin, F.K. Gurkaynak, S. Oetiker, Wolfgang Fichtner, Self-timed ring for globally-asynchronous locally-synchronous systems, ASYNC (2003) 141–150.

– J. Muttersbach, T. Villiger, W. Fichtner, Practical design of globally asynchronous locally-synchronous systems, ASYNC (2000) 52–61.

– K.Y. Yun, R.P. Donohue, Pausible clocking-based heterogeneous systems, TVLSI 7 (4) (1999) 482–488.

– A.E. Sjogren, C.J. Myers, Interfacing synchronous and asynchronous modules within a high-speed pipeline, TVLSI 8 (5) (2000) 573–583.

– R. Dobkin, R. Ginosar, C.P. Sotiriou, High rate data synchronization in GALS SoCs, TVLSI 14 (10) (2006) 1063–1074.

– Y. Semiat, R. Ginosar, Timing measurements of synchronization circuits, ASYNC (2003) 68–77.

– R. Kol, R. Ginosar, Adaptive synchronization, ICCD (1998) 188–189.

– D.J. Kinniment, Synchronization and Arbitration in Digital Systems, Wiley, New York, 2008.

– R. Ginosar, Fourteen ways to fool your synchronizer, ASYNC (2003) 89–96.

– D.J. Kinniment, A. Yakovlev, Low latency synchronization through speculation, PATMOS (2004) 278–288.

– A. Chakraborty, M.R. Greenstreet, Efficient self-timed interfaces for crossing clock domains, ASYNC (2003) 78–88.

– A. Chakraborty, M.R. Greenstreet, A minimal source–synchronous interface, ASIC/SOC (2002) 443–447.

– S. Chakraborty, J. Mekie, D.K. Sharma, Reasoning about synchronization techniques in GALS systems: a unified approach, FMGALS (2003).

– J. Mekie, S. Chakraborty, D.K. Sharma, Evaluation of pausible clocking for interfacing high speed IP cores in GALS framework,VLSI Des. (2004) 559–564.

– R. Mullins, S. Moore, Demystifying data-driven and pausible clocking schemes, ASYNC (2007) 175–185.

– L. Carloni, A. Sangiovanni-Vincentelli, Coping with latency in SoC design, IEEE Micro (special issue on SoC) 22 (5) (2002) 24–35.

– R. Dobkin and R. Ginosar, Two Phase Synchronization with Sub-cycle Latency, Integration—The VLSI Journal, 2008.

– K. Goossens, J. Dielissen, and A. Radulescu, AEthereal network on chip: Concepts, Architectures, and Implementations, IEEE Design and Test of Computers, Vol 22(5):414--421, Sept-Oct 2005.

– T. Bjerregaard, S. Mahadevan, A Survey of Research and Practices of Network-on-Chip, ACM Computing Surveys, Vol. 38, March 2006.


Recommended