ASYNC07 High Rate Wave-pipelined Asynchronous On-chip Bit-serial Data Link R. Dobkin, T. Liran, Y....

Post on 31-Mar-2015

213 views 0 download

Tags:

transcript

ASYNC07ASYNC07

High Rate Wave-pipelinedHigh Rate Wave-pipelinedAsynchronous On-chip Bit-serialAsynchronous On-chip Bit-serial

Data Link Data Link

R. Dobkin, T. Liran, Y. Perelman, R. Dobkin, T. Liran, Y. Perelman, A. Kolodny, R. Ginosar A. Kolodny, R. Ginosar

Technion – Israel Institute of TechnologyTechnion – Israel Institute of Technology

Electrical Engineering Department – VLSI LabElectrical Engineering Department – VLSI Lab

March 12, 2007March 12, 2007

2 ASYNC07

Presentation Outline

• Why Serial Link?

• Fast Asynchronous Serial Link• Transmitter, Fast LEDR Encoder

• Receiver, Fast Toggle Circuit

• Channel, Current Mode Async Signaling

• Performance

• Summary

3 ASYNC07

Serial Link Employment Benefits

• Why Serial Link?• Less interconnect area• Less routing congestion• Less coupling• Less power (depends on range)

• The relative improvement grows with technology scaling. The example on the right refers to: • Single gate delay serial link• Fully-shielded parallel link with

8 gate delay clock cycle• Equal bit-rate• Word width N=8

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

180 130 90 65 30 15

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

9.0

180 130 90 65 30 15

Parallel Link dissipates less power

Serial Link dissipates less power

Technology Node [nm]

Link Length [mm]

Parallel Link requires less area

Serial Link requires less area

4 ASYNC07

Serial Link Applications

• P2P long-range interconnect

• Long range NoC links

• Pin-limited on-chip module interfaces• Presently chips are pin-limited, and that will migrate

inside

• Cross-bar • Simpler routing and congestion

• Communications inside many-core CMPs

5 ASYNC07

Serial Link – Top Structure

• Transition signaling instead of sampling: two-phase NRZ Level Encoded Dual Rail (LEDR) asynchronous protocol, a.k.a. data-strobe (DS)

• Acknowledge per word instead of per bit• Wave-pipelining over channel• Differential encoding (DS-DE, IEEE1355-95)• Low-latency synchronizers

Sender Receiver

Word Ack

Bit-Serial ChannelSynchr. Synchr.Serializer

& LEDREncoder

DeSerializer& LEDRDecoder

P

S

6 ASYNC07

Encoding –Two Phase NRZ LEDR

• Two Phase Non-Return-to-Zero Level Encoded Dual Rail • “delta” encoding (one transition per bit)

Uncoded (B)

State bit (S)

Phase bit (P)

0 0 1 1 0 0 0 0 1 0

( ),( )

( ),

B i i oddP i

B i i even

( ) ( )S i B i i

7 ASYNC07

Transmitter – Fast SR Approach

Transition

Generator

P

P

Parallel Load Interface

Load Enable

Uncoded Data

Shift-Register, SR(B) Beven

T0T0

Bodd

S

S

LEDREncoder

P

P

S

S

T90T90

Bodd

Beven

OT0

OT0

OT90

OT90

• Targeted Speed: One gate delay between bits

ASYNC078

T

XL[1] XL[0]

Data-0Data10

XL[W-2]

Data(W-1)(W-2)

Parallel Load Interface

NotConnected

OUT

Fast Asynchronous Shift Register

9 ASYNC07

-25

-20

-15

-10

-5

0

5

1.00E+09 1.00E+10 1.00E+11Frequency [Hz]

DVb=0DVb=0.05DVb=0.1DVb=0.15DVb=0.2DVb=0.25DVb=0.3

-8

-6

-4

-2

0

2

4

6

8

1.00E+09 1.00E+10 1.00E+11Frequency [Hz]

0.5 Full Swing0.6 Full Swing0.7 Full Swing0.8 Full Swing0.9 Full SwingFull Swing

BiasGain[dB]

Voltage SwingGain[dB]

Wave-pipelined Control Characteristics

• The highest speed (the single gate-delay cycle) relates to the pole of the Bode diagram

• This operating point results in signal degradation along the inverter chain

Single Gate Delay Rate

BIAS

SWING

25

30

35

40

45

50

55

60

65

0 5 10 15 20 25 30

Inverter Chain Length, N

Cut-off Frequency[GHz]

10 ASYNC07

Splitter Architecture

• The shift-register is partitioned into M shift-registers M slower operation in each shift-register

• Signal is no longer degraded • Single gate-delay operation is localized to output (input) stage only

Shift-Register for Odd Bits

Shift-Register for Even BitsMerge

PARALLEL LOAD

Shift-Register for Odd Bits

Shift-Register for Even BitsSplit

PARALLEL LOAD PARALLEL READ

PARALLEL READ

Transmitter Receiver

11

Transmitter Splitter Architecture

XLEVEN[N/2]

C90

XLODD[N/2]

C

XOR

P

BODD

BEVEN

P

S

S

BODD

BEVEN

LEDR ENCODER

MergeEVEN

MergeODD

SREVEN

SRODD

12 ASYNC07

Transmitter – SPICE Simulation (65nm node)

30 ps

C0

C90

BEVEN

BODD

C

P

S

60 ps

START BIT=1

1 0 0 0 1 0 1 0

0 0 1 0 1 0 1 Dummy

1 0 0 1 0 1 0 11 0 0 0 1 0 1 0

Simulations done at

ASYNC0713

Receiver

14

Receiver Splitter Architecture

XLEVEN[1]

XLODD[1]

SREVEN

SRODD

TOGGLEC

S

A

B

T

SEVEN

SODD

SPLIT

15 ASYNC07

Toggle Circuit

• Straightforward implementation (fundamental asynchronous state machine) is too slow (supports only ~1.5 gate delay cycle)

• Novel toggle: • Single gate delay operation support• Internal and output latches

T

A

B

16 ASYNC07

Channel

• Four transmission lines (DS-DE)

• High metal layers utilization• Metals 5-8 of 65nm process

• RLC modeled

• Careful layout• Small crosstalk

• Small relative variations

17 ASYNC07

S SP P SP

LEDR Interconnect Layout

18 ASYNC07

Differential Channel Driver and Receiver

• Current mode differential low-swing signaling

• Currents in opposite directions

• Controllable current return path

a

a

Driver

SA

i

i

b

o

o

z

z

Receiver

R

RP / S P / S

19 ASYNC07

Channel Characteristic Impedance

0 (1 )DCR j L

Z R R jj C

Based on data from BPTM. Drawn for constant R, L, C

• Z depends on F

• Voltage changes with F

• Fast changes voltage drifts

• The drifts bound the operating speed

F

Z

S

S

20 ASYNC07

Channel Driver with Adaptive Control

IN

OUT

• Compensates for Z changes• Turned on for low frequencies

Adaptive Control

Inertial Delay

21 ASYNC07

Adaptive Control – Simulation Example

• SPICE simulation setup:• 65nm technology, 4mm range, 67Gbps data rate

• RLC modeled channel (using Raphael-like three-dimensional field solver)

• Adaptive control is turned on only for low frequencies

Data

Adaptive Control

Currents

Low FrequencyTurns Adaptive Control On

22 ASYNC07

Channel Receiver Amplifier

B

IN

OUT

R

RB

23 ASYNC07

TX-SR

RX-SR

Channel Diff Pair

Performance

• SPICE simulation show correct operation at target data cycle of 15ps (65nm technology node)

• Power for 67Gbps 4mm 16-bit word link under 100% utilization:• Total power: 150mW• Channel differential pair: 18mW• Leakage power: 4mW

(due to low VT transistors employment)

• Power reduction• Deeper split ( M power reduction)• Circuit optimizations• Circuit shut down during idle states

24 ASYNC07

In-Die Variations

• Splitter architecture• High-speed operation localized to input and output stages

• High-speed components design and verification• Monte-Carlo simulations (>5)• 26 PVT Corners• Iterative design with legging and sizing for sensitive

transistors

• Asynchronous structure• Supports any slow down• Minimal time separation between successive bits must be

provided!

25 ASYNC07

Summary

• High speed Serial Link requires special circuits:• Fast serializers and de-serializers

• Wave-pipelined control• Splitter architecture:

• Long word transmission• Power reduction

• On-the-fly LEDR encoding

• Adaptive control for fast asynchronous signals handling• Low crosstalk interconnect layout• Single FO4 inverter delay data cycle support (15ps on 65nm process, 67 Gbps)

• The Serial Link preferred over Parallel Link thanks to:• Reduced Interconnect and Active area• Easier routing, less coupling • Reduced power for long on-chip interconnects

26 ASYNC07

The End

• Thank you