+ All Categories
Home > Documents > Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

Date post: 21-Jan-2016
Category:
Upload: satya
View: 36 times
Download: 0 times
Share this document with a friend
Description:
High Performance Asynchronous ASIC Back-End Design Flow Using Single-Track Full-Buffer Standard Cells. Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems University of Southern California. Key to High-Speed Async Design. Control logic. - PowerPoint PPT Presentation
Popular Tags:
36
High Performance Asynchronous ASIC Back-End Design Flow Using Single-Track Full-Buffer Standard Cells Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems University of Southern California
Transcript
Page 1: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

High Performance Asynchronous ASIC Back-End Design Flow

Using Single-Track Full-Buffer Standard

Cells

Marcos Ferretti, Recep O. Ozdag, Peter A. BeerelDepartment of Electrical Engineering Systems

University of Southern California

Page 2: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 2

Key to High-Speed Async Design

Completion detection demands 2-D pipelining

Latc

hes

Latc

hes

Latc

hes

Bundle-data pipeline

Datapath

Control logic

2-D pipeline

Pipeline stagesAsync. channels

Page 3: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 3

Asynchronous Channels

Ack

1-of-N

1

2

3

4

1-of-N

1 2

Sender Receiver 1-of-N data

Acknowledge

1-of-N channel

Sender Receiver 1-of-N data

Acknowledge

1-of-N single-track channel

Control

Data

Data stable

Req1 2

Ack

GasP bundle-data channel

Sender Receiver

Single-rail data LatchesLatches

Control channel

Page 4: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 4

GasP (Sutherland et al.’01)

B

A

L RL

Latches

R GasP

Pulse to data latches Datapath

Staticizer

Self-resetting NAND

fw = 4

= 6

Includes latch setup time and delay

Bundled-data pipeline using single-track control

Page 5: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 5

Precharge Half-Buffer (Lines’98)

NMOStransistor

stack

PcEval

Schematic for each output rail

Rx

L

Sx

R

Eval Pc

Le

RLL

LCD RCD

Re

fw = 2

= 14+Precharge Half-Buffer Template

C

2-D pipeline using 1-of-N delay-insensitive channels and QDI cells

Page 6: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 6

Single-Track Asynchronous Pulsed Logic (Nyström’01)

RL

Re RCD

Re

R4

L R

STAPL template

Pulsegenerator

Reset

S

Pulsegenerator

xv L01

L0n

R0

S0 S1

R1

re R0R1

NMOS transistor

stack

L11

L1n

Schematic for dual-rail outputxv

R4

L01 L11… L0n L1n

xv

STAPL uses pulse generators to control drivers activation timing

fw = 2

= 10

Page 7: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 7

Single-Track Full-Buffer (Ferretti’02)

RL

S

B RCD

B

SCDA

Reset

L R L01

L0n

R0

S0 S1

R1

B

B

BA R0

R1S0S1

L01 L11… L0n L1n

NMOS transistor

stack

L11

L1n

C

Schematic for dual-rail outputBlock diagram

Timing DiagramL

S

A

B

R

fw = 2

= 6Small and fast

Page 8: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 8

STFB: Tradeoff Speed for Robustness

Features of STFB3x faster than QDI and about half the sizeSmaller and faster than STAPLSmaller forward latency and less timing

assumptions than GasP

performance GasP

robustnessQDI (Lines - Caltech)

STFB (Ferretti - USC)

(Sutherland - Sun)

STAPL (Nyström - Caltech)

Page 9: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 9

Motivation and Goals

• Develop a methodology to design STFB-based asynchronous circuits using conventional CAD toolsCreate a STFB standard cell libraryMake the library publicly-available

• Design and fabricate a demonstration test chip• Evaluate the results

Ultimate Goal: Full-custom Performance with ASIC Design Times

Page 10: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 10

Outline

STFB standard-cell design

Backend design flow

Demonstration test chip

Conclusions

Page 11: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 11

STFB channels are point to point (no forked wires)

One size per cell in the library is adequate

STFB Standard-Cell DesignTransistor sizing

Page 12: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 12

STFB Standard-Cell DesignTransistor sizing

2x min. size N-stack strength 1:4-5 drive ratio

2x8x

8x

L

Sx

Rx

BRCD

NMOS transistor

stackC

2.8

10

Wn

A5

SCD

L

≤ 1mmSx

Rx

BRCD

NMOS transistor

stackC

2.8

10

Wn

A5

SCD

TSMC 0.25 m, widths in m and all lengths 0.24 m

Up to 1mm long wire

Page 13: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 13

STFB Standard-Cell DesignBalanced response

SCD/RCD

Data-independent timing assumptions

S1

S0

A

2.8 2.8

1.21.2

1.21.2

SCD balanced NAND (2x)

TSMC 0.25 m, widths in m and all lengths 0.24

m

R1

R0 1.4

1.2 1.2

1.4

1.41.4

B

RCD balanced NOR (1x)

Page 14: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 14

STFB Standard-Cell Design

STFB_POUT sub-cell

Yields less load on B and faster S reset

S

R

B

NR

0.6 2.8

0.610

1.2

1.2

1.4/0.60.3

TSMC 0.25 m, widths in m and all lengths 0.24

m

staticizer

fights charge–sharing

fast S reset

fights leakage current

STFB_POUT sub-cell layout

Page 15: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 15

STFB Standard-Cell Design

Reset transistors

2-input NAND → less load on S

TSMC 0.25 m, widths in m and all lengths 0.24

m

Reset transistors, reset inverter and NAND layout (from

STFB_XOR2 cell)

A1S0S1

L01 L11…

A2

L01 L11…

/Reset

1-of-2 cell 2-input NAND

+ inverter

AS0

/ResetS1

L01 L11…

Initial idea 3-input NAND

S0S1

L01 L11…

L01 L11…

A1

A2/

ResetS2

1-of-3 cell two

2-input NAND

Page 16: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 16

STFB Standard-Cell DesignDirect-path current analysis

Vin

M1

M2

Vout

VDD

VDD -Vtp

Vtn0V

Ipeak

0A

t

t

Idp

Vin

Idp

Sx

A

M1

M2

Idp

Average direct-path current is similar to inverter

Idp

VDD

VDD -Vtp

Vtn0V

Ipeak1

Ipeak2

0A

t

t

VA VSx

0

100

200

300

400

500

600

700

800

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

Vdiff = VA - VSX (V)

Peak c

urr

en

t (u

A)

Ipeak1Ipeak2Average (Ipeak1 + Ipeak2)/2

Page 17: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 17

Outline

STFB standard cell design

Backend design flow

Demonstration test chip

Conclusions

Page 18: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 18

Standard-Cell Library Development (Ozdag’04)

Cell specification

s

Layout (Virtuoso)

Symbol,Schematic and

Functional(Virtuoso, Emacs)

Simulation (Verilog, Hspice)

Layout

Cell Abstract (Envisia)

Asynchronous Cell Library

SymbolSchematicFunctional

Abstract

Template specification

s

Standard cell specification

s

Same tools and flow as synchronous

LVS/DRC (Dracula/Diva

)

Page 19: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 19

Asynchronous ASIC Design Flow (Ozdag’04)

SymbolSchematicFunctional

Schematic (Virtuoso)

Design specifications

Layout Chip Assembly (Virtuoso)

Chip Fabrication

Place & Route (Silicon Ensemble)

Abstract Asynchronous Cell Library

LVS/DRC (Dracula/Diva)

Simulation (Verilog, Nanosim)

Same tools and flow as synchronous

Page 20: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 20

Cell Layout Example: STFB2_XOR2

Each cell comprises an entire STFB pipeline stage

A

A0 A1 B0 B1

Reset

A0 A1 B0 B1

/ResetS0 S1

RCD

SCD

BS

RBS

R

STFB_POUT STFB_POUT

R0 R1

R1

R0

S1

S0

BC

a1

C

b1

a1

b0

a0

b1

a0

b0

S0

S1

S1

S0

Page 21: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 21

Outline

STFB standard cell design

Backend design flow

Demonstration test chip

Conclusions

Page 22: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 22

Prefix Adder

a0b0 c-1a1b1a2b2a3b3a4b4a5b5a6b6a7b7

s7 s6 s5 s4 s3 s2 s1 s0c7

3 +log2 n

2*n + 1

STFB2_FORK (fork stage)

STFB2_BUFFER (buffer stage)

STFB2_XOR2 (2-input xor stage)

STFB3_AB_KPG and STFB3_AB_KPG2

STFB3_KPG2_KPG and STFB3_KPG2_KPG2

STFB3_KPGC_C and STFB3_KPGC_C2

(Goldovsky’99)

Page 23: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 23

64-bit Adder BlockSilicon Ensemble P&R

Schematic (Virtuoso)

Place & Route (Silicon

Ensemble)

Floor plan

129 rows

70% areautilization

Plan power

M4 and M5power grid

Pins and cell placement

Input pins on the left

(A64, B64 and C)

Output pins on the right(S64 and C)

Filler cell

Routing

Page 24: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 24

Input Generator Block

Flexible and fast input generation

a0…a3

d0…d7

4 levels

STFB

2_S

PLI

T

8 8

4

4

8x8

8x8

STFB2_SRST

Carry in

9-stage ring1

64

64

A

B

Cin6

4x9

-sta

ge

rin

g6

4x9

-sta

ge

rin

g

12

x

STFB

2_S

RST

Single-rail to single-track converter

1

data

address

Page 25: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 25

Output Sampler Block

65

6565x STFB2_BUCKET

BB

65

x S

TFB

2_S

PLI

T

65

6565x STFB2_BUCKET

BB

65

x S

TFB

2_S

PLI

T

65

6565x STFB2_BUCKET

BB

65

x S

TFB

2_S

PLI

T

65

64 bit sum

+ Cout

30-stagering

30-stagering

30-stagering

1:10 1:100 1:1000

1000000000 1000000000 1000000000= 1,10,… = 1,100,…= 1,1000,…

0010000000 0000100000 0000000100= 3,13,… = 43,143,…= 843,1843,…

Flexible and fast output sampler

1

0

1

0

1

0

Page 26: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 26

Simulation Results: LoadingNanosim

Carry in

Sampler: 10x4x4 = 160

3x B64 3x A64 Go!

Page 27: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 27

Simulation Results: RunningNanosim

Go!

Sum

Carry out

112.9ns

112.9/160 = 0.706ns 1/0.706ns = 1.4 GHz

Page 28: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 28

Simulation Results

Conditions Iav LatencyThroughpu

t

TT, 25oC, 2.5V, 3.3V 2.9 A 2.1 ns 1.4 GHz

SS, 120oC, 2.2V, 3.0V 1.6 A 3.3 ns 890 MHz

FF, 0oC, 2.7V, 3.6V 4.2 A 1.6 ns 1.9 GHz

SF, 25oC, 2.5V, 3.3V 2.9 A 2.2 ns 1.4 GHz

FS, 25oC, 2.5V, 3.3V 2.9 A 2.2 ns 1.4 GHz

Page 29: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 29

Demonstration chipTop layout

INPUTGEN129BY9

ADDER64

SAMPLER65BY1000

1700 m

801 m 663 m 499 m

1963 m

1.36 mm2

105k transistors1.3 A @ 1.4 GHz

1.13 mm2

89k transistors1.3 A @ 1.4 GHz

0.85 mm2

62k transistors0.3 A @ 1.4 GHz

3.3 mm2

257k transistors2.9 A @ 1.4 GHz

TSMC 0.25 mMOSIS Mar/22/04

QDI Sequential Decoder

(Session VI, 10:30am, Thu, Apr/22)

STFB64-bitAdder

3733 m

20.5 mm2

132 pins

5483 m

~6 months/man Library~6 months/man Design

Page 30: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 30

Summary and Conclusions

• PerformanceSTFB 2-D pipelining yields ultra-high-

performance

• Design TimeBack-end flow achieves ASIC design time

• AvailabilityCell library has been made freely available

• Future workCharacterize and extend libraryStatic timing analysis and sign-off

Page 31: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 31

Efharisto!(Thank you!)

Page 32: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 32

STFB Standard-Cell DesignDynamic worst-case direct-path current analysis

(STFB buffer pipeline at 2GHz)

Non-overlap drive = less direct-path current than an inverter

1mm

TSMC 0.25 m, widths in m and all lengths 0.24 m

L

Sx

RRCD

A

L

Sx

RRCD

A

L

Sx

RRCD

A

L

Sx

RRCD

A

Page 33: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 33

Input Generator Block9-stage ring

BGout

ingo

BG STFB2_BITGEN (bit generator)

STFB2_MERGENC (non-conditional merge stage)

STFB2_FORK (fork stage)

STFB2_BUFFER (buffer stage)

STFB2_XOR2 (2-input xor stage)

1

1

11

0

0

00

0

0

1

1,0,0,1,0,0…

Page 34: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 34

E2

Comparison STFB x WCHB

TemplateNumber

of stages

Number of

tokensLibrary

Vdd (V)

Temp.

(oC)

Throughput (GHz)

Stage Cycle time (ns)

Average current

(mA)

Current per

token (mA)

E2

metric

WCHB 10 2 TT 2.5 25 1.00 1.00 2.7 1.35 3.4STFB 9 3 TT 2.5 25 2.00 0.50 13.0 4.32 1.3

STFB buffer is ~3x more efficient than WCHB buffer

Page 35: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 35

Demonstration chipTop layout

INPUTGEN129BY9

ADDER64

SAMPLER65BY1000

1700 m

801 m 663 m 499 m

1963 m

1.36 mm2

105k transistors1.3 A @ 1.4 GHz

1.13 mm2

89k transistors1.3 A @ 1.4 GHz

0.85 mm2

62k transistors0.3 A @ 1.4 GHz

3.3 mm2

257k transistors2.9 A @ 1.4 GHz

TSMC 0.25 mMOSIS Mar/22/04

7 Vdd and 7 Gnd pins

12 In/Out, 8 Input and 3 pad’s supply pins

7 Vdd and 7 Gnd pins

Total: 51 pins

Page 36: Marcos Ferretti, Recep O. Ozdag, Peter A. Beerel Department of Electrical Engineering Systems

USC Asynchronous CAD/VLSI Group 36

Test chip designTop chip layout

TSMC 0.25 mMOSIS Mar/22/04

QDI Sequential Decoder(Session VI, 10:30am, Thu)

STFB64-bitAdder

3733 m

5483 m

20.5 mm2

132 pins


Recommended