Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 220 times |
Download: | 0 times |
11
Bridging the gap between Bridging the gap between asynchronous designasynchronous design
and designersand designers
Thanks to Jordi Cortadella, Luciano Lavagno, Mike Thanks to Jordi Cortadella, Luciano Lavagno, Mike Kishinevsky and many othersKishinevsky and many others
22
OutlineOutline
1.1. Basic concepts on asynchronous circuit designBasic concepts on asynchronous circuit design
2.2. Logic synthesis from concurrent specificationsLogic synthesis from concurrent specifications
3.3. Design automation for asynchronous circuitsDesign automation for asynchronous circuits
33
Basic concepts on Basic concepts on asynchronous circuit designasynchronous circuit design
44
OutlineOutline
What is an asynchronous circuit ?What is an asynchronous circuit ?
Asynchronous communicationAsynchronous communication
Asynchronous design styles (Micropipelines)Asynchronous design styles (Micropipelines)
Asynchronous logic building blocksAsynchronous logic building blocks
Control specification and implementationControl specification and implementation
Delay models and classes of async circuitsDelay models and classes of async circuits
Why asynchronous circuits ?Why asynchronous circuits ?
55
Synchronous circuitSynchronous circuit
R R R RCL CL CL
CLK
Implicit (global) synchronization between blocksClock period > Max Delay (CL + R)
Time is an independent physical variable (quantity)
66
Asynchronous circuitAsynchronous circuit
R R R RCL CL CL
Req
Ack
Explicit (local) synchronization:Req / Ack handshakes
Time = events + quantity Time does not exist if nothing happens (Aristotle)
77
Motivation for asynchronousMotivation for asynchronous
Asynchronous design is often unavoidable:Asynchronous design is often unavoidable: Asynchronous interfaces, arbiters etc.Asynchronous interfaces, arbiters etc.
Modern clocking is multi-phase and distributed – Modern clocking is multi-phase and distributed – and virtually ‘asynchronous’ (cf. GALS – next slide):and virtually ‘asynchronous’ (cf. GALS – next slide): Mesachronous (clock travels together with data)Mesachronous (clock travels together with data) Local (possibly stretchable) clock generationLocal (possibly stretchable) clock generation
Robust asynchronous design flow is coming (e.g. Robust asynchronous design flow is coming (e.g. VLSI programming from Philips, NCL from Theseus VLSI programming from Philips, NCL from Theseus Logic, fine-grain pipelining from Fulcrum) Logic, fine-grain pipelining from Fulcrum)
88
Globally Async Locally Sync (GALS)Globally Async Locally Sync (GALS)
Local CLK
R RCL
Async-to-sync Wrapper
Req1
Req2
Req3
Req4
Ack3
Ack4Ack2
Ack1
Asynchronous World
Clocked Domain
99
Key Design DifferencesKey Design Differences
Synchronous logic design:Synchronous logic design: proceeds without taking timing correctness proceeds without taking timing correctness
(hazards, signal ack-ing etc.) into account(hazards, signal ack-ing etc.) into account Combinational logic and memory latches Combinational logic and memory latches
(registers) are built separately(registers) are built separately Static timing analysis of CL is sufficient to Static timing analysis of CL is sufficient to
determine the Max Delay (clock period)determine the Max Delay (clock period) Fixed set-up and hold conditions for latchesFixed set-up and hold conditions for latches
1010
Key Design DifferencesKey Design Differences
Asynchronous logic design:Asynchronous logic design: Must ensure hazard-freedom, signal ack-ing, local Must ensure hazard-freedom, signal ack-ing, local
timing constraintstiming constraints Combinational logic and memory latches Combinational logic and memory latches
(registers) are often mixed in “complex gates”(registers) are often mixed in “complex gates” Dynamic timing analysis of logic is needed to Dynamic timing analysis of logic is needed to
determine relative delays between pathsdetermine relative delays between paths
To avoid complex issues, circuits may be To avoid complex issues, circuits may be built as Delay-insensitive and/or Speed-built as Delay-insensitive and/or Speed-independent independent (Maller’s theory vs Huffman (Maller’s theory vs Huffman asynchronous automata)asynchronous automata)
1111
Verification and Testing DifferencesVerification and Testing Differences
Synchronous logic verification and testing:Synchronous logic verification and testing: Only functional correctness aspect is verified and Only functional correctness aspect is verified and
testedtested Testing can be done with standard ATE and at low Testing can be done with standard ATE and at low
speedspeedAsynchronous logic verification and testing:Asynchronous logic verification and testing: In addition to functional correctness, temporal In addition to functional correctness, temporal
aspect is crucial: e.g. causality and order, aspect is crucial: e.g. causality and order, deadlock-freedomdeadlock-freedom
Testing must cover faults in complex gates Testing must cover faults in complex gates (logic+memory) and must proceed at normal (logic+memory) and must proceed at normal operation rateoperation rate
Delay fault testing may be neededDelay fault testing may be needed
1212
Synchronous communicationSynchronous communication
Clock edges determine the time instants where Clock edges determine the time instants where data must be sampleddata must be sampled
Data wires may glitch between clock edges (set-Data wires may glitch between clock edges (set-up/hold times must be satisfied)up/hold times must be satisfied)
Data are transmitted at a fixed rateData are transmitted at a fixed rate(clock frequency)(clock frequency)
1 1 0 0 1 0
1313
Dual railDual rail
Two wires with L(low) and H (high) per bitTwo wires with L(low) and H (high) per bit ““LL” = “spacer”, “LH” = “0”, “HL” = “1”LL” = “spacer”, “LH” = “0”, “HL” = “1”
nn-bit data communication requires 2-bit data communication requires 2nn wires wires
Each bit is Each bit is self-timedself-timed
Other Other delay-insensitivedelay-insensitive codes exist (e.g. k-of-n) codes exist (e.g. k-of-n) and event-based signalling (choice criteria: pin and event-based signalling (choice criteria: pin and power efficiency) and power efficiency)
1 1
0 0
1
0
1414
Bundled dataBundled data
Validity signalValidity signal Similar to an aperiodic local clockSimilar to an aperiodic local clock
nn-bit data communication requires -bit data communication requires nn+1 wires+1 wires
Data wires may glitch when no validData wires may glitch when no valid
Signaling protocolsSignaling protocols level sensitive (latch)level sensitive (latch) transition sensitive (register): 2-phase / 4-phasetransition sensitive (register): 2-phase / 4-phase
1 1 0 0 1 0
1515
Example: memory read cycleExample: memory read cycle
Transition signaling, 4-phaseTransition signaling, 4-phase
Valid address
Address
Valid data
Data
A A
DD
1616
Example: memory read cycleExample: memory read cycle
Transition signaling, 2-phaseTransition signaling, 2-phase
Valid address
Address
Valid data
Data
A A
DD
1717
Asynchronous modulesAsynchronous modules
Signaling protocol:Signaling protocol:
reqin+ start+ [reqin+ start+ [computationcomputation] done+ reqout+ ackout+ ackin+] done+ reqout+ ackout+ ackin+reqin- start- [reqin- start- [resetreset] done- reqout- ackout- ackin-] done- reqout- ackout- ackin-
(more concurrency is also possible)(more concurrency is also possible)
Data IN Data OUT
req in req out
ack in ack out
DATAPATH
CONTROL
start done
1818
Asynchronous latches: C elementAsynchronous latches: C element
CA
BZ
A B Z+
0 0 00 1 Z1 0 Z1 1 1
Vdd
Gnd
A
A
A
AB
B
B
B
Z
Z
Z
[van Berkel 91]
Static Logic Implementation
1919
C-element: Other implementationsC-element: Other implementations
A
A
B
B
Gnd
Vdd
Z
A
A
B
B
Gnd
Vdd
Z
Weak inverter
Quasi-StaticDynamic
2020
Dual-rail logicDual-rail logic
A.t
A.f
B.t
B.f
C.t
C.f
Dual-rail AND gate
Valid behavior for monotonic environment
2121
Completion detection Completion detection
Dual-rail logic
•••
•••
C done
Completion detection tree
2222
Differential cascode voltage switch logic Differential cascode voltage switch logic
start
start
A.t
B.t
C.t
A.fB.fC.f
Z.tZ.f
done
3-input AND/NAND gate
N-type transistor network
2323
Examples of dual-rail designExamples of dual-rail design
Asynchronous dual-rail ripple-carry adder (A. Asynchronous dual-rail ripple-carry adder (A. Martin, 1991)Martin, 1991) Critical delay is proportional to logN (N=number Critical delay is proportional to logN (N=number
of bits)of bits) 32-bit adder delay (1.6m MOSIS CMOS): 11ns 32-bit adder delay (1.6m MOSIS CMOS): 11ns
versus 40 ns for synchronousversus 40 ns for synchronous Async cell transistor count = 34 versus Async cell transistor count = 34 versus
synchronous = 28synchronous = 28
More recent success stories (modularity and More recent success stories (modularity and automatic synthesis) of dual-rail logic from automatic synthesis) of dual-rail logic from Null-Convension Logic from Theseus Logic Null-Convension Logic from Theseus Logic
2424
Bundled-data logic blocks Bundled-data logic blocks
Single-rail logic
•••
•••
delaystart done
Conventional logic + matched delay
2525
Micropipelines Micropipelines (Sutherland 89)(Sutherland 89)
C
Join Merge
Toggle
r1
r2
g1
g2
d1
d2
Request-Grant-Done (RGD)Arbiter
Call
r1
r2
ra
a1
a2Select
inoutf
outt
sel
inout0out1
Micropipeline (2-phase) control blocks
2626
Micropipelines (Sutherland 89)Micropipelines (Sutherland 89)
L L L Llogic logic logic
Rin
Aout
C C
C C
Rout
Aindelay
delay
delay
2727
Data-path / ControlData-path / Control
L L L Llogic logic logic
Rin RoutCONTROL AinAout
Synthesis of control is a major challenge
2828
Control specificationControl specification
A+
B+
A-
B-
A
B
A inputB output
2929
Control specificationControl specification
A+
B-
A-
B+
A B
3030
Control specificationControl specification
A+
C-
A-
C+A
C
B+
B- B
C
3131
Control specificationControl specification
A+
C-
A-
C+A
C
B+
B-B
C
3232
Control specificationControl specification
CC
Ri
Ro
Ai
Ao
Ri+
Ao+
Ri-
Ao-
Ro+
Ai+
Ro-
Ai-
Ri Ro
Ao Ai
FIFOcntrl
3333
Gate vs wire delay modelsGate vs wire delay models
Gate delay model: delays in gates, no delays in wiresGate delay model: delays in gates, no delays in wires
Wire delay model: delays in gates and wiresWire delay model: delays in gates and wires
3434
Delay models for async. circuitsDelay models for async. circuits
Bounded delays (BD):Bounded delays (BD): realistic for gates and wires. realistic for gates and wires. Technology mapping is easy, verification is Technology mapping is easy, verification is
difficultdifficult
Speed independent (SI):Speed independent (SI): Unbounded (pessimistic) Unbounded (pessimistic) delays for gates and “negligible” (optimistic) delays delays for gates and “negligible” (optimistic) delays for wires.for wires.
Technology mapping is more difficult, verification Technology mapping is more difficult, verification is easyis easy
Delay insensitive (DI):Delay insensitive (DI): Unbounded (pessimistic) Unbounded (pessimistic) delays for gates and wires.delays for gates and wires.
DI class (built out of basic gates) is almost emptyDI class (built out of basic gates) is almost empty
Quasi-delay insensitive (QDI):Quasi-delay insensitive (QDI): Delay insensitive Delay insensitive except for critical wire forks (except for critical wire forks (isochronic forksisochronic forks).).
In practice it is the same as speed independentIn practice it is the same as speed independent
BD
SI QDI
DI
3535
Environment modelsEnvironment models
Slow enough environment = Fundamental modeSlow enough environment = Fundamental mode
(Inputs change AFTER system has settled)(Inputs change AFTER system has settled)
Reactive environment = I/O modeReactive environment = I/O mode
(Inputs may change once the first output changes)(Inputs may change once the first output changes)
3636
Correctness of a circuit wrt delay Correctness of a circuit wrt delay assumptionsassumptions
a
bz
C-element: z = ab +zb + za
a
b z
3737
Motivation (designer’s view)Motivation (designer’s view)
Modularity for system-on-chip designModularity for system-on-chip design Plug-and-play interconnectivityPlug-and-play interconnectivity
Average-case peformanceAverage-case peformance No worst-case delay synchronizationNo worst-case delay synchronization
Many interfaces are asynchronousMany interfaces are asynchronous Buses, networks, ...Buses, networks, ...
3838
Motivation (technology aspects)Motivation (technology aspects)
Low powerLow power Automatic clock gatingAutomatic clock gating
Electromagnetic compatibilityElectromagnetic compatibility No peak currents around clock edgesNo peak currents around clock edges
SecuritySecurity No ‘electro-magnetic difference’ between logical No ‘electro-magnetic difference’ between logical
‘0’ and ‘1’in dual rail code‘0’ and ‘1’in dual rail codeRobustnessRobustness High immunity to technology and environment High immunity to technology and environment
variations (temperature, power supply, ...)variations (temperature, power supply, ...)
3939
ResistanceResistance
Concurrent models for specificationConcurrent models for specification CSP, Petri nets, ...: no more FSMsCSP, Petri nets, ...: no more FSMs
Difficult to designDifficult to design Hazards, synchronizationHazards, synchronization
Complex timing analysisComplex timing analysis Difficult to estimate performanceDifficult to estimate performance
Difficult to testDifficult to test No way to stop the clockNo way to stop the clock
4040
But ... some successful storiesBut ... some successful stories
PhilipsPhilipsAMULET microprocessorsAMULET microprocessorsSharpSharpIntel (RAPPID)Intel (RAPPID)Start-up companies:Start-up companies: Theseus logic, Fulcrum, Self-Timed SolutionsTheseus logic, Fulcrum, Self-Timed Solutions
Recent blurb: Recent blurb: It's Time for Clockless Chips, by It's Time for Clockless Chips, by Claire TristClaire Tristramram (MIT Technology Review, v. 104, (MIT Technology Review, v. 104, no.8, October 2001: no.8, October 2001: http://www.technologyreview.com/magazine/oct01/thttp://www.technologyreview.com/magazine/oct01/tristram.aspristram.asp)) … …..