Observability Conditions and Automatic Operand-Isolation in High-Throughput Asynchronous
PipelinesArash Saifhashemi
Peter A. BeerelUniversity of Southern California
USC Asynchronous CAD/VLSI Group (async.usc.edu)(Thanks to a grant from Intel and NSF)
Patmos 2012, Sep 2012, Newcastle upon Tyne
Asynchronous Circuit Design - Today Applications
• 3D Network on chips (STMicroelectronics)• Ethernet Switches (Intel SRD)• Ultra high-speed FPGAs (Achronix)• Process variation• Low-power chip design (Encryption – Tiempo,
…)
Basic challenges: Automation
Proteus design flow (USC)• Uses commercial synchronous CAD tools• Starting at a high-level specification written in
SVC (SystemVerilogCSP) Fulcrum Microsystems Ethernet switch chip (up to 72 10G ports, 40G)
- 1.2 B transistors, 90% Asynchronous 13% Proteus
Tiempo TAM16 - Clockless 16-
bit microcontrolle
r
STMicroelectronics WIOMING 3D-IC (July
2012)
Achronix FPGA. 1.7 M
LUTs. 2.1 Gbps IO
ConstraintsSync Library
Clock Gating
Clock Tree SynthesisNetlist
Clock Gating
The Proteus Flow
Synthesis
Physical Design
Verilog
Netlist
Netlist
Constraints
Constraints
Final Layout
Proteus/Sync
LibraryClockFree
System- Verilog
Image Netlist
SVC2RTLDesign Goals
Synth. RTL Constraints
Async Netlist
Key Features• Re-uses synchronous EDA tools• Seamless integration into existing flows• Up to 2X higher performance
Tool Status• Started at USC Async CAD/VLSI• Commercialized by TimeLess (2008)• Acquired by Fulcrum (2010)• Intel Acquired Fulcrum (2011)• Used in Intel Ethernet Alta FM6000 chip
The Problem• Limited and manual power optimization
6
Conditional Communication in Proteus
0
1
0
Not received
Dummy value
0
1
Not sent
Example: ALU
SVC Description
No conditionality in high-level description
Reconverging fanouts
+
Unnecessary calculation
Adding Isolation Cells
• All inputs/outputs are unconditional
• Operand Isolation• And-based isolation
cells• Generated by
synchronous RTL synthesizer
• Does not prevent switching in
asynchronous circuitsIsolation cells are not effective in asynchronous
circuits
Three-valued logic
• Formal justification of conditioning• Three-valued logic image model
• Each iteration is modeled by a clock cycle• Each variable can be 0, 1, or N (no token)
Status of each channel
One iteration
3VL Unconditional Functions
Unconditional functions
• Can be represented only by , , operators
• Example: functions represented by combinational gates in a typical cell library: NAND, NOR, AOI, XOR, …Lemma 1: the output is N iff at least one of the inputs is N.
SEND/RECEIVE Operators
• Conditional Communication• RECEIVE and SEND are modeled as and Ⓡ Ⓢ operators
Behave like buffers when E=1
SEND Reconditioning
Assuming y=f(x) is unconditional and e TFO(y)
Lemma 2:
Application: SEND cells can be moved through logic
• Similar to retiming in synchronous circuits
Less switching when e=0
Less number of SENDs
Observability in 3V Networks
Local Observability Partial Care (LOPC)• OPC(f,C,xj) of input xj of a node representing a function f is the condition
under which f’s output is not affected as xj changes in C {0,1,N}Global Observability Partial Care (GOPC)
• GOPC(C,x) of a variable x is the condition under which the value of no primary output is affected as the value of x changes in C {0,1,N}
• Example: 𝑂𝑃𝐶 (𝑀𝑢𝑥 , {0 ,1 } , 𝑖1 )=𝑠{ 1}𝑖2{0 , 1}
i1 changes in {0,1} are not observable when…
i2 =0 or i2 =1
𝑂𝑃𝐶 ( 𝑓 ,𝐶 , 𝑥 ) implies→
𝐺𝑂𝑃𝐶 (𝐶 ,𝑥 )
s =1
GOPC Conditioning
When xj is not observable…• Add a SEND followed by a RECEIVE• Move the SENDs using SEND reconditioning
Lemma 3: 𝐼𝑓 𝑒 { 0}→𝐺𝑂𝑃𝐶 ( {0,1 } ,𝑥1 ) h𝑡 𝑒𝑛 : 𝑓 (𝒙 )= ( 𝑓 (𝒙 ) Ⓢ𝑒 ) Ⓡ𝑒
SEND Reconditioning
0
0 or 1
NNN
N
N
1
Conditioning
&
+
0
0
+
No Activity
Inserting Isolating Nodes and Recognizing Enable DomainsSynchronous synthesis tools can insert isolating nodes
• Constrained to insert isolating nodes only on non-critical pathsNode u is in e’s Enable Domain OIED(e) if
• All paths starting from a primary input and ending at u include an isolating node controlled by e
• Detected using a DFS search
Pre-layout Analysis
• Wu : power of receiving data on all inputs and sending the output (unconditional nodes)
• K: power of conditional nodes
• rf: activity factor Total power Power of each domain
Domain power after isolation (n inputs)
Benefit of isolating each domain
Post-layout Experimental Results• Case study: 32-bit ALU placed and routed
• Back annotated switching activity using a VCD file• Results:
• Isolating ADD and SUB are detrimental for rADD and rSUB > 0.2
• 53% power reduction when only isolating MUL (rf=0.25)
• Area cost of isolating MUL is about 4% and no performance penalty
Conclusions and Future Work
Conditional communication in async. circuits is not free
• Creates area and performance overheads• Requires manual or automatic optimization
Asynchronous circuits can/should leverage sync. tools
• This paper is first to use 3-valued-logic and observability don’t cares for power optimization of asynchronous circuits
Our future work• Evaluate the proposed method on bigger designs• Adopt other sync power optimization techniques such as clock
gating• Optimize the location of SEND/RECEIVE nodes (Reconditioning)