6/8/2018
1
ECE4740: Digital VLSI Design
Lecture 22: Tree adders
799
Carry-skip and carry-select adders
Recap
800
6/8/2018
2
Carry-skip adder principle
• If BP=P0P1P2P3=1 then CO,3=Ci,0, otherwise block itself generates (or kills) carry internally
801
FA
A0 B0
S0
Ci,0FA
A1 B1
S1
FA
A2 B2
S2
FA
A3 B3
S3
Co,3
Co,3
BP = P0 P1 P2 P3 �Block Propagate� why is this the critical path?
N-bit carry skip adder
• Set block size to B=sqrt(N/2)
• Delay grows only with sqrt(N)802
Ci,0
Sum
Carry
Propagation
Setup
Sum
Carry
Propagation
Setup
Sum
Carry
Propagation
Setup
Sum
Carry
Propagation
Setup
bits 0 to 3bits 4 to 7bits 8 to 11bits 12 to 15
no direct pathto carry out
Image taken from: CMOS VLSI Design: A Circuits and Systems Perspective by Weste, Harris
6/8/2018
3
Carry select adder (CSA)
• Pre-compute carry out for each block for Cin=0 and Cin=1
• Select correct outputs as soon as Cin is ready
• Only a MUX in the critical path!
803
4-b setup
0 carry propagation
1 carry propagation 1
0
MUX CinCout
Sum generation
P[3:0] G[3:0]
C[3:0]
A[3:0] B[3:0]
S[3:0]
Square root carry select adder
• Linearly increasing group sizes: T grows in sqrt(N)
804
0 carry
setup
1 carry 1
0
MUXCin
sum gen
P’s G’s
C’s
S[1:0]
A[1:0] B[1:0]
0 carry
setup
1 carry
MUX
sum gen
P’s G’s
C’s
S[4:2]
A[4:2] B[4:2]
0 carry
setup
1 carry
MUX
sum gen
P’s G’s
C’s
S[8:5]
A[8:5] B[8:5]
0 carry
setup
1 carry
MUX
sum gen
P’s G’s
C’s
S[13:9]
A[13:9] B[13:9]
0 carry
setup
1 carry
MUXCout
sum gen
P’s G’s
C’s
S[19:14]
A[19:14]B[19:14]
6/8/2018
4
Tree adders
Recap and more topologies
805
The carry recurrence
• Remember: Ci+1=Gi+PiCi
C1 = G0 + P0 C0
• C2 = (G1 + P1G0) + (P1P0) C0
• Can be modeled as an operation on a tuple:(Gi,Pi)(Gi-1,Pi-1)=(Gi+Pi*Gi-1,Pi*Pi-1)
806
new group generate and group propagate signals
6/8/2018
5
PG diagram notation
807
i:j
i:j
i:k k-1:j
i:j
i:k k-1:j
i:j
Gi:k
Pk-1:j
Gk-1:j
Gi:j
Pi:j
Pi:k
Gi:k
Gk-1:j
Gi:j G
i:j
Pi:j
Gi:j
Pi:j
Pi:k
Black cell Gray cell Buffer
Image Adapted From: CMOS VLSI Design: A Circuits and Systems Perspective by Weste, Harris
dot operator
generateonly
generate and propagate can also just
be a wire in some cases
Brent-Kung adder (1982)
808
1:03:25:47:69:811:1013:1215:14
3:07:411:815:12
7:015:8
11:0
5:09:013:0
0123456789101112131415
15:014:013:0 12:0 11:010:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
Image taken from: CMOS VLSI Design: A Circuits and Systems Perspective by Weste, Harris
6/8/2018
6
Brent-Kung adder (cont’d)
809
1:03:25:47:69:811:1013:1215:14
3:07:411:815:12
7:015:8
11:0
5:09:013:0
0123456789101112131415
15:014:013:0 12:0 11:010:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
collect all carries from 13:0 for sum output 14
Image taken from: CMOS VLSI Design: A Circuits and Systems Perspective by Weste, Harris
Summary: Brent-Kung adder
810
1:03:25:47:69:811:1013:1215:14
3:07:411:815:12
7:015:8
11:0
5:09:013:0
0123456789101112131415
15:014:013:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
• A=2N-log(N)-2
• T=2log(N)-2
• FOmax=log(N)
• Pros: – regular structure (…really?) – limited fan-in for all gates
• Cons: – FO is an issue: grows log(N)– Power?
uneven path lengths glitches
Image taken from: CMOS VLSI Design: A Circuits and Systems Perspective by Weste, Harris
6/8/2018
7
Kogge-Stone adder (1973)
811
1:02:13:24:35:46:57:68:79:810:911:1012:1113:1214:1315:14
3:04:15:26:37:48:59:610:711:812:913:1014:1115:12
4:05:06:07:08:19:210:311:412:513:614:715:8
2:0
0123456789101112131415
15:014:013:0 12:011:010:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
• A=NlogN-N+1
• T=log(N)
• FOmax=2
• High wiring overhead
Image taken from: CMOS VLSI Design: A Circuits and Systems Perspective by Weste, Harris
Sklansky adder (1960)
812
1:0
2:03:0
3:25:47:69:811:1013:1215:14
6:47:410:811:814:1215:12
12:813:814:815:8
0123456789101112131415
15:014:013:0 12:011:010:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
• A=0.5Nlog(N)
• T=log(N)
• FOmax=N/2
Jack Sklansky, UC Irvine
Images taken from: CMOS VLSI Design: A Circuits and Systems Perspective by Weste,
Harris; https://www.the-scientist.com/?articles.view/articleNo/18705/title/New-
Technology-Weighs-In-On-Mammography-Debate/
6/8/2018
8
Which tree adder should I pick?
813
Name Area [A] Time [T] Max. FO Wiring
Ripple carry N-1 N-1 2
Sklansky N/2*log(N) log(N) N/2
Brent-Kung 2N-log(N)-2 2*log(N)-2 log(N)
Kogge-Stone N*log(N)-N+1 log(N) 2
Carry increment 2N-sqrt(2N) sqrt(2N) sqrt(2N)
• Trade-off between area, propagation delay, etc.
• For adders with a small number of bits, do not forget carry select and carry skip adders!
Adder design with Synopsys DC
What are CAD tools doing?
814
6/8/2018
9
Synopsys design compiler (DC)
• One of the leading CAD tools for digital integrated circuits and FPGAs
• How do you use it?
– You write hardware description language
– You provide constraints and then compile it
– The tool generates a gate-level netlist
• Automatic logic optimization, sizing, etc.
815
Simple example
• Faraday 90nm CMOS technology
• 16-bit adder design
– 16 bit inputs (Data1 and Data2)
– 17 bit outputs (Output)
816
DQ
DQ
CLK
CLKDQ
CLK
+
Data1
Data2
Output
6/8/2018
10
Synthesis script example
[...]
analyze -library WORK -format vhdl {chris_adder.vhd}
elaborate chris_adder -architecture arch1 -library WORK
create_clock -name "ClkxCI" -period 5 -waveform {0 2.5} {ClkxCI}
compile_ultra
817
set the clock period to 5ns= constraint
Area results for T=5ns
• 49 sequential cells (D-flip-flops)
• 18 combinational cells
• Combinational area = 463um2
• Noncombinational area = 882um2
818
WHY?
6/8/2018
11
819
how Synopsys reports the circuit area
usually comes without units depends on library
Timing results for 5ns
• Critical path: tpdmax=4.76ns
– Startpoint: LSB of Data1
– Endpoint: carry output (bit 16)
• Most likely a simple ripple carry adder
• Carry out (= bit 16) is critical!
820
6/8/2018
12
821
Part 1
822
AN2RLX1 = 2-input AND gatew/ drive
strength 1
Part 2
4.76ns –3.84ns = 0.92 ns
6/8/2018
13
The reason (VHDL ftw!)
823Can you see it now?
asynchronous reset
do not write data-path stuff in combinational blocks bad practice
What happens if we set T=1ns?
• Very aggressive delay constraint!
• Combinational area = 1323um2
• Noncombinational area = 987um2
• (Remember 463um2, 882um2 of ripple carry adder)
824
6/8/2018
14
825
Part 1
826
Part 2
1.00ns –0.76ns = -0.24 ns
6/8/2018
15
Fastest design
• tpd = 1.24ns (including flip-flop timing)
• tpd = 720ps (without flip-flop timing)
• Old lab 4 constraints:
– 500ps critical path
– Area smaller than 800um2 5pts
• Hand-design can be faster & smaller than tool-based design (but it’s much more work)!
827
Note that we are comparing different processes here and in lab 4;
flip-flop timing: propagation delay = 280ps=propagation, setup time = 240ps
CAD tools are smart!
828R. Zimmermann, “Non-Heuristic Optimization and Synthesis of Parallel-Prefix Adders,” IWLAS 1996