Sub-System Design
Topics Architectural issues Switch logic Gate Logic Examples of Structured Design - Combinational logic - Design of an ALU subsystem Consideration of Adders Multipliers Sequential Circuits Semiconductor Memories
Introduction Large systems are composed of sub-systems
known as Leaf-Cells The most basic leaf cell is the common logic
gate (inverter nand etc) Structured Design
High regularity Leaf cells replicated many times and interconnected
to form the system Logical and systematic approach to VLSI
design is essential
Dealing with Complexity
Divide and conquer - limit the number of components you deal with at any one time
Group several components into larger components transistors form gates gates form functional units functional units form processing
elements
A System-on-a-Chip
Courtesy Philips
Major Levels of Design
Specification Description of requirements
Systems Level placing and interconnecting major functional units
Function Level specification and design of major functional units
LogicCircuit Level Gate level design gate interconnection design
Layout Level what will actually be patterned onto the chip how the chip
will be processed Physics Level
the physics of gate and switch operation
Sub-System Design Guidelines Define the requirements Partition the overall architecture into subsystems Consider interconnection paths between the
subsystems System floor plan on silicon chip Regular structures for replication Stick diagram for each leaf-cell (module) in the
system Convert the stick diagram of each leaf-cell into
layout and go for design rule check Simulate the performance of each cell
Design Validation
Must check at every step that errors have not been introduced the longer the error remains the more
expensive it becomes to remove it
Chip Architecture ndash Floor plan After high level design is complete it is necessary to
decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of
supply and clock rails In doing this sufficient space must be left between
power rails to allow for data-buses and combinational logic cells
Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area
Alpha 21364 Microprocessor Floor plan
Major Levels of Design
Switch Logic
How do we build switches from MOS transistors
1) Pass Transistors 2) Transmission gates
Pass Transistors
We have assumed source is grounded
What if source gt 0 eg pass transistor passing VDD
VDDVDD
NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the
drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type
switches can cause electrical problems 1048708When n-type switch driving a complementary gate
cause the gate to run slower when the switch input = 1
1048708Since pull down current is weaker when a lower gate voltage is applied
1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be
PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged
through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device
Voltage degradation of Pass Transistors
Pass Transistor Ckts
VDDVDD
VSS
VDD
VDD
VDD VDD VDD
VDD
Pass Transistor Ckts
VDD
VDD V
s = V
DD-V
tn
VSS
Vs = |V
tp|
VDD
VDD
-Vtn V
DD-V
tn
VDD
-Vtn
VDD
VDD
VDD
VDD
VDD
VDD
-Vtn
VDD
-2Vtn
Complementary Pass Transistor Logic
A
B
A
B
B B B B
A
B
A
B
F=AB
F=AB
F=A+B
F=A+B
B B
A
A
A
A
F=AYacute
F=AYacute
ORNOR EXORNEXORANDNAND
F
F
Pass-Transistor
Network
Pass-TransistorNetwork
AABB
AABB
Inverse
(a)
(b)
Advantages and Disadvantages
Advantages Less no of Transistors No Static Power Consumption
Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large
Transmission gates
Transmission gates (contd)
Logic with Transmission gates
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Topics Architectural issues Switch logic Gate Logic Examples of Structured Design - Combinational logic - Design of an ALU subsystem Consideration of Adders Multipliers Sequential Circuits Semiconductor Memories
Introduction Large systems are composed of sub-systems
known as Leaf-Cells The most basic leaf cell is the common logic
gate (inverter nand etc) Structured Design
High regularity Leaf cells replicated many times and interconnected
to form the system Logical and systematic approach to VLSI
design is essential
Dealing with Complexity
Divide and conquer - limit the number of components you deal with at any one time
Group several components into larger components transistors form gates gates form functional units functional units form processing
elements
A System-on-a-Chip
Courtesy Philips
Major Levels of Design
Specification Description of requirements
Systems Level placing and interconnecting major functional units
Function Level specification and design of major functional units
LogicCircuit Level Gate level design gate interconnection design
Layout Level what will actually be patterned onto the chip how the chip
will be processed Physics Level
the physics of gate and switch operation
Sub-System Design Guidelines Define the requirements Partition the overall architecture into subsystems Consider interconnection paths between the
subsystems System floor plan on silicon chip Regular structures for replication Stick diagram for each leaf-cell (module) in the
system Convert the stick diagram of each leaf-cell into
layout and go for design rule check Simulate the performance of each cell
Design Validation
Must check at every step that errors have not been introduced the longer the error remains the more
expensive it becomes to remove it
Chip Architecture ndash Floor plan After high level design is complete it is necessary to
decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of
supply and clock rails In doing this sufficient space must be left between
power rails to allow for data-buses and combinational logic cells
Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area
Alpha 21364 Microprocessor Floor plan
Major Levels of Design
Switch Logic
How do we build switches from MOS transistors
1) Pass Transistors 2) Transmission gates
Pass Transistors
We have assumed source is grounded
What if source gt 0 eg pass transistor passing VDD
VDDVDD
NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the
drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type
switches can cause electrical problems 1048708When n-type switch driving a complementary gate
cause the gate to run slower when the switch input = 1
1048708Since pull down current is weaker when a lower gate voltage is applied
1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be
PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged
through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device
Voltage degradation of Pass Transistors
Pass Transistor Ckts
VDDVDD
VSS
VDD
VDD
VDD VDD VDD
VDD
Pass Transistor Ckts
VDD
VDD V
s = V
DD-V
tn
VSS
Vs = |V
tp|
VDD
VDD
-Vtn V
DD-V
tn
VDD
-Vtn
VDD
VDD
VDD
VDD
VDD
VDD
-Vtn
VDD
-2Vtn
Complementary Pass Transistor Logic
A
B
A
B
B B B B
A
B
A
B
F=AB
F=AB
F=A+B
F=A+B
B B
A
A
A
A
F=AYacute
F=AYacute
ORNOR EXORNEXORANDNAND
F
F
Pass-Transistor
Network
Pass-TransistorNetwork
AABB
AABB
Inverse
(a)
(b)
Advantages and Disadvantages
Advantages Less no of Transistors No Static Power Consumption
Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large
Transmission gates
Transmission gates (contd)
Logic with Transmission gates
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Introduction Large systems are composed of sub-systems
known as Leaf-Cells The most basic leaf cell is the common logic
gate (inverter nand etc) Structured Design
High regularity Leaf cells replicated many times and interconnected
to form the system Logical and systematic approach to VLSI
design is essential
Dealing with Complexity
Divide and conquer - limit the number of components you deal with at any one time
Group several components into larger components transistors form gates gates form functional units functional units form processing
elements
A System-on-a-Chip
Courtesy Philips
Major Levels of Design
Specification Description of requirements
Systems Level placing and interconnecting major functional units
Function Level specification and design of major functional units
LogicCircuit Level Gate level design gate interconnection design
Layout Level what will actually be patterned onto the chip how the chip
will be processed Physics Level
the physics of gate and switch operation
Sub-System Design Guidelines Define the requirements Partition the overall architecture into subsystems Consider interconnection paths between the
subsystems System floor plan on silicon chip Regular structures for replication Stick diagram for each leaf-cell (module) in the
system Convert the stick diagram of each leaf-cell into
layout and go for design rule check Simulate the performance of each cell
Design Validation
Must check at every step that errors have not been introduced the longer the error remains the more
expensive it becomes to remove it
Chip Architecture ndash Floor plan After high level design is complete it is necessary to
decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of
supply and clock rails In doing this sufficient space must be left between
power rails to allow for data-buses and combinational logic cells
Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area
Alpha 21364 Microprocessor Floor plan
Major Levels of Design
Switch Logic
How do we build switches from MOS transistors
1) Pass Transistors 2) Transmission gates
Pass Transistors
We have assumed source is grounded
What if source gt 0 eg pass transistor passing VDD
VDDVDD
NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the
drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type
switches can cause electrical problems 1048708When n-type switch driving a complementary gate
cause the gate to run slower when the switch input = 1
1048708Since pull down current is weaker when a lower gate voltage is applied
1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be
PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged
through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device
Voltage degradation of Pass Transistors
Pass Transistor Ckts
VDDVDD
VSS
VDD
VDD
VDD VDD VDD
VDD
Pass Transistor Ckts
VDD
VDD V
s = V
DD-V
tn
VSS
Vs = |V
tp|
VDD
VDD
-Vtn V
DD-V
tn
VDD
-Vtn
VDD
VDD
VDD
VDD
VDD
VDD
-Vtn
VDD
-2Vtn
Complementary Pass Transistor Logic
A
B
A
B
B B B B
A
B
A
B
F=AB
F=AB
F=A+B
F=A+B
B B
A
A
A
A
F=AYacute
F=AYacute
ORNOR EXORNEXORANDNAND
F
F
Pass-Transistor
Network
Pass-TransistorNetwork
AABB
AABB
Inverse
(a)
(b)
Advantages and Disadvantages
Advantages Less no of Transistors No Static Power Consumption
Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large
Transmission gates
Transmission gates (contd)
Logic with Transmission gates
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Dealing with Complexity
Divide and conquer - limit the number of components you deal with at any one time
Group several components into larger components transistors form gates gates form functional units functional units form processing
elements
A System-on-a-Chip
Courtesy Philips
Major Levels of Design
Specification Description of requirements
Systems Level placing and interconnecting major functional units
Function Level specification and design of major functional units
LogicCircuit Level Gate level design gate interconnection design
Layout Level what will actually be patterned onto the chip how the chip
will be processed Physics Level
the physics of gate and switch operation
Sub-System Design Guidelines Define the requirements Partition the overall architecture into subsystems Consider interconnection paths between the
subsystems System floor plan on silicon chip Regular structures for replication Stick diagram for each leaf-cell (module) in the
system Convert the stick diagram of each leaf-cell into
layout and go for design rule check Simulate the performance of each cell
Design Validation
Must check at every step that errors have not been introduced the longer the error remains the more
expensive it becomes to remove it
Chip Architecture ndash Floor plan After high level design is complete it is necessary to
decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of
supply and clock rails In doing this sufficient space must be left between
power rails to allow for data-buses and combinational logic cells
Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area
Alpha 21364 Microprocessor Floor plan
Major Levels of Design
Switch Logic
How do we build switches from MOS transistors
1) Pass Transistors 2) Transmission gates
Pass Transistors
We have assumed source is grounded
What if source gt 0 eg pass transistor passing VDD
VDDVDD
NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the
drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type
switches can cause electrical problems 1048708When n-type switch driving a complementary gate
cause the gate to run slower when the switch input = 1
1048708Since pull down current is weaker when a lower gate voltage is applied
1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be
PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged
through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device
Voltage degradation of Pass Transistors
Pass Transistor Ckts
VDDVDD
VSS
VDD
VDD
VDD VDD VDD
VDD
Pass Transistor Ckts
VDD
VDD V
s = V
DD-V
tn
VSS
Vs = |V
tp|
VDD
VDD
-Vtn V
DD-V
tn
VDD
-Vtn
VDD
VDD
VDD
VDD
VDD
VDD
-Vtn
VDD
-2Vtn
Complementary Pass Transistor Logic
A
B
A
B
B B B B
A
B
A
B
F=AB
F=AB
F=A+B
F=A+B
B B
A
A
A
A
F=AYacute
F=AYacute
ORNOR EXORNEXORANDNAND
F
F
Pass-Transistor
Network
Pass-TransistorNetwork
AABB
AABB
Inverse
(a)
(b)
Advantages and Disadvantages
Advantages Less no of Transistors No Static Power Consumption
Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large
Transmission gates
Transmission gates (contd)
Logic with Transmission gates
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
A System-on-a-Chip
Courtesy Philips
Major Levels of Design
Specification Description of requirements
Systems Level placing and interconnecting major functional units
Function Level specification and design of major functional units
LogicCircuit Level Gate level design gate interconnection design
Layout Level what will actually be patterned onto the chip how the chip
will be processed Physics Level
the physics of gate and switch operation
Sub-System Design Guidelines Define the requirements Partition the overall architecture into subsystems Consider interconnection paths between the
subsystems System floor plan on silicon chip Regular structures for replication Stick diagram for each leaf-cell (module) in the
system Convert the stick diagram of each leaf-cell into
layout and go for design rule check Simulate the performance of each cell
Design Validation
Must check at every step that errors have not been introduced the longer the error remains the more
expensive it becomes to remove it
Chip Architecture ndash Floor plan After high level design is complete it is necessary to
decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of
supply and clock rails In doing this sufficient space must be left between
power rails to allow for data-buses and combinational logic cells
Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area
Alpha 21364 Microprocessor Floor plan
Major Levels of Design
Switch Logic
How do we build switches from MOS transistors
1) Pass Transistors 2) Transmission gates
Pass Transistors
We have assumed source is grounded
What if source gt 0 eg pass transistor passing VDD
VDDVDD
NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the
drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type
switches can cause electrical problems 1048708When n-type switch driving a complementary gate
cause the gate to run slower when the switch input = 1
1048708Since pull down current is weaker when a lower gate voltage is applied
1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be
PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged
through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device
Voltage degradation of Pass Transistors
Pass Transistor Ckts
VDDVDD
VSS
VDD
VDD
VDD VDD VDD
VDD
Pass Transistor Ckts
VDD
VDD V
s = V
DD-V
tn
VSS
Vs = |V
tp|
VDD
VDD
-Vtn V
DD-V
tn
VDD
-Vtn
VDD
VDD
VDD
VDD
VDD
VDD
-Vtn
VDD
-2Vtn
Complementary Pass Transistor Logic
A
B
A
B
B B B B
A
B
A
B
F=AB
F=AB
F=A+B
F=A+B
B B
A
A
A
A
F=AYacute
F=AYacute
ORNOR EXORNEXORANDNAND
F
F
Pass-Transistor
Network
Pass-TransistorNetwork
AABB
AABB
Inverse
(a)
(b)
Advantages and Disadvantages
Advantages Less no of Transistors No Static Power Consumption
Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large
Transmission gates
Transmission gates (contd)
Logic with Transmission gates
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Major Levels of Design
Specification Description of requirements
Systems Level placing and interconnecting major functional units
Function Level specification and design of major functional units
LogicCircuit Level Gate level design gate interconnection design
Layout Level what will actually be patterned onto the chip how the chip
will be processed Physics Level
the physics of gate and switch operation
Sub-System Design Guidelines Define the requirements Partition the overall architecture into subsystems Consider interconnection paths between the
subsystems System floor plan on silicon chip Regular structures for replication Stick diagram for each leaf-cell (module) in the
system Convert the stick diagram of each leaf-cell into
layout and go for design rule check Simulate the performance of each cell
Design Validation
Must check at every step that errors have not been introduced the longer the error remains the more
expensive it becomes to remove it
Chip Architecture ndash Floor plan After high level design is complete it is necessary to
decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of
supply and clock rails In doing this sufficient space must be left between
power rails to allow for data-buses and combinational logic cells
Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area
Alpha 21364 Microprocessor Floor plan
Major Levels of Design
Switch Logic
How do we build switches from MOS transistors
1) Pass Transistors 2) Transmission gates
Pass Transistors
We have assumed source is grounded
What if source gt 0 eg pass transistor passing VDD
VDDVDD
NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the
drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type
switches can cause electrical problems 1048708When n-type switch driving a complementary gate
cause the gate to run slower when the switch input = 1
1048708Since pull down current is weaker when a lower gate voltage is applied
1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be
PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged
through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device
Voltage degradation of Pass Transistors
Pass Transistor Ckts
VDDVDD
VSS
VDD
VDD
VDD VDD VDD
VDD
Pass Transistor Ckts
VDD
VDD V
s = V
DD-V
tn
VSS
Vs = |V
tp|
VDD
VDD
-Vtn V
DD-V
tn
VDD
-Vtn
VDD
VDD
VDD
VDD
VDD
VDD
-Vtn
VDD
-2Vtn
Complementary Pass Transistor Logic
A
B
A
B
B B B B
A
B
A
B
F=AB
F=AB
F=A+B
F=A+B
B B
A
A
A
A
F=AYacute
F=AYacute
ORNOR EXORNEXORANDNAND
F
F
Pass-Transistor
Network
Pass-TransistorNetwork
AABB
AABB
Inverse
(a)
(b)
Advantages and Disadvantages
Advantages Less no of Transistors No Static Power Consumption
Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large
Transmission gates
Transmission gates (contd)
Logic with Transmission gates
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Sub-System Design Guidelines Define the requirements Partition the overall architecture into subsystems Consider interconnection paths between the
subsystems System floor plan on silicon chip Regular structures for replication Stick diagram for each leaf-cell (module) in the
system Convert the stick diagram of each leaf-cell into
layout and go for design rule check Simulate the performance of each cell
Design Validation
Must check at every step that errors have not been introduced the longer the error remains the more
expensive it becomes to remove it
Chip Architecture ndash Floor plan After high level design is complete it is necessary to
decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of
supply and clock rails In doing this sufficient space must be left between
power rails to allow for data-buses and combinational logic cells
Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area
Alpha 21364 Microprocessor Floor plan
Major Levels of Design
Switch Logic
How do we build switches from MOS transistors
1) Pass Transistors 2) Transmission gates
Pass Transistors
We have assumed source is grounded
What if source gt 0 eg pass transistor passing VDD
VDDVDD
NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the
drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type
switches can cause electrical problems 1048708When n-type switch driving a complementary gate
cause the gate to run slower when the switch input = 1
1048708Since pull down current is weaker when a lower gate voltage is applied
1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be
PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged
through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device
Voltage degradation of Pass Transistors
Pass Transistor Ckts
VDDVDD
VSS
VDD
VDD
VDD VDD VDD
VDD
Pass Transistor Ckts
VDD
VDD V
s = V
DD-V
tn
VSS
Vs = |V
tp|
VDD
VDD
-Vtn V
DD-V
tn
VDD
-Vtn
VDD
VDD
VDD
VDD
VDD
VDD
-Vtn
VDD
-2Vtn
Complementary Pass Transistor Logic
A
B
A
B
B B B B
A
B
A
B
F=AB
F=AB
F=A+B
F=A+B
B B
A
A
A
A
F=AYacute
F=AYacute
ORNOR EXORNEXORANDNAND
F
F
Pass-Transistor
Network
Pass-TransistorNetwork
AABB
AABB
Inverse
(a)
(b)
Advantages and Disadvantages
Advantages Less no of Transistors No Static Power Consumption
Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large
Transmission gates
Transmission gates (contd)
Logic with Transmission gates
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Design Validation
Must check at every step that errors have not been introduced the longer the error remains the more
expensive it becomes to remove it
Chip Architecture ndash Floor plan After high level design is complete it is necessary to
decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of
supply and clock rails In doing this sufficient space must be left between
power rails to allow for data-buses and combinational logic cells
Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area
Alpha 21364 Microprocessor Floor plan
Major Levels of Design
Switch Logic
How do we build switches from MOS transistors
1) Pass Transistors 2) Transmission gates
Pass Transistors
We have assumed source is grounded
What if source gt 0 eg pass transistor passing VDD
VDDVDD
NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the
drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type
switches can cause electrical problems 1048708When n-type switch driving a complementary gate
cause the gate to run slower when the switch input = 1
1048708Since pull down current is weaker when a lower gate voltage is applied
1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be
PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged
through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device
Voltage degradation of Pass Transistors
Pass Transistor Ckts
VDDVDD
VSS
VDD
VDD
VDD VDD VDD
VDD
Pass Transistor Ckts
VDD
VDD V
s = V
DD-V
tn
VSS
Vs = |V
tp|
VDD
VDD
-Vtn V
DD-V
tn
VDD
-Vtn
VDD
VDD
VDD
VDD
VDD
VDD
-Vtn
VDD
-2Vtn
Complementary Pass Transistor Logic
A
B
A
B
B B B B
A
B
A
B
F=AB
F=AB
F=A+B
F=A+B
B B
A
A
A
A
F=AYacute
F=AYacute
ORNOR EXORNEXORANDNAND
F
F
Pass-Transistor
Network
Pass-TransistorNetwork
AABB
AABB
Inverse
(a)
(b)
Advantages and Disadvantages
Advantages Less no of Transistors No Static Power Consumption
Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large
Transmission gates
Transmission gates (contd)
Logic with Transmission gates
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Chip Architecture ndash Floor plan After high level design is complete it is necessary to
decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of
supply and clock rails In doing this sufficient space must be left between
power rails to allow for data-buses and combinational logic cells
Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area
Alpha 21364 Microprocessor Floor plan
Major Levels of Design
Switch Logic
How do we build switches from MOS transistors
1) Pass Transistors 2) Transmission gates
Pass Transistors
We have assumed source is grounded
What if source gt 0 eg pass transistor passing VDD
VDDVDD
NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the
drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type
switches can cause electrical problems 1048708When n-type switch driving a complementary gate
cause the gate to run slower when the switch input = 1
1048708Since pull down current is weaker when a lower gate voltage is applied
1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be
PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged
through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device
Voltage degradation of Pass Transistors
Pass Transistor Ckts
VDDVDD
VSS
VDD
VDD
VDD VDD VDD
VDD
Pass Transistor Ckts
VDD
VDD V
s = V
DD-V
tn
VSS
Vs = |V
tp|
VDD
VDD
-Vtn V
DD-V
tn
VDD
-Vtn
VDD
VDD
VDD
VDD
VDD
VDD
-Vtn
VDD
-2Vtn
Complementary Pass Transistor Logic
A
B
A
B
B B B B
A
B
A
B
F=AB
F=AB
F=A+B
F=A+B
B B
A
A
A
A
F=AYacute
F=AYacute
ORNOR EXORNEXORANDNAND
F
F
Pass-Transistor
Network
Pass-TransistorNetwork
AABB
AABB
Inverse
(a)
(b)
Advantages and Disadvantages
Advantages Less no of Transistors No Static Power Consumption
Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large
Transmission gates
Transmission gates (contd)
Logic with Transmission gates
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Alpha 21364 Microprocessor Floor plan
Major Levels of Design
Switch Logic
How do we build switches from MOS transistors
1) Pass Transistors 2) Transmission gates
Pass Transistors
We have assumed source is grounded
What if source gt 0 eg pass transistor passing VDD
VDDVDD
NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the
drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type
switches can cause electrical problems 1048708When n-type switch driving a complementary gate
cause the gate to run slower when the switch input = 1
1048708Since pull down current is weaker when a lower gate voltage is applied
1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be
PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged
through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device
Voltage degradation of Pass Transistors
Pass Transistor Ckts
VDDVDD
VSS
VDD
VDD
VDD VDD VDD
VDD
Pass Transistor Ckts
VDD
VDD V
s = V
DD-V
tn
VSS
Vs = |V
tp|
VDD
VDD
-Vtn V
DD-V
tn
VDD
-Vtn
VDD
VDD
VDD
VDD
VDD
VDD
-Vtn
VDD
-2Vtn
Complementary Pass Transistor Logic
A
B
A
B
B B B B
A
B
A
B
F=AB
F=AB
F=A+B
F=A+B
B B
A
A
A
A
F=AYacute
F=AYacute
ORNOR EXORNEXORANDNAND
F
F
Pass-Transistor
Network
Pass-TransistorNetwork
AABB
AABB
Inverse
(a)
(b)
Advantages and Disadvantages
Advantages Less no of Transistors No Static Power Consumption
Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large
Transmission gates
Transmission gates (contd)
Logic with Transmission gates
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Major Levels of Design
Switch Logic
How do we build switches from MOS transistors
1) Pass Transistors 2) Transmission gates
Pass Transistors
We have assumed source is grounded
What if source gt 0 eg pass transistor passing VDD
VDDVDD
NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the
drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type
switches can cause electrical problems 1048708When n-type switch driving a complementary gate
cause the gate to run slower when the switch input = 1
1048708Since pull down current is weaker when a lower gate voltage is applied
1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be
PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged
through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device
Voltage degradation of Pass Transistors
Pass Transistor Ckts
VDDVDD
VSS
VDD
VDD
VDD VDD VDD
VDD
Pass Transistor Ckts
VDD
VDD V
s = V
DD-V
tn
VSS
Vs = |V
tp|
VDD
VDD
-Vtn V
DD-V
tn
VDD
-Vtn
VDD
VDD
VDD
VDD
VDD
VDD
-Vtn
VDD
-2Vtn
Complementary Pass Transistor Logic
A
B
A
B
B B B B
A
B
A
B
F=AB
F=AB
F=A+B
F=A+B
B B
A
A
A
A
F=AYacute
F=AYacute
ORNOR EXORNEXORANDNAND
F
F
Pass-Transistor
Network
Pass-TransistorNetwork
AABB
AABB
Inverse
(a)
(b)
Advantages and Disadvantages
Advantages Less no of Transistors No Static Power Consumption
Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large
Transmission gates
Transmission gates (contd)
Logic with Transmission gates
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Switch Logic
How do we build switches from MOS transistors
1) Pass Transistors 2) Transmission gates
Pass Transistors
We have assumed source is grounded
What if source gt 0 eg pass transistor passing VDD
VDDVDD
NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the
drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type
switches can cause electrical problems 1048708When n-type switch driving a complementary gate
cause the gate to run slower when the switch input = 1
1048708Since pull down current is weaker when a lower gate voltage is applied
1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be
PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged
through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device
Voltage degradation of Pass Transistors
Pass Transistor Ckts
VDDVDD
VSS
VDD
VDD
VDD VDD VDD
VDD
Pass Transistor Ckts
VDD
VDD V
s = V
DD-V
tn
VSS
Vs = |V
tp|
VDD
VDD
-Vtn V
DD-V
tn
VDD
-Vtn
VDD
VDD
VDD
VDD
VDD
VDD
-Vtn
VDD
-2Vtn
Complementary Pass Transistor Logic
A
B
A
B
B B B B
A
B
A
B
F=AB
F=AB
F=A+B
F=A+B
B B
A
A
A
A
F=AYacute
F=AYacute
ORNOR EXORNEXORANDNAND
F
F
Pass-Transistor
Network
Pass-TransistorNetwork
AABB
AABB
Inverse
(a)
(b)
Advantages and Disadvantages
Advantages Less no of Transistors No Static Power Consumption
Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large
Transmission gates
Transmission gates (contd)
Logic with Transmission gates
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Pass Transistors
We have assumed source is grounded
What if source gt 0 eg pass transistor passing VDD
VDDVDD
NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the
drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type
switches can cause electrical problems 1048708When n-type switch driving a complementary gate
cause the gate to run slower when the switch input = 1
1048708Since pull down current is weaker when a lower gate voltage is applied
1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be
PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged
through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device
Voltage degradation of Pass Transistors
Pass Transistor Ckts
VDDVDD
VSS
VDD
VDD
VDD VDD VDD
VDD
Pass Transistor Ckts
VDD
VDD V
s = V
DD-V
tn
VSS
Vs = |V
tp|
VDD
VDD
-Vtn V
DD-V
tn
VDD
-Vtn
VDD
VDD
VDD
VDD
VDD
VDD
-Vtn
VDD
-2Vtn
Complementary Pass Transistor Logic
A
B
A
B
B B B B
A
B
A
B
F=AB
F=AB
F=A+B
F=A+B
B B
A
A
A
A
F=AYacute
F=AYacute
ORNOR EXORNEXORANDNAND
F
F
Pass-Transistor
Network
Pass-TransistorNetwork
AABB
AABB
Inverse
(a)
(b)
Advantages and Disadvantages
Advantages Less no of Transistors No Static Power Consumption
Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large
Transmission gates
Transmission gates (contd)
Logic with Transmission gates
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the
drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type
switches can cause electrical problems 1048708When n-type switch driving a complementary gate
cause the gate to run slower when the switch input = 1
1048708Since pull down current is weaker when a lower gate voltage is applied
1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be
PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged
through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device
Voltage degradation of Pass Transistors
Pass Transistor Ckts
VDDVDD
VSS
VDD
VDD
VDD VDD VDD
VDD
Pass Transistor Ckts
VDD
VDD V
s = V
DD-V
tn
VSS
Vs = |V
tp|
VDD
VDD
-Vtn V
DD-V
tn
VDD
-Vtn
VDD
VDD
VDD
VDD
VDD
VDD
-Vtn
VDD
-2Vtn
Complementary Pass Transistor Logic
A
B
A
B
B B B B
A
B
A
B
F=AB
F=AB
F=A+B
F=A+B
B B
A
A
A
A
F=AYacute
F=AYacute
ORNOR EXORNEXORANDNAND
F
F
Pass-Transistor
Network
Pass-TransistorNetwork
AABB
AABB
Inverse
(a)
(b)
Advantages and Disadvantages
Advantages Less no of Transistors No Static Power Consumption
Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large
Transmission gates
Transmission gates (contd)
Logic with Transmission gates
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged
through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device
Voltage degradation of Pass Transistors
Pass Transistor Ckts
VDDVDD
VSS
VDD
VDD
VDD VDD VDD
VDD
Pass Transistor Ckts
VDD
VDD V
s = V
DD-V
tn
VSS
Vs = |V
tp|
VDD
VDD
-Vtn V
DD-V
tn
VDD
-Vtn
VDD
VDD
VDD
VDD
VDD
VDD
-Vtn
VDD
-2Vtn
Complementary Pass Transistor Logic
A
B
A
B
B B B B
A
B
A
B
F=AB
F=AB
F=A+B
F=A+B
B B
A
A
A
A
F=AYacute
F=AYacute
ORNOR EXORNEXORANDNAND
F
F
Pass-Transistor
Network
Pass-TransistorNetwork
AABB
AABB
Inverse
(a)
(b)
Advantages and Disadvantages
Advantages Less no of Transistors No Static Power Consumption
Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large
Transmission gates
Transmission gates (contd)
Logic with Transmission gates
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Voltage degradation of Pass Transistors
Pass Transistor Ckts
VDDVDD
VSS
VDD
VDD
VDD VDD VDD
VDD
Pass Transistor Ckts
VDD
VDD V
s = V
DD-V
tn
VSS
Vs = |V
tp|
VDD
VDD
-Vtn V
DD-V
tn
VDD
-Vtn
VDD
VDD
VDD
VDD
VDD
VDD
-Vtn
VDD
-2Vtn
Complementary Pass Transistor Logic
A
B
A
B
B B B B
A
B
A
B
F=AB
F=AB
F=A+B
F=A+B
B B
A
A
A
A
F=AYacute
F=AYacute
ORNOR EXORNEXORANDNAND
F
F
Pass-Transistor
Network
Pass-TransistorNetwork
AABB
AABB
Inverse
(a)
(b)
Advantages and Disadvantages
Advantages Less no of Transistors No Static Power Consumption
Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large
Transmission gates
Transmission gates (contd)
Logic with Transmission gates
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Pass Transistor Ckts
VDDVDD
VSS
VDD
VDD
VDD VDD VDD
VDD
Pass Transistor Ckts
VDD
VDD V
s = V
DD-V
tn
VSS
Vs = |V
tp|
VDD
VDD
-Vtn V
DD-V
tn
VDD
-Vtn
VDD
VDD
VDD
VDD
VDD
VDD
-Vtn
VDD
-2Vtn
Complementary Pass Transistor Logic
A
B
A
B
B B B B
A
B
A
B
F=AB
F=AB
F=A+B
F=A+B
B B
A
A
A
A
F=AYacute
F=AYacute
ORNOR EXORNEXORANDNAND
F
F
Pass-Transistor
Network
Pass-TransistorNetwork
AABB
AABB
Inverse
(a)
(b)
Advantages and Disadvantages
Advantages Less no of Transistors No Static Power Consumption
Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large
Transmission gates
Transmission gates (contd)
Logic with Transmission gates
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Pass Transistor Ckts
VDD
VDD V
s = V
DD-V
tn
VSS
Vs = |V
tp|
VDD
VDD
-Vtn V
DD-V
tn
VDD
-Vtn
VDD
VDD
VDD
VDD
VDD
VDD
-Vtn
VDD
-2Vtn
Complementary Pass Transistor Logic
A
B
A
B
B B B B
A
B
A
B
F=AB
F=AB
F=A+B
F=A+B
B B
A
A
A
A
F=AYacute
F=AYacute
ORNOR EXORNEXORANDNAND
F
F
Pass-Transistor
Network
Pass-TransistorNetwork
AABB
AABB
Inverse
(a)
(b)
Advantages and Disadvantages
Advantages Less no of Transistors No Static Power Consumption
Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large
Transmission gates
Transmission gates (contd)
Logic with Transmission gates
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Complementary Pass Transistor Logic
A
B
A
B
B B B B
A
B
A
B
F=AB
F=AB
F=A+B
F=A+B
B B
A
A
A
A
F=AYacute
F=AYacute
ORNOR EXORNEXORANDNAND
F
F
Pass-Transistor
Network
Pass-TransistorNetwork
AABB
AABB
Inverse
(a)
(b)
Advantages and Disadvantages
Advantages Less no of Transistors No Static Power Consumption
Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large
Transmission gates
Transmission gates (contd)
Logic with Transmission gates
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Advantages and Disadvantages
Advantages Less no of Transistors No Static Power Consumption
Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large
Transmission gates
Transmission gates (contd)
Logic with Transmission gates
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Transmission gates
Transmission gates (contd)
Logic with Transmission gates
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Transmission gates (contd)
Logic with Transmission gates
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Logic with Transmission gates
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Logic with Transmission gates
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Advantages and Disadvantages
Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates
Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Gate Logic
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Sizing of NMOS inverter
The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Sizing of NMOS Nand Gate
The ratio between pu to all pd transistors
(Zpu nZpd) must be minimum 41 for making correct
level of output voltage
nMOS Nand gate geometry reveals two factors
Area of nand gate is greater than area of inverter
because more no of pull down transistors and
corresponding increase in length of pull up
transistor
Delay is also increased due to direct proportion to
the number of inputs added
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
NMOS Nor Gate Since the pull down transistors are parallel
in Nor gate ie the pull down ratio for all transistors is same
So it has same characteristics as inverter The area occupied is reasonable since
there is no increase in length of pull-up transistor
So Nor gate is preferable than nand gate
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
CMOS Logic
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Properties of CMOS Gates
bull High noise margins
VOH and VOL are at VDD and GND respectively
bull No static power consumption
There never exists a direct path between VDD and
VSS (GND) in steady-state mode
bull Comparable rise and fall times
(under appropriate sizing conditions)
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
CMOS Nand and Nor Characteristics
CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by
Allowing extended fall times for series nmos transistors (for series resistance)
Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which
increase the resistance and delay This effect the transfer characteristics and reduce
noise immunity So Geometries of nmos and pmos transistors should
change
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
CMOS Logic
Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Static CMOS Switch Delay Model
A
Req
A
Rp
A
Rp
A
Rn CL
A
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
B
Rp
A
Rp
A
Rn
B
Rn CL
Cint
NAND2
INV
NOR2
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Input Pattern Effects on Delay
Delay is dependent on the pattern of inputs
Low to high transition both inputs go low
delay is 069 Rp2 CL
one input goes low delay is 069 Rp CL
High to low transition both inputs go high
delay is 069 2Rn CL
CL
B
Rn
A
Rp
B
Rp
A
Rn Cint
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Delay Dependence on Input Patterns
-05
0
05
1
15
2
25
3
0 100 200 300 400
A=B=10
A=1 B=10
A=1 0 B=1
time [ps]
Vo
ltage
[V]
Input DataPattern
Delay(psec)
A=B=01 67
A=1 B=01
64
A= 01 B=1
61
A=B=10 45
A=1 B=10
80
A= 10 B=1
81
NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Pseudo NMOS Logic
Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic
VDD
F
In1In2
InN
In1In2
InN
PUN
PDN
helliphellip
Static CMOS
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Pseudo NMOS Operation The pulldown network of the gate is the same as for a
fully complementary gate The pullup network is replaced by a single p-type
transistor whose gate is connected to VSS leaving the transistor permanently on
The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are
off and the p-type transistor pulls the gatersquos output up to VDD
When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Pseudo NMOS Characteristics
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Pseudo NMOS VTC
00 05 10 15 20 2500
05
10
15
20
25
30
Vin [V]
Vou
t [V
]
WLp = 4
WLp = 2
WLp = 1
WLp = 025
WLp = 05
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Advantages and Disadvantages
Advantages
Main advantage of the pseudo-nMOS gate is the small
size of the pullup network both in terms of number of
devices and wiring complexity
Disadvantages
Due to more pull-up resistance delay is more and
hence speed of circuits is less
More Static power dissipation due to conduction path
between VDD and VSS
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Dynamic CMOS Logic The disadvantage of
Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic
It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases
In1
In2 PDN
In3
Me
Mp
Clk
Clk
Out
CL
Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Dynamic CMOS Logic operation In static circuits at every point in time (except
when switching) the output is connected to either GND or VDD via a low resistance path
fan-in of n requires 2n (n N-type + n P-type) devices
Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes
requires on n + 2 (n+1 N-type + 1 P-type) transistors
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Dynamic CMOS Logic operation Precharge
When CLK goes low the p-type transistor starts charging the precharge capacitance
The pulldown transistors controlled by the clock keep that precharge node from being drained
The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1
Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge
capacitance If the inputs create a conducting path through the pulldown network the
precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Conditions on Output Once the output of a
dynamic gate is discharged it cannot be charged again until the next precharge operation
Inputs to the gate can make at most one transition during evaluation
Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL
Out
Clk
Clk
A
BC
Mp
Me
on
off
1
off
on
((AB)+C)
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic
levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS
no glitching higher transition probabilities extra load on Clk
PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn
low noise margin (NML)
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Advantages and Disadvantages Advantages
low area higher speed than static complementary gates
Disadvantages Precharged gates introduce functional complexity
because they must be operated in two distinct phases
Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are
difficult to turn off to save power
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Cascading Dynamic Gates
Clk
Clk
Out1
In
Mp
Me
Mp
Me
Clk
Clk
Out2
V
t
Clk
In
Out1
Out2V
VTn
Only 0 1 transitions allowed at inputs
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Cascading Dynamic Gates- Problem
Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge
The second dynamic inverter turns off (PDN) when Out1 reaches VTn
Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period
Setting all inputs of the second gate to 0 during precharge will fix it
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Domino CMOS Logic
In1
In2 PDN
In3
Me
Mp
Clk
Clk1 11 0
Out1
Combination of Dynamic Logic and Static inverter is Domino Logic
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)
When CLK goes high precharging stops ie the p-type pullup turns off
The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from
0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance
If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1
If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Properties of Domino Logic Only non-inverting logic can be
implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed
static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical
effort
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Cascading Domino Logic Gates
In1
In2 PDN
In3
Me
Mp
Clk
ClkOut1
In4 PDN
In5
Me
Mp
Clk
ClkOut21 1
1 00 00 1
Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period
Hence the only possible transition during evaluation is 0 -gt 1
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Clocked CMOS Logic It is a combination of
Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS
Fan-in for this logic is 2n+2 When Clk goes high then logic of the
circuit is evaluated due to NMOS in ON condition
When Clk goes low then Logic is not evaluated due to NMOS in OFF condition
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Advantages and Disadvantages
Advantages Clocked CMOS logic has been used for
very low power CMOS andor for minimizing hot electron effect problems in N-FET devices
Disadvantages More Area More Complexity
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
N-P CMOS Logic
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
N-P CMOS Logic An elegant solution to the dynamic CMOS logic
ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below
Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation
phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node
During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd
During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd
Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Cascading N-P Logic
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Example Logic below
Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo
X
G
Z
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
BiCMOS Technology CMOS properties
Low Power Slower
BJT properties Faster High Power
Implementation CMOS amp BJT Core Logic CMOS Interface BJT
Basic Circuits Inverter Nand Gate Nor Gate
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
BiCMOS Circuits
InverterNand Gate
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Structured Design
Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Examples Combinational Logic
5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code
Converter
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Adders
1-bit full adder Manchester Carry Chain Adder Enhancement Techniques
Carry Select Adders Carry Skip Adders Carry Look-ahead adders
32-bit Adders
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
1-bit full adder
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
CMOS Full Adder
Sum Output Carry output
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Complementary Static CMOS Adder
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Standard cells dimensions for Adder
Adder is made up of standard cells of
Multiplexers - L=11λ W=7λ
i) NMOS Inverter (81 and 41 ratio)
Butting contact - L=22λ W=10λ
Buried contact - L=30λ W=10λ
ii) CMOS Inverter - L=35λ W=18λ
Communication paths - 3λ spacing between metal
contacts
So for Adder Standard cell - L=190λ W=150λ
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Layout of full adder
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Layout2 of full adder
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Propagate and Generate Signal
For a full adder define what happens to carries Generate Cout = 1 independent of C
G = A bull B Propagate Cout = C
P = A B Kill Cout = 0 independent of C
K = ~A bull ~B
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Manchester Carry Chain
Build from switch logic using propagate generate amp kill
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Manchester Chain adders The carry input is
precharged with clock signal instead of passing through the Logic
Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor
Generate signal (G= ArsquoBrsquo)
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
4-stage Dynamic Manchester Chain
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Carry-Ripple Adder
Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry
delay
CinCout
B1A1B2A2B3A3B4A4
S1S2S3S4
C1C2C3
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we
use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel
Output Carry is represented in terms of generate and propagate signals
The Expressions for 4-bit CLA are
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
4-Stage Carry look-ahead adder
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
1-bit CMOS CLA
C1=G0+P0C0
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
4-bit CMOS CLA
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Carry Select Adder
It is also referred as Conditional sum adder
The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin
S and Cout are selected by actual Cin using 21 Multiplexer
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Carry Select Adder Structure
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
8-bit Carry Select Adder
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Delay of Carry Select Adder
The Delay of n-bit adder is given by T=PK1+(M-1)K2
where M=No of Blocks in the adder
P=No of Cells of each block K1=Delay through one adder
cell
K2 = Delay through Multiplexer
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Carry Skip Adder (CSA)
It is also called as Carry Bypass Adder
Looks for cases in carry-out of a set of bits is identical to carry in
Typically organized into m-bit stages If Ai neBi for every bit in stage then
bypass gate sends stagersquos carry input directly to carry output
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Simplified Carry-Skip Adder Logic
FAFA
aibi
sisi+1
ai+1bi+1
cici+1
pipi+1
skip signal
ci+1
2-Stage Carry Skip Adder
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
4-stage carry skip adder
If (P0 and P1 and P2 and P3 = 1)then C3 = C0
else C3=output carry of stage4 adder
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Delay of CSA
Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder
cell k2=delay for skipping the
carry over a block
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Delay for Ripple and Carry Skip Adder
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Comparison of CLA and CSK
Using 32-bit operands a multi-level carry-skip adder
was 14 faster and its power dissipation was 58
of that of the carry-lookahead adder
Using 64-bit operands a one-level carry-skip adder
was 38 slower and its power consumption is 68
of the the carry-lookahead adder
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Comparison of Adders
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
MULTIPLIERS
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Multipliers Basics
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
4-bit Multiplier
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Multiplier structure using Shift and Add
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-
Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Serial-Parallel Multiplier
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Braun Array
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Modified Booth Algorithm
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Booth Multiplier Procedure
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Structure of Boothrsquos Multiplier
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Wallace Tree Multiplier A Wallace tree is an implementation of an adder
tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors
(carry and sum) Carry and sum outputs combined using a
conventional adder Compresses the no of stages of partial products
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Wallace Tree Multiplier Structure
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Sequential Logic
D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Design of Subsystem - ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Data path of a Processor
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Designing the complex systems - ALU
Complex systems are designed in Top-Down approach with the help of CAD tools
Partition the system sensibly Aiming for simple interconnection and high
regularity between sub-systems Generate and verify each section of the
design Calculate the dimensions of the layout of sub-
systems and check the proportion in the total chip area
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Bit Slice design of ALU
Design of Adder and Shifter is essential for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Barrel Shifter in 4-bit ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Barrel Shifter- Operation
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
4x4 Barrel Shifter Pass Transistor Circuit
Routing Power rails for ALU
Routing Power rails for ALU