Chp 3- Sub-system Design

Post on 28-Oct-2014

79 views 3 download

Tags:

description

vlsi

transcript

Sub-System Design

Topics Architectural issues Switch logic Gate Logic Examples of Structured Design - Combinational logic - Design of an ALU subsystem Consideration of Adders Multipliers Sequential Circuits Semiconductor Memories

Introduction Large systems are composed of sub-systems

known as Leaf-Cells The most basic leaf cell is the common logic

gate (inverter nand etc) Structured Design

High regularity Leaf cells replicated many times and interconnected

to form the system Logical and systematic approach to VLSI

design is essential

Dealing with Complexity

Divide and conquer - limit the number of components you deal with at any one time

Group several components into larger components transistors form gates gates form functional units functional units form processing

elements

A System-on-a-Chip

Courtesy Philips

Major Levels of Design

Specification Description of requirements

Systems Level placing and interconnecting major functional units

Function Level specification and design of major functional units

LogicCircuit Level Gate level design gate interconnection design

Layout Level what will actually be patterned onto the chip how the chip

will be processed Physics Level

the physics of gate and switch operation

Sub-System Design Guidelines Define the requirements Partition the overall architecture into subsystems Consider interconnection paths between the

subsystems System floor plan on silicon chip Regular structures for replication Stick diagram for each leaf-cell (module) in the

system Convert the stick diagram of each leaf-cell into

layout and go for design rule check Simulate the performance of each cell

Design Validation

Must check at every step that errors have not been introduced the longer the error remains the more

expensive it becomes to remove it

Chip Architecture ndash Floor plan After high level design is complete it is necessary to

decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of

supply and clock rails In doing this sufficient space must be left between

power rails to allow for data-buses and combinational logic cells

Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area

Alpha 21364 Microprocessor Floor plan

Major Levels of Design

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Topics Architectural issues Switch logic Gate Logic Examples of Structured Design - Combinational logic - Design of an ALU subsystem Consideration of Adders Multipliers Sequential Circuits Semiconductor Memories

Introduction Large systems are composed of sub-systems

known as Leaf-Cells The most basic leaf cell is the common logic

gate (inverter nand etc) Structured Design

High regularity Leaf cells replicated many times and interconnected

to form the system Logical and systematic approach to VLSI

design is essential

Dealing with Complexity

Divide and conquer - limit the number of components you deal with at any one time

Group several components into larger components transistors form gates gates form functional units functional units form processing

elements

A System-on-a-Chip

Courtesy Philips

Major Levels of Design

Specification Description of requirements

Systems Level placing and interconnecting major functional units

Function Level specification and design of major functional units

LogicCircuit Level Gate level design gate interconnection design

Layout Level what will actually be patterned onto the chip how the chip

will be processed Physics Level

the physics of gate and switch operation

Sub-System Design Guidelines Define the requirements Partition the overall architecture into subsystems Consider interconnection paths between the

subsystems System floor plan on silicon chip Regular structures for replication Stick diagram for each leaf-cell (module) in the

system Convert the stick diagram of each leaf-cell into

layout and go for design rule check Simulate the performance of each cell

Design Validation

Must check at every step that errors have not been introduced the longer the error remains the more

expensive it becomes to remove it

Chip Architecture ndash Floor plan After high level design is complete it is necessary to

decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of

supply and clock rails In doing this sufficient space must be left between

power rails to allow for data-buses and combinational logic cells

Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area

Alpha 21364 Microprocessor Floor plan

Major Levels of Design

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Introduction Large systems are composed of sub-systems

known as Leaf-Cells The most basic leaf cell is the common logic

gate (inverter nand etc) Structured Design

High regularity Leaf cells replicated many times and interconnected

to form the system Logical and systematic approach to VLSI

design is essential

Dealing with Complexity

Divide and conquer - limit the number of components you deal with at any one time

Group several components into larger components transistors form gates gates form functional units functional units form processing

elements

A System-on-a-Chip

Courtesy Philips

Major Levels of Design

Specification Description of requirements

Systems Level placing and interconnecting major functional units

Function Level specification and design of major functional units

LogicCircuit Level Gate level design gate interconnection design

Layout Level what will actually be patterned onto the chip how the chip

will be processed Physics Level

the physics of gate and switch operation

Sub-System Design Guidelines Define the requirements Partition the overall architecture into subsystems Consider interconnection paths between the

subsystems System floor plan on silicon chip Regular structures for replication Stick diagram for each leaf-cell (module) in the

system Convert the stick diagram of each leaf-cell into

layout and go for design rule check Simulate the performance of each cell

Design Validation

Must check at every step that errors have not been introduced the longer the error remains the more

expensive it becomes to remove it

Chip Architecture ndash Floor plan After high level design is complete it is necessary to

decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of

supply and clock rails In doing this sufficient space must be left between

power rails to allow for data-buses and combinational logic cells

Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area

Alpha 21364 Microprocessor Floor plan

Major Levels of Design

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Dealing with Complexity

Divide and conquer - limit the number of components you deal with at any one time

Group several components into larger components transistors form gates gates form functional units functional units form processing

elements

A System-on-a-Chip

Courtesy Philips

Major Levels of Design

Specification Description of requirements

Systems Level placing and interconnecting major functional units

Function Level specification and design of major functional units

LogicCircuit Level Gate level design gate interconnection design

Layout Level what will actually be patterned onto the chip how the chip

will be processed Physics Level

the physics of gate and switch operation

Sub-System Design Guidelines Define the requirements Partition the overall architecture into subsystems Consider interconnection paths between the

subsystems System floor plan on silicon chip Regular structures for replication Stick diagram for each leaf-cell (module) in the

system Convert the stick diagram of each leaf-cell into

layout and go for design rule check Simulate the performance of each cell

Design Validation

Must check at every step that errors have not been introduced the longer the error remains the more

expensive it becomes to remove it

Chip Architecture ndash Floor plan After high level design is complete it is necessary to

decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of

supply and clock rails In doing this sufficient space must be left between

power rails to allow for data-buses and combinational logic cells

Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area

Alpha 21364 Microprocessor Floor plan

Major Levels of Design

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

A System-on-a-Chip

Courtesy Philips

Major Levels of Design

Specification Description of requirements

Systems Level placing and interconnecting major functional units

Function Level specification and design of major functional units

LogicCircuit Level Gate level design gate interconnection design

Layout Level what will actually be patterned onto the chip how the chip

will be processed Physics Level

the physics of gate and switch operation

Sub-System Design Guidelines Define the requirements Partition the overall architecture into subsystems Consider interconnection paths between the

subsystems System floor plan on silicon chip Regular structures for replication Stick diagram for each leaf-cell (module) in the

system Convert the stick diagram of each leaf-cell into

layout and go for design rule check Simulate the performance of each cell

Design Validation

Must check at every step that errors have not been introduced the longer the error remains the more

expensive it becomes to remove it

Chip Architecture ndash Floor plan After high level design is complete it is necessary to

decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of

supply and clock rails In doing this sufficient space must be left between

power rails to allow for data-buses and combinational logic cells

Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area

Alpha 21364 Microprocessor Floor plan

Major Levels of Design

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Major Levels of Design

Specification Description of requirements

Systems Level placing and interconnecting major functional units

Function Level specification and design of major functional units

LogicCircuit Level Gate level design gate interconnection design

Layout Level what will actually be patterned onto the chip how the chip

will be processed Physics Level

the physics of gate and switch operation

Sub-System Design Guidelines Define the requirements Partition the overall architecture into subsystems Consider interconnection paths between the

subsystems System floor plan on silicon chip Regular structures for replication Stick diagram for each leaf-cell (module) in the

system Convert the stick diagram of each leaf-cell into

layout and go for design rule check Simulate the performance of each cell

Design Validation

Must check at every step that errors have not been introduced the longer the error remains the more

expensive it becomes to remove it

Chip Architecture ndash Floor plan After high level design is complete it is necessary to

decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of

supply and clock rails In doing this sufficient space must be left between

power rails to allow for data-buses and combinational logic cells

Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area

Alpha 21364 Microprocessor Floor plan

Major Levels of Design

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Sub-System Design Guidelines Define the requirements Partition the overall architecture into subsystems Consider interconnection paths between the

subsystems System floor plan on silicon chip Regular structures for replication Stick diagram for each leaf-cell (module) in the

system Convert the stick diagram of each leaf-cell into

layout and go for design rule check Simulate the performance of each cell

Design Validation

Must check at every step that errors have not been introduced the longer the error remains the more

expensive it becomes to remove it

Chip Architecture ndash Floor plan After high level design is complete it is necessary to

decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of

supply and clock rails In doing this sufficient space must be left between

power rails to allow for data-buses and combinational logic cells

Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area

Alpha 21364 Microprocessor Floor plan

Major Levels of Design

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Design Validation

Must check at every step that errors have not been introduced the longer the error remains the more

expensive it becomes to remove it

Chip Architecture ndash Floor plan After high level design is complete it is necessary to

decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of

supply and clock rails In doing this sufficient space must be left between

power rails to allow for data-buses and combinational logic cells

Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area

Alpha 21364 Microprocessor Floor plan

Major Levels of Design

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Chip Architecture ndash Floor plan After high level design is complete it is necessary to

decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of

supply and clock rails In doing this sufficient space must be left between

power rails to allow for data-buses and combinational logic cells

Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area

Alpha 21364 Microprocessor Floor plan

Major Levels of Design

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Alpha 21364 Microprocessor Floor plan

Major Levels of Design

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Major Levels of Design

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Routing Power rails for ALU