+ All Categories
Home > Documents > Chp 3- Sub-system Design

Chp 3- Sub-system Design

Date post: 28-Oct-2014
Category:
Upload: kkece41
View: 79 times
Download: 3 times
Share this document with a friend
Description:
vlsi
Popular Tags:
112
Sub-System Design
Transcript
Page 1: Chp 3- Sub-system Design

Sub-System Design

Topics Architectural issues Switch logic Gate Logic Examples of Structured Design - Combinational logic - Design of an ALU subsystem Consideration of Adders Multipliers Sequential Circuits Semiconductor Memories

Introduction Large systems are composed of sub-systems

known as Leaf-Cells The most basic leaf cell is the common logic

gate (inverter nand etc) Structured Design

High regularity Leaf cells replicated many times and interconnected

to form the system Logical and systematic approach to VLSI

design is essential

Dealing with Complexity

Divide and conquer - limit the number of components you deal with at any one time

Group several components into larger components transistors form gates gates form functional units functional units form processing

elements

A System-on-a-Chip

Courtesy Philips

Major Levels of Design

Specification Description of requirements

Systems Level placing and interconnecting major functional units

Function Level specification and design of major functional units

LogicCircuit Level Gate level design gate interconnection design

Layout Level what will actually be patterned onto the chip how the chip

will be processed Physics Level

the physics of gate and switch operation

Sub-System Design Guidelines Define the requirements Partition the overall architecture into subsystems Consider interconnection paths between the

subsystems System floor plan on silicon chip Regular structures for replication Stick diagram for each leaf-cell (module) in the

system Convert the stick diagram of each leaf-cell into

layout and go for design rule check Simulate the performance of each cell

Design Validation

Must check at every step that errors have not been introduced the longer the error remains the more

expensive it becomes to remove it

Chip Architecture ndash Floor plan After high level design is complete it is necessary to

decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of

supply and clock rails In doing this sufficient space must be left between

power rails to allow for data-buses and combinational logic cells

Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area

Alpha 21364 Microprocessor Floor plan

Major Levels of Design

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 2: Chp 3- Sub-system Design

Topics Architectural issues Switch logic Gate Logic Examples of Structured Design - Combinational logic - Design of an ALU subsystem Consideration of Adders Multipliers Sequential Circuits Semiconductor Memories

Introduction Large systems are composed of sub-systems

known as Leaf-Cells The most basic leaf cell is the common logic

gate (inverter nand etc) Structured Design

High regularity Leaf cells replicated many times and interconnected

to form the system Logical and systematic approach to VLSI

design is essential

Dealing with Complexity

Divide and conquer - limit the number of components you deal with at any one time

Group several components into larger components transistors form gates gates form functional units functional units form processing

elements

A System-on-a-Chip

Courtesy Philips

Major Levels of Design

Specification Description of requirements

Systems Level placing and interconnecting major functional units

Function Level specification and design of major functional units

LogicCircuit Level Gate level design gate interconnection design

Layout Level what will actually be patterned onto the chip how the chip

will be processed Physics Level

the physics of gate and switch operation

Sub-System Design Guidelines Define the requirements Partition the overall architecture into subsystems Consider interconnection paths between the

subsystems System floor plan on silicon chip Regular structures for replication Stick diagram for each leaf-cell (module) in the

system Convert the stick diagram of each leaf-cell into

layout and go for design rule check Simulate the performance of each cell

Design Validation

Must check at every step that errors have not been introduced the longer the error remains the more

expensive it becomes to remove it

Chip Architecture ndash Floor plan After high level design is complete it is necessary to

decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of

supply and clock rails In doing this sufficient space must be left between

power rails to allow for data-buses and combinational logic cells

Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area

Alpha 21364 Microprocessor Floor plan

Major Levels of Design

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 3: Chp 3- Sub-system Design

Introduction Large systems are composed of sub-systems

known as Leaf-Cells The most basic leaf cell is the common logic

gate (inverter nand etc) Structured Design

High regularity Leaf cells replicated many times and interconnected

to form the system Logical and systematic approach to VLSI

design is essential

Dealing with Complexity

Divide and conquer - limit the number of components you deal with at any one time

Group several components into larger components transistors form gates gates form functional units functional units form processing

elements

A System-on-a-Chip

Courtesy Philips

Major Levels of Design

Specification Description of requirements

Systems Level placing and interconnecting major functional units

Function Level specification and design of major functional units

LogicCircuit Level Gate level design gate interconnection design

Layout Level what will actually be patterned onto the chip how the chip

will be processed Physics Level

the physics of gate and switch operation

Sub-System Design Guidelines Define the requirements Partition the overall architecture into subsystems Consider interconnection paths between the

subsystems System floor plan on silicon chip Regular structures for replication Stick diagram for each leaf-cell (module) in the

system Convert the stick diagram of each leaf-cell into

layout and go for design rule check Simulate the performance of each cell

Design Validation

Must check at every step that errors have not been introduced the longer the error remains the more

expensive it becomes to remove it

Chip Architecture ndash Floor plan After high level design is complete it is necessary to

decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of

supply and clock rails In doing this sufficient space must be left between

power rails to allow for data-buses and combinational logic cells

Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area

Alpha 21364 Microprocessor Floor plan

Major Levels of Design

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 4: Chp 3- Sub-system Design

Dealing with Complexity

Divide and conquer - limit the number of components you deal with at any one time

Group several components into larger components transistors form gates gates form functional units functional units form processing

elements

A System-on-a-Chip

Courtesy Philips

Major Levels of Design

Specification Description of requirements

Systems Level placing and interconnecting major functional units

Function Level specification and design of major functional units

LogicCircuit Level Gate level design gate interconnection design

Layout Level what will actually be patterned onto the chip how the chip

will be processed Physics Level

the physics of gate and switch operation

Sub-System Design Guidelines Define the requirements Partition the overall architecture into subsystems Consider interconnection paths between the

subsystems System floor plan on silicon chip Regular structures for replication Stick diagram for each leaf-cell (module) in the

system Convert the stick diagram of each leaf-cell into

layout and go for design rule check Simulate the performance of each cell

Design Validation

Must check at every step that errors have not been introduced the longer the error remains the more

expensive it becomes to remove it

Chip Architecture ndash Floor plan After high level design is complete it is necessary to

decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of

supply and clock rails In doing this sufficient space must be left between

power rails to allow for data-buses and combinational logic cells

Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area

Alpha 21364 Microprocessor Floor plan

Major Levels of Design

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 5: Chp 3- Sub-system Design

A System-on-a-Chip

Courtesy Philips

Major Levels of Design

Specification Description of requirements

Systems Level placing and interconnecting major functional units

Function Level specification and design of major functional units

LogicCircuit Level Gate level design gate interconnection design

Layout Level what will actually be patterned onto the chip how the chip

will be processed Physics Level

the physics of gate and switch operation

Sub-System Design Guidelines Define the requirements Partition the overall architecture into subsystems Consider interconnection paths between the

subsystems System floor plan on silicon chip Regular structures for replication Stick diagram for each leaf-cell (module) in the

system Convert the stick diagram of each leaf-cell into

layout and go for design rule check Simulate the performance of each cell

Design Validation

Must check at every step that errors have not been introduced the longer the error remains the more

expensive it becomes to remove it

Chip Architecture ndash Floor plan After high level design is complete it is necessary to

decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of

supply and clock rails In doing this sufficient space must be left between

power rails to allow for data-buses and combinational logic cells

Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area

Alpha 21364 Microprocessor Floor plan

Major Levels of Design

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 6: Chp 3- Sub-system Design

Major Levels of Design

Specification Description of requirements

Systems Level placing and interconnecting major functional units

Function Level specification and design of major functional units

LogicCircuit Level Gate level design gate interconnection design

Layout Level what will actually be patterned onto the chip how the chip

will be processed Physics Level

the physics of gate and switch operation

Sub-System Design Guidelines Define the requirements Partition the overall architecture into subsystems Consider interconnection paths between the

subsystems System floor plan on silicon chip Regular structures for replication Stick diagram for each leaf-cell (module) in the

system Convert the stick diagram of each leaf-cell into

layout and go for design rule check Simulate the performance of each cell

Design Validation

Must check at every step that errors have not been introduced the longer the error remains the more

expensive it becomes to remove it

Chip Architecture ndash Floor plan After high level design is complete it is necessary to

decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of

supply and clock rails In doing this sufficient space must be left between

power rails to allow for data-buses and combinational logic cells

Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area

Alpha 21364 Microprocessor Floor plan

Major Levels of Design

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 7: Chp 3- Sub-system Design

Sub-System Design Guidelines Define the requirements Partition the overall architecture into subsystems Consider interconnection paths between the

subsystems System floor plan on silicon chip Regular structures for replication Stick diagram for each leaf-cell (module) in the

system Convert the stick diagram of each leaf-cell into

layout and go for design rule check Simulate the performance of each cell

Design Validation

Must check at every step that errors have not been introduced the longer the error remains the more

expensive it becomes to remove it

Chip Architecture ndash Floor plan After high level design is complete it is necessary to

decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of

supply and clock rails In doing this sufficient space must be left between

power rails to allow for data-buses and combinational logic cells

Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area

Alpha 21364 Microprocessor Floor plan

Major Levels of Design

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 8: Chp 3- Sub-system Design

Design Validation

Must check at every step that errors have not been introduced the longer the error remains the more

expensive it becomes to remove it

Chip Architecture ndash Floor plan After high level design is complete it is necessary to

decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of

supply and clock rails In doing this sufficient space must be left between

power rails to allow for data-buses and combinational logic cells

Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area

Alpha 21364 Microprocessor Floor plan

Major Levels of Design

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 9: Chp 3- Sub-system Design

Chip Architecture ndash Floor plan After high level design is complete it is necessary to

decide on how design is to be implemented in silicon The implementation plan is known as the floor plan First step in laying out a floor plan is the routing of

supply and clock rails In doing this sufficient space must be left between

power rails to allow for data-buses and combinational logic cells

Decide on relative positions of major functional blocks Use routing algorithm ( software ) Routing algorithm will minimize total routing area

Alpha 21364 Microprocessor Floor plan

Major Levels of Design

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 10: Chp 3- Sub-system Design

Alpha 21364 Microprocessor Floor plan

Major Levels of Design

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 11: Chp 3- Sub-system Design

Major Levels of Design

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 12: Chp 3- Sub-system Design

Switch Logic

How do we build switches from MOS transistors

1) Pass Transistors 2) Transmission gates

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 13: Chp 3- Sub-system Design

Pass Transistors

We have assumed source is grounded

What if source gt 0 eg pass transistor passing VDD

VDDVDD

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 14: Chp 3- Sub-system Design

NMOS Pass Transistors Require one transistor and one gate signal Transmit 0 well but when Vdd is applied to the

drain the voltage at the source is Vdd-Vtn When switch logic drives gate logic n-type

switches can cause electrical problems 1048708When n-type switch driving a complementary gate

cause the gate to run slower when the switch input = 1

1048708Since pull down current is weaker when a lower gate voltage is applied

1048708The complementary gatersquos pull down will not suck current off the output capacitance as fast as it should be

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 15: Chp 3- Sub-system Design

PMOS Pass Transistors When Vin= Vdd then Vout= Vdd When Vin=0 CL will be discharged

through P-transistor until Vout= Vtp P-device will stop conducting Logic 0 is somewhat degraded through p-device

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 16: Chp 3- Sub-system Design

Voltage degradation of Pass Transistors

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 17: Chp 3- Sub-system Design

Pass Transistor Ckts

VDDVDD

VSS

VDD

VDD

VDD VDD VDD

VDD

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 18: Chp 3- Sub-system Design

Pass Transistor Ckts

VDD

VDD V

s = V

DD-V

tn

VSS

Vs = |V

tp|

VDD

VDD

-Vtn V

DD-V

tn

VDD

-Vtn

VDD

VDD

VDD

VDD

VDD

VDD

-Vtn

VDD

-2Vtn

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 19: Chp 3- Sub-system Design

Complementary Pass Transistor Logic

A

B

A

B

B B B B

A

B

A

B

F=AB

F=AB

F=A+B

F=A+B

B B

A

A

A

A

F=AYacute

F=AYacute

ORNOR EXORNEXORANDNAND

F

F

Pass-Transistor

Network

Pass-TransistorNetwork

AABB

AABB

Inverse

(a)

(b)

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 20: Chp 3- Sub-system Design

Advantages and Disadvantages

Advantages Less no of Transistors No Static Power Consumption

Disadvantages Output voltage degrades Not an ideal switch due to series resistance Delay of series pass transistors is large

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 21: Chp 3- Sub-system Design

Transmission gates

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 22: Chp 3- Sub-system Design

Transmission gates (contd)

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 23: Chp 3- Sub-system Design

Logic with Transmission gates

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 24: Chp 3- Sub-system Design

Logic with Transmission gates

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 25: Chp 3- Sub-system Design

Advantages and Disadvantages

Advantages Less no of Transistors compared to CMOS No Static Power Consumption Efficient building of complex gates

Disadvantages Not an ideal switch due to series resistance Delay of series transmission gates is large

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 26: Chp 3- Sub-system Design

Gate Logic

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 27: Chp 3- Sub-system Design

Sizing of NMOS inverter

The following parameters are calculated for different sizes of nmos inverter Zpu Zpd ratio Pull-up and Pull-down resistance Power dissipation Standard unit of capacitance Gate delay

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 28: Chp 3- Sub-system Design

Sizing of NMOS Nand Gate

The ratio between pu to all pd transistors

(Zpu nZpd) must be minimum 41 for making correct

level of output voltage

nMOS Nand gate geometry reveals two factors

Area of nand gate is greater than area of inverter

because more no of pull down transistors and

corresponding increase in length of pull up

transistor

Delay is also increased due to direct proportion to

the number of inputs added

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 29: Chp 3- Sub-system Design

NMOS Nor Gate Since the pull down transistors are parallel

in Nor gate ie the pull down ratio for all transistors is same

So it has same characteristics as inverter The area occupied is reasonable since

there is no increase in length of pull-up transistor

So Nor gate is preferable than nand gate

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 30: Chp 3- Sub-system Design

CMOS Logic

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 31: Chp 3- Sub-system Design

Properties of CMOS Gates

bull High noise margins

VOH and VOL are at VDD and GND respectively

bull No static power consumption

There never exists a direct path between VDD and

VSS (GND) in steady-state mode

bull Comparable rise and fall times

(under appropriate sizing conditions)

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 32: Chp 3- Sub-system Design

CMOS Nand and Nor Characteristics

CMOS Nand Gate has no restrictions as NMOS Nand Gate but we have to keep the geometry symmetry by

Allowing extended fall times for series nmos transistors (for series resistance)

Keep the transfer characteristics for Vdd2 CMOS Nor Gate has series p-transistors which

increase the resistance and delay This effect the transfer characteristics and reduce

noise immunity So Geometries of nmos and pmos transistors should

change

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 33: Chp 3- Sub-system Design

CMOS Logic

Static CMOS Logic Pseudo NMOS Logic Dynamic CMOS Logic Domino CMOS Logic Clocked CMOS Logic n-p CMOS Logic

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 34: Chp 3- Sub-system Design

Static CMOS Switch Delay Model

A

Req

A

Rp

A

Rp

A

Rn CL

A

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

B

Rp

A

Rp

A

Rn

B

Rn CL

Cint

NAND2

INV

NOR2

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 35: Chp 3- Sub-system Design

Input Pattern Effects on Delay

Delay is dependent on the pattern of inputs

Low to high transition both inputs go low

delay is 069 Rp2 CL

one input goes low delay is 069 Rp CL

High to low transition both inputs go high

delay is 069 2Rn CL

CL

B

Rn

A

Rp

B

Rp

A

Rn Cint

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 36: Chp 3- Sub-system Design

Delay Dependence on Input Patterns

-05

0

05

1

15

2

25

3

0 100 200 300 400

A=B=10

A=1 B=10

A=1 0 B=1

time [ps]

Vo

ltage

[V]

Input DataPattern

Delay(psec)

A=B=01 67

A=1 B=01

64

A= 01 B=1

61

A=B=10 45

A=1 B=10

80

A= 10 B=1

81

NMOS = 05m025 mPMOS = 075m025 mCL = 100 fF

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 37: Chp 3- Sub-system Design

Pseudo NMOS Logic

Reducing the noof inputs from N to 1 of Static CMOS Logic is Pseudo-NMOS Logic

VDD

F

In1In2

InN

In1In2

InN

PUN

PDN

helliphellip

Static CMOS

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 38: Chp 3- Sub-system Design

Pseudo NMOS Operation The pulldown network of the gate is the same as for a

fully complementary gate The pullup network is replaced by a single p-type

transistor whose gate is connected to VSS leaving the transistor permanently on

The p-type transistor is used as a resistor When the gate input is 0V both n-type transistors are

off and the p-type transistor pulls the gatersquos output up to VDD

When the gate input is Vdd both the p-type and n-type transistor are on and both are operating to determine the gatersquos output voltage

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 39: Chp 3- Sub-system Design

Pseudo NMOS Characteristics

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 40: Chp 3- Sub-system Design

Pseudo NMOS VTC

00 05 10 15 20 2500

05

10

15

20

25

30

Vin [V]

Vou

t [V

]

WLp = 4

WLp = 2

WLp = 1

WLp = 025

WLp = 05

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 41: Chp 3- Sub-system Design

Advantages and Disadvantages

Advantages

Main advantage of the pseudo-nMOS gate is the small

size of the pullup network both in terms of number of

devices and wiring complexity

Disadvantages

Due to more pull-up resistance delay is more and

hence speed of circuits is less

More Static power dissipation due to conduction path

between VDD and VSS

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 42: Chp 3- Sub-system Design

Dynamic CMOS Logic The disadvantage of

Static Power dissipation in Pseudo NMOS Logic leads for an alternative logic which is Dynamic CMOS Logic

It avoids Static Power dissipation and adds a clock input for precharge and conditional evaluation phases

In1

In2 PDN

In3

Me

Mp

Clk

Clk

Out

CL

Two phase operation Precharge (CLK = 0) Evaluate (CLK = 1)

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 43: Chp 3- Sub-system Design

Dynamic CMOS Logic operation In static circuits at every point in time (except

when switching) the output is connected to either GND or VDD via a low resistance path

fan-in of n requires 2n (n N-type + n P-type) devices

Dynamic circuits rely on the temporary storage of signal values on the capacitance of high impedance nodes

requires on n + 2 (n+1 N-type + 1 P-type) transistors

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 44: Chp 3- Sub-system Design

Dynamic CMOS Logic operation Precharge

When CLK goes low the p-type transistor starts charging the precharge capacitance

The pulldown transistors controlled by the clock keep that precharge node from being drained

The length of CLK = 0 phase is adjusted to ensure that the storage node is charged to a solid logic 1

Evaluate When CLK goes high precharging stops ie the p-type pullup turns off The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from 0 to 1 and back to 0 it will inadvertently discharge the precharge

capacitance If the inputs create a conducting path through the pulldown network the

precharge capacitance is discharged forcing its gatersquos output to 0 If input is not 1 then the gatersquos output would be left charged at logic 1

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 45: Chp 3- Sub-system Design

Conditions on Output Once the output of a

dynamic gate is discharged it cannot be charged again until the next precharge operation

Inputs to the gate can make at most one transition during evaluation

Output can be in the high impedance state during and after evaluation (PDN off) state is stored on CL

Out

Clk

Clk

A

BC

Mp

Me

on

off

1

off

on

((AB)+C)

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 46: Chp 3- Sub-system Design

Properties of Dynamic Gates Logic function is implemented by the PDN only Full swing outputs (VOL = GND and VOH = VDD) Non-ratioed sizing of the devices does not affect the logic

levels Faster switching speeds due to reduced load capacitance Overall power dissipation usually higher than static CMOS

no glitching higher transition probabilities extra load on Clk

PDN starts to work as soon as the input signals exceed VTn so VM VIH and VIL equal to VTn

low noise margin (NML)

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 47: Chp 3- Sub-system Design

Advantages and Disadvantages Advantages

low area higher speed than static complementary gates

Disadvantages Precharged gates introduce functional complexity

because they must be operated in two distinct phases

Requires introduction of a clock signal They are also more sensitive to noise Their clocking signals also consume power and are

difficult to turn off to save power

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 48: Chp 3- Sub-system Design

Cascading Dynamic Gates

Clk

Clk

Out1

In

Mp

Me

Mp

Me

Clk

Clk

Out2

V

t

Clk

In

Out1

Out2V

VTn

Only 0 1 transitions allowed at inputs

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 49: Chp 3- Sub-system Design

Cascading Dynamic Gates- Problem

Out2 should remain at VDD since Out1 transitions to 0 during evaluation However since there is a finite propagation delay for the input to discharge Out1 to GND the second output also starts to discharge

The second dynamic inverter turns off (PDN) when Out1 reaches VTn

Correct operation is guaranteed (ignoring charge redistribution and leakage) as long as the inputs can only make a single 0 -gt 1 transition during the evaluation period

Setting all inputs of the second gate to 0 during precharge will fix it

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 50: Chp 3- Sub-system Design

Domino CMOS Logic

In1

In2 PDN

In3

Me

Mp

Clk

Clk1 11 0

Out1

Combination of Dynamic Logic and Static inverter is Domino Logic

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 51: Chp 3- Sub-system Design

Domino CMOS Logic operation Precharge Phase (Same as Dynamic Logic) Evaluate Phase (Modification to Dynamic Logic)

When CLK goes high precharging stops ie the p-type pullup turns off

The evaluation phase begins ie the n-type pulldown turns on The input signals must monotonically risemdashif an input goes from

0 to 1 and back to 0 it will inadvertently discharge the precharge capacitance

If the inputs create a conducting path through the pulldown network the precharge capacitance is discharged forcing its value to 0 and the gatersquos output (through the inverter) to 1

If input is not 1 then the storage node would be left charged at logic 1 and the gatersquos output would be 0

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 52: Chp 3- Sub-system Design

Properties of Domino Logic Only non-inverting logic can be

implemented Smaller area compared to Static CMOS Free of glitches due to transistion from logic 1 to logic 0 only Very high speed

static inverter can be skewed only L-H transition Input capacitance reduced ndash smaller logical

effort

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 53: Chp 3- Sub-system Design

Cascading Domino Logic Gates

In1

In2 PDN

In3

Me

Mp

Clk

ClkOut1

In4 PDN

In5

Me

Mp

Clk

ClkOut21 1

1 00 00 1

Ensures all inputs to the Domino gate are set to 0 at the end of the precharge period

Hence the only possible transition during evaluation is 0 -gt 1

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 54: Chp 3- Sub-system Design

Clocked CMOS Logic It is a combination of

Static CMOS Clk input given to one additional NMOS Clkrsquo input given to another additional PMOS

Fan-in for this logic is 2n+2 When Clk goes high then logic of the

circuit is evaluated due to NMOS in ON condition

When Clk goes low then Logic is not evaluated due to NMOS in OFF condition

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 55: Chp 3- Sub-system Design

Advantages and Disadvantages

Advantages Clocked CMOS logic has been used for

very low power CMOS andor for minimizing hot electron effect problems in N-FET devices

Disadvantages More Area More Complexity

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 56: Chp 3- Sub-system Design

N-P CMOS Logic

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 57: Chp 3- Sub-system Design

N-P CMOS Logic An elegant solution to the dynamic CMOS logic

ldquoerroneous evaluationrdquo problem is to use NP Domino Logic (also called NORA logic) as shown below

Alternate stages of N logic with stages of P logic N logic stages use true clock normal precharge and evaluation

phases with N logic tree in the pull down leg P logic stages use a complement clock with P logic stage tied above the output node

During precharge clk is low (-clk is high) and the P-logic output precharges to ground while N-logic outputs precharge to Vdd

During evaluate clk is high (-clk is low) and both type stages go through evaluation N-logic tree logically evaluates to ground while P-logic tree logically evaluates to Vdd

Inverter outputs can be used to feed other N-blocks from N-blocks or to feed other P-blocks from P-blocks

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 58: Chp 3- Sub-system Design

Cascading N-P Logic

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 59: Chp 3- Sub-system Design

Example Logic below

Stage 1 is X = (A middot B)rsquo Stage 2 is G = Xrsquo + Yrsquo Stage 3 is Z = (F middot G + H)rsquo

X

G

Z

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 60: Chp 3- Sub-system Design

BiCMOS Technology CMOS properties

Low Power Slower

BJT properties Faster High Power

Implementation CMOS amp BJT Core Logic CMOS Interface BJT

Basic Circuits Inverter Nand Gate Nor Gate

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 61: Chp 3- Sub-system Design

BiCMOS Circuits

InverterNand Gate

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 62: Chp 3- Sub-system Design

Structured Design

Designing the Digital block or standard cell in the following levels Logic level Circuit level Stick Diagram Layout

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 63: Chp 3- Sub-system Design

Examples Combinational Logic

5-way selector Multiplexers amp Demultiplexers Encoders amp Decoders Parity Generator Bus Arbitration Logic Gray Code to Binary Code

Converter

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 64: Chp 3- Sub-system Design

Adders

1-bit full adder Manchester Carry Chain Adder Enhancement Techniques

Carry Select Adders Carry Skip Adders Carry Look-ahead adders

32-bit Adders

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 65: Chp 3- Sub-system Design

1-bit full adder

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 66: Chp 3- Sub-system Design

CMOS Full Adder

Sum Output Carry output

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 67: Chp 3- Sub-system Design

Complementary Static CMOS Adder

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 68: Chp 3- Sub-system Design

Standard cells dimensions for Adder

Adder is made up of standard cells of

Multiplexers - L=11λ W=7λ

i) NMOS Inverter (81 and 41 ratio)

Butting contact - L=22λ W=10λ

Buried contact - L=30λ W=10λ

ii) CMOS Inverter - L=35λ W=18λ

Communication paths - 3λ spacing between metal

contacts

So for Adder Standard cell - L=190λ W=150λ

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 69: Chp 3- Sub-system Design

Layout of full adder

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 70: Chp 3- Sub-system Design

Layout2 of full adder

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 71: Chp 3- Sub-system Design

Propagate and Generate Signal

For a full adder define what happens to carries Generate Cout = 1 independent of C

G = A bull B Propagate Cout = C

P = A B Kill Cout = 0 independent of C

K = ~A bull ~B

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 72: Chp 3- Sub-system Design

Manchester Carry Chain

Build from switch logic using propagate generate amp kill

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 73: Chp 3- Sub-system Design

Manchester Chain adders The carry input is

precharged with clock signal instead of passing through the Logic

Carry path is gated by Propagate signal (P=A^B) with a single n-type pass transistor

Generate signal (G= ArsquoBrsquo)

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 74: Chp 3- Sub-system Design

4-stage Dynamic Manchester Chain

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 75: Chp 3- Sub-system Design

Carry-Ripple Adder

Simplest design cascade full adders Critical path goes from Cin to Cout Design full adder to have fast carry

delay

CinCout

B1A1B2A2B3A3B4A4

S1S2S3S4

C1C2C3

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 76: Chp 3- Sub-system Design

Carry Look-Ahead Adder (CLA) To avoid the linear growth of the carry delay we

use a Carry Look-Ahead Adder (CLA) in which the carries can be generated in parallel

Output Carry is represented in terms of generate and propagate signals

The Expressions for 4-bit CLA are

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 77: Chp 3- Sub-system Design

4-Stage Carry look-ahead adder

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 78: Chp 3- Sub-system Design

1-bit CMOS CLA

C1=G0+P0C0

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 79: Chp 3- Sub-system Design

4-bit CMOS CLA

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 80: Chp 3- Sub-system Design

Carry Select Adder

It is also referred as Conditional sum adder

The adder is divided into two blocks One block with logical 0 Cin Another block with logical 1 Cin

S and Cout are selected by actual Cin using 21 Multiplexer

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 81: Chp 3- Sub-system Design

Carry Select Adder Structure

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 82: Chp 3- Sub-system Design

8-bit Carry Select Adder

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 83: Chp 3- Sub-system Design

Delay of Carry Select Adder

The Delay of n-bit adder is given by T=PK1+(M-1)K2

where M=No of Blocks in the adder

P=No of Cells of each block K1=Delay through one adder

cell

K2 = Delay through Multiplexer

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 84: Chp 3- Sub-system Design

Carry Skip Adder (CSA)

It is also called as Carry Bypass Adder

Looks for cases in carry-out of a set of bits is identical to carry in

Typically organized into m-bit stages If Ai neBi for every bit in stage then

bypass gate sends stagersquos carry input directly to carry output

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 85: Chp 3- Sub-system Design

Simplified Carry-Skip Adder Logic

FAFA

aibi

sisi+1

ai+1bi+1

cici+1

pipi+1

skip signal

ci+1

2-Stage Carry Skip Adder

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 86: Chp 3- Sub-system Design

4-stage carry skip adder

If (P0 and P1 and P2 and P3 = 1)then C3 = C0

else C3=output carry of stage4 adder

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 87: Chp 3- Sub-system Design

Delay of CSA

Worst case delay T is given by T=2(P-1)K1+(M-2)K2 where k1=delay through one adder

cell k2=delay for skipping the

carry over a block

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 88: Chp 3- Sub-system Design

Delay for Ripple and Carry Skip Adder

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 89: Chp 3- Sub-system Design

Comparison of CLA and CSK

Using 32-bit operands a multi-level carry-skip adder

was 14 faster and its power dissipation was 58

of that of the carry-lookahead adder

Using 64-bit operands a one-level carry-skip adder

was 38 slower and its power consumption is 68

of the the carry-lookahead adder

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 90: Chp 3- Sub-system Design

Comparison of Adders

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 91: Chp 3- Sub-system Design

MULTIPLIERS

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 92: Chp 3- Sub-system Design

Multipliers Basics

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 93: Chp 3- Sub-system Design

4-bit Multiplier

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 94: Chp 3- Sub-system Design

Multiplier structure using Shift and Add

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 95: Chp 3- Sub-system Design

Multipliers Serial parallel Multiplier Braun Array 2rsquos Complement multiplication using Baugh-

Wooley method Pipelined Multiplier Array Modified Boothrsquos Algorithm Wallace Tree Multipliers Recursive decomposition of Multiplication Daddarsquos Method

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 96: Chp 3- Sub-system Design

Serial-Parallel Multiplier

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 97: Chp 3- Sub-system Design

Braun Array

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 98: Chp 3- Sub-system Design

Modified Booth Algorithm

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 99: Chp 3- Sub-system Design

Booth Multiplier Procedure

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 100: Chp 3- Sub-system Design

Structure of Boothrsquos Multiplier

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 101: Chp 3- Sub-system Design

Wallace Tree Multiplier A Wallace tree is an implementation of an adder

tree designed for minimum propagation delay Completion time is proportional to log2n Optimized column adder tree Combines all partial products into 2 vectors

(carry and sum) Carry and sum outputs combined using a

conventional adder Compresses the no of stages of partial products

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 102: Chp 3- Sub-system Design

Wallace Tree Multiplier Structure

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 103: Chp 3- Sub-system Design

Sequential Logic

D flip-flop Two Phase Clocking Dynamic Shift Register RAM ROM

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 104: Chp 3- Sub-system Design

Design of Subsystem - ALU

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 105: Chp 3- Sub-system Design

Data path of a Processor

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 106: Chp 3- Sub-system Design

Designing the complex systems - ALU

Complex systems are designed in Top-Down approach with the help of CAD tools

Partition the system sensibly Aiming for simple interconnection and high

regularity between sub-systems Generate and verify each section of the

design Calculate the dimensions of the layout of sub-

systems and check the proportion in the total chip area

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 107: Chp 3- Sub-system Design

Bit Slice design of ALU

Design of Adder and Shifter is essential for ALU

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 108: Chp 3- Sub-system Design

Barrel Shifter in 4-bit ALU

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 109: Chp 3- Sub-system Design

Barrel Shifter- Operation

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 110: Chp 3- Sub-system Design

4x4 Barrel Shifter Pass Transistor Circuit

Routing Power rails for ALU

Page 111: Chp 3- Sub-system Design

Routing Power rails for ALU


Recommended