+ All Categories
Home > Engineering > 21bx21b booth 2 multiplier

21bx21b booth 2 multiplier

Date post: 02-Jul-2015
Category:
Upload: bharat-biyani
View: 265 times
Download: 4 times
Share this document with a friend
Description:
Designed a 21b X 21b multiplier using Booth-2 algorithm by constructing schematic of decoder, partial product generation & compression and Adder (Carry Look Ahead). Performed Hspice simulation to verify the correct functionality, library characterization of assembled Netlist using Siliconsmart ACE, RTL synthesis of generated library. Timing and power consumed is analyzed through static timing analysis using Synopsys Primetime.
22
21b x 21b Multiplier Design EE7325 Page 1 Project Description 21b X 21b multiplier design with emphasis on speed. The schematic is designed in IBM 130 nm process technology Input operands are positive Design is verified using Hspice Siliconsmart ACE is used to characterize the cells Power and delay found from Primetime The design uses Booth-2, the partial products are compressed and a carry propagate adder is used. Introduction Multiplication is a heavily used arithmetic operation that is prominently used in signal processing and scientific applications. Multiplication is hardware intensive, and the main criteria of interest are higher speed, lower cost, and less VLSI area. The main concern in classic multiplication often realized by a number of cycles of shifting and adding, is to speed up the underlying multi-operand addition of partial products. This algorithm can be slow if there are many partial products because the output must wait until each sum is performed. Hence we use Booth’s algorithm which cuts the number of required partial products in half in turn reducing the hardware and delay required to sum the partial products. Booth’s Algorithm Booth algorithm examines adjacent pairs of bits of the N-bit multiplier including an implicit bit below the least significant bit, y -1 = 0. These pairs are used to generate the partial products from the multiplicand by either multiplying it by 1 (i.e. no change), multiplying it by 2 (shift left by one bit), multiplying it by -1 (2’s complement) or multiplying it by -2(2's complement and shift left by one bit). The encodings are shown in Table 1. These partial products are shifted by two bits for each partial product after the first. The product is equal to the sum of these terms. This algorithm reduces the number of partial products from n to n/2.
Transcript
Page 1: 21bx21b booth 2 multiplier

21b x 21b Multiplier Design

 

EE7325 Page 1  

Project Description

• 21b X 21b multiplier design with emphasis on speed. • The schematic is designed in IBM 130 nm process technology • Input operands are positive • Design is verified using Hspice • Siliconsmart ACE is used to characterize the cells • Power and delay found from Primetime • The design uses Booth-2, the partial products are compressed and a carry propagate adder

is used.

Introduction

Multiplication is a heavily used arithmetic operation that is prominently used in signal processing and scientific applications. Multiplication is hardware intensive, and the main criteria of interest are higher speed, lower cost, and less VLSI area. The main concern in classic multiplication often realized by a number of cycles of shifting and adding, is to speed up the underlying multi-operand addition of partial products. This algorithm can be slow if there are many partial products because the output must wait until each sum is performed. Hence we use Booth’s algorithm which cuts the number of required partial products in half in turn reducing the hardware and delay required to sum the partial products.

Booth’s Algorithm

Booth algorithm examines adjacent pairs of bits of the N-bit multiplier including an implicit bit below the least significant bit, y-1 = 0. These pairs are used to generate the partial products from the multiplicand by either multiplying it by 1 (i.e. no change), multiplying it by 2 (shift left by one bit), multiplying it by -1 (2’s complement) or multiplying it by -2(2's complement and shift left by one bit). The encodings are shown in Table 1. These partial products are shifted by two bits for each partial product after the first. The product is equal to the sum of these terms. This algorithm reduces the number of partial products from n to n/2.

Page 2: 21bx21b booth 2 multiplier

21b x 21b Multiplier Design

 

EE7325 Page 2  

Y2i+1 Y2i Y2i-1 Recoded Digit

0 0 0 0 0 0 1 1 0 1 0 1 0 1 1 2 1 0 0 -2 1 0 1 -1 1 1 0 -1 1 1 1 0 Table 1: Booth Encoding

Architecture

In this project, we have designed a high speed, low power 21 bit x 21bit multiplier using Booth 2 algorithm. The 21bit multiplier is divided into 2 bits of 11 groupings. Each of these groupings is passed into a Booth encoder, whose output bits corresponding to the operations described in Table 1 .Each group of these selection bits are sent to a Booth decoder block. These decoded bits are then used to select the bits of the multiplicand using a partial product multiplexer (PPMUX) which outputs the appropriate partial product bits based on the selected operation. These partial products are then sign extended. Then the rows of partial products are compressed. The output compressed bits are then added using a low area carry propagate adder to output the final product. A standard array multiplier would typically require 21 partial products, however, this implementation reduces the number of partial products to only 11, significantly reducing the area and also improving the speed. The block diagram of the complete design is as shown in Figure 2.

Figure 1: 21bx21b multiplier

Page 3: 21bx21b booth 2 multiplier

21b x 21b Multiplier Design

 

EE7325 Page 3  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 2: Complete Block Diagram of the Multiplier

Booth Encoder Y2i+1

Y2i

Y2i-1

PPMUX

1 2 0

-1 -2

X2i-1 X2i

Add Block

0 1

11:2 Compression

Adder

Result

Page 4: 21bx21b booth 2 multiplier

21b x 21b Multiplier Design

 

EE7325 Page 4  

Component Design

• Booth Encoder Booth encoding block is designed using Table 1. The recoded bits are generated using the corresponding input logic. The gate level schematic and the transistor level schematic for one of the five different recoded digits one are as shown below.

Y2i+1 Y2i Y2i-1 Recoded Digit

0 0 0 0 1 1 1 0

Figure 3: Gate level schematic for code 0

Page 5: 21bx21b booth 2 multiplier

21b x 21b Multiplier Design

 

EE7325 Page 5  

Figure 4: Schematic for code zero

Page 6: 21bx21b booth 2 multiplier

21b x 21b Multiplier Design

 

EE7325 Page 6  

Figure 5: Complete schematic of the Booth Encoder

Figure 6: Symbol view of the Booth encoder

Page 7: 21bx21b booth 2 multiplier

21b x 21b Multiplier Design

 

EE7325 Page 7  

• Partial Product Multiplexer(PPMUX) Pass transistor logic (PTL) is used to form the PPMUX. The encoded bits decide which bits of the multiplicand should be manipulated and then output the corresponding output bits for the partial product. The schematic of the Booth decoder is as shown in figure. Area is greatly reduced using PTL for the decoder.

Figure 7: Schematic of the Partial Product Multiplexer

Figure 8: Symbol of the Partial Product Multiplexer

Page 8: 21bx21b booth 2 multiplier

21b x 21b Multiplier Design

 

EE7325 Page 8  

• Add Block

If the recoded digit is negative, then we have to take the 2's complement of the multiplicand. 2’s complement is done by complementing all the bits and adding one to the LSB. The complement operation is done by the PPMUX. An add block is designed in order to add a one. This is also done using PTL and the recoded Booth digits and it is as shown below.

Figure 9: Schematic of the Add block

Figure 10: Symbol view of the Add block

Page 9: 21bx21b booth 2 multiplier

21b x 21b Multiplier Design

 

EE7325 Page 9  

• Compression Module In our design, we have 11 rows and 46 columns. We are using 3:2 compressors (i.e. Full adders) for compression. The idea here is to compress the partial product rows in each column into 2 rows which are then added to get the output product. The carries that are being generated in one column are dropped into the next column so that they are added to the sum of that column. This way the rippling of the carry is avoided. The compression here is mainly based on the number of inputs and carries that are being passed from the previous column. The 21st column has the maximum number of inputs which are 11. So the approximate number of full adders required is given by,

N = Xin + Cin –D

Where, Xin = Number of inputs to be compressed = 11 Cin = Carries passed from previous column= n-3 =8 D = Number of drops = 1 So N = Xin + Cin –D = 18 Number of full adders = !

! = 9 Full adders

The block diagram of the compression block is shown in Figure 11.

Page 10: 21bx21b booth 2 multiplier

21b x 21b Multiplier Design

 

EE7325 Page 10  

Figure 11: 11:2 Compression

Full Adder Full Adder Full Adder

in0   in1   in2   in3   in4   in5   in6   in7   in8  

C1 C2

C3

Full Adder Full Adder

in9   in10  

C1

Full Adder Full Adder

C2 C3

C4 C5

C4

C5

C6 C7

Full Adder C6 C8

Full Adder C7

C8

S

C

Page 11: 21bx21b booth 2 multiplier

21b x 21b Multiplier Design

 

EE7325 Page 11  

Figure 12: Partial products generated form inputs to the compression block

Page 12: 21bx21b booth 2 multiplier

21b x 21b Multiplier Design

 

EE7325 Page 12  

The full adder is designed using Mirror carry (MC) and Mirror Sum (MS) which has the least possible area for an adder where it has 12(MC) and 16(MS) transistor making a total of 28 transistor count with no diffusion breaks. The design is only about 5% slower than the NAND based sum but has much lesser area. The transistor level schematic of full adder is as shown in Figure 13.

Figure 13: Schematic of Full Adder

Page 13: 21bx21b booth 2 multiplier

21b x 21b Multiplier Design

 

EE7325 Page 13  

The half adder sum is an XOR of the two bits and the carry is an AND operation. The XOR is implemented using NOR2 + AOI21 combination and the AND is implemented as NAND2+INV. The schematic of the half adder is as shown in Figure 14.

Figure 14: Schematic of the Half Adder

Page 14: 21bx21b booth 2 multiplier

21b x 21b Multiplier Design

 

EE7325 Page 14  

• Carry Propagate Adder Design Since the main concern for the design is speed, a Carry Lookahead adder with two trees is used. The adder is faster than the ripple carry adder as it calculates one or more carry bits before the sum, which reduces the wait time to calculate the result of the larger value bits. However, it has more area as compared to ripple carry adder.

Figure 15: Schematic of the 43 bit Carry Lookahead Adder with two adders

Page 15: 21bx21b booth 2 multiplier

21b x 21b Multiplier Design

 

EE7325 Page 15  

The adder has a worst case delay of 372ps as shown in Figure 16.

Figure 16: Worst case delay of the CLA

Page 16: 21bx21b booth 2 multiplier

21b x 21b Multiplier Design

 

EE7325 Page 16  

Simulation Results

Figure 17 depicts the symbol view of the complete multiplier.

Figure 17: Symbol view of the complete multiplier

Page 17: 21bx21b booth 2 multiplier

21b x 21b Multiplier Design

 

EE7325 Page 17  

The functionality of the multiplier is tested using the following inputs among others

Case 1: X = Y = 1 1111 1111 1111 1111 1111

Expected Output: 0 1111 1111 1111 1111 1111 0000 0000 0000 0000 0000 01

Figure 18 shows the .mt0 file which depicts the obtained output

Figure 18: .mt0 file for Case 1

The obtained output is 0 1111 1111 1111 1111 1111 0000 0000 0000 0000 0000 01

LSB

MSB

Page 18: 21bx21b booth 2 multiplier

21b x 21b Multiplier Design

 

EE7325 Page 18  

Case 2: X = 0 1010 1010 1010 1010 1010

Y = 1 0101 0101 0101 0101 0101

Expected Output: 0 1110 0011 1000 1110 0010 0111 0001 1100 0111 0010

Figure 19 shows the .mt0 file which depicts the obtained output

Figure 19: .mt0 file for Case 2

The obtained output is 0 1110 0011 1000 1110 0010 0111 0001 1100 0111 0010

LSB

MSB

Page 19: 21bx21b booth 2 multiplier

21b x 21b Multiplier Design

 

EE7325 Page 19  

Prime time report

• Timing Report **************************************** Report : timing -path_type full -delay_type min_max -input_pins -max_paths 1 -transition_time -capacitance -sort_by slack Design : final_design Version: I-2013.12-SP3 Date : Sat Aug 9 20:41:11 2014 **************************************** Startpoint: I1/I1227/b (internal pin) Endpoint: m0 (output port) Path Group: (none) Path Type: min Point Cap Trans Incr Path ----------------------------------------------------------------------------- I1/I1227/b (xor) 0.00 0.00 0.00 f I1/I1227/out (xor) 15.00 8.60 7.88 7.88 f m0 (out) 8.60 0.00 7.88 f data arrival time 7.88 ----------------------------------------------------------------------------- (Path is unconstrained) Startpoint: I4/I53/I60/ximinusone (internal pin) Endpoint: m41 (output port) Path Group: (none) Path Type: max

Page 20: 21bx21b booth 2 multiplier

21b x 21b Multiplier Design

 

EE7325 Page 20  

Point Cap Trans Incr Path ----------------------------------------------------------------------------- I4/I53/I60/ximinusone (ppmux) 0.00 0.00 0.00 f I4/I53/I60/out (ppmux) 0.12 2.22 2.02 2.02 f I4/I1409/cin (fulladder) 2.22 0.00 2.02 f I4/I1409/cout (fulladder) 0.04 0.12 0.61 2.63 f I4/I1420/b (fulladder) 0.12 0.00 2.63 f I4/I1420/sum (fulladder) 0.03 0.05 0.24 2.87 f I4/I1422/cin (fulladder) 0.05 0.00 2.87 f I4/I1422/sum (fulladder) 0.03 0.05 0.24 3.11 f I4/I1423/cin (fulladder) 0.05 0.00 3.11 f I4/I1423/sum (fulladder) 0.04 0.05 0.24 3.35 f I4/I1424/b (fulladder) 0.05 0.00 3.35 f I4/I1424/sum (fulladder) 0.03 0.05 0.22 3.57 f I1/I888/a (xor) 0.05 0.00 3.57 f I1/I888/out (xor) 0.06 0.10 0.13 3.71 f I1/I1102/b (and) 0.10 0.00 3.71 f I1/I1102/out (and) 0.01 0.02 0.09 3.79 f I1/I1132/a (or) 0.02 0.00 3.79 f I1/I1132/out (or) 0.02 0.03 0.10 3.90 f I1/I1034/a (and) 0.03 0.00 3.90 f I1/I1034/out (and) 0.01 0.02 0.06 3.96 f I1/I1064/a (or) 0.02 0.00 3.96 f I1/I1064/out (or) 0.01 0.03 0.09 4.05 f I1/I1071/b (or) 0.03 0.00 4.05 f I1/I1071/out (or) 0.01 0.03 0.08 4.13 f I1/I1074/b (or) 0.03 0.00 4.13 f I1/I1074/out (or) 0.02 0.03 0.09 4.22 f I1/I1190/b (or) 0.03 0.00 4.22 f I1/I1190/out (or) 0.06 0.05 0.11 4.34 f I1/I1179/a (and) 0.05 0.00 4.34 f I1/I1179/out (and) 0.01 0.02 0.06 4.40 f I1/I1191/a (or) 0.02 0.00 4.40 f I1/I1191/out (or) 0.05 0.05 0.12 4.52 f I1/I1180/a (and) 0.05 0.00 4.52 f I1/I1180/out (and) 0.01 0.02 0.06 4.58 f I1/I1192/a (or) 0.02 0.00 4.58 f I1/I1192/out (or) 0.03 0.04 0.11 4.69 f I1/I1174/a (and) 0.04 0.00 4.69 f I1/I1174/out (and) 0.01 0.02 0.06 4.75 f

Page 21: 21bx21b booth 2 multiplier

21b x 21b Multiplier Design

 

EE7325 Page 21  

I1/I1196/a (or) 0.02 0.00 4.75 f I1/I1196/out (or) 0.02 0.03 0.10 4.85 f I1/I1267/a (xor) 0.03 0.00 4.85 f I1/I1267/out (xor) 15.00 29.06 20.97 25.82 r m41 (out) 29.06 0.00 25.82 r data arrival time 25.82 ----------------------------------------------------------------------------- (Path is unconstrained)

• Power Report **************************************** Report : Averaged Power Design : final_design Version: I-2013.12-SP3 Date : Sat Aug 9 20:41:12 2014 **************************************** Attributes ---------- i - Including register clock pin internal power u - User defined power group Internal Switching Leakage Total Power Group Power Power Power Power ( %) Attrs -------------------------------------------------------------------------------- clock_network 0.0000 0.0000 0.0000 0.0000 ( 0.00%) i register 0.0000 0.0000 0.0000 0.0000 ( 0.00%) combinational 0.0122 0.2191 9.688e-07 0.2312 (99.53%) sequential 0.0000 0.0000 0.0000 0.0000 ( 0.00%) memory 0.0000 0.0000 0.0000 0.0000 ( 0.00%) io_pad 0.0000 0.0000 0.0000 0.0000 ( 0.00%) black_box 5.952e-04 5.074e-04 2.286e-07 1.103e-03 ( 0.47%) Net Switching Power = 0.2196 (94.51%) Cell Internal Power = 0.0127 ( 5.49%) Cell Leakage Power = 1.197e-06 ( 0.00%) --------- Total Power = 0.2323 (100.00%)

Page 22: 21bx21b booth 2 multiplier

21b x 21b Multiplier Design

 

EE7325 Page 22  

Conclusion

A 21 bit * 21 bit unsigned multiplier is successfully designed and simulated. The output results are as shown.

Worst Case Delay 25.82 Total Power 0.2323mW


Recommended