Two's complement division without using the set of full precision comparisons

Two‘s corn full precisi

A.€. Bashagha M.K. I bra him

Indexing te7m: Two’s complen

Abstract: It is well 1 complement radix-2 set of full word1 multiples of the remainder. For fast should be implemei huge area is requii radix division. The complement radix-2 first time, the set o replaced with a set ( only two full wordle select the k-bit quo selected by the (k result, the required the speed of the tv same. Moreover, thl can be made faster fast adders becaur additions (rather thi additions) per radix-

1 Introduction

The design of efficient problem in compute approaches have been can roughly be classifi tion techniques [ 1, 21 [3 , 41. The approximai Raphson algorithm [ 11 multiplication of the d divisor. Since these mt quadratically, they ea expense of great circui complexity of the hard the approximation tech good approximation s available, and the resu as required by IEEE 7!

The digit recurrence also known as direct m 0 IEE, 1998

Paper first received 1 lth Decen 1997 The authors are with the Dep neering, De Montfort Universi

IEE Proceedings online no. 195

IEE Proc.-Comput. Digit. Tech.,

blement division without using the set of n comparisons

‘ division, Fast udders, Radix division

iwn that the existing two’s livision methods require a gth comparisons of the iso or against the shifted ivision, these comparisons d in parallel. Therefore, a

to implement such high per presents a novel two’s livision algorithm. For the U11 precision additions are ’k + 2)-bit additions. Then, th additions are required to nt digit out of two values 2)-bit comparisons. As a

:a is reduced by 77% with algorithms are nearly the

peed of the new algorithm using one of the known

only two full precision the set of the full precision quotient digit is required.

vision algorithms is a central arithmetic. Many division

)posed in the literature. They into two classes: approxima- d digit recurrence algorithms n methods such as Newton- mpute the quotient through dend by the reciprocal of the ods converge to the quotient achieve higher speed at the omplexity. In addition to the re, the other disadvantages of lues [5] are that they require a t, the final remainder is not s never correct to the last bit floating point standard [6]. ethods [I , 3 , 4, 71 which are iods compute one digit of the

771

I 1996 and in revised form 24th October

nent of Electronic and Electrical Engi- Leicester LE1 9BH, UK

145, No. I , January 1998

quotient every iteration. In the direct algorithms, the quotient digit is selected through the comparison of the multiples of the divisor with the shifted remainder. Then the selected multiple of the divisor is subtracted from the shifted remainder to generate the next remainder. Since these methods converge to the quotient line- arly, they are slower than the approximation methods. Despite this disadvantage, there arc many advantages for using the direct methods and therefore, most of the commercial implementations are based on this approach [5]. For instance, the implementation of these algorithms require simple shift-add-subtract cells and, hence, the resultant hardware is simpler than that of the approximation techniques. The other advantages are that the final remainder is available at the end of the division operation and the result is correct up to the last bit which satisfies the IEEE standards [6].

The data used in the digit recurrence algorithms can be represented in different forms. The most commonly used forms are the nonredundant two’s complement representation [7] and the redundant signed digit one [8]. The signed digit division methods which are known as SRT algorithms [3, 9-11] use data of redundant form and therefore, they require an inspection of few most significant digits to select the redundant quotient digit. As a result, these methods offer higher speed than the simple two’s complement ones [l, 7, 121 which require full wordlength comparisons. However, the hardware required to process redundant data is more complex than that of the corresponding nonredundant one. For example, the addition of two radix-2 digits in the redundant form requires more than two full adders [13]. The number of possible values of redundant quotient digit is more than that of the nonredundant form (more than twice when an overredundant system [14] is used). As a result, more multiples of the divisor need to be generated and compared with the shifted remainder. The other disadvantage is the need to convert the input and output data from and to two’s complement form.

To speed up the two’s complement division methods and using the simple binary elements, Mandelbaum [ 151 has suggested a method which uses multiple arrays acting in parallel. In Mandelbaum’s r a d i ~ - 2 ~ structure, all the 2k - 1 multiples of the divisor are subtracted in parallel from the current remainder to generate all the 2k - 1 possible values of the next remainder. The correct remainder is the minimum positive one and the radix-2k quotient digit (i.e. k quotient bits) is the one used to compute this possible value. This restoring algorithm requires a set of 2k - 1 full wordlength comparisons per quotient digit and, therefore, a huge area is required to implement this algorithm. This area

19

can be reduced by using the r a d i ~ - 2 ~ nonrestoring algorithm of [16]. This algorithm requires a set of only 2k-1 (rather than the 2k-1 [15]) full precision comparisons to select a r a d i ~ - 2 ~ quotient digit. Despite the significant reduction in area, the cost of implementing this algorithm is also relatively high especially for high values of k. This is because a set of full precision comparisons is still required. It is worth noting that the redundancy in data representation is used in the SRT algorithms [3, 9-11] in order to avoid this set of full precision comparisons and speed up the division operation [ 171.

This paper presents novel restoring high radix two's complement division structure. For the first time, the set of the full wordlength additions [15] per each radix- 2k quotient digit is replaced with a set of (k + 2)-bit additions. This set of k + 2 most significant bits (MSB's) additions is used to select two adjacent possible values of the remainder which have different carry out bits. As it will be shown later, one of these two values is positive and the sign of the second value is unknown. Therefore, only one full wordlength addition is required to check the sign of the second value. If the second value is positive then it will be the correct remainder, otherwise, the first value is the correct remainder. Since the set of the full wordlength additions [15] is replaced with a set of (k + 2)-bit additions, the required area is reduced significantly without reducing the speed. Since only one full wordlength comparison (rather than the set of the 2k - 1 [15] comparisons) is required, the algorithm can be made faster by using one of the known fast adders. It is worth noting that the required area can be reduced further by extending the idea of the high radix nonrestoring division of [ 161 to the proposed algorithm.

2 R a d i ~ - 2 ~ division

Let A, D, and Q represent the dividend, divisor, and quotient respectively. The divisor, D , is assumed to be greater than or equal to 0.5 (i.e. 0.5 i D < 1) and 0 I A < D. Hereafter, { Q } D and { P R } will be used to represent the set of all the multiples of the divisor and the set of all the possible values of the remainder respectively, where:

{ Q > D = D , 2 D , 3 D , . . . , (2'" - l )D (1)

{ P R i } = 2'PPRi-1 - { Q } D ( 2 ) It should be noted that PRiP1 and PRi are current and next values of the partial remainder, respectively, where PRO = A. 2kPRi-, is the shifted value of the current remainder. The high radix algorithm of [15] and the proposed algorithm will be hereafter referred to as the original and new algorithms, respectively. The carry out bit of the k + 2 most significant bits (MSB's) addition, the N - 1 (where N is the operand wordlength) least significant bits (LSB's) addition, and the whole wordlength (i.e. the N + k + 1 bits) addition will be hereafter referred to as ck+2, cN-,, and cN+k+l, respectively. It is worth noting that the carry coming from the N - 1 least significant bits (LSB's) of the remainder is ignored in the k + 2 MSB's addition. The final carry cN+lc+l can be obtained by adding CN-1 to the result of the (k + 2)-bit addition. It is well known that the subtraction (as that of eqn. 2) is a two's complement addition and therefore, it will be hereafter referred to as an addition.

First, let us substitute eqn. 1 into eqn. 2, to obtain:

{PR,} = (2'PR,-1 - D,2'PPR,-1 - 2 0 , . . . , 2'PR,-1 - zD,2'PR,-1 - ( X + l ) D , . . . )

2'"PPR,-1 - (2'" - 1)D) ( 3 ) These possible values are shown graphically on the Robertson diagram of Fig. 1 for k = 3. In this Figure, the current value of the shifted remainder 2kPR,-1, is shown by a dot on the x-axis between 3 0 and 4 0 . The possible values of the next remainder, PR, (i.e. 2kPR,1-{Q}D) are shown by the little circles on the y- axis.

Fig. 1 * current value of remainder 0 possible values of next remainder

Robertson diugrum of high radix division

Let the set of possible values of eqn. 3 be divided into two subsets, {PR,,} and {PR,}, where:

{ P R i l ) = (2'PRi-1 - D,2'PPRi-l - 2 0 , . . ., 2'"PRi-1 - x D } (4)

(5)

(PRi2) 5 (2'"PPRi-1 -(z+l)D, 2 ' " P R i - l - ( ~ + 2 ) D , . . . ,2'PPRi-l - (2' - 1 ) D }

The carry out bits of the possible values of {PRi l} are 1s and the carry out bits of the possible values of {PR,} are Os. The two adjacent possible values which have different carry out bits are PR, , with carry out being equal to 1 and PRi,,+, with carry out being equal to 0, where:

PRi,, = 2'PRi-1 - X D (6)

(7) PRi,,++l = 2'PRi-1 - ( X + l ) D It should be noted that the subtractions of eqns. 2-7 can be ( N + k + 1)-bit (i.e. full wordlength of the shifted values) additions as in the original algorithm [15] or the (k + 2)-bit additions only as in the proposed one. Next, the basic idea of the original algorithm is described and afterward, the new algorithm wlll be presented.

2. I Original algorithm In the original algorithm [15], (N + k + 1)-bit additions are performed and therefore it is clear that the possible values of (PRi I } (i.e. the values with cN+k+l = 1) as given in eqn. 4 are positive and the possible values of (PR,) (i.e. the values with c ~ + ~ + , = 0) as given in eqn. 5 are negative. The correct remainder, P R , is the minlmum positive possible value (i.e. PR, = PR, , and Qi = x). If all the possible values are negative, the previous remainder, PRi-l, is restored and Qi = 0. In general, the ith iteration of the original algorithm [ 151 requires: (i) 2k - 1 of (N + k + 1)-bit subtractions of all the multiples of the divisor { Q}D from the shifted remainder

IEE Proc.-Comput. Digit. Tech., Vol. 145, No. I , January 1998 20

carry out bits, ck+2,

2.3.1 All values

by adding CN-1 to the Therefore, it is clear tl

1 are positive: As men- CN+k+l, can be obtained the (k + 2)-bit addition. e values of the first sub-

Fig. 1. As a result, PR,, is positive and the minimum value of { PRil j .

2.3.2 PRi,,+., can be positive or negative: Although ck+2 of PRi,,+l equals 0, the final carry, CN+k+l, (which depends on c ~ - ~ ) can be 0 as shown in entry 1 and entry 2 of Table 1 where PRi,,+l is negative, or it can be 1 as shown in entry 3 of Table 1 where PR,,,, is positive. Selection of PR,, or PRi,,+l as the correct remainder PR, for the new algorithm is shown in the third column of this Table. The first case (i.e. PRi,,+i is negative) shown in entry 1 and entry 2 of the Table will be first considered. In entry 1, PR,,,, is negative since ck+2 is 0 and there is at least one 0 bit in the result of the (k + 2)-bit addition such that any carry, coming from the N - 1. LSBs is stopped by this 0 bit. The final carry, CN+k+l, which results from adding cN-1 to the k + 2 MSB’s of PRi,x+l, is always 0. In entry 2, all the k + 2 MSBs of PRi,,+l are 1s and ck+2 is equal to 0. The carry, c ~ _ ~ , coming from the N - 1 LSBs is also equal to 0 and therefore, the final carry, CN+k+l, will be 0 (i.e. PR,,,, is negative).

Now the other case shown in entry 3 of Table 1 is considered. PRi,,+l (with ck+2 = 0) is positive only if the result of the k + 2 MSB’s addition is 11 ... 1 and the carry, cN-,, coming from the N - 1 LSB’s addition equals 1. In this case, it is clear that, if cN-] is added to the k + 2 MSBs, the final carry, CN+k+l, becomes 1 and the k + 2 MSB’s will be 00 ... 0. It should be noted that the k + 2 MSBs of PI?,,,,, are the k + 1 bits to the left of the radix point and the bit to its right. Therefore, in this case (i.e. entry 3), the value of PRi,,+, is less than 0.5 (i.e. PRi,,+, < 0) . It is clear from the discussions of entry 2 and entry 3 of Table 1 that the (k + 2)-bit addition is the optimal choice for D 2 0.5.

2.3.3 PR;,,+g, PR,,,. ..., PR&, are negative: It is clear that, if PR,,,, is negative (entry 1 and 2), the other values of {PR,,} (i.e. {2kPRi_l - (x + 2)D, 2kPRi-1 - (x + 3)D, ..., 2kPRi_I - (2k - 2)D}, are also negative. If PR,,,, is positive (entry 3), PRi,,+l will be less than D, as explained above. Therefore, the next possible value PRi,+2 which is less than PR,,,, by D should be negative. The other possible values {2kPRi-1

are also negative.

2.3.4 PR, is either PR,, or PR;,,,: It is well known that the correct remainder PR, 6f restoring division must be positive and less than D [l]. As explained above, PR, , is the minimum positive possible value of the first subset {PI?,,}. Therefore, if PR,,,, is negative (entry 1 and 2), PR, , will be the minimum positive possible value of all the possible values {PR,) (i.e. the correct remainder PR, is equal to PR,,). On the other hand, if PRi,,+l is positive (entry 3), its value is less than D, and the other possible values of {PRi2) are negative as explained above. Since PRi,,+I is less than PR, , by D, PRi,,+, becomes the minimum positive possible value of {PR,) (i.e. PR, is qual to PRi,,+,). It is clear from the above discussion that the correct remainder PRi is either equal to PR, , (if PR,,,, is negative) or PR,,,, (if PR,,,+, is positive) (i.e. PR, is one of the two adjacent possible values which have different carry out bits cki2) resulting from the (k + 2)-bit additions. Therefore, it can be concluded that the (k + 2)- bit addition is enough to reduce the number of possible values from 2k - 1 to 2 values only.

- ( X + 3)D, 2kPRi-1 - ( X + 4)D, ..., 2kPRj_l - (2k - 1)D

21

2.3.5 Selection of PRi and 0;: Now, it is known that PR, is either PR,. os PR,,,+I, but is it possible to select one of these two values as a correct remainder without a full wordlength comparison? The answer will be No! This is the inherent problem of the two's complement addition where the sign cannot be detected unless the carry propagates from the LSB to the MSB. As explained above, the selection of PR, as PR, , or PR,,,+l can be decided by checking one of two condi- tions as in eqn. 8. The first is to check the sign of PK,,,+l (i.e. PR, = PR, , and Q, = x if PR,,,+, 0, otherwise PR, = PRz,,+l and Q, = x + 1) and the second is to compare the positive value PR,, with the value of the divisoi- D (i.e. PR, = PR,, and Q, = x if PR, , < D, otherwise PR, = PR,,,+l and Q, = x + 1). Both condi- tions require an inspection of all the bits of either PR,, or PR,x+i. It should be noted that only the (k + 2) MSBs of PR, , and PRL,,+l are available and therefore, an (N + 2)-bit comparison is required either to compare PR,. with D or to detect the sign of PR,,+i as shown in eqn. 8. Whether PR,,+, is negative or not, it is clear that the ranges of PR,, and PR,,,+l are 0 I ( P R , , < 2 0 and -D 5 PR,x+l < D, respectively, and this is the reason for using (N + 2)-bit comparison rather than the (N + k + 1)-bit one.

3

3.7 Example of the original algorithm 1701 Example 1

The original algorithm is clarified by Example 1 for N = 9, k = 3, and N + k + 1 = 13. At the first iteration, all the 13-bit seven multiples of the divisor { Q} D (i.e. {D, 2 0 , ..., 7 0 } , are subtracted in parallel from the 13- bit shifted dividend 23A. The two values which have different carry outs, cN+k+1, are 23A - 4 0 (with c ~ + ~ + ~ = 1) and Z3A - 5D (with ~ ~ + ~ + 1 = 0). Since CN+k+l is the carry of the full wordlength addition, 23A - 4 0 is positive and 23A - 5D is negative. As explained earlier, PR,,, is the minimum positive possible value (i.e. PR1 = 23A - 4 0 ) and therefore, the first quotient digit Q, is equal to 4 (i.e. Ql = 100). At the second iteration, PRI is shifted three bits to the left. Then, the seven multiples of the divisor are subtracted in parallel from 23PRi. As in the first iteration, the second remainder, PR2, can be selected as the minimum positive value (i.e. PR2 = 23PR1 - 5D) and therefore, Q2 equals 101. Similarly, PR3 equals 23PR2 - 4 0 and Q3 = 100. Therefore, the 3-digit quotient, Q, becomes 0. 100 101 100.

Examples of the high radix division

A = 0.011 010 101 0 = 0.101 101 011 N + 1 = 10; k = 3; N + k + 1 = 13 { Q } D = { D ; 2 0 ; 3 0 ; 40; 5 0 ; 60; 70) D = 0 000.101 101 011

2 0 = 0 001.011 010 110 3 0 = 0 010.001 000 001

40 = 0 010. 110 101 100 5 0 = 0 011.100 010 111 6D = 0 100. 010 000 010 70 = 0 100.111 101 101

2 3 ~ = o 011.010 101 000

Iteration 1: {PR,} = 23A - {Q>D

2 3 ~ ~ O ~ ~ O ~ O ~ O ~ O O O 0011010101000 -0 1111010010101 - 2 0 1110100101010 1 0010100111101 1 q00111;01001q

1 \000100~10011~ 1 p00001~11110~

0 ~ 1 1 1 1 1 ~ 0 1 0 0 0 ~ 0 ~ 1 1 1 0 0 ~ 1 0 0 1 1 ~

P A 00iio~o101000

- 2 3 ~ - - ~ 2 3 ~ - 2 ~

Z3A 0011010101000 0011010101000 -30 1101110111111 - 4 0 1101001010100

23A-3D 23A-4D

2 3 ~ ooi1oioi0ioo0 0011010101000 - 5 0 1100011101001 - 6 0 1011101111110

Z3A-5D 23A-6D

-70 1011000010011 0 1110010111011

Y

23A-7D

since the correct remainder PR1, equals the minimum positive value, PRl = 23A - 40. &I = 100 and PR1 = 0.011 111 100 2 3 ~ ~ ~ = o 011.111 100 ooo

23 PR1 00111 11100000 0011111100000

Iteration 2: {PR2} = Z3PRI - {Q>D

-0 1111010010101 - 2 0 1110100101010 1 ~ol1oo;llolo~ 1 ~01010~001010/

1 0001110011111 1 poo1oo~lloloo/

23PR1-D P P R 1 - 2 0 23 P R ~ 001 111 iiooooo 0011111100000 - 3 0 1101110111111 -40 1101001010100 -

23PR1-3D 23 P RI -4 D

2 3 ~ ~ 1 0011ii IIOOOOO 0011111100000

I g o o o o ~ ~ o o i o o ~ , o < 1 i i i o e o i i 119 -50 1100011101001 -6D 1011101111110

23 PR1-5D ~ ~ P R ~ - G D

2 3 P ~ 1 00111 1 i1000oo -70 1011000010011

0 ~ l l O l l ~ l l 0 0 1 j Z3 PRi - 7 0

P R 2 = 23PR1 - 5D Q2 = 101 PR2 ~ 0 . 0 1 1 0 0 1 0 0 1 2 3 ~ ~ ~ = o 011.001 001 ooo

23PR2 ooiiooiooiooo 0011001001000

Iteration 3: {?R3} = 23PR2 - { Q ) D

-D 1111010010101 -20 1110100101010 1 polool~ol l1o; 1 ~ o o l l o ~ l 1 o o l g

1 poo1ooyoooo11~ 1 \000001~011100/

0 1111100110001 0 ~11011~000110/

2 3 P R z - D 23PR2-2D

23PR2 ooiiooiooiooo 0011001001000 -30 1101110111111 -40 1101001010100

2 3 P R z - 3 D 23 PRz - 4 D

23 P R ~ 001 iooiooiooo 0011001001000 - 5 0 1100011101001 -GD 1011101111110 -

23 PRz-5D Z3PRz -6D

23Y~2 001 100 iooiooo JEE Proc -Comput Digit Tech Vol 14i, No 1 .January 1998

^^ LL

- 7 0 101100001 0011 0 111000101 I-- 1011

23 PR2-”D

(PR3)min = 23PR; PR3 = 0.010 011 Q = 0. 100 101 100

3.2 Example of this Example 2 The new algorithm is the same values of A tioned earlier, the new bit additions rather t h a In this example, the of each iteration and in a Table. To compxe original one and to ck+2, c ~ - ~ , and CN+k+l, the ( N - 1)-bit additions are also placed at example. At the first k = 3 ) MSBs of all the (i.e. { D , 2 0 , ..., 7 0 ) ) the five MSBs of 23A. it is clear that the PRl,x+l, which have 23A - 4 0 and 23A - eqn. 8, the first remaiider ~ 5D is negative, otherwise ( N + 2)-bit addition value is negative and and the first quotient

Table 1: Selection of PS, rithm

- 4 0 Q 3 = 100 130

new algorithm

Aarified by Example 2 of using and D of Example 1. As men- algorithm requires only (k + 2)-

the ( N + k + 1)-bit additions. results of the (k + 2)-bit additions tl.eir carry out bits ck+2 are placed

the new algorithm with the clarify the relationship between is given in Table 1, the results of

and ( N + k + 1)-bit additions each of the three Tables of this itzration, the five (i.e. k + 2 where

multiples of the divisor { Q} D are subtracted in parallel from

From these (k + 2)-bit additions, ];WO adjacent values PR,,, and

different carry out bits, ck+2, are 5D, respectively. According to

PR, equals 23A - 4 0 if 23A PR, = T3A - 5D. From the

of z3A - 5D, it is clear that this therefore, PR1 equals 23A - 4 0

digit Q, equals 100.

for the proposed r a d i ~ - 2 ~ algo-

Result of the (k+2)- bit addition of c

1 At least one 0 bit in 0

2 1 1 ... 1 0

3 1 1 ... 1 0

PR,,,,,

the result

k+2 cWl c ~ + ~ + , Selection of PR,

1 or 0 0 PR,,,, is negative

0 0 PR,,,, is negative

1 1 PR,,x+l is positive

PR,= PR,,,

PR,=PRLX

PR,= PR, x+l

23PR1 - 5D and the second quotient digit Q2 equals 101. In a similar procedure, the third iteration can be performed where PR3 equals 23PR2 - 4 0 and Q3 equals 100. The 3-digit quotient Q equals 0.100 101 100 which is the same as that of Example 1.

A = 0. 011 010 101 D = 0.101 101 011 N + 1 = 1 0 ; k = 3 ; N + k + l = 1 3 { Q } 0 = (0; 2 0 ; 3 0 ; 4 0 ; 5 0 ; 6 0 ; 7 0 ) k + 2 = 5 ; N - 1 ~ 8 ; N + 2 = 1 1

refer to Table 2 From the (k + 2)-bit addition, PR1 equals 23 A - 4 0

if 23 A - 5D < 0. Otherwise PR1 = 23 A - 5D. -+ To detect the sign of z3A - 5D, it is required to generate its N + 2 LSBs. From the ( N + 2)-bit addition: 23A - 4 0 = 00. 011 111 100 and 23A ~ 5D = 11. 110 010 001 -+ Since 23A - 5D is negative, PR1 = 23A - 4 0 and Q, = 100 PR, = 0. 011 111 100; 23PRl = 0 011. 111 100 000 refer to Table 3 -+ From the (k + 2)-bit addition, PR2 is either 23PR, - 4 0 or 23PR1 - 5D. + From the ( N + 2)-bit addition: z3PR1 - 4 0 = 01. 000 110 100 and 23PR1 - 5D = 00. 011 001 001 -+ Since 23PR1 - 5D is positive, PR2 = 23PR1 - 5D and Q2 = 101 PR2 = 0. 011 001 001; z3PR2 = 0 011. 001 001 000 refer to Table 4 + From the (k + 2)-bit addition, PR3 is either 23PR2 - 4 0 or 23PR2 - 5D. + From the ( N + 2)-bit addition: Z3PR2 - 4 0 = 00. 010 011 100 and 23PR2 - 5D = 11. 100 110 001 + Since 23PR2 - 5D is negative, PR3 = 23PR2 - 4 0 and Q3 = 100

(2 = 0. 100 101 100

4

From the above description of the new algorithm, it is clear that its implementation requires two stages: first, the selection of the two adjacent possible values which have different carry out bits ck+2 resulting from the (k + 2)-bit additions. Secondly, a full wordlength

PR3 = 0. 010 011 100

Implementation of the new algorithm At the second

the left to form 23PR1. multiples of the divisor corresponding bits 01’ additions, PR2,x witl. PR2,x+l with ck+2 = 0 need the ( N + 2)-bit shows that this value

c ~ + ~ (k+2)-bit cWl (N-lkbit c ~ + ~ + , (N+k+l)-bit

2 3 ~ - ~ I O O I O O I 00111 101 1 0010100111101

iteration, PR1 is shifted three bits to Then, the five MSBs of all the { Q } D are subtracted from the

23PR1. From these (k + 2)-bit ck+2 = 1 is 23PR1 - 4 0 and

is 23PR, - 5D. To select PR2, we itddition of PR2,x+l and the result is positive. Therefore, PR2 equals

Table 2: iteration 1: {PR,} Z3A - {Q)D

2 3 ~ - 2 0 I 0001 I o 11 OIOOIO I 0001111010010 23A-30 1 00001 1 01 100111 1 0001001100111 23A-4D 1 00000 0 1 1 1 1 1 100 1 0000011 111 100 2jA-5D 0 11110 1 10010001 0 1111110010001 23A-6D 0 11101 1 00100110 0 1111000100110 23A-70 0 11100 0 10111011 0 1110010111011

IEE Proc.-Comput. Digit. Ted!., Vol. 145, No. 1, January 1998 23

remainder shifted remainder, Z3PR,.,

I I divisor

D

f

multiplexer and encoder

1

if PR,,,+,<o (i.e. C ~ + ~ = O )

PR,=PR,,,+, and Qi=x+l if PR,,,,, 20 (i.e. cNr2=o)

Fig.2 (k + 2)-bit subtractor cell

Basic cell of new algorithm W (N + 2)-bit subtractor cell 0 cross-hatch shift one bit to the left

Table 3: Iteration 2: {PRJ = Z3PR, - IQID

ck+z (k+2)-bit cCl (N-1)-bit

~ ~ P R , - D I O O I O I I 01 I I O I O I 2 3 ~ ~ , - 2 ~ I O O I O O I ooooi 010

2 3 ~ ~ ~ 4 ~ I o o o o i I 00 I I O I O O 2 3 ~ 4 s ~ o I 111 I I 11 001 001

2 3 ~ 4 - 6 ~ o I 1110 I 01011 110

Z3PRq-3D 1 0 0 0 1 0 1 10011 111

23PR,-7D 0 1 1 1 0 1 0 11 110011

c N + ~ + ~ (N+k+ 1 )-bit

1 0011001 110101

1 0010100001010 1 0001110011111 1 0001000110100 1 0000011001001

0 1111101011110

0 1110111110011

Table 4: Iteration 3: {PR,] = Z3PR2 - {Q}D

c ~ + ~ (k+2)-bit cC1 (N-l)-bit c ~ + ~ + ~ (N+k+l)-bit

2 3 ~ ~ ~ 4 I 0 0 1 0 1 o 11 011 IO? I 0010011011101 2 3 ~ ~ , - 2 ~ I 0001 I o 01 I I O O I O I 0001101110010 Z3PR3-3D 1 0000 1 1 00000 111 1 0001000000111 2 3 ~ ~ ~ 4 ~ I o o o o o o 10011 i o 0 I 0000010011100 2 3 ~ ~ ~ 4 ~ o I 111 o I 00 I I O O O I o 1111100110001

23PRz6D 0 1 1 1 0 1 0 11000110 0 1110111000110 23PR2-7D 0 1 1 1 0 0 0 01011 011 0 1 110 001 011 011

comparator to select one of these two values as a correct remainder PR, with its quotient digit Q, by using criteria (eqn. 8). The basic cell used to implement the ith iteration of the new r a d i ~ - 2 ~ algorithm for k = 3 is shown in Fig. 2. In this cell, it is assumed that the multiples of the divisor { Q } D are generated by using an addlshift operation which is cheaper than implementing the multiplication one [ 161.

The first stage of the basic cell consists of seven (i.e. 2k - 1) 5-bit (i.e. (k + 2)-bit) subtracter cells and an 8- to-2 (N + 2)-bit multiplexer as well as an encoder circuit. First, the remainder, PR,l is shifted three bits to

the left to form 23PR,1. The divisor 0 is shifted by

8 0 , respectively. These shifted values of D are used to generate the multiples { Q } D (i.e. {D, 20, ..., 70)). For instance, D can be added to 2 0 to form 3 0 and 2 0 should be added to 4 0 to form 6D and so on. Some multiples require a subtraction, for instance, D should be subtracted from 8D to form 7 0 (i.e. 23PRi-l - 70 can be rewritten as 23PRi_l - 8 0 + 0). Therefore, the generation of such possible value requires an addition as well as a subtraction and this can be carried out using a controlled addlsubtract cell [16].

one, two, and three bits to the left to form 2 0 , 40, and

24 IEE Proc.-Comput. Digit. Tech., Vol. 145, No. 1, January 1998

The (k + 2)-bit (k + 2)-bit controlled lowed by a (k + 2)-bil ahead adder. It is wor:h ested in the carry out (k + 2)-bit subtraction 23PRi-1 and the N + divisor { Q}D (i.e. the 8-to-2 ( N + 2)-bit is controlled by the set select the two possible have different carry values of the remaincer PR,, and PRi,,+l are the cell but are left to multiplexer only selecls can be used to form x3 out bits ck+2 are also ate the two possible (i.e. x and x + 1).

The second stage of remainder PR, (either digit Qi (either x or (eqn. S), this selectior. or the value of PR,, ( N + 2)-bit comparator the value comparison. tractor cells are used PR,, and PRi,,+l. The outputs of the first 2kPRi-1) and the N + which will be used to worth noting that only and this needs a to the MSB of PRi,x,l, save form (i.e. sum and the sum and carry wo::ds lel to that of PRi,,+,, the architecture and next iteration; therefoi'e, used in Fig. 2. The si,gn trol a 2-to-1 ( N + 2)-bit PR,,,, as the correct also be used to cont:ol select x or x + 1 as a

subtyacter cell can be built by using carry save adderhubtracter fol- carry propagate or carry look-

noting that we are only inter- bits ck+2 and we do not need the

results. The N + 2 LSBs of 2 LSBs of the multiples of the

I hifted values of D) are fed to an multiplexer circuit. This multiplexer

of 2k - 1 carry out bits { c ~ + ~ } to values PR, , and PR,,,+,, which

out bits out of the 2k - 1 possible {PR,). The N + 2 LSBs of

r.ot generated at the first stage of the second stage. Therefore, the

the shifted values of D which and (x + 1)D. The 2k - 1 carry

fi:d to an encoder circuit to gener- ~a lues of the k-bit quotient digit

the basic cell is used to select the .?R,, or PR,,,,) and the quotient x + I). According to criteria depends on the sign of PRi,,+l

((:ompared with 0). Therefore, an is required for the sign check or In Fig. 2, two ( N + 2)-bit sub- to generate the N + 2 LSBs of inputs to these two cells are the stage (i.e. the N + 2 LSBs of

:Z LSBs of the shifted values of D form XD and (x + 1)D. It is the sign of PRi,,+l is required

propa.gation of the carry from the LSB while PR, , can be in a carry carry words). The addition of of PR,, can be done in paral-

which does not affect the speed of :his will save silicon area in the

two full wordlength adders are of PR,,,, is then used to con- multiplexer to select PR,, or

remainder PR,. This sign bit can a 2-to-1 k-bit multiplexer to

quotient digit Qi.

Aorg = (Zk - 1) * ( N + k + 1) * (ACCSAS +AFA) (9)

A,,, = ( ( ~ ' - - ) * ( ~ + ~ ) + ~ * ( N + ~ ) ) * ( A C C ~ ~ ~ + A F A )

(10) From the above equations, A,,, can be rewritten as a percentage of the original area [15], AOrg, i.e.

5 Evaluation of the

Although the new full wordlength it requires two full w select PR,. The cycle the additional second speed is nearly the algorithm [15]. Since full wordlength adde::s new algorithm, the one of the known faft to implement the new the corresponding 0r.e Let A,, and A,,, (i.e. k bits) required algorithm [15], and AccSAs and AFA be adderhubtracter respectively. Therefore, approximately by:

IEE Proc.-Comput. Digit. Tech.,

A,,, -% = Aorg

( Z k - 1) * ( k + 2) + 2 * ( N + 2) * ( 2 k - 1) * ( N + k + 1)

new algorithm

algorithm does not require a set of comp;.risons to select PR,,, and PR,,,,

xdlength additions in parallel to time is only increased by that of

multiplexer and therefore, the same as that of the original

mly two (rather than 2k - 1 [15]) are required to implement the

speed can be made faster by using adders. Now, the area required

algorithm will be compared with of the original algorithm [15].

rerresent the area per quotient digit to implement the original

the new one, respectively. Let the area of a controlled carry save

(CCSAS) [I61 cell and a full adder, A,, and A,,, are given

Vol. 145, No. 1, January 1998

(11) The area percentage of the new architecture AnewlAorg is shown in Fig. 3 for N = 32 and different values of k. For k = 1, the structure will become the conventional two's complement binary bit parallel architecture where A,,,, and A,, are the same. It is clear from Fig. 3 that the optimal value of the area percentage is for values of k around 7. For instance, for k equals 6 to 8, the new area A,,, is about 23% of the original one [ 151 Aorg.

1001

5 80 c c

60

2 40

20

n

Q

- 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5

hg.3 Area per quotient digit of new A,,, r a d i ~ - 2 ~ structure as percentage of area per quotient digit required by original radix-Zk structure A,,

digit size, k

(151

The new structure is also compared with the two's complement binary bit parallel one [l] as shown in Fig. 4. In the binary bit parallel structure, one quotient bit is generated by each row of the architecture and k bits are generated by k rows. Therefore, the area per quotient digit Abp will be:

Where ACAS is the area of a controlled addhubtract cell [l] which is the same as A,,-sAs, i.e.

Abp = k * ( N + 1) *ACAS (1'4

ACAS = AFA + AXOR (13) By using the above equation, A,,, + A F A of eqns. 9 and 10 can be rewritten as:

ACCSAS + AFA = 2 * AFA + AXOR (14) If we assume A F A equals 3Ax0, [6], (AccsAs + AFA)/ ACAS will equal 714 and the area percentage becomes:

(ak - 1) * ( k + 2) + 2 * ( N + 2) * 7oo A,,, -% = 4 * k * ( N + 1) AbP (15)

The results are shown in Fig. 4 where it is clear that, for k = 2 to 4, the area of the new approach is only two times that of the bit parallel one, while the speed of the new architecture is about k times that of the binary one (i.e. the speed can be up to four times with only doubling the area). Therefore, the area-time AT of the new r a d i ~ - 2 ~ structure (for k = 4) is approximately half that of the bit parallel one. It is worth noting that the area-time of recent two's digit serial algorithms [18] is similar to that of the bit

25

parallel one. Therefore, a comparison between these algorithms and the new one can also be deduced from Fig. 4.

600 - 550 - 500 -

a, 450 - 400 -

5 350 - 2 300 - 8 250 -

200 150

c

- -

100 50 0

1 2 3 4 5 6 digit size, k

Fig.4 Area per quotient digit of new Anew r ~ d i x - 2 ~ structure aspercent- age of urea per quotient digit yequired by binary bit parallel structure Abp l1J

The signed digit (SD) architectures [3] might offer higher speed than the two’s complements ones at the expense of more complex circuits for quotient digit selection and input-output conversion circuits. A comparison between a SD architecture and another SD one is easy since it will be assumed that the selection and conversion circuits are the same. A comparison between the new algorithm and the SD ones is not straightforward and it requires these additional circuits to be considered. However, the new algorithm improves the performance of the existing two’s complement one by reducing the area by 75%.

It is worth noting that the idea of using only the odd values of the quotient rather than all its values as described in an earlier work [ 161 can also be applied to the proposed approach. In this case, the area can be reduced further by about 50% and it can be shown that only about 13% of the original area [15] is required to implement such approach for values of k around 7.

6 Conclusions

A new ~ i d i x - 2 ~ two’s complement division algorithm has been proposed in this paper. It is the first time that the set of full wordlength comparisons of the existing two’s complement structures have been replaced with a set of only few MSBs (i.e. k + 2 bits) comparisons. This set of the (k + 2)-bit comparisons are used to select two

values of the quotient digit out of its 2k - 1 possible values. Then, only two full wordlength additions per each quotient digit are required to select one of these two values. For N equals 32 and k equals 6 to 8, it has been shown that the area of the new structure is only 23% of that of the original one [lo], while the speed is nearly the same. The speed of the new algorithm might be increased if a fast adder is used to implement these two full precision additions. The new structure has also been compared to the two’s complement binary bit parallel structure, where the area is only increased by two times while the speed can be up to four times for k equals 4.

7 References

1 HWANG, K.: ‘Computer arithmetic: principles, architecture, and design’ (John Wiley, New York, 1979)

2 KOREN, I.: ‘Computer arithmetic algorithms’ (Prentice-Hall, Englewood Cliffs, NJ, 1993)

3 ERCEGOVAC, M.D., and LANG, T.: ‘Division and square root: digit recurrence algorithms’ (Kluwer Academic Publishers, USA, 1994)

4 PREPARATA, F.P., and VUILLEMIN, J.E.: ‘Practical cellular dividers’, IEEE Trans., 1990, C-3, (5), pp. 606614

5 OBERMAN, S.F., and FLYNN, M.J.: ‘An analysis of division algorithms and implementations’. Technical report CSLTR-95- 675. 1995. Commiter Svstems Laboratorv. Stanford Universitv

6

7

‘IEEE standard’for bi&y floating poin<arithmetic’. IEEE standard 754, IEEE Computer Society, 1985 CAVANAGH, J.J.F.: ‘Digital computer arithmetic: Design and imolementation’ (McGraw-Hill. New York. 1985)

8 AbIZIENIS, A.: “Signed digit’ number representations for fast parallel arithmetic’, IRE Trans., 1961, 10, pp. 389400

9 MONTUSCHI, P., and CIMINIERA, L.: ‘Quotient prediction without prescaling’, IEE Proc. Comp. Digit. Tech., 1995, 142, (l), _I 1c ?? pp. IJ-LL

10 ROBERTSON, J.E.: ‘A new class of digital division methods’, IRE Trans., 1958, C--7, (9), pp. 218-222

11 SIRNTVAS, H.R., and PARHI, K.K.: ‘A fast radix-4 division algorithm’. Proceedings of IEEE international symposium on Computer arithmetic, Santa Monica, CA, 1994, pp. 31 1-314

12 BASHAGHA, A.E., and IBRAHIM, M.K.: ‘Radix digit serial pipelined dividedsquare root architecture’, IEE Proc. Comp. Digit. Tech., 1994, 141, (6), pp. 375-380

13 IRWIN, M.J., and OWENS, R.M.: ‘Design issues in digit serial signal processors’. Proceedings of the IEEE international symposium on Circuits and systems, 1989, pp. 441444

14 MONTUSCHI, P., and CIMTNIERA, L.: ‘Radix-8 division with over redundant digit set’, J. VLSI Sig. Proc., 1994, 7, (3) , pp. 259-270

15 MANDELBAUM. D.M.: ‘Multinle Darallel-actine iterative arrays for fast division’, Int J EZeitr, 1988, 64, (6), p”p 885-896

16 BASHAGHA, A E , and IBRAHIM, M K ‘A new high radix nonrestoring divider architecture’, Int. J. Electr., 1995, 79, (4), pp. 455470

17 TRIVEDI, K.S., and ERCEGOVAC, M.D.: ‘On-line algorithms for division and multiplication’, IEEE Trans., 1977, C-26, (7), pp. 681-687

18 PARHI, K.K.: ‘A systematic approach for design of digit serial signal processing architectures’, IEEE Trans., 1991, CS2.3, (4), pp. 358-275

26 IEE Proc.-Comput. Digit. Tech., Vol. 145, No. 1, January 1998

Date post:	19-Sep-2016
Category:	Documents
Upload:	mk
View:	213 times
Download:	1 times

Two's complement division without using the set of full precision comparisons

Documents