Download - Radix digit-serial pipelined divider/square-root architecture

Radix digit-serial pipelined divider/square-root architecture

A.E. Bashagha M.K. lbrahim

Indexing terms: VLSI arithemetic, Architecture

Abstract: The paper presents a new digit-serial architecture for division and square-root which can be pipelined to the bit level to achieve high throughput. The architecture is different from the existing divider/square-root architectures in that is is based on the radix-2" algorithm. As a result, any type of adder can be used in the proposed digit- serial controlled add/subtract basic cell. The authors present two basic digit-serial controlled add/subtract cells. The first is based on the conventional carry feedback digit-serial adder. The second is based on the carry feed-forward adder, which results in the first reported digit-serial divider/square-root architecture that can be pipelined down to the bit-level. An evaluation of the proposed architecture for different values of the digit size is also presented.

1 Introduction

Digital signal processing systems require different sam- pling rates, and therefore they are implemented in different approaches. The bit-perallel approach [l-71 is preferred for high-speed applications such as image processing, whereas the bit-serial approach [8-101 is used for low-speed applications such as speech and communica- tions. The digit-serial approach [11-161 have been developed recently as a compromise between the fast and expensive bit-parallel approach and the slow and cheap bit-serial realisation. The digit-serial structure processes a number of bits called the digit, n, at any one cycle. The digit size varies between one bit and wordlength N ; so the digit-serial structure is reduced to a bit-serial structure where n equals 1, and it becomes a bit-parallel one when n equals the wordlength N . This flexibility will enable the designers to find the best tradeoff between the cost and the throughput rate. However, these digit-serial architectures are based on either folding the bit-parallel architectures [l5] or unfolding the bit-serial ones [lo]. As a result, they are fixed for all radices and are based on carry ripple adders. Moreover, the major drawback of this approach is that such architectures cannot be pipe-

0 IEE, 1994 Paper 1502E (C2, ES), first received 30th November 1993 and in revised form 27th June 1994 The authors are with the Department of Electrical & Electronic Engin- eering University of Nottingham, Nottingham NG7 ZRD, United Kingdom

IEE Proc.-Comput. Digit. Tech., Vol. 141, No. 6, November 1994

lined to the bit level, owing to the feeding back of the carry out bit to the same digit cell.

Digit-serial notation is also used in the context of redundant number-based systems [17-201 which are also known as on-line arithmetic systems. The drawback of these systems is the increased size of the computational elements and the overhead of data conversion [l5].

Another approach to the design of digit-serial systems has been developed recently and is based on radix-2" arithmetic [21-241. The advantages of the radix approach are: (i) it has enabled for the first time the design of functionality correct digit-serial structures directly based on two's complement representation in a hierarchical way without the need for bit-parallel or bit- serial designs as an initial starting point; and (ii) it only specifies the functionality of the basic cell, hence any internal architecture can be used so long as it satisfies the functionality specification of the cell.

Division and square-root are basic arithmetical func- tons and are used in many applications, such as QR decomposition [SI. Many workers have made efforts to discover suitable algorithms and design efficient hard- ware architectures. The architectures that perform both division and square-root operations are based on either the binary bit-parallel approach [4-71 or the redundant signed digit approach [25-281. The performance of the nonredundant bit-parallel divide/square-root architectures [4-61 is limited by the use of carry-propagate arithmetic. Also, they cannot be pipelined to the bit level and their throughput rate is wordlength-dependent. Agrawal [7] presented a divide/square-root array by using carry- save arithmetic instead of the carry-propagate approach to increase the throughput rate. However, this requires carry-lookahead circuitry for sign computation in each row, which cannot achieve the high throughput of the bit level pipelined structures. The redundant systems [25- 281 are based on redundant signed digit number representation of the data. Therefore, relatively complex circuits are required for the control and sign check of the partial remainder.

In this paper we present a new digit-serial architecture to carry out division and square-root which is based on radix-2" arithmetic. We present two types of digital controlled add/subtract cells; the first is based on feeding back the carry output bit to the same digit-serial adder and the second is based on a digit carry-out-feedforward adder. The advantage of feeding forward the carry out bit is to enable the pipelining of the architecture to the bit level as well as the digit level. Moreover, any adder can be used to implement the proposed architecture; it can be a carry ripple adder which is more suitable for small

375

radices, or a carry lookahead adder which can be used for high radices.

2 Nonrestoring division/square-root algorithms

Let us consider the following division and square-rooting operations:

Q = A / D and Q = J A (1) where A, D and Q are (2N + 1)-bit, (N + 1)-bit, and (N + 1)-bit binary fractions in two's complement fixed point representation, respectively, such that

A = aZNaZN-la2N-2 ... alao (2) D =dNdN-ldN-2 ... dldo (3)

Q =qNqN-lqN-2 ' " 4140 (4) where aZN, dN and qN are the sign bits of A, D and Q, respectively. It should be noted that in the case of division A, D and Q can be positive or negative binary fractions, whereas A and Q must be positive in the case of square-rooting (i.e. a2, and qN are zeros). We assume that 1 AI i ID1 and D # 0.

Now we describe the nonrestoring general radix division algorithm for positive and negative operands [23]. The sign bit of the quotient q, depends on the sign bits of A and D, and can be found directly as the output of an exclusive-or (XOR) gate, where the inputs are the sign bits aZN and d,, i.e. q, = a2, @ d,. The 2N-bit shifted dividend (excluding a2,), A and the (N + 1)-bit divisor D are rewritten as 2m-digit and m-digit numbers, respectively (it may be required to append a number of zero bits to A or D or both of them to obtain these numbers of digits), where each digit consists of n bits, such that:

A = a2n-la2,,-2 ... alao (5 )

D d,,-1dm-2 ... dido (6) The sign bit of D will be the most significant bit (MSB) of the most significant digit (MSD), dm-l. Digit-serial nonrestoring division is illustrated in Fig. 1 for N = 5, n = 2 and m = 3. The dividend A and the divisor D are of different signs, and so the sign bit q, is equal to 1. The control signal for the first operation is equal to 0, and

N = 5, n = 2, m = 3

Operand Decimal Binary Radix-4 4's (2's) complement

A 0.34375 0.01011 023 Shifted A 0.6875 010110 112 D -0.65625 1.01011 223 111 (0.10101)

Q = A/D -0.53125 1.01111 233 101 (0.1oOo1)

q5 = 0 €B 1 = 1 1 1 2 0 0 0

C,=&=O addD 223 q 4 = o c,=1 0 0 1

shift PR 0 0 2 C,=q,=O addD - 2 2 3

q a = l c,=o 2 3 1 shift PR 1 2 2

C,=q3 = 1 subtract D 111 q * = l c,=o 2 3 3

shift PR 1 3 2 C , = q2 = 1 subtract D

q , = 1 c,=o 3 0 3 shift PR 2 1 2

C, = q1 = 1 subtract D 111 q o = l c,=o 3 2 3

Fig. 1

376

Nonrestoring digital-serial division example

therefore the radix-4 divisor 223 is added to the shifted radix-4 dividend 112. The carry out of the first pada l remainder 001 is equal to 1, and so the quotient bit q4 is equal to 0 (since d, = l), and the next step will be addition. The radix-4 remainder 001 is shifted one bit to the left and the next MSB of A, 0, is appended to it. The appended remainder 002 is added to D, 223 (the carry out of the result of this addition) is equal to 0, and qa = 1; the next operation is subtraction. The subtraction is carried out by adding the 4's complement of 223, i.e. 111, to the second appended remainder 122. The remaining quotient bits are generated in a similar procedure and they are equal to the complement of the carry out (since d, = l), where these quotient bits are used to control the addition/subtraction operations. The procedure con- tinues until we obtain the least significant bit (LSB) of Q, q o , and it is noted that we ignore some of the least significant bits (zero bits) of A.

Nonrestoring binary square rooting 141 is similar to binary division, except that the subtrahend (the radicand R) of the square-root changes from one step to another, whereas the subtrahend of the division process is the same for all the steps and is equal to the (N + 1)-bit divisor D. The radicand starts with two bits (01) in the first step and increases by one bit in every preceding step, and the h a 1 radicand consists of N + 1 bits. In digit- serial square-rooting [24] the radicand should have the same number of bits for all the steps of the square- rooting process. This can be achieved by appending a number of zero bits to the right of the least significant bit (LSB) of each radicand [24], such that the ith radicand becomes :

i

R,=qN-lqN-Z ' " qN-i+lqN-i+ll 0 ' ' . 0 (7)

The digit-serial square-rooting is similar to that of digit- serial division, except that the subtrahend for the square- rooting will be the m-digit ((N + 1)-bit) radicand Ri of eqn. 7.

3 The combined architecture

3.1 The basic cell The basic combined controlled addlsubtract cells (BCCAS) for division and square-rooting are shown in Fig. 2. The carry-out bit of each digit (except the most significant digit (MSD)) of the partial remainder is delayed by one cycle and is either fed backwards to the same n-bit adder as shown in Fig. 2a, or fed forward to another n-bit adder as shown in Fig. 26. The switch in Figs. 2a and b is a multiplexer which is used to connect the carry input bit to the control signal every m cycles (i.e. at the instant mk), and the carry-out bit of each digit for other cycles (i.e. at the instants mk + 1, mk + 2, . . . , mk + m - l), where m is the number of digits of the kth sample. If the control mode M is equal to 1, the subtrahend B is equal to the divisor D and division is carried out. On the other hand, the square-root operation is performed when M is equal to 0, with B equal to the radicand R.

At every cycle the BCCAS cell subtracts (adds) one digit of the subtrahend B, b,- . . . blbo from (to) the cor- responding digit of the input partial remainer PR, P,- . . . PIPo, if the control signal C, is equal to 1 (0) each cycle. The output sum, Pm-l ... PIPo, represents one digit of the next partial remainder PR, and all the digits of PR are generated in m cycles. The carry-out bit CO of the MSD of PR is fed to an XOR gate with the divisor's sign bit, Md, (dN is equal to the MSB d,-l of the msd

I E E Proc.-Comput. Digit. Tech., Vol. 141, No. 6, November 1994

d,,-l), to produce the ith bit qi of the division's quotient if M = 1, or the square-root bit if M = 0. This bit q1 is fed forward to the next cell to decide whether the next operation addition (ql = 0) or subtraction (qi = 1). The subtraction is carried out by inverting all the bits of B and

If M - 1 ; Q = A / D If M r O , Q=m b = M d @ m r 4-1 =Md+ 0 C

I I I

I i I I L --_----- ------_ _- -------- 2

5 - 1 I Pi It Po

b Fig. 2 (I With carry-out feedback b With carry-out feedforward

Basic combined controlled addlsubtract cell

adding 1 to the least significant bit (LSB) of the least significant digit (LSD) of B. The addition of this 1 bit is performed by connecting the carry input Ci to C, in the first cycle of each additionlsubtraction operation.

The fed-forward BCCAS cell is shown in Fig. 2b and is based on the fed-forward digit-serial adder [22]. This cell consists of two n-bit adders, where the carry out bit of each digit of the first adder is delayed by one cycle and

IEE Proc.-Cornput. Digit. Tech., Vol. 141, No. 6, November I994

fed forward to another adder, instead of being fed backwards to the same adder. The second adder adds this carry out to the output digit of the first adder to generate the next partial remainder PR, and the carry-out output bit of the second adder is always zero, except for the case when the carry input bit and all the bits of the input digit are equal to 1. To avoid feeding back this carry-out bit to the second adder, an AND-OR circuit is added to the fed-forward cell to detect this condition and to ensure that the correct carry bit is added to the input digit. The advantage of this digit-serial adder is to increase the throughput rate, since the architecture can be pipelined to the bit level as well as the digit level, and so the same architecture processes many samples simultaneously.

3.2 Divisionlsquare-root architecture The radix-4 digit-serial architecture to perform division and square-root is shown in Fig. 3 for N = 3, n = 2 and

M

m = 2. Also, it is assumed that the N least significant bits of A are zeros such that

(8) The 2N bits of A (excluding the sign bits a,) are divided into three digits: a2a,, a,O and 00. The (N + 1)-bit divisor D (the first radicand R,) is divided into two digits d,d, and dld,(Ol and 00). The two (m) most significant digits of A, the digits of the divisor and the radicand are fed digit-serially with the least significant digit first. The control mode M is fed to all the cells of the architecture so that the division is carried out if M = 1 and the square-rooting is performed when M = 0. The control signal C, for the first cell is the complement of the MSB of Q, q3 , whereas the control signal for the other cells is equal to qi for i = N - 1, . . . , 1.

Now we describe the division process: the sign bit of Q, q, , is given by Md, @ a,, and hence it will only be

377

A = a, a2 a,aoOOO

generated when d3 and a3 have become available, i.e. after one (m - 1) cycle. This bit q3 is delayed by one cycle, inverted and then fed to the first cell for the next m cycles to decide whether the first operation is subtraction (when q3 = 0) or addition (when q3 = 1). The entry of the first digit of A, a,O, and the first digit of the divisor D, dido, to the first cell is synchronised with the entry of the control signal C, = & . The m-digit divisor D is sub- tracted from (added to) the rn digits of A during the second m cycles to generate q 2 , which is delayed by one cycle and fed to the second cell to control the second operation in the third rn cycles. The output sum of the first cell represents the first partial remainder PR, ( P I 3 P I , , P,,Pl0) which is shifted by one bit to the left, and the MSB of the next MSD of A, 0, is appended to it. The appended partial reminder ( P , , P , , , P,,O) is fed down to the second cell as well as the two-digit divisor D. The addition/subtraction operation is performed in the second cell during the third m cycles to generate another bit of Q, q l , which controls the next operation in the third cell through the fourth m cycle. The least significant bit of Q, qo , is generated from the last cell, and so all the quotient bits are generated within (N + l)m cycles.

The square-rooting procedure is similar to the division procedure, except that the quotient’s sign bit q3 is always equal to zero since both M and a3 are equal to 0, and so the first operation for the square-rooting is always subtraction. In the case of division, the subtrahend (divisor) is the same for all the cells, and the square-rooting subtrahend (radicand) changes from one step to another and it depends on the previous quotient bits, as explained in Section 2. Therefore, a new radicand is generated for every cell, as shown in Fig. 3. The switches in Fig. 3 are multiplexers which are used: to hold the control signal (quotient bit) for m cycles; to shift the partial remainder to the left and to modify the radicand at each step of the algorithm; and to convert the quotient bits available in bit-parallel form to digit-serial form.

4

The use of the feedforward cell allows the pipelining of the architecture to the bit level, which cannot be achieved by using the carry feedback cell. The pipelining of the architecture to the bit level results in an increase in the throughput rate, but at the same time the area will increase owing to the increase in the number of the latches used to hold the data. The area A, time T and the area-time A T for a 32-bit digit-serial divider/square-root architecture as a function of the digit size, and the number of pipelining stages are shown in Fig. 4-6. The calculations of A, T and A T are based on the assumption that a carry-propagate adder is used in the basic digit cell.

In these calculations it is assumed that the AND gate is equivalent to two NAND gates, the exclusive-or (XOR) is equivalent to three NAND gates, and so on [4]. The units of area and time A,, and At, respectively, represent the area required and the time taken by an NAND gate, respectively, where the values of Aa and A, depend on the technology used. For example, the ES2 (European silicon structures) typical value of A, using 1.0 pm CMOS technology is 0.46 ns.

It is clear from Figs. 4-6, that the performance of the architecture that uses a carry feedback cell (feedback (0)) is better than the one with carry feedforward without pipelining (feedforward (0)). However, the performance is

Evaluation of the performance of the architecture

378

enhanced when pipelining is used in the feedforward architecture. The number of pipelining levels is shown inside the brackets in the legend to these figures. It can be concluded that for each digit size there is a certain number of pipelining levels that achieves the best tradeoff between the cost (area) and the throughput rate. For example, the best performance of the digit-serial architecture for digit size 16 is achieved when 6 pipelining levels are used. Also, it is clezr from Fig. 6 that the minimum area-time is achieved when the digit size is equal to 8 and the number of pipelining levels is 10. Furthermore, the

3001 I

I feedback(0)

0 feedforward(0)

I feedforward(1)

feedforward (3)

feedforward (4)

feedforward ( 6 )

feedforward ( 8 )

I feedforward(9)

feedforward(l0)

I feedforwrd (16)

1 feedtorward(l8)

feedfwward (33)

1 2 4 8 1 6 3 2 digit size

Fig. 4 Area of a 32-bit digit-serial dividerlsquare-root architecture using feedbackward cell and feedforward cell for different levels of pipelining (shown inside parentheses in the legend), as a funetion of the digit size

time. A,

I feedback ( 0 ) 1000-.. ........ ~. ~ ........ ~~ . ... ..... ~~. ~.~ .~ ..... ~~~~~

~ ~ ~ ~~ Oteedtorward(0)

feedforward (3)

feedforward (4)

0 teedtorward (6) a feedforward (8)

I feedtorward(9)

feedfwward (10)

feedforward(l6)

feedforward(l8)

feedforward (33)

digit size

Fig. 5 Time of II 32-bit digit-serial diuiderlsquare-root architecture using feedbackward cell and feedforward cell for different levels of pipelining, as a function ofthe digit size

I E E Proc.-Comput. Digit. Tech., Vol. 141, No. 6, November 1994

performance of the divider/square-root architecture for digit sizes 8 and 16 is better than the performance of the bit-parallel architecture (i.e. when the digit size is equal to 32).

The conventional digit-serial architectures [SI and [11-161 are based on feeding back the carry to the same basic cell, and hence they cannot be pipelined to the bit level. Therefore, the values of the area, time and area- time of these architectures are similar to those of our

area-lime XIO!A,A~ feedback(0)

0 feedforward (0)

............................... ~~ .................... ~ . ~ . ~ ......

~~ ~~~~ ........... ~~~ .................. .................

24 ........................... ~.~ ........................... ~~~ ..... feedforward ( 1 )

21 .........____ ~ ................................................... feedforward ( 3 3i feedforward (4)

n feedforward(6) 1 5 1 1 2 ’ .......

9 . . .....

6 ..... . .

3 .....

n ’

. . . . . . ...................... . ... .. _ _

feedforward(33)

“ 1 2 4 8 1 6 3 2 digit size

Fig. 6 Area-time of a 32-bit digit-serial dividerlsqunre-root architecture using feedbackward cell and feedforward cell for diffment levels of pipelining, as afunction ofthe digit size

feedback architecture (feedback (0)). The comparison between our feedforward architecture for different pipelining levels and the conventional ones can be seen from Figs. 4-6, which are similar to the above comparison with our feedback architecture.

The real problems of the two’s complement division/ square-rooting circuits are the serial dependency between successive additions/subtractions and the LSB to MSB nature of addition. We do not claim in this paper that our architecture will solve these inherent problems of the conventional nonredundant systems. However, the comparison between our architecture and the conventional digit-serial architectures shows that our architecture can be pipelined to the bit level, and therefore a higher throughput rate can be achieved. In addition, the basic adder in our architecture can be any adder. When a carry-propagate adder is used, the latency of the proposed architecture will be a function of the pipelining levels. The latency can be reduced if a carry-lookahead adder is used in the proposed architectures, but the price to be paid for that is the increase of the required silicon area. It should be pointed out that the ability to use any adder in the basic cell, which is unique to our architecture, has only been made possible because our approach is based on radix-2” arithmetic.

5 Conclusions

We have presented a new high-performance digit-serial architecture to perform division and square-rooting. The

IEE Prm.-Comput. Digit. Tech., Vol. 141, No. 6, November 1994

n-bit adder used in the basic cell can be any adder, and it depends on the digit size; the carry-propagate adder is more suitable for small digit sizes, and the lookahead adder is more suitable for large digit sizes. The basic combined controlled add/subtract cell either consists of one n-bit adder, with the carry out bit fed back to the same adder, or two n-bit adders, with the carry out bit of the first adder fed forward to the second one. The advantage of feeding forward the carry bit is in enabling the pipelining of the architecture down to the bit level, as well as the digit level. We have also shown that the performance of the pipelined digit-serial divide/square-root architecture is better than the bit-parallel one.

6 References

1 CAVANAGH, J.J.F.: ‘Digital computer arithmetic: design and implementation’ (McGraw-Hill, New York, 1985)

2 GUILD, H.H.: ‘Some cellular logic arrays for non-restoring binary division’, Radio Electron. Eng., 1970,39, pp. 345-348

3 MAJITHIA, J.C.: ‘Non-restoring binary division using a cellular array’, Electron. Lett., 1970,6, pp. 303-304

4 HWANG, K.: ‘Computer arithmetic: principles, architecture and design’ (Wiley, New York, 1979)

5 KAMAL, A.K., SINGH, H., and GRAWAL, D.P.: ‘A generalized pipeline array’, IEEE Trans. Comput., 1974,23, pp. 533-536

6 STEWART, R.W., and CHAPMAN, R.: ‘Fast stable Kalman filter algorithms utilising the square root’. IEEE International Conference on Acoustics, Speech and Signal Processing, 1990, pp. 1815-1818

7 AGRAWAL D.P.: ‘High-speed arithmetic arrays’, IEEE Trans. Comput., 1979,2& (3), pp. 215-224

8 PARHI, K.K.: ‘A systematic approach for design of digit-serial signal processing architectures’, IEEE Trans. Circuit Syst., 1991, 38, (4), pp.-358-375-

9 STEWART, R.W., CHAPMAN, R., and DURRANI, T.S.: ‘Arith- metic imulementation of the piven’s OR array’. Proceedinas o f f E E E _ . ICASSP-’89, Glasgow, UK, G89, pp.~2405-f408

10 DENYER, P.B., and RERNSHAW, D.: ‘VLSI signal processing: a bit-serial approach‘ (Addison Wesley, Reading, MA, 1986)

11 SMITH, S.G., McGREGOR, M.S., and DENYER, P.B.: Tech- niques to increase the computational throughput of bit-serial architectures’, Proceedings of the Internationnl Conference on Acoustics, Speech and Signal Processing, 1987, Dallas, USA, pp. 543-546

12 SMITH, S.G., and DENYER, P.B.: ‘Serial data computation’ (Kluwer Academic, Boston, 1988)

13 SMITH, S.G., and MORGAN, R.W.: ‘Generic ASIC architecture and synthesis scheme for digital signal processing’, Proceedings of the IEEE ICASSP ’89, Glasgow, Scotland, 1989, pp. 2413-2416

14 HARTLEY, R.I., and CORBETT, P.F.: ‘A digit-serial silicon com- piler’, Proceedings of the 25th I EEE Design Automation Conference, 1988, pp. 646-649

15 HARTLEY, R.I., and CORBETT, P.F.: ‘Digit-serial processing techniques’, IEEE Trans. Circuit Syst., 1990, 37, (6). pp. 707-719

16 PARHI, K.K., and WANG, C.Y.: ‘Digit-serial DSP architectures’. Proceedings of the IEEE Conference on Application Specific Array Processors, Princeton, NJ, 1990, pp. 341-351

17 IRWIN, M.J., and OWENS, R.M.: ‘Digit pipelined arithmetic as illustrated by the paste-up system: A tutorial’, IEEE Comput., 1987, 20, pp. 61-73

18 ERCEGOVAC, M.D.: ‘On-line arithmetic: an overview’, Proc. SPIE. Real-time Signal Processing VIII , 1984,495, pp. 86-93

19 ERCEGOVAC, M.D.: ‘An on-line square-root algorithm’. Pro- d i n g s of the 4th IEEE Symposium on Computer Arithmetic, 1978, Anaheim, USA, pp. 183-189

20 CARTER, T.M., and ROBERTSON, J.E.: ‘Radix-16 signed digit division’, IEEE Trans. Comput., 1990, 39, (12), pp. 1424-1433

21 IBRAHIM, M.K.: ‘Radix-2” multiplier structures: a structured design methodology’, IEE Proc. E, 1993, 140, (4), pp. 185-190

22 AGGON, A., ASHUR, A., and IBRAHIM, M.K.: ‘A novel cell architecture for high performance digit-serial computations’, Elec- tron. Lett., 1993,29, (1 l), pp. 938-940

23 BASHAGHA, A.E., and IBRAHIM, M.K.: ‘A new digit-serial divider architecture’, Int. J. Electron., 1993, 75, (l), pp. 133-140

24 BASHAGHA, A.E., and IBRAHIM, M.K.: ‘Design of a square-root architecture: digit-serial approach, Int. J. Electron., 1994,76, (l), pp. 15-25

379

25 GLOSING, J.B., and BLAKELEY, C.M.S.: ‘Arithmetic unit for integ~al division and square root’, I E E Proc. E, 1987, 134, (l), pp. 17-23 139, (a), pp. 505-510

26 McQUILLAN, S.E., McCANNY, J.C., and WOODS, R.F.: ‘High performance VLSI architecture for divison and square root’, Elm- won. Lett., 1991,27, (l), pp. 19-21

27 McQUILLAN, S.E., and McCANNY, J.C.: ‘VLSI module for high performance multiply, square root and divide’, I E E PROC. E, 1992,

28 SRINIVAS, H.R., and PARHI. K.K.: ‘High speal VLSI processor architectures wing hybrid number representation’, J . VLSI Signal Process., 1992,4, pp. 177-198

380 IEE Proc.-Comput. Digit . Tech., Vol. 141, No. 6, November 1994