A NOVEL COMPRESSION APPROACH FOR DUAL ...journalstd.com/gallery/8-aug2020.pdfA NOVEL COMPRESSION...

A NOVEL COMPRESSION APPROACH FOR DUAL

QUALITY 32 BIT DADDA MULTIPLIER

Swayamvarapu Rajesh Kumar1

M-Tech scholar

Department of Electronics and Communication Engineering

Visakha Institute of Engineering and Technology, Vishakhapatnam, Andhra Pradesh, India.

Bighneswar Panda2

Associate Professor



Dr.Murali Krishna Gurram3

Senior Research Fellow

Department of Geospatial Analytics

National University, Singapore

J.Harini Nayana4

Assistant Professor



Abstract

In this paper, we propose four 4:2 compressors, Which have the flexibility to switch between exact and

approximate mode of operation. In approximate mode, these dual-quality compressors provide higher velocities

and lower power consumption at a lower precision cost. In approximate mode, each of these compressors has

its own level of accuracy as well as different delays and power dissipations in approximate and accurate mode.

The use of these compressors in parallel multiplier structures gives configurable multipliers whose precision

The efficiencies of these compressors in a 32-bit Dadda multiplier are evaluated in a 45-nm standard CMOS

system by contrasting their para-meters with those of the cutting-edge estimated multipliers. The results of the

analysis show an average reduction in delay and power consumption of 46 per cent and 68 per cent In certain

image processing applications the efficacy of these compressors is also evaluated. As compared with no

estimated multipliers based on a compressor, the errors of the proposed multipliers were higher while the

design parameters were considerably better. Finally, our studies showed that the multipliers realized based on

the suggested compressors have, on Compressor, Accuracy, Approximate design, Computing, Configurable,

Delay, Power Consumption average, about 93% smaller FOM value compared with the considered

approximate multipliers.

Keywords:- Approximate operating mode-computing, 4:2Compressor, Accuracy, configuring, Delay(lag), Power.

Science, Technology and Development

Volume IX Issue VIII AUGUST 2020

ISSN : 0950-0707

Page No : 65

1. Introduction

In conventional digital VLSI design, one usually assumes that a usable circuit/system should

always provide definite and precise results. But in fact, such exact operations are seldom needed in our

non-digital worldly experiences. The world accepts “analog computation,” which generates “good

enough” results rather than totally accurate results .The data processed by many digital systems may

already contain errors. In many applications, such as a communication system, the analog signal

coming from the outside world must first be sampled before being converted to digital data. The digital

data are then processed and transmit ted in a noisy channel before converting back to an analog signal.

During this process, errors may occur anywhere. Furthermore, due to the advances in transistor size

scaling, factors such as noise and process variations which are previously insignificant are becoming

important in today’s digital IC design. Of course, not all digital systems can engage the error-tolerant

concept. In digital systems like control systems, output signal impeccability is extremely important,

and this denies the use of the error tolerant circuit. However, for many digital signal processing (DSP)

systems that process signals relating to human senses such as hearing, sight, smell, and touch, e.g., the

image processing and speech processing systems, the Error-tolerant circuits may be applicable.

While there are many works in designing approximate multipliers, the research efforts on accuracy

configurable approximate multipliers are limited. In a static segment method (SSM) is presented,

which performs the multiplication operation on an m-bit segment starting from the leading 1 bit of the

input operands where m is equal to or greater than n/2. Hence, an m × m multiplier consumes much

less energy than an n × n multiplier. Also, a dynamic range unbiased multiplier (DRUM) multiplier,

which selects an m-bit segment, starting from the leading 1 bit of the input operands, and sets the least

significant bit of the truncated values to “1,” has been proposed in [11]. In this structure, the truncated

values are multiplied and shifted to the left to generate the final output. Although, by exploiting

smaller values for m, the structure of [11] provides higher accuracy designs than those of [10], its

approach requires utilizing extra complex circuitry.

1.1. Exact 4:2 Compressor

To reduce the delay of the partial product summation stage of parallel multipliers, 4:2 and 5:2

compressors are widely employed [18]. Some compressor structures, which have been optimized for

one or more design parameters (e.g., delay, area, or power consumption), have been proposed. The

focus of this project is on approximate 4:2 compressors. First, some background on the exact 4:2

compressor is presented.

Fig. 1 Block diagram of 4:2 compressor.



ISSN : 0950-0707

Page No : 66

This type of compressor, shown schematically in Fig. 1, has four inputs (x1– x4) along with an

input carry (Cin), and two outputs (sum and carry) along with an output Count. The internal structure of

an exact 4:2 compressor is composed of two serially connected full adders, as shown in Fig. 2. In this

structure, the weights of all the inputs and the sum output are the same whereas the weights of the carry

and Count Outputs are one binary bit position higher. The outputs sum, carry, and Count are obtained

from

Fig. 2 Structure of the conventional 4:2 compressor.

II Literature Survey

This literature describes about, a novel multiplier architecture with tunable error

characteristics, that leverages a modified inaccurate 2×2 building block. Our inaccurate multipliers

achieve an average power saving of 31.78% - 45.4% over corresponding accurate multiplier designs,

for an average error of 1.39% - 3.32%. Using image filtering and JPEG compression as sample

applications we show that our architecture can achieve 2X - 8X better Signal-Noise-Ratio (SNR) for

the same power savings when compared to recent voltage over-scaling based power-error tradeoff

methods. The multiplier power savings to bigger designs highlighting the fact that the benefits are

strongly design-dependent is presented. An enhance the design to allow for correct operation of the

multiplier using a residual adder, for non error-resilient applications is presented.

2.1 Multiplier structures for low power applications

This literature describes about, Energy-efficient serial and parallel multiplier structures are

explored to see their suitability in the low and ultra low power design regimes. 16 × 16-bit serial and

state-of-art parallel multipliers are compared in 45 nm CMOS. A multiplier structure is proposed by

optimizing the architecture, gate sizes and the voltage supply. The proposed structure provides 15%

more throughput as compared to two-cycle parallel multiplier with the same energy consumption for

high speed applications. In the low speed design region, it provides 3.7X energy reduction compared to

the serial multiplier.



ISSN : 0950-0707

Page No : 67

2.2 Voltage scalable high-speed robust hybrid arithmetic units using adaptive clocking

This literature describes about, various arithmetic units for possible use in high-speed, high-

yield ALUs operated at scaled supply voltage with adaptive clock stretching. Logic optimization of

the existing arithmetic units (to create hybrid units) indeed make them further amenable to supply

voltage scaling is demonstrated. Such hybrid units result from mixing right amount of fast arithmetic

into the slower ones. Simulations on different hybrid adder and multipliers in BPTM 70 nm

technology show 18%-50% improvements in power compared to standard adders with only 2%-8%

increase in die-area at iso-yield. These optimized data path units can be used to construct voltage

scalable robust ALUs that can operate at high clock frequency with minimal performance degradation

due to occasional clock stretching.

2.3 A reconfigurable approximate carry look-ahead adder

This literature describes about, a fast yet energy-efficient reconfigurable approximate carry

look-ahead adder (RAP-CLA) is implemented. This adder has the ability of switching between the

approximate and exact operating modes making it suitable for both error-resilient and exact

applications. The structure, which is more area and power efficient than state-of-the-art reconfigurable

approximate adders, is achieved by some modifications to the conventional carry look ahead adder

(CLA). The efficacy of the proposed RAP-CLA adder is evaluated by comparing its characteristics to

those of two state-of-the-art reconfigurable approximate adders as well as the conventional (exact) CLA

in a 15 nm FinFET technology. The results reveal that, in the approximate operating mode, the proposed

32-bit adder provides up to 55% and 28% delay and power reductions compared to those of the exact

CLA, respectively, at the cost of up to 35.16% error rate. It also provides up to 49% and 19% lower

delay and power consumption, respectively, compared to other approximate adders considered in this

brief. Finally, the effectiveness of the proposed adder on two image processing applications of

smoothing and sharpening is demonstrated.

2.4 Approximate data types for safe and general low-power computation

This literature describes about, Energy is increasingly a first-order concern in computer

systems. Exploiting energy-accuracy trade-offs is an attractive choice in applications that can tolerate

inaccuracies. Recent work has explored exposing this tradeoff in programming models. A key

challenge, though, is how to isolate parts of the program that must be precise from those that can be

approximated so that a program functions correctly even as quality of service degrades. By using type

qualifiers to declare data that may be subject to approximate computation is implemented. Using these

types, the system automatically maps approximate variables to low-power storage, uses low-power

operations, and even applies more energy-efficient algorithms provided by the programmer. In addition,

the system can statically guarantee isolation of the precise program component from the approximate

component. This allows a programmer to control explicitly how information flows from approximate

data to precise data. Importantly, employing static analysis eliminates the need for dynamic checks,

further improving energy savings. As a proof of concept, EnerJ, an extension to Java that adds

approximate data types and hardware architecture that offers explicit approximate storage and

computation is implemented.

2.5 Reducing area complexity multiplier reduction

Multipliers are the basic unit for all signal processing applications and other applications. In

all technology advancement it plays a vital role, the targets are low power consumption, increase in



ISSN : 0950-0707

Page No : 68

speed, reduction in area etc. The computations that are done by a modern computers that includes

microcomputers and microprocessor is astronomical. Even with the high speed computer chips the

process of the data coming from the devices all over the world requires efficient algorithms and to

achieve the compatibility we need to use the chip area effectively. The most often encountered

computation in data processing or signal processing is the operation of multiplication. This architecture

is to present a novice solution to reduce the total area of the multiplier by modifying the partial

products addition multiplier. Generally, to compute the data with high speeds modern hardware uses

the Wallace tree or dada multiplication techniques. By reducing the number of partial products

addition the number of gates can be reduced used to obtain the final result. In this proposed method

we reduced the real-estate of the chip by using more number of full adder in the earlier stages of the

partial products addition which is not present in the conventional multipliers.

III EXISTING SYSTEM

In order to diminish the time-taking of the intermediate products of the adder stages of multipliers, for providing high speed and lower power consumption with minimized area, Compressors are

equipped instead of regular Adders. Compressors based on their sizes can minimize as many inputs at

a time resulting in improved speed, reduced delay, minimized area on chip and lowered power

consumption. The basic adders- Half adder and full adder can only minimize two, three inputs at a

time where they are said to be 2:2 compressor and 3:2 compressor respectively

Fig 3 : Half-adder also termed as 2:2 Compressor

Fig4: Full-adder also termed as 3:2 Compressor

IV. Proposed Dual-Quality 4:2 Compressors

The proposed DQ4:2Cs operate in two accuracy modes of approximate and exact. The

general block diagram of the compressors is shown in Fig. 3. The diagram consists of two main parts

of approximate and supplementary. During the approximate mode, only the approximate part is

exploited while the supplementary part is power gated. During the exact operating mode, the

supplementary and some parts of the approximate parts are utilized.



ISSN : 0950-0707

Page No : 69

Fig. 5 Block diagram of the proposed approximate 4:2 compressors.

The hachured box in the approximate part indicates the components, which are not shared

between this and supplementary parts. In the proposed structure, to reduce the power consumption and

area, most of the components of the approximate part are also used during the exact operating mode.

We use the power gating technique to turn OFF the unused components of the approximate part. Also

note that, as is evident from Fig. 3, in the exact operating mode, tristate buffers are utilized to

disconnect the outputs of the approximate part from the primary outputs. In this design, the

switching between the approximate and exact operating modes is fast. Thus, it provides us with the

opportunity of designing parallel multipliers that are capable of switching between different accuracy

levels during the runtime. Next, we discuss the details of our four DQ4:2Cs based on the diagram shown

in Fig. 3. The structures have different accuracies, delays, power consumptions, and area usages. Note

that the i th proposed structure is denoted by DQ4:2Ci . The basic idea behind suggesting the

approximate compressors was to minimize the difference (error) between the outputs of exact and

approximate ones.

Structure 1 (DQ4:2C1): For the approximate part of the first proposed DQ4:2C structure, as shown in

Fig. 4(a), the approximate output carry (i.e., carry_) is directly connected to the input x4 (carry_ = x4),

and also, in a similar approach, the approximate output sum (i.e., sum_) is directly connected to input

x1 (sum_ = x1).

In the approximate part of this structure, the output Count is ignored. While the approximate part of

this structure is considerably fast and low power, its error rate is large (62.5%). During the

approximate mode, only the approximate part is exploited while the supplementary part is power

gated. During the exact operating mode, the supplementary and some parts of the approximate parts

are utilized. In the proposed structure, to reduce the power consumption and area, most of the

components of the approximate part are also used during the exact operating mode. We use the power

gating technique to the unused components of the approximate part.

Structure 2 (DQ4:2C2): In the first structure, while ignoring Cout simplified the internal structure of

the reduction stage of the multiplication, its error was large. In the second structure, compared with

the DQ4:2C1, the output Cout is generated by connecting it directly to the input x3 in the approximate

part. Fig. 5 shows the internal structure of the approximate part and the overall structure of DQ4:2C2.

While the error rate of this structure is the same as that of DQ4:2C1, namely, 62.5%, its relative error is

lower.



ISSN : 0950-0707

Page No : 70

Fig. 6 Approximate part and (b) overall structure of DQ4:2C2.

Structure 3 (DQ4:2C3): The previous structures, in the approximate operating mode, had maximum

power and delay reductions compared with those of the exact compressor. In some applications,

however, a higher accuracy may be needed. In the third structure, the accuracy of the approximate

operating mode is improved by increasing the complexity of the approximate part whose internal

structure is shown in Fig. 7(a).

Fig. 7(a) Approximate part of DQ4:2C3 and (b) overall structure of DQ4:2C3.

In this structure, the accuracy of output sum_ is increased. Similar to DQ4:2C1, the

approximate part of this structure does not support output Count. The error rate of this structure,

however, is reduced to 50%. The overall structure of DQ4:2C3 is shown in Fig. 6(b) where the

supplementary part is enclosed in a red dashed line rectangle. Note that in this structure, the utilized

NAND gate of the approximate part (denoted by a blue dotted line rectangle) is not used during the

exact operating mode.

Hence, during this operating mode, we suggest disconnecting supply voltage of this gate by

using the power gating. 4) Structure 4 (DQ4:2C4): In this structure, we improve the accuracy of the

output carry_ compared with that of DQ4:2C3 at the cost of larger delay and power consumption



ISSN : 0950-0707

Page No : 71

where the error rate is reduced to 31.25%. The internal structure of the approximate part and the overall

structure of DQ4:2C4 are shown in Fig. 7. The supplementary part is indicated by red dashed line

rectangular while the gates of the approximate part, powered OFF during the exact operating mode,

are indicated by the blue dotted line. Note that the error rate corresponds to the occurrence of the errors

in the output for the complete range of the input.

Fig. 8 (a) Approximate part of DQ4:2C4 and (b) overall structure of DQ4:2C4..

4.1 MULTIPLIER DESIGN

Dadda multipliers realized by the proposed compressors are studied. A proper combination of

the proposed compressors may be utilized to achieve a better tradeoff between the accuracy and

design parameters.

As an option, the use of both DQ4:2C1 and DQ4:2C4 for the LSB and MSB parts in the

multiplication, respectively, Essential design targets of multiplier include high speed, low power

consumption, regularity of layout and hence less area or even combination of them in one multiplier

are required thereby making them suitable for various VLSI implementations.

Dadda multiplier is a hardware multiplier designed similar to Wallace multiplier. Unlike

Wallace multipliers that perform reductions as much as possible on each layer, Dadda multipliers do

as few reductions as Possible. Due to this, Dadda multipliers have less expensive reduction phase, but

the numbers may be a few bit longer, thus requiring slightly bigger Adders. This implies that fewer

columns are compressed in the initial stages of the column compression tree, and more columns in the

later levels of the Multiplier.



ISSN : 0950-0707

Page No : 72

Fig. 9 : Reduction circuitry of an 8-bit Dadda mutiplier.

V . EVALUATED RESULTS

In this fragment, the efficiencies of the approximate mode put-forward 4:2 compressors and the Exact

4:2 compressors are evaluated using Verilog coding to equip them in the multiplier, and the simulated

results are collected. The outputs are simulated by inputs given. By studying and comparing the results to

the conventional / theoretical multiplier outputs, the error percentile the results and comparing them to the

conventional/ theoretical multiplier outputs, the error percentage of the multiplier is calculated.

Fig10: Simulation results of approximate multipliers



ISSN : 0950-0707

Page No : 73

Fig 11: RTL schematic of approximate multipliers

Fig 12: Technology schematic of approximate multiplier



ISSN : 0950-0707

Page No : 74

Fig 13: Design report of approximate multipliers

Fig 14: Timing report of approximate multiplier

VI. Conclusion

For 8-bit, 16-bit, 32-bit and 64-bit multipliers the Dadda multipliers delay, area, power , energy,

and EDP using the proposed approximate compressors are improved compared to the Dadda multipliers

using the exact compressor. The enhancements increase as the length of the bits increases. Using this

approach, we can infer that system output (speed) can be improved with the use of reduced area and

estimated compressor delay in the system Hence the project has been successfully synthesized and

simulated using Xilinx tool. In this project had the flexibility of switching between the exact and

approximate operating modes by using the control signal. Dadda Multiplier is faster than other

multipliers and requires less gates than Wallace Multiplier and a low power consumption. The



ISSN : 0950-0707

Page No : 75

compressors used has its own accuracy level in both the exact and approximate mode with variable

delay and less power consumptionIn future, the proposed work can be extended to reduce the

remainder size to N/2 bits like Karatsuba algorithm and the remainder can be calculated based on Bit

reduction Technique. Surely, this work can reduce the delay further and it can be used in ultra-high

speed processors. In BCD number system a group of binary bit is used to represent each of 10 decimal

digits. It can be designed to use it for Binary Coded Decimal (BCD) numbers and signed numbers. In

real time applications, BCD numbers have a vital role. BCD numbers are used to transfer the decimal

information into a computer, packet calculators, electronic counters, digital voltmeters and digital

clocks are the applications of BCD numbers. BCD code is referred as 8421 code. The implementation of

the arithmetic operations like addition, multiplication and division based on BCD numbers using Vedic

Mathematics will give the better result than the conventional methods

VII. References

[1] P. Kulkarni, P. Gupta, and M. Ercegovac, “Trading accuracy for power with an underdesigned

multiplier architecture,” in Proc. 24th Int. Conf. VLSI Design, Jan. 2011, pp. 346–351.

[2] D. Baran, M. Aktan, and V. G. Oklobdzija, “Multiplier structures for low power applications in

deep-CMOS,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2011, pp. 1061–1064.

[3] S. Ghosh, D. Mohapatra, G. Karakonstantis, and K. Roy, “Voltage scalable high- speed robust

hybrid arithmetic units using adaptive clocking,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,

vol. 18, no. 9, pp. 1301–1309, Sep. 2010.

[4] O. Akbari, M. Kamal, A. Afzali-Kusha, and M. Pedram, “RAP-CLA: A reconfigurable

approximate carry look-ahead adder,” IEEE Trans. Circuits Syst. II, Express Briefs, doi:

10.1109/TCSII.2016.2633307.

[5] A. Sampson et al., “EnerJ: Approximate data types for safe and general low-power

computation,” in Proc. 32nd ACM SIGPLAN Conf. Program. Lang. Design Implement. (PLDI), 2011,

pp. 164–174.

[6] A. Raha, H. Jayakumar, and V. Raghunathan, “Input-based dynamic reconfiguration of

approximate arithmetic units for video encoding,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,

vol. 24, no. 3, pp. 846–857, May 2015.

[7] J. Joven et al., “QoS-driven reconfigurable parallel computing for NoC-based clustered

MPSoCs,” IEEE Trans. Ind. Informat., vol. 9, no. 3, pp. 1613–1624, Aug. 2013.

[8] R. Ye, T. Wang, F. Yuan, R. Kumar, and Q. Xu, “On reconfigurationoriented approximate

adder design and its application,” in Proc. IEEE/ACM Int. Conf. Comput. Aided Design (ICCAD),

Nov. 2013, pp. 48–54.

[9] M. Shafique, W. Ahmad, R. Hafiz, and J. Henkel, “A low latency generic accuracy configurable

adder,” in Proc. 52nd ACM/EDAC/IEEE Design Autom. Conf. (DAC), Jun. 2015, pp. 1–6.

[10] S. Narayanamoorthy, H. A. Moghaddam, Z. Liu, T. Park, and N. S. Kim, “Energy- efficient

approximate multiplication for digital signal processing and classification applications,” IEEE Trans.

Very Large Scale Integr. (VLSI) Syst., vol. 23, no. 6, pp. 1180–1184, Jun. 2015.



ISSN : 0950-0707

Page No : 76

[11] S. Hashemi, R. I. Bahar, and S. Reda, “DRUM: A dynamic range unbiased multiplier for approximate

applications,” in Proc. IEEE/ACM Int. Conf. Comput.-Aided Design (ICCAD), Austin, TX, USA, Nov. 2015,

pp. 418–425.

[12] K. Y. Kyaw, W. L. Goh, and K. S. Yeo, “Low-power high-speed multiplier for error-tolerant

application,” in Proc. IEEE Int. Conf. Electron Devices Solid-State Circuits (EDSSC), Dec. 2010, pp. 1–4.

[13] H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas, “Bio-inspired imprecise computational

blocks for efficient VLSI implementation of soft-computing applications,” IEEE Trans. Circuits Syst. I, Reg.

Papers, vol. 57, no. 4, pp. 850–862, Apr. 2010.

[14] A. Momeni, J. Han, P. Montuschi, and F. Lombardi, “Design and analysis of approximate compressors

for multiplication,” IEEE Trans. Comput., vol. 64, no. 4, pp. 984–994, Apr. 2015.

[15] C. H. Lin and I. C. Lin, “High accuracy approximate multiplier with error correction,” in Proc. IEEE

31st Int. Conf. Comput. Design (ICCD), Oct. 2013, pp. 33– 38.



ISSN : 0950-0707

Page No : 77

Date post:	16-Feb-2021
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

A NOVEL COMPRESSION APPROACH FOR DUAL ...journalstd.com/gallery/8-aug2020.pdfA NOVEL COMPRESSION...

Documents