Automating Transformations from Floating Point to Fixed Point for Implementing Digital Signal...

Automating Transformations fromFloating Point to Fixed Point for Implementing

Digital Signal Processing Algorithms

Prof. Brian L. Evans

Embedded Signal Processing LaboratoryDept. of Electrical and Computer Engineering

The University of Texas at Austin

July 4, 2006

Based on work by PhD student Kyungtae Han (now at Intel Research Labs)

2

Outline

• Introduction

• Background

• Optimize fixed-point wordlengths

• Reduce power consumption in arithmetic

• Automate transformations of systems

• Conclusion

3

Implementing Digital Signal Processing Algorithms

Introduction

CodeConversion

WordlengthOptimization

Floating-Point Program

Fixed Point (Uniform Wordlength)

Fixed Point (Optimized Wordlength)

Floating-Point

Processor

Fixed-Point

Processor

Fixed-Point ASIC

Price Power*Hardware

Digital SignalProcessingAlgorithms

* Power consumption

HL

HL

HL

ASIC: Application Specific Integrated Circuit

4

Transformations to Fixed Point

• Advantages Lower hardware complexity Lower power consumption Faster speed in processing

• Disadvantages Introduces distortion due to

quantization error Search for optimum wordlengths

by trial & error is time-consuming

• Research goals Automate transformations to fixed point Control distortion vs. complexity tradeoffs

CodeConversion


Floating-Point Program

Fixed Point (Optimized Wordlength)

Tra

nsfo

rmat

ion

Introduction

5

Outline

• Introduction

• Background




• Conclusion

6

Fixed-Point Data Format

• Integer wordlength (IWL) Number of bits assigned to integer representation Includes sign bit

• Fractional wordlength (FWL) Number of bits assigned to fraction

• Wordlength: WL = IWL + FWL

SystemC formatwww.systemc.org

S X X X X X

Wordlength

Integer wordlength

Fractional wordlength

(Binary point)

π = 3.14159…(10) [Floating Point]

3.140625(10) = 011.001001(2) [WL=9; IWL=3; FWL=6]

3.141479492(10) = 011.00100100001110(2) [WL=16; IWL=3; FWL=13]

Background

7

Feasible region

Distortion vs. Complexity Tradeoffs

• Different wordlengths have different application distortion and implementation complexity tradeoffs

Background

• Minimize implementation cost• Minimize application distortion

Implementationcomplexity c(w)

Applicationdistortion d(w)

Optimaltradeoff curve

c(w) Implementation cost function

Cmax Constant for maximum implementation cost

d(w) Application distortion function

Dmax Constant for maximum application distortion

Wordlength lower bounds

Wordlength upper boundsw

w

},,,,{ 1210 Nwwww wVector of wordlengths:

8

Wordlength Optimization

Background

www

w

w

wwΙw

max

max

)(

)(

tosubject

)](),([ min

Cc

Dd

dcn

www

w

w

wwΙw

max

max

)(

)(

tosubject

)()( min

Cc

Dd

daca dcn

• Multiple objective optimization

• Single objective optimization

Proposed work fixes integer wordlengthsand searches for fractional wordlengths

9

Genetic Algorithm

• Evolutionary algorithm Inspired by Holland

1975 Mimic processes of

plant and animal evolution

Find optimum of a complex function

New GenePool

FunctionEvaluation

Mutation Selection

MatingChild

Genes

Parental Genes

Genes w/Measure

[Greg Rohling, Ph.D Defense, Georgia Tech, 2004]

Background

10

Pareto Optimality

• Pareto optimality: “best that could be achieved without disadvantaging at least one group” [Schick, 1970]

• Pareto optimal set is set of nondominated solutions E is dominated by C as all objectives for C

are less than corresponding objectives for E Solutions A, B, C, D are nondominated (not

dominated by any solution)

• Pareto front is boundary (tradeoff curve) that connects Pareto optimal set solutions

Obj

ecti

ve

2

Objective 1

Pareto Front

: Nondominated : Dominated

F

E

GH

I

D

C

B

A

Background

11

Outline

• Introduction

• Background




• Conclusion

12

Search for Optimum Wordlength

• Exhaustive search impractical for many variables• Gradient-based search (single objective)

Utilizes gradient information to determine next candidates Complexity measure (CM) [Sung & Kum, 1995]

Distortion measure (DM) [Han et al., 2001]

Complexity-and-distortion measure (CDM) [Han & Evans, 2004]

• Guided random search Genetic algorithm for single objective [Leban & Tasic, 2000]

Multiple objective genetic algorithm [Han, Olson & Evans, 2006]

Optimize Fixed-Point Wordlengths

Next

Next

13

Complexity-and-Distortion Measure

• Weighted combination of measures

• Single objective function• Gradient-based search

Initialization Iterative greedy search based

on complexity and distortiongradient information

)( )( )( www dcf dccd

www

w

w

wΙw

max

max

)(

)(

tosubject

)(min

Cc

Dd

fcdn

10,10,1 dcdcwhere

c(w) Complexity function

d(w) Distortion function

Dmax Constant for maximum distortion

Cmax Constant for maximum complexity


14

Case Study I: Filter Design

• Infinite impulse response (IIR) filter Complexity measure: Area model of field-programmable

gate array (FPGA) [Constantinides, Cheung & Luk 2003]

Distortion measure: Root mean square (RMS) error Seven fixed-point variables (indicated by slashes)

Delay

b0

b1-a1

x[n] y[n]


15

Case Study I: Gradient-Based Search

• CDM could lead to lower complexity and lower number of simulations compared to DM and CM

Search

Method

Gradient

Measure

Number

of System

Simulations

Complexity Estimate

(LUT)

Distortion

(RMS)*

Gradient

Gradient

Gradient

Complete

DM

CDM

CM

-

316

145

417

167 **

51.05

49.85

51.95

-

0.0981

0.0992

0.0986

-

* Maximum distortion measured by root mean square (RMS) error is 0.1** 167 = 268,435,456 (8.5 years, if 1 second per 1 simulation)


16

Case Study I: Genetic Algorithm

20 40 60 80 10010

-2

10-1

100

Area (LUTs)

Err

or

(RM

S)

non-dom (90/90)

20 40 60 80 10010

-2

10-1

100

Area (LUTs)

Err

or

(RM

S)

non-dom (67/90)

dom (23/90)

20 40 60 80 10010

-2

10-1

100

Area (LUTs)

Err

or

(RM

S)

non-dom (76/90)

dom (14/90)

100th Generation 250th Generation 500th Generation

• Search Pareto optimal set (nondominated) • Handles multiple objectives: Error and Area

* Population for one generation: 90

Pareto Front

LUT: Lookup table

9,000 simulations 22,500 simulations 45,000 simulations


17

Case Study I: Comparison

• Gradient-based search (GS) results vs. GA results

• GS methods can get stuck in a local minimum

• GS methods reduce running time (CDM: 145 simulations)

* Required RMSmax for gradient-based search are Dmax {0.12, 0.1, 0.08}

500th Generation (45000 simulations)50th Generation (4500 simulations)

20 40 60 80 10010

-2

10-1

100

Area (LUTs)

Err

or

(RM

S)

non-dom (90/90)

DM solutions

CDM solutions

CM solutions

20 40 60 80 10010

-2

10-1

100

Area (LUTs)

Err

or

(RM

S)

non-dom (35/90)

dom (55/90)

DM solutions

CDM solutions

CM solutions


18

Case Study II: Communication System

• Simple binary phase shift keying (BPSK) system Complexity measure: Area model of field-programmable

gate array (FPGA) [Constantinides, Cheung, and Luk 2003]

Distortion measure: Bit error rate (BER) Four fixed-point variables (indicated by slashes)

Integration & Dump


Decision

AWGN

Source Data(1 or -1)

Carrier

BER

19

Case Study II: Gradient-Based Search

• CDM could lead to lower complexity and lower number of simulations compared to DM and CM

Search

Method

Gradient

Measure

Number

of System

Simulations

Complexity Estimate

(LUT)

Distortion

(BER)*

Gradient

Gradient

Gradient

Complete

DM

CDM

CM

-

66

65

193

65536

40.65

43.65

41.95

-

0.083

0.085

0.081

-

* Maximum distortion measured by bit error rate (BER) error is 0.1


20

Case Study II: Genetic Algorithm

• Search Pareto optimal set • Handles multiple objectives

50th Generation 100th Generation 200th Generation

* Population for one generation: 90

Pareto Front

LUT: Lookup table

4,500 simulations 9,000 simulations 18,000 simulations


BER LUT

DM 0.83 40.65

CDM 0.85 43.95

CM 0.81 41.95E

rror

(B

it E

rror

Rat

e)

Err

or (

Bit

Err

or R

ate)

Err

or (

Bit

Err

or R

ate)

For

Com

pari

son

Preliminary results

21

Comparison of Proposed Methods

Gradient-based

search

Genetic

algorithm

Type of Solution One point Family of points

Tradeoff Curve Found No Yes

Execution Time Short Long

Amount of Computation Low High

Parallelism Low High


22

Outline

• Introduction

• Background




• Conclusion

23

Lower Power Consumption in DSP

• Minimize power dissipation due to limited battery power and cooling system

• Multipliers often a major source of dynamic power consumption in typical DSP applications Multi-precision multiplier select smaller multipliers (8,

16 or 24 bits) to reduce power consumption Wordlength reduction to select any word size

[Han, Evans & Swartzlander 2004]

• In general, what reductions in power are possible in software when hardware has fixed wordlengths?

Reduce Power Consumption in Arithmetic

Next

24

Wordlength Reduction in Multiplication

• Input data wordlength reduction Smaller bits enough to represent,

e.g. π x π ≈ 9

• Truncation

• Signed right shift Move toward the least

significant bit (LSB)

Signed bit extended for arithmetic right shift

0001 0010 0011 01001101 1100 1010 1001

(a) Original Multiplication

0001 0010 0000 00001101 1100 0000 0000

(b) Reduction by Truncation

0000 0000 0001 00101111 1111 1101 1100

(c) Reduction by Signed Right Shift

Sign bit


25

• Power consumption Switching power consumption Static power consumption

• Switching power consumption Switching activity parameter, α Reduce α by wordlength

reduction

clkddLswitching fVCP 2

Relationship between reduced wordlength and switching parameter α in power consumption?

CL Load capacitance

Vdd Operating voltage

fclk Operating frequency

Power Reduction via Wordlength Reduction


26

Analytical Method

Input Switching expectation

Full length L/2

Truncate N bits M/2

N-bit signed right shift

L/2Wordlength (L) = 16

Reduction

No ReductionS … …

L bits

M bits N bits

S … …

S S … SS …


27

Dynamic Power Consumption for Wallace Multiplier (1 MHz)

Reduction(56%)

16-bit x 16-bit multiplier (Simulated on Xilinx XC3S200-5FT256 FPGA)

Truncation- FirstTruncation- Second

Truncate 1st argTruncate 2nd arg(recode,nonrecode)

Wallace multiplier used in TI 320C64 DSP


28

Dynamic Power Consumption for Radix-4 Modified Booth Multiplier (1 MHz)

Reduction(31%)

Sensitive(13%)

16-bit x 16-bit multiplier (Simulated on Xilinx XC3S200-5FT256 FPGA)

Swapping could have benefitRadix-4 modified Booth multiplier used in TI 320C62 DSP

Truncate 1st argTruncate 2nd arg(recode,nonrecode)


29

Comparison of Proposed Methods

• Truncation to 8 bits reduces est. power consumption by 56% in Wallace and 31% in Booth 16-bit multipliers

• Signed right shift has no est. power reduction in Wallace multiplier (for any shift) and 25% reduction in Booth (for 8-bit shift) multiplier

• Operand swapping reduces power consumption for Booth but has negligible savings for Wallace multiplier

• Power consumption in tree-based multiplier Highly dependent on input data Simulation matches analysis


30

Outline

• Introduction

• Background




• Conclusion

31

Automating Transformations from Floating Point to Fixed Point

• Existing fixed-point tools Support fixed-point simulation Convert floating-point code to

raw fixed-point code Manually find optimum

wordlength by trial and error

• Automating transformations Fully automate conversion and wordlength optimization

Floating-PointProgram

Wordlength-OptimizedFixed-Point Program

CodeConversion


• SNU gFix, Autoscaler• CoWare SPW HDS• Synopsys CoCentric• MATLAB Fixed-point toolbox• MATLAB Fixed-point blockset• AccelChip DSP synthesis• Catalytic RMS, MCS

Fixed-point tools

Automatic Transformations of Systems

32

Automatic Transformation Flow

• Code generation Parse floating-point program Generate raw fixed-point program and auxiliary

programs

• Range estimation Estimate range to avoid overflow (Analytical/Simulation) Determine integer wordlength (IWL)

• Wordlength optimization Optimize wordlength according to given input, and error

specification (Analytical/Simulation) Determine fractional wordlength (FWL)

Code Generation


RangeEstimation


33

Automating Transformation Environment for Wordlength Optimization

Top Program

Search Engine

EvaluationProgram

(Objectives)

Fixed-PointProgram

Floating-PointProgram

Error Estimation

Complexity Estimation

RangeEstimation

• Given floating-point program and options, auxiliary programs are automatically generated• Given input data, optimum wordlength is searched

Input Data

Gradient-based or Genetic algorithm

Optimum Wordlength


34

Demo of Released Software


35

Conclusion

• Search for optimum wordlength Gradient-based search reduces execution time while

solutions could be trapped in local optimum Genetic algorithm can find distortion vs. complexity

tradeoff curve, but it requires longer execution time

• Reduce power consumption by wordlength reduction of multiplicands

• Automate transformations from floating-point programs to fixed-point programs

• Freely distributable software release available at

Conclusion

http://www.ece.utexas.edu/~bevans/projects/wordlength/converter/

36

Future Work

• Advanced wordlength search algorithms Hybrid wordlength optimization Prune redundant wordlength variables (e.g. delay, adder) Adaptive step size for gradient-based search methods

• Further analysis on search algorithms Analysis of genetic algorithms with different settings Comparison with simulated annealing

• Low power consumption System level including memory [Powell and Chau, 1991]

Wordlength reduction for floating-point multipliers

Conclusion

37

Future Work (continued)

• Electronic design automation software Enhanced code generator (e.g. rounding preferences) Hybrid analytical/simulation range estimation

• Optimum DSP algorithms Rearranging subsystems at block diagram Rearranging mathematical expressions in algorithm

• Developing more sophisticated hardware area models Avoids having to route each design through synthesis tools Transcendental functions

Conclusion

38

End

39

Backup Slides

40

Publications-I

• Conference Papers 1. K. Han, A. G. Olson, and B. L. Evans, `Àutomatic floating-point to fixed-point

transformations'', Proc. IEEE Asilomar Conf. on Signals, Systems, and Computers, Nov. 2006, Pacific Grove, CA USA. invited paper.

2. K. Han, B. L. Evans, and E. E. Swartzlander, Jr., ``Low-Power Multipliers with Data Wordlength Reduction'', Proc. IEEE Asilomar Conf. on Signals, Systems, and Computers, Oct. 30-Nov. 2, 2005, pp. 1615-1619, Pacific Grove, CA USA.

3. K. Han, B. L. Evans, and E.E. Swartzlander, Jr., ``Data Wordlength Reduction for Low-Power Signal Processing Software,'' Proc. IEEE Work. on Signal Processing Systems, Oct. 13-15, 2004, pp. 343-348, Austin, TX USA.

4. K. Han and B. L. Evans, ``Wordlength Optimization with Complexity-And-Distortion Measure and Its Applications to Broadband Wireless Demodulator Design,'' Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Proc., May 17-21, 2004, vol. 5, pp. 37-40, Montreal, Canada.

5. K. Han, I. Eo, K. Kim, and H. Cho, ``Numerical Word-Length Optimization for CDMA Demodulator,'' Proc. IEEE Int. Sym. on Circuits and Systems, May, 2001, vol. 4, pp. 290-293, Sydney, Australia.

6. K. Han, I. Eo, K. Kim, and H. Cho, ``Bit Constraint Parameter Decision Method for CDMA Digital Demodulator,'' Proc. CDMA Int. Conf. & Exhibition, Nov. 2000, vol. 2, pp. 583-586, Seoul, Korea.

7. S. Nahm, K. Han, and W. Sung, `À CORDIC-based Digital Quadrature Mixer: Comparison with ROM-based Architecture,'' Proc. IEEE Int. Sym. on Circuits and Systems, Jun. 1998, vol. 4, pp. 385-388, Monterey, CA USA.

Publications

41

Publications-II

• Journal Articles K. Han and B. L. Evans, `Òptimum Wordlength Search Using A Complexity-And-

Distortion Measure,'' EURASIP Journal on Applied Signal Processing, special issue on Design Methods for DSP Systems, vol. 2006, no. 5, pp. 103-116, 2006.

• Other Publications 1. K. Han, E. Soo, H. Jugn, and K. Kim, Apparatus and Method for Short-Delay Multipath

Searcher in Spread Spectrm Systems, U.S. Patent pending, Nov. 2001.

2. K. Han, I. Lim, E. Soo, H. Seo, K. Kim, H. Jung, and H. Cho, Apparatus and Method for Separating Carrier of Multicarrier Wireless Communication Receiver System, U.S. Patent pending, Sep. 2001.

3. K. Han, ``Carrier Synchronization Scheme Using Input Signal Interpolation for Digital Receivers,'' Master's Thesis, Seoul National University, Seoul, Korea, Feb. 1998.

Publications

42

Research on Transformation

Backup

43

Simulation Flow

Generate Optimized fixed-point program

Search wordlength set

Setup desired specification

Gradient-based search algorithm

Genetic search algorithm

Pick one of sets

Search wordlength sets

Generate Pareto Front

Backup

44

Algorithm Design and Implementation

Floating-Point Programs

Uniform WordlengthFixed-Point

Programs

Optimized Fixed-Point

Programs

Code Conversion


Floating-PointProcessor

Fixed-PointProcessor

Fixed-PointIC

HighLow

Algorithm Design Algorithm Implementation

Des

ign

Tim

e

High Low

Har

dwar

e C

ompl

exit

y

Pow

er C

onsu

mpt

ion

Backup

45

Wordlength Optimization Constraints

• Distortion constraint • Complexity constraint


Application-specific distortion d(w)

Dmax


Application-specific distortion d(w)

Cmax

Backup

46

Gradient-Based Search

• Gradient information can be used for update direction• Gradient information is measured in design

parameters such as implementation complexity, precision distortion, or power consumption Complexity measurement (CM) [Sung and Kum, 1995]

Distortion measurement (DM) [Han et al., 2001]

Complexity-and-distortion measurement (CDM) [Han and Evans, 2004] (proposed)

Backup

47

Gradient Information

)1()(

)()1()(2

)(1

)( }),...,,...,,({)(

hn

hn

hN

hn

hhh

ww

wwwwff w

5w1

w2

2 3

2

3

10

20

15

10

23

25

8

10

4

ab

Search direction

Gradient

Objective value

b

N number of variable

h iteration index

n variable index

w wordlength vector

f(w) objective function

Backup

48

Gradient-Based Search Direction

• Wordlength update (s: step size)

• Direction

where

jjj sww

1

Nj

j

j

j

w

dmif

w

dmif

w

dmif

)1,...,0,0,0(

..............................

)0,...,0,1,0(

)0,...,0,0,1(

2

1

),....,,max(21 N

j w

d

w

d

w

dm

Finite Difference

Backup

49

Complexity and Distortion Function

• Complexity function, c(w) Number of multiplications is counted Hardware complexity is estimated by assuming that

complexity linearly increases as wordlength increases Given hardware model results in accurate complexity

• Distortion function, d(w) Difficult to derive closed-form mathematical expression Estimated by computer simulation measuring output SNR

or bit error rate in digital communication systems

Backup

50

• Uses complexity sensitivity information as direction to search for optimum wordlength

• Advantage: minimizes complexity• Disadvantage: demands large number of iterations

Complexity Measure [Sung and Kum, 1995]

)()( ww cf

)(wc

},)(|)({min max wwDdfnI

wwww

Update direction

Objective function

Optimization problem

Backup

51

• Applies the application performance information to search for the optimum wordlengths

• Advantage: Fewer number of iterations• Disadvantage: Not guaranteed to yield optimum

wordlength for complexity

Distortion Measure [Han et al., 2001]

)()( ww df

)(wd

},)(,)(|)({min maxmax wwCcDdfnI

wwwww

Update direction

Objective function

Optimization problem

Backup

52

Feasible Solution Search [Sung and Kum, 1995]

• Exhaustive search of all possible wordlengths

• Advantages Does not miss optimum points Simple algorithm

• Disadvantage Many trials (=experiments)

• Distance• Expected number of iterations

wb

wopt

w1

w2

dw1

dw2

21

24

22

23

5

5

Direction of full search:minimum wordlengths {2,2}

optimum wordlengths = {5,5}d = 6

trials = 24!

)1)(2)...(1()(

N

dddNddE N

FS

Ndwdwdwd ...21

Backup

53

Sequential Search [K. Han et al. 2001]

• Greedy search based on sensitivity information (gradient)

• Example Minimum wordlengths {2,2} Direction of sequential search Optimum wordlengths {5,5} 12 iterations

• Advantage: Fewer trials• Disadvantage: Could miss global optimum point

jjj sww

1

wopt

wb

w1

w2

dw1

dw2

5

5

Backup

54

Case Study: Receiver Design

Multicarrier Modulator

w0w

1

w2

w3

Transmitter

Wireless Channel

MulticarrierDemodulatorChannel

Equalizer

ChannelEstimator

Bit ErrorRate

Tester

Receiver

w0 Input wordlength of a multicarrier demodulator which performs a fast Fourier transform (FFT)

w1 Input wordlength of equalizer

w2 Input wordlength of channel estimator

w3 Output wordlength of channel estimator

EncoderData

Backup

55

Simulation Results

• CDM leads to lower complexity compared to DM• CDM reduces the number of trials compared to CM,

feasible solution [Sung and Kim 1995], and exhaustive search Fast searching

Search

Method

Gradient

Measure

αc Number

of

Trials

Simulations Wordlength for

Variables

Complexity Estimate

Distortion

(BER)*

Gradient

Gradient

Gradient

Feasible

Exhaustive

DM

CDM

CM

-

-

0

0.5

1

-

-

16

15

69

210

26364

64

60

69

210

26364

{10,9,4,10}

{7,10,4,6}

{7,7,4,6}

{7,7,4,6}

-

10781

7702

7699

7699

-

0.0009

0.0012

0.0015

0.0015

-

* Required BER ≤ 1.5 x 10-3

Backup

56

Simulation Environments

• Assumptions Internal wordlengths of blocks have

been decided Complexity increases linearly as

wordlength increases

• Required application performance Bit error rate of 1.5 x 10-3 (without

error correcting codes)

• Simulation tool LabVIEW 7.0

Input Weight

FFT 1024

Equalizer

(right)

1

Estimator 128

Equalizer (upper)

2

Complexity Vector

ComplexityC(w) = cT.w

Backup

57

FFT Cost

1024

256log2

256Cost 2FFT

• N Tap FFT cost

• 256 Tap FFT cost

NN

2FFT log2

Cost

Backup

58

Minimum Wordlengths

• Change one wordlength variable while keeping other variables at high precision {1,16,16,16},{2,16,16,16},... {16,1,16,16},{16,2,16,16},... … …{16,16,16,15},{16,16,16,16}

• Minimum wordlength vector is {5,4,4,4}

Backup

59

Number of Trials

• Start at {5,4,4,4} wordlength• Next wordlength vectors

for complexity measure(α = 1.0)

{5,4,4,4},

{5,5,4,4}, …

• Increase wordlength one-by-one until satisfying required application performance

Backup

60

Power Consumption

• Power consumption in CMOS circuits

• Significant power in CMOS circuits is dissipated when they are switching

• Power reduction in hardware part [Chandrakasan and Brodersen, 1995]

Scaling down, minimizing area Adjusting voltage and frequency during operation

• Power reduction in software part [Tiwari, Malik and Wolfe, 1994] [Lee et al., 1997]

Instruction ordering and packing Energy reductions varying from 26% to 73%

Low-Power Signal Processing

leakagecircuitshortswitchingavg PPPP

fVCPswitching2

Frequency:

voltageSupply:

eCapacitanc:

factorTransition:

f

V

C

α

61

Wordlength for Low-Power Consumption

• Power model of wordlength [Choi and Burleson, 1994]

Wordlength is considered as capacitance Power consumption is proportional to wordlength Switching activity is not considered

• Data wordlength reduction technique [Han, Evans, and Swartzlander, 2004] (proposed) Count node transitions for switching activity Reduce input data wordlength to decrease power

consumption

Low-Power Signal Processing

62

Dynamic and Static Power

Backup

Trends in dynamic and static power dissipation showing increasing contribution of static power

[S. Thompson, P. Packan, and M. Bohr. MOS Scaling: Transistor Challenges for the 21st Century. Intel Technology Journal, Q3 1998]

63

Power Dissipation of Multiplier Unit

• Multiply unit is usually a major source of power consumption in typical DSP applications Multiply unit required

for digital communication& digital signal processingalgorithms

Digital filters, equalizers, FFT/IFFT, digital down/upconverter, etc.

TMS320C5x Power Dissipation Characteristics from www.ti.com

Backup

64

Wallace vs. Booth Multipliers

Tree dot diagram in 4-bit Wallace multiplier

Radix-4 multiplier based on Booth’s recoding (Χ ● a = P)

Asymmetric (one operand

recoded)

Symmetric

Backup

65

Radix-4 Modified Booth Multiplier

• One multiplicand is recoded

• The a and x are multiplicands

• P is product of multiplication

• Three bits in X are recoded to z

Backup

66

Switching Activity in Multipliers

• Logic delay and propagation cause glitches• Proposed analytical method

Hard to estimate glitches in closed form Analyze switching activity w/r to input data wordlength Does not consider multiplier architecture

• Simulation method Count all switching activities

(transition counts in logic) Power estimation (Xilinx XPower) Considers multiplier architecture

Backup

67

Analytical Method

• Stream of data for one multiplicand• Compare two adjacent numbers

in stream after reduction• Expectation of bit

switching, x, withprobability Px

L-bit input data Truncate input data

to M bits (remove N bits) N-bit signed right shift in

L-bit input (Y is sign bit)

2)(

LXEL

22)(

MNLXEtr

2)1|(

2

1)0|(

2

1)(

LYXEYXEXErs

L

xX xPxXE

0

)()(

S … …

L bits

M bits N bits

S … …

S S … SS …


68

Analytical Method

)1|(2

1)0|(

2

1)( YXEYXEXErsX has binomial

distribution

Always L/2 (independent on M and N)

Backup

69

Power Reduction in TI DSP

• TI TMS320VC5416 DSP STARTER KIT Radix-4 modified Booth multiplier Measure average current for wordlength reduction of

multiplicands

loop:STM data_a, AR2;STM data_b, AR3;MPY *AR2+, *AR3+,aMPY *AR2+, *AR3+,a….….MPY *AR2+, *AR3+,aB loop

Assembly program (data_a and data_b has random data with

wordlength w)

0 2 4 6 8 10 12 14 16574

575

576

577

578

579

580

581

Wordlength (w)

Cur

rent

[m

A]

(w,w)

(16,w)(w,16)

(wrsh,w

rsh)

Backup

70

Code Generation for Fixed-Point Program

• Adder function in MATLABFunction [c] = adder(a, b)c = 0;c = a + b;

Function [c] = adder_fx(a, b, numtype)c = 0;a = fi (a, numtype.a);b = fi (b, numtype.b);c = fi (c, numtype.c);c(:) = a + b;

(a) Floating point program for adder

(b) Raw fixed-point program

Function [c] = adder_fx(a, b)c = 0;a = fi (a, 1,32,16);b = fi (b, 1,32,16);c = fi (c, 1,32,16);c(:) = a + b;

(c) Converted fixed-point program for automating optimization

SWL

FWL

fi(a, S,WL,FWL) is a constructor function for a fixed-point object in fixed-point toolbox [S: Signed, WL: Wordlength, FWL: Fraction length]

Determined by designers

with trial and error

Backup

71

Code Generation

<Run Code Generation>

<Floating-point Program>

Backup

72

Running Transformation

• Just call top function with input data

• Range and optimum wordlengths depend on input statistic

> in = rand(1,1000)> mac_top(in)

Backup

73

Advantages/disadvantages of wordlength search algorithms

Backup

Date post:	22-Dec-2015
Category:	Documents
View:	220 times
Download:	0 times

Automating Transformations from Floating Point to Fixed Point for Implementing Digital Signal...

Documents