Date post: | 22-Dec-2015 |
Category: |
Documents |
View: | 220 times |
Download: | 0 times |
Automating Transformations fromFloating Point to Fixed Point for Implementing
Digital Signal Processing Algorithms
Prof. Brian L. Evans
Embedded Signal Processing LaboratoryDept. of Electrical and Computer Engineering
The University of Texas at Austin
July 4, 2006
Based on work by PhD student Kyungtae Han (now at Intel Research Labs)
2
Outline
• Introduction
• Background
• Optimize fixed-point wordlengths
• Reduce power consumption in arithmetic
• Automate transformations of systems
• Conclusion
3
Implementing Digital Signal Processing Algorithms
Introduction
CodeConversion
WordlengthOptimization
Floating-Point Program
Fixed Point (Uniform Wordlength)
Fixed Point (Optimized Wordlength)
Floating-Point
Processor
Fixed-Point
Processor
Fixed-Point ASIC
Price Power*Hardware
Digital SignalProcessingAlgorithms
* Power consumption
HL
HL
HL
ASIC: Application Specific Integrated Circuit
4
Transformations to Fixed Point
• Advantages Lower hardware complexity Lower power consumption Faster speed in processing
• Disadvantages Introduces distortion due to
quantization error Search for optimum wordlengths
by trial & error is time-consuming
• Research goals Automate transformations to fixed point Control distortion vs. complexity tradeoffs
CodeConversion
WordlengthOptimization
Floating-Point Program
Fixed Point (Optimized Wordlength)
Tra
nsfo
rmat
ion
Introduction
5
Outline
• Introduction
• Background
• Optimize fixed-point wordlengths
• Reduce power consumption in arithmetic
• Automate transformations of systems
• Conclusion
6
Fixed-Point Data Format
• Integer wordlength (IWL) Number of bits assigned to integer representation Includes sign bit
• Fractional wordlength (FWL) Number of bits assigned to fraction
• Wordlength: WL = IWL + FWL
SystemC formatwww.systemc.org
S X X X X X
Wordlength
Integer wordlength
Fractional wordlength
(Binary point)
π = 3.14159…(10) [Floating Point]
3.140625(10) = 011.001001(2) [WL=9; IWL=3; FWL=6]
3.141479492(10) = 011.00100100001110(2) [WL=16; IWL=3; FWL=13]
Background
7
Feasible region
Distortion vs. Complexity Tradeoffs
• Different wordlengths have different application distortion and implementation complexity tradeoffs
Background
• Minimize implementation cost• Minimize application distortion
Implementationcomplexity c(w)
Applicationdistortion d(w)
Optimaltradeoff curve
c(w) Implementation cost function
Cmax Constant for maximum implementation cost
d(w) Application distortion function
Dmax Constant for maximum application distortion
Wordlength lower bounds
Wordlength upper boundsw
w
},,,,{ 1210 Nwwww wVector of wordlengths:
8
Wordlength Optimization
Background
www
w
w
wwΙw
max
max
)(
)(
tosubject
)](),([ min
Cc
Dd
dcn
www
w
w
wwΙw
max
max
)(
)(
tosubject
)()( min
Cc
Dd
daca dcn
• Multiple objective optimization
• Single objective optimization
Proposed work fixes integer wordlengthsand searches for fractional wordlengths
9
Genetic Algorithm
• Evolutionary algorithm Inspired by Holland
1975 Mimic processes of
plant and animal evolution
Find optimum of a complex function
New GenePool
FunctionEvaluation
Mutation Selection
MatingChild
Genes
Parental Genes
Genes w/Measure
[Greg Rohling, Ph.D Defense, Georgia Tech, 2004]
Background
10
Pareto Optimality
• Pareto optimality: “best that could be achieved without disadvantaging at least one group” [Schick, 1970]
• Pareto optimal set is set of nondominated solutions E is dominated by C as all objectives for C
are less than corresponding objectives for E Solutions A, B, C, D are nondominated (not
dominated by any solution)
• Pareto front is boundary (tradeoff curve) that connects Pareto optimal set solutions
Obj
ecti
ve
2
Objective 1
Pareto Front
: Nondominated : Dominated
F
E
GH
I
D
C
B
A
Background
11
Outline
• Introduction
• Background
• Optimize fixed-point wordlengths
• Reduce power consumption in arithmetic
• Automate transformations of systems
• Conclusion
12
Search for Optimum Wordlength
• Exhaustive search impractical for many variables• Gradient-based search (single objective)
Utilizes gradient information to determine next candidates Complexity measure (CM) [Sung & Kum, 1995]
Distortion measure (DM) [Han et al., 2001]
Complexity-and-distortion measure (CDM) [Han & Evans, 2004]
• Guided random search Genetic algorithm for single objective [Leban & Tasic, 2000]
Multiple objective genetic algorithm [Han, Olson & Evans, 2006]
Optimize Fixed-Point Wordlengths
Next
Next
13
Complexity-and-Distortion Measure
• Weighted combination of measures
• Single objective function• Gradient-based search
Initialization Iterative greedy search based
on complexity and distortiongradient information
)( )( )( www dcf dccd
www
w
w
wΙw
max
max
)(
)(
tosubject
)(min
Cc
Dd
fcdn
10,10,1 dcdcwhere
c(w) Complexity function
d(w) Distortion function
Dmax Constant for maximum distortion
Cmax Constant for maximum complexity
Optimize Fixed-Point Wordlengths
14
Case Study I: Filter Design
• Infinite impulse response (IIR) filter Complexity measure: Area model of field-programmable
gate array (FPGA) [Constantinides, Cheung & Luk 2003]
Distortion measure: Root mean square (RMS) error Seven fixed-point variables (indicated by slashes)
Delay
b0
b1-a1
x[n] y[n]
Optimize Fixed-Point Wordlengths
15
Case Study I: Gradient-Based Search
• CDM could lead to lower complexity and lower number of simulations compared to DM and CM
Search
Method
Gradient
Measure
Number
of System
Simulations
Complexity Estimate
(LUT)
Distortion
(RMS)*
Gradient
Gradient
Gradient
Complete
DM
CDM
CM
-
316
145
417
167 **
51.05
49.85
51.95
-
0.0981
0.0992
0.0986
-
* Maximum distortion measured by root mean square (RMS) error is 0.1** 167 = 268,435,456 (8.5 years, if 1 second per 1 simulation)
Optimize Fixed-Point Wordlengths
16
Case Study I: Genetic Algorithm
20 40 60 80 10010
-2
10-1
100
Area (LUTs)
Err
or
(RM
S)
non-dom (90/90)
20 40 60 80 10010
-2
10-1
100
Area (LUTs)
Err
or
(RM
S)
non-dom (67/90)
dom (23/90)
20 40 60 80 10010
-2
10-1
100
Area (LUTs)
Err
or
(RM
S)
non-dom (76/90)
dom (14/90)
100th Generation 250th Generation 500th Generation
• Search Pareto optimal set (nondominated) • Handles multiple objectives: Error and Area
* Population for one generation: 90
Pareto Front
LUT: Lookup table
9,000 simulations 22,500 simulations 45,000 simulations
Optimize Fixed-Point Wordlengths
17
Case Study I: Comparison
• Gradient-based search (GS) results vs. GA results
• GS methods can get stuck in a local minimum
• GS methods reduce running time (CDM: 145 simulations)
* Required RMSmax for gradient-based search are Dmax {0.12, 0.1, 0.08}
500th Generation (45000 simulations)50th Generation (4500 simulations)
20 40 60 80 10010
-2
10-1
100
Area (LUTs)
Err
or
(RM
S)
non-dom (90/90)
DM solutions
CDM solutions
CM solutions
20 40 60 80 10010
-2
10-1
100
Area (LUTs)
Err
or
(RM
S)
non-dom (35/90)
dom (55/90)
DM solutions
CDM solutions
CM solutions
Optimize Fixed-Point Wordlengths
18
Case Study II: Communication System
• Simple binary phase shift keying (BPSK) system Complexity measure: Area model of field-programmable
gate array (FPGA) [Constantinides, Cheung, and Luk 2003]
Distortion measure: Bit error rate (BER) Four fixed-point variables (indicated by slashes)
Integration & Dump
Optimize Fixed-Point Wordlengths
Decision
AWGN
Source Data(1 or -1)
Carrier
BER
19
Case Study II: Gradient-Based Search
• CDM could lead to lower complexity and lower number of simulations compared to DM and CM
Search
Method
Gradient
Measure
Number
of System
Simulations
Complexity Estimate
(LUT)
Distortion
(BER)*
Gradient
Gradient
Gradient
Complete
DM
CDM
CM
-
66
65
193
65536
40.65
43.65
41.95
-
0.083
0.085
0.081
-
* Maximum distortion measured by bit error rate (BER) error is 0.1
Optimize Fixed-Point Wordlengths
20
Case Study II: Genetic Algorithm
• Search Pareto optimal set • Handles multiple objectives
50th Generation 100th Generation 200th Generation
* Population for one generation: 90
Pareto Front
LUT: Lookup table
4,500 simulations 9,000 simulations 18,000 simulations
Optimize Fixed-Point Wordlengths
BER LUT
DM 0.83 40.65
CDM 0.85 43.95
CM 0.81 41.95E
rror
(B
it E
rror
Rat
e)
Err
or (
Bit
Err
or R
ate)
Err
or (
Bit
Err
or R
ate)
For
Com
pari
son
Preliminary results
21
Comparison of Proposed Methods
Gradient-based
search
Genetic
algorithm
Type of Solution One point Family of points
Tradeoff Curve Found No Yes
Execution Time Short Long
Amount of Computation Low High
Parallelism Low High
Optimize Fixed-Point Wordlengths
22
Outline
• Introduction
• Background
• Optimize fixed-point wordlengths
• Reduce power consumption in arithmetic
• Automate transformations of systems
• Conclusion
23
Lower Power Consumption in DSP
• Minimize power dissipation due to limited battery power and cooling system
• Multipliers often a major source of dynamic power consumption in typical DSP applications Multi-precision multiplier select smaller multipliers (8,
16 or 24 bits) to reduce power consumption Wordlength reduction to select any word size
[Han, Evans & Swartzlander 2004]
• In general, what reductions in power are possible in software when hardware has fixed wordlengths?
Reduce Power Consumption in Arithmetic
Next
24
Wordlength Reduction in Multiplication
• Input data wordlength reduction Smaller bits enough to represent,
e.g. π x π ≈ 9
• Truncation
• Signed right shift Move toward the least
significant bit (LSB)
Signed bit extended for arithmetic right shift
0001 0010 0011 01001101 1100 1010 1001
(a) Original Multiplication
0001 0010 0000 00001101 1100 0000 0000
(b) Reduction by Truncation
0000 0000 0001 00101111 1111 1101 1100
(c) Reduction by Signed Right Shift
Sign bit
Reduce Power Consumption in Arithmetic
25
• Power consumption Switching power consumption Static power consumption
• Switching power consumption Switching activity parameter, α Reduce α by wordlength
reduction
clkddLswitching fVCP 2
Relationship between reduced wordlength and switching parameter α in power consumption?
CL Load capacitance
Vdd Operating voltage
fclk Operating frequency
Power Reduction via Wordlength Reduction
Reduce Power Consumption in Arithmetic
26
Analytical Method
Input Switching expectation
Full length L/2
Truncate N bits M/2
N-bit signed right shift
L/2Wordlength (L) = 16
Reduction
No ReductionS … …
L bits
M bits N bits
S … …
S S … SS …
Reduce Power Consumption in Arithmetic
27
Dynamic Power Consumption for Wallace Multiplier (1 MHz)
Reduction(56%)
16-bit x 16-bit multiplier (Simulated on Xilinx XC3S200-5FT256 FPGA)
Truncation- FirstTruncation- Second
Truncate 1st argTruncate 2nd arg(recode,nonrecode)
Wallace multiplier used in TI 320C64 DSP
Reduce Power Consumption in Arithmetic
28
Dynamic Power Consumption for Radix-4 Modified Booth Multiplier (1 MHz)
Reduction(31%)
Sensitive(13%)
16-bit x 16-bit multiplier (Simulated on Xilinx XC3S200-5FT256 FPGA)
Swapping could have benefitRadix-4 modified Booth multiplier used in TI 320C62 DSP
Truncate 1st argTruncate 2nd arg(recode,nonrecode)
Reduce Power Consumption in Arithmetic
29
Comparison of Proposed Methods
• Truncation to 8 bits reduces est. power consumption by 56% in Wallace and 31% in Booth 16-bit multipliers
• Signed right shift has no est. power reduction in Wallace multiplier (for any shift) and 25% reduction in Booth (for 8-bit shift) multiplier
• Operand swapping reduces power consumption for Booth but has negligible savings for Wallace multiplier
• Power consumption in tree-based multiplier Highly dependent on input data Simulation matches analysis
Reduce Power Consumption in Arithmetic
30
Outline
• Introduction
• Background
• Optimize fixed-point wordlengths
• Reduce power consumption in arithmetic
• Automate transformations of systems
• Conclusion
31
Automating Transformations from Floating Point to Fixed Point
• Existing fixed-point tools Support fixed-point simulation Convert floating-point code to
raw fixed-point code Manually find optimum
wordlength by trial and error
• Automating transformations Fully automate conversion and wordlength optimization
Floating-PointProgram
Wordlength-OptimizedFixed-Point Program
CodeConversion
WordlengthOptimization
• SNU gFix, Autoscaler• CoWare SPW HDS• Synopsys CoCentric• MATLAB Fixed-point toolbox• MATLAB Fixed-point blockset• AccelChip DSP synthesis• Catalytic RMS, MCS
Fixed-point tools
Automatic Transformations of Systems
32
Automatic Transformation Flow
• Code generation Parse floating-point program Generate raw fixed-point program and auxiliary
programs
• Range estimation Estimate range to avoid overflow (Analytical/Simulation) Determine integer wordlength (IWL)
• Wordlength optimization Optimize wordlength according to given input, and error
specification (Analytical/Simulation) Determine fractional wordlength (FWL)
Code Generation
Wordlength Optimization
RangeEstimation
Automatic Transformations of Systems
33
Automating Transformation Environment for Wordlength Optimization
Top Program
Search Engine
EvaluationProgram
(Objectives)
Fixed-PointProgram
Floating-PointProgram
Error Estimation
Complexity Estimation
RangeEstimation
• Given floating-point program and options, auxiliary programs are automatically generated• Given input data, optimum wordlength is searched
Input Data
Gradient-based or Genetic algorithm
Optimum Wordlength
Automatic Transformations of Systems
35
Conclusion
• Search for optimum wordlength Gradient-based search reduces execution time while
solutions could be trapped in local optimum Genetic algorithm can find distortion vs. complexity
tradeoff curve, but it requires longer execution time
• Reduce power consumption by wordlength reduction of multiplicands
• Automate transformations from floating-point programs to fixed-point programs
• Freely distributable software release available at
Conclusion
http://www.ece.utexas.edu/~bevans/projects/wordlength/converter/
36
Future Work
• Advanced wordlength search algorithms Hybrid wordlength optimization Prune redundant wordlength variables (e.g. delay, adder) Adaptive step size for gradient-based search methods
• Further analysis on search algorithms Analysis of genetic algorithms with different settings Comparison with simulated annealing
• Low power consumption System level including memory [Powell and Chau, 1991]
Wordlength reduction for floating-point multipliers
Conclusion
37
Future Work (continued)
• Electronic design automation software Enhanced code generator (e.g. rounding preferences) Hybrid analytical/simulation range estimation
• Optimum DSP algorithms Rearranging subsystems at block diagram Rearranging mathematical expressions in algorithm
• Developing more sophisticated hardware area models Avoids having to route each design through synthesis tools Transcendental functions
Conclusion
40
Publications-I
• Conference Papers 1. K. Han, A. G. Olson, and B. L. Evans, ``Automatic floating-point to fixed-point
transformations'', Proc. IEEE Asilomar Conf. on Signals, Systems, and Computers, Nov. 2006, Pacific Grove, CA USA. invited paper.
2. K. Han, B. L. Evans, and E. E. Swartzlander, Jr., ``Low-Power Multipliers with Data Wordlength Reduction'', Proc. IEEE Asilomar Conf. on Signals, Systems, and Computers, Oct. 30-Nov. 2, 2005, pp. 1615-1619, Pacific Grove, CA USA.
3. K. Han, B. L. Evans, and E.E. Swartzlander, Jr., ``Data Wordlength Reduction for Low-Power Signal Processing Software,'' Proc. IEEE Work. on Signal Processing Systems, Oct. 13-15, 2004, pp. 343-348, Austin, TX USA.
4. K. Han and B. L. Evans, ``Wordlength Optimization with Complexity-And-Distortion Measure and Its Applications to Broadband Wireless Demodulator Design,'' Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Proc., May 17-21, 2004, vol. 5, pp. 37-40, Montreal, Canada.
5. K. Han, I. Eo, K. Kim, and H. Cho, ``Numerical Word-Length Optimization for CDMA Demodulator,'' Proc. IEEE Int. Sym. on Circuits and Systems, May, 2001, vol. 4, pp. 290-293, Sydney, Australia.
6. K. Han, I. Eo, K. Kim, and H. Cho, ``Bit Constraint Parameter Decision Method for CDMA Digital Demodulator,'' Proc. CDMA Int. Conf. & Exhibition, Nov. 2000, vol. 2, pp. 583-586, Seoul, Korea.
7. S. Nahm, K. Han, and W. Sung, ``A CORDIC-based Digital Quadrature Mixer: Comparison with ROM-based Architecture,'' Proc. IEEE Int. Sym. on Circuits and Systems, Jun. 1998, vol. 4, pp. 385-388, Monterey, CA USA.
Publications
41
Publications-II
• Journal Articles K. Han and B. L. Evans, ``Optimum Wordlength Search Using A Complexity-And-
Distortion Measure,'' EURASIP Journal on Applied Signal Processing, special issue on Design Methods for DSP Systems, vol. 2006, no. 5, pp. 103-116, 2006.
• Other Publications 1. K. Han, E. Soo, H. Jugn, and K. Kim, Apparatus and Method for Short-Delay Multipath
Searcher in Spread Spectrm Systems, U.S. Patent pending, Nov. 2001.
2. K. Han, I. Lim, E. Soo, H. Seo, K. Kim, H. Jung, and H. Cho, Apparatus and Method for Separating Carrier of Multicarrier Wireless Communication Receiver System, U.S. Patent pending, Sep. 2001.
3. K. Han, ``Carrier Synchronization Scheme Using Input Signal Interpolation for Digital Receivers,'' Master's Thesis, Seoul National University, Seoul, Korea, Feb. 1998.
Publications
43
Simulation Flow
Generate Optimized fixed-point program
Search wordlength set
Setup desired specification
Gradient-based search algorithm
Genetic search algorithm
Pick one of sets
Search wordlength sets
Generate Pareto Front
Backup
44
Algorithm Design and Implementation
Floating-Point Programs
Uniform WordlengthFixed-Point
Programs
Optimized Fixed-Point
Programs
Code Conversion
Wordlength Optimization
Floating-PointProcessor
Fixed-PointProcessor
Fixed-PointIC
HighLow
Algorithm Design Algorithm Implementation
Des
ign
Tim
e
High Low
Har
dwar
e C
ompl
exit
y
Pow
er C
onsu
mpt
ion
Backup
45
Wordlength Optimization Constraints
• Distortion constraint • Complexity constraint
Implementationcomplexity c(w)
Application-specific distortion d(w)
Dmax
Implementationcomplexity c(w)
Application-specific distortion d(w)
Cmax
Backup
46
Gradient-Based Search
• Gradient information can be used for update direction• Gradient information is measured in design
parameters such as implementation complexity, precision distortion, or power consumption Complexity measurement (CM) [Sung and Kum, 1995]
Distortion measurement (DM) [Han et al., 2001]
Complexity-and-distortion measurement (CDM) [Han and Evans, 2004] (proposed)
Backup
47
Gradient Information
)1()(
)()1()(2
)(1
)( }),...,,...,,({)(
hn
hn
hN
hn
hhh
ww
wwwwff w
5w1
w2
2 3
2
3
10
20
15
10
23
25
8
10
4
ab
Search direction
Gradient
Objective value
b
N number of variable
h iteration index
n variable index
w wordlength vector
f(w) objective function
Backup
48
Gradient-Based Search Direction
• Wordlength update (s: step size)
• Direction
where
jjj sww
1
Nj
j
j
j
w
dmif
w
dmif
w
dmif
)1,...,0,0,0(
..............................
)0,...,0,1,0(
)0,...,0,0,1(
2
1
),....,,max(21 N
j w
d
w
d
w
dm
Finite Difference
Backup
49
Complexity and Distortion Function
• Complexity function, c(w) Number of multiplications is counted Hardware complexity is estimated by assuming that
complexity linearly increases as wordlength increases Given hardware model results in accurate complexity
• Distortion function, d(w) Difficult to derive closed-form mathematical expression Estimated by computer simulation measuring output SNR
or bit error rate in digital communication systems
Backup
50
• Uses complexity sensitivity information as direction to search for optimum wordlength
• Advantage: minimizes complexity• Disadvantage: demands large number of iterations
Complexity Measure [Sung and Kum, 1995]
)()( ww cf
)(wc
},)(|)({min max wwDdfnI
wwww
Update direction
Objective function
Optimization problem
Backup
51
• Applies the application performance information to search for the optimum wordlengths
• Advantage: Fewer number of iterations• Disadvantage: Not guaranteed to yield optimum
wordlength for complexity
Distortion Measure [Han et al., 2001]
)()( ww df
)(wd
},)(,)(|)({min maxmax wwCcDdfnI
wwwww
Update direction
Objective function
Optimization problem
Backup
52
Feasible Solution Search [Sung and Kum, 1995]
• Exhaustive search of all possible wordlengths
• Advantages Does not miss optimum points Simple algorithm
• Disadvantage Many trials (=experiments)
• Distance• Expected number of iterations
wb
wopt
w1
w2
dw1
dw2
21
24
22
23
5
5
Direction of full search:minimum wordlengths {2,2}
optimum wordlengths = {5,5}d = 6
trials = 24!
)1)(2)...(1()(
N
dddNddE N
FS
Ndwdwdwd ...21
Backup
53
Sequential Search [K. Han et al. 2001]
• Greedy search based on sensitivity information (gradient)
• Example Minimum wordlengths {2,2} Direction of sequential search Optimum wordlengths {5,5} 12 iterations
• Advantage: Fewer trials• Disadvantage: Could miss global optimum point
jjj sww
1
wopt
wb
w1
w2
dw1
dw2
5
5
Backup
54
Case Study: Receiver Design
Multicarrier Modulator
w0w
1
w2
w3
Transmitter
Wireless Channel
MulticarrierDemodulatorChannel
Equalizer
ChannelEstimator
Bit ErrorRate
Tester
Receiver
w0 Input wordlength of a multicarrier demodulator which performs a fast Fourier transform (FFT)
w1 Input wordlength of equalizer
w2 Input wordlength of channel estimator
w3 Output wordlength of channel estimator
EncoderData
Backup
55
Simulation Results
• CDM leads to lower complexity compared to DM• CDM reduces the number of trials compared to CM,
feasible solution [Sung and Kim 1995], and exhaustive search Fast searching
Search
Method
Gradient
Measure
αc Number
of
Trials
Simulations Wordlength for
Variables
Complexity Estimate
Distortion
(BER)*
Gradient
Gradient
Gradient
Feasible
Exhaustive
DM
CDM
CM
-
-
0
0.5
1
-
-
16
15
69
210
26364
64
60
69
210
26364
{10,9,4,10}
{7,10,4,6}
{7,7,4,6}
{7,7,4,6}
-
10781
7702
7699
7699
-
0.0009
0.0012
0.0015
0.0015
-
* Required BER ≤ 1.5 x 10-3
Backup
56
Simulation Environments
• Assumptions Internal wordlengths of blocks have
been decided Complexity increases linearly as
wordlength increases
• Required application performance Bit error rate of 1.5 x 10-3 (without
error correcting codes)
• Simulation tool LabVIEW 7.0
Input Weight
FFT 1024
Equalizer
(right)
1
Estimator 128
Equalizer (upper)
2
Complexity Vector
ComplexityC(w) = cT.w
Backup
58
Minimum Wordlengths
• Change one wordlength variable while keeping other variables at high precision {1,16,16,16},{2,16,16,16},... {16,1,16,16},{16,2,16,16},... … …{16,16,16,15},{16,16,16,16}
• Minimum wordlength vector is {5,4,4,4}
Backup
59
Number of Trials
• Start at {5,4,4,4} wordlength• Next wordlength vectors
for complexity measure(α = 1.0)
{5,4,4,4},
{5,5,4,4}, …
• Increase wordlength one-by-one until satisfying required application performance
Backup
60
Power Consumption
• Power consumption in CMOS circuits
• Significant power in CMOS circuits is dissipated when they are switching
• Power reduction in hardware part [Chandrakasan and Brodersen, 1995]
Scaling down, minimizing area Adjusting voltage and frequency during operation
• Power reduction in software part [Tiwari, Malik and Wolfe, 1994] [Lee et al., 1997]
Instruction ordering and packing Energy reductions varying from 26% to 73%
Low-Power Signal Processing
leakagecircuitshortswitchingavg PPPP
fVCPswitching2
Frequency:
voltageSupply:
eCapacitanc:
factorTransition:
f
V
C
α
61
Wordlength for Low-Power Consumption
• Power model of wordlength [Choi and Burleson, 1994]
Wordlength is considered as capacitance Power consumption is proportional to wordlength Switching activity is not considered
• Data wordlength reduction technique [Han, Evans, and Swartzlander, 2004] (proposed) Count node transitions for switching activity Reduce input data wordlength to decrease power
consumption
Low-Power Signal Processing
62
Dynamic and Static Power
Backup
Trends in dynamic and static power dissipation showing increasing contribution of static power
[S. Thompson, P. Packan, and M. Bohr. MOS Scaling: Transistor Challenges for the 21st Century. Intel Technology Journal, Q3 1998]
63
Power Dissipation of Multiplier Unit
• Multiply unit is usually a major source of power consumption in typical DSP applications Multiply unit required
for digital communication& digital signal processingalgorithms
Digital filters, equalizers, FFT/IFFT, digital down/upconverter, etc.
TMS320C5x Power Dissipation Characteristics from www.ti.com
Backup
64
Wallace vs. Booth Multipliers
Tree dot diagram in 4-bit Wallace multiplier
Radix-4 multiplier based on Booth’s recoding (Χ ● a = P)
Asymmetric (one operand
recoded)
Symmetric
Backup
65
Radix-4 Modified Booth Multiplier
• One multiplicand is recoded
• The a and x are multiplicands
• P is product of multiplication
• Three bits in X are recoded to z
Backup
66
Switching Activity in Multipliers
• Logic delay and propagation cause glitches• Proposed analytical method
Hard to estimate glitches in closed form Analyze switching activity w/r to input data wordlength Does not consider multiplier architecture
• Simulation method Count all switching activities
(transition counts in logic) Power estimation (Xilinx XPower) Considers multiplier architecture
Backup
67
Analytical Method
• Stream of data for one multiplicand• Compare two adjacent numbers
in stream after reduction• Expectation of bit
switching, x, withprobability Px
L-bit input data Truncate input data
to M bits (remove N bits) N-bit signed right shift in
L-bit input (Y is sign bit)
2)(
LXEL
22)(
MNLXEtr
2)1|(
2
1)0|(
2
1)(
LYXEYXEXErs
L
xX xPxXE
0
)()(
S … …
L bits
M bits N bits
S … …
S S … SS …
Reduce Power Consumption in Arithmetic
68
Analytical Method
)1|(2
1)0|(
2
1)( YXEYXEXErsX has binomial
distribution
Always L/2 (independent on M and N)
Backup
69
Power Reduction in TI DSP
• TI TMS320VC5416 DSP STARTER KIT Radix-4 modified Booth multiplier Measure average current for wordlength reduction of
multiplicands
loop:STM data_a, AR2;STM data_b, AR3;MPY *AR2+, *AR3+,aMPY *AR2+, *AR3+,a….….MPY *AR2+, *AR3+,aB loop
Assembly program (data_a and data_b has random data with
wordlength w)
0 2 4 6 8 10 12 14 16574
575
576
577
578
579
580
581
Wordlength (w)
Cur
rent
[m
A]
(w,w)
(16,w)(w,16)
(wrsh,w
rsh)
Backup
70
Code Generation for Fixed-Point Program
• Adder function in MATLABFunction [c] = adder(a, b)c = 0;c = a + b;
Function [c] = adder_fx(a, b, numtype)c = 0;a = fi (a, numtype.a);b = fi (b, numtype.b);c = fi (c, numtype.c);c(:) = a + b;
(a) Floating point program for adder
(b) Raw fixed-point program
Function [c] = adder_fx(a, b)c = 0;a = fi (a, 1,32,16);b = fi (b, 1,32,16);c = fi (c, 1,32,16);c(:) = a + b;
(c) Converted fixed-point program for automating optimization
SWL
FWL
fi(a, S,WL,FWL) is a constructor function for a fixed-point object in fixed-point toolbox [S: Signed, WL: Wordlength, FWL: Fraction length]
Determined by designers
with trial and error
Backup
72
Running Transformation
• Just call top function with input data
• Range and optimum wordlengths depend on input statistic
> in = rand(1,1000)> mac_top(in)
Backup