Date post: | 20-Jul-2016 |
Category: |
Documents |
Upload: | raghunandan-komandur |
View: | 30 times |
Download: | 1 times |
1
Pipelining and Parallel Processing
Shao-Yi Chien
Shao-Yi Chien 2DSP in VLSI Design
Introduction
Pipelining and parallel are the most important design techniques in VLSI DSP systemsMake use of the inherent parallel property of DSP algorithms
Pipelining: different function units working in parallelParallel: duplicated function units working in parallel
Shao-Yi Chien 3DSP in VLSI Design
An Example:Car Production Line
Throughput: how many cars produced in one hourLatency: how long will it take to produce one car
mTCost: 1Throughput: 1/mTLatency: mT
T T T
mTmT
mT
mCost: >1Throughput: 1/TLatency: >mT
Cost: >nThroughput: (1/mT)*nLatency: >mT
n
Shao-Yi Chien 4DSP in VLSI Design
Pipelining of Digital Filters (1/3)FIR filter
Critical path: TM+2TA
Shao-Yi Chien 5DSP in VLSI Design
Pipelining of Digital Filters (2/3)
Pipelined FIR filter
AMsample TTT +≥AM
sample TTf
+≤
1
Shao-Yi Chien 6DSP in VLSI Design
Pipelining of Digital Filters (3/3)
Schedule
Shao-Yi Chien 7DSP in VLSI Design
Pipeline
Can reduce the critical path to increase the working frequency and sample rate
TM+2TA TM+TA
Shao-Yi Chien 8DSP in VLSI Design
Drawbacks of Pipelining
Increasing latency (in cycle)For M-level pipelined system, the number of delay elements in any path from input to output is (M-1) greater than the origin one
Increase the number of latches (registers)
Shao-Yi Chien 9DSP in VLSI Design
How to Do Pipelining?
Put pipelining latches across any feed-forward cutset of the graphCutset
A cutset is a set of edges of a graph such that if these edges are removed from the graph, the graph becomes disjoint
Feed-forward cutsetThe data move in the forward direction on all the edges of the cutset
Shao-Yi Chien 10DSP in VLSI Design
How to Do Pipelining?
Feed-forward cutset
Shao-Yi Chien 11DSP in VLSI Design
Example
In the SFG in Fig. (a), all the computation time for each node is 1 u.t.(a) Calculate the critical path(b) The critical path is reduced to 2 u.t. by inserting 3 extra delay elements. Is it a valid pipelining?
Shao-Yi Chien 12DSP in VLSI Design
Answer
(a) The critical path is A3 A5 A4 A6, 4 u.t.(b) No, it is not a valid pipelining. Fig. (c) is the corrected one
Shao-Yi Chien 13DSP in VLSI Design
Fine-Grain Pipelining
Critical path (TM=10, TA=2)TM+2TA=14TM+TA=12TM1=6 or TM2+TA=6
Shao-Yi Chien 14DSP in VLSI Design
Notes for Pipelining (1/2)
Pipelining is a very simple design technique which can maintain the input output data configuration and sampling frequencyTclk=TsampleSupported in many EDA toolsStill has some limitations
Pipeline bubblesHas some problems for recursive systemIntroduces large hardware cost for 2-D or 3-D dataCommunication bound
Shao-Yi Chien 15DSP in VLSI Design
Notes for Pipelining (2/2)
Effective pipeliningPut pipelining registers on the critical pathBalance pipelining
10 (2+8): critical path=810 (5+5): critical path=5
Shao-Yi Chien 16DSP in VLSI Design
Parallel of Digital Filters (1/5)Single-input single-output (SISO) system
Multiple-input multiple-output (MIMO) system
3-Parallel System!
Shao-Yi Chien 17DSP in VLSI Design
Parallel of Digital Filters (2/5)
Parallel processing, block processingBlock size (L): the number of data to be processed at the same timeBlock delay (L-slow)
A latch is equivalent to L clock cycles at the sample rate
Shao-Yi Chien 18DSP in VLSI Design
Parallel of Digital Filters (3/5)
The critical path is the same
Tclk is not equal to Tsample
L-slow
Shao-Yi Chien 19DSP in VLSI Design
Parallel of Digital Filters (4/5)Whole system
Shao-Yi Chien 20DSP in VLSI Design
Parallel of Digital Filters (5/5)
Parallel-to-serial converterSerial-to-parallel converter
Shao-Yi Chien 21DSP in VLSI Design
Parallel Processing v.s. Pipelining
Parallel processing is superior than pipelining processing for the I/O bottleneck (communication bounded)
Pipelining only can increase the clock rateSystem clock rate = sample rate for pipelining systemUse parallel processing can further lower the required working frequency
Shao-Yi Chien 22DSP in VLSI Design
Pipelining-Parallel Architecture
Shao-Yi Chien 23DSP in VLSI Design
Notes for Parallel Processing
The input/output data access scheme should be carefully designed, it will cost a lot sometimesTclk>Tsample, fclk<fsample
Large hardware costCombine with pipelining processing
Shao-Yi Chien 24DSP in VLSI Design
Low Power Issues
Pipelining and parallel processing are also beneficial for low power designPropagation delay
Power consumption
Assume the sampling frequency is the same
Shao-Yi Chien 25DSP in VLSI Design
Pipelining for Low Power (1/2)
For M-level pipeliningCritical path is reduced to 1/MThe capacitance is also reduced to Ccharge/MThe supply voltage can be reduced to , and the propagation delay remains unchanged
Shao-Yi Chien 26DSP in VLSI Design
Pipelining for Low Power (2/2)
Power consumption:
How about the parameter ?
Solve this equation to get
Shao-Yi Chien 27DSP in VLSI Design
ExampleParameters
TM=10 u.t.TA=2 u.t.Tm1=6 u.t.Tm2=4 u.t.CM=5CAVt=0.6VNormal Vcc=5V
(a) New supply voltage?(b) Power saving percentage?
Shao-Yi Chien 28DSP in VLSI Design
Answer(a)Origin system:Pipelined system:
(b)
Invalid value, less than threshold voltage
Shao-Yi Chien 29DSP in VLSI Design
Parallel Processing for Low Power (1/2)
For L-parallel systemClock period: Tseq LTseq
Ccharge remains unchangedCtotol LCtotal
Have more time to charge the capacitance, the supply voltage can be lower
Shao-Yi Chien 30DSP in VLSI Design
Parallel Processing for Low Power (2/2)
Power consumption
To derive the parameter
Shao-Yi Chien 31DSP in VLSI Design
Example
ParametersTM=8 u.t.TA=1 u.t.Tsample=9 u.t.CM=8CAVt=0.45VNormal Vcc=3.3V
(a) New supply voltage?(b) Power saving percentage?
Shao-Yi Chien 32DSP in VLSI Design
Answer(a)Origin:2-parallel system:
(b)
Invalid value, less than threshold voltage
Shao-Yi Chien 33DSP in VLSI Design
Pipelining-Parallel for Low Power
Shao-Yi Chien 34DSP in VLSI Design
Conclusion
PipeliningReduce the critical pathIncrease the clock speedReduce power consumption at the same speed
ParallelIncrease effective sampling rateReduce power consumption at the same speed