+ All Categories
Home > Documents > Ch4_pipelining and Parallel Processing

Ch4_pipelining and Parallel Processing

Date post: 03-Jun-2018
Category:
Upload: rasigan-ur
View: 213 times
Download: 0 times
Share this document with a friend

of 15

Transcript
  • 8/12/2019 Ch4_pipelining and Parallel Processing

    1/15

    VLSI DSP 2008 Y.T. Hwang 5-1

    Chapter 4Pipelining and Parallel

    Processing

    VLSI DSP 2008 Y.T. Hwang 5-2

    Introduction (1)

    Pipelining

    Reduction in critical path

    Increase the clock speed

    Reduce power consumption at same speedParallel processing

    Parallelism

    Increase effective sampling speed

    Reduction of power consumption

  • 8/12/2019 Ch4_pipelining and Parallel Processing

    2/15

    VLSI DSP 2008 Y.T. Hwang 5-3

    Introduction (2)

    A 3-tap FIR filter

    y(n)=ax(n)+bx(n-1)+cx(n-2)

    Critical path: 1 multiply and 2 add

    AM

    sample

    AMsample

    TTf

    TTT

    2

    1

    2

    VLSI DSP 2008 Y.T. Hwang 5-4

    Introduction (3)

    Pipelining or parallel processing to sampling

    frequency

    Critical path: 2 add

    Pipelining

    Parallel processing

  • 8/12/2019 Ch4_pipelining and Parallel Processing

    3/15

    VLSI DSP 2008 Y.T. Hwang 5-5

    Pipelining of FIR digital fil ters (1)

    Feed forward cut set Two iterations arecomputed

    concurrentlyCritical path

    reduced from

    TM+2TA to TM+TA

    Latency increased

    from 1 to 2

    VLSI DSP 2008 Y.T. Hwang 5-6

    Pipelining of FIR digital fil ters (2)

    Drawbacks of pipelining

    Increase in the number of latches and in system latency

    Observations

    The clock period is limited by the longest path between Two latches

    An input and a latch

    A latch and an output

    An input and an output

    Critical path can be reduced by suitably placing the

    pipelining latches

    Pipelining latches can be placed across any feed-forward cutset of the graph

  • 8/12/2019 Ch4_pipelining and Parallel Processing

    4/15

    VLSI DSP 2008 Y.T. Hwang 5-7

    Pipelining of FIR digital fil ters (3)

    Cut set

    A set of edges of a graph such that if these edges are

    removed from the graph, the graph becomes disjointFeed-forward cut set

    The data move in the forward direction on all the edges

    of the cut set

    We can arbitrarily place latches on a feed-forward cut

    set w/o affecting the functionality of the algorithm

    VLSI DSP 2008 Y.T. Hwang 5-8

    Pipelining of FIR digital fil ters (4)

    Example 3.2.1

    Incorrect pipelining correct pipelining

    Original critical path: A3

    A5 A4 A6

    After pipelining: A3 A5

    or A4 A6

    Critical path is reduced by

    one half

  • 8/12/2019 Ch4_pipelining and Parallel Processing

    5/15

    VLSI DSP 2008 Y.T. Hwang 5-9

    Direct v.s. transpose form

    Direct form with long critical path

    Transpose form with data broadcast structure

    Critical path is reduced to TM + TA

    VLSI DSP 2008 Y.T. Hwang 5-10

    Fine-Grain pipelining

    Pipelining the function unit

    Assume TM = 10 units, TA = 2 units

    After pipelining, the critical path is 6 units

  • 8/12/2019 Ch4_pipelining and Parallel Processing

    6/15

    VLSI DSP 2008 Y.T. Hwang 5-11

    Parallel processing of FIR filter (1)

    Block processing of size L

    y(n)=ax(n)+bx(n-1)+cx(n-2)

    y(3k)=ax(3k)+bx(3k-1)+cx(3k-2)y(3k+1)=ax(3k+1)+bx(3k)+cx(3k-1)

    y(3k+2)=ax(3k+2)+bx(3k+1)+cx(3k)

    Block delay (L-slow): placing a latch at any line of MIMO

    structures produces an effective delay of L clocks at the

    sample rate

    VLSI DSP 2008 Y.T. Hwang 5-12

    Parallel processing of FIR filter (2)

    Block size 3

    3 times hardware

    Critical path remains

    unchanged TM+2TATclk TM+2TA

    3 samples are

    produced in 1 clock

    cycle

    effective iteration

    period is

    Note: Tclk Tsample

    )2(311 AMclksampleiter TTT

    LTT

  • 8/12/2019 Ch4_pipelining and Parallel Processing

    7/15

    VLSI DSP 2008 Y.T. Hwang 5-13

    Parallel processing of FIR filter (3)

    MIMO system

    Complete parallel processing

    System with block size 4

    A serial-to-parallel

    converter

    A parallel-to-serial converter

    VLSI DSP 2008 Y.T. Hwang 5-14

    Pipelining v.s. parallel processing

    Limitation of pipelining processing

    Input/output bottleneck, i.e. communication bounded

    system

    Pipelining period cannot be smaller than thecommunication or I/O bound

  • 8/12/2019 Ch4_pipelining and Parallel Processing

    8/15

    VLSI DSP 2008 Y.T. Hwang 5-15

    pipelining & parallel processing

    Combined fine grain

    pipelining and

    parallel processingfor 3-tap FIR filter

    L = 3, M = 2

    6

    14)2(

    6

    1

    1

    AM

    clksampleiter

    TT

    TLM

    TT

    VLSI DSP 2008 Y.T. Hwang 5-16

    Pipelining & parallel processing for low power

    Advantages of pipelining and parallel processing

    High speed

    Low power

    CMOS circuit model1st order analysis

    Propagation delay

    Power consumption fVCP

    VVk

    VCT

    total

    t

    echpd

    20

    20

    0arg

    )(

  • 8/12/2019 Ch4_pipelining and Parallel Processing

    9/15

    VLSI DSP 2008 Y.T. Hwang 5-17

    Pipelining for low power (1)

    Sequential version

    M-level pipelined versionWorking at the same frequency, i.e.f = 1/Tseq remains

    unchanged

    Capacitance in each pipeline stage is reduced to

    Ccharge/M

    OnlyV0 (< 1) is needed to charge Ccharge/M inTseq

    seqtotalseq TffVCP /1,2

    0

    seqtotalpip PfVCP

    22

    0

    2

    VLSI DSP 2008 Y.T. Hwang 5-18

    Pipelining for low power (2)

    Calculation of

    20

    20

    20

    0arg

    20

    0arg

    )()(

    let

    )(

    )(

    tt

    pipseq

    t

    ech

    pip

    t

    ech

    seq

    VVVVM

    TT

    VVk

    VM

    C

    T

    VVk

    VCT

  • 8/12/2019 Ch4_pipelining and Parallel Processing

    10/15

    VLSI DSP 2008 Y.T. Hwang 5-19

    Pipelining for low power (3)

    Example

    3-tap FIR filter

    Tm = 10, Ta = 2, Cm = 5Ca

    Pipelined multiplier, Tm1 = 6, Tm2 = 4, Cm1 = 3Ca , Cm2 = 2Ca

    V0 = 5V, Vt= 0.6V

    Supply voltage calculation

    Ccharge = Cm + Ca = 6Ca

    Pipelined: Ccharge = Cm1 =Cm2 + Ca = 3Ca

    50 2 - 31.36+ 0.72 = 0= 0.6033Vpip = V0 = 3.0165V

    Power consumption ratio = 2 = 36.4%

    VLSI DSP 2008 Y.T. Hwang 5-20

    Parallel processing for low power (1)

    L-parallel version

    Working at the one Lth frequency, i.e.f = 1/(LTseq)

    Total Capacitance is increased toLCcharge

    Since each Ccharge is charged inLTseq, OnlyV0 (< 1) isneeded to charge

  • 8/12/2019 Ch4_pipelining and Parallel Processing

    11/15

    VLSI DSP 2008 Y.T. Hwang 5-21

    Parallel processing for low power (2)

    Calculation of

    seqech

    echpar

    tt

    t

    echseq

    t

    echseq

    PfVC

    L

    fVLCP

    VVVVL

    VVk

    VC

    LTVVk

    VC

    T

    220arg

    2

    20arg

    20

    20

    20

    0arg

    20

    0arg

    ))((

    )()(

    )(,)(

    VLSI DSP 2008 Y.T. Hwang 5-22

    Parallel processing for low power (3)

    Example of 2-parallel version

    4-tap FIR filter

    Tm = 8, Ta = 1, Cm = 8Ca

    Tseq = 9V0 = 3.3V, Vt= 0.45V

  • 8/12/2019 Ch4_pipelining and Parallel Processing

    12/15

    VLSI DSP 2008 Y.T. Hwang 5-23

    Parallel processing for low power (4)

    2-parallel FIR filter design

    Note each delay is 2-slow

    x(2k-1)

    x(2k-2)

    VLSI DSP 2008 Y.T. Hwang 5-24

    Parallel processing for low power (5)

    Supply voltage calculation

    Ccharge = Cm + Ca = 9Ca

    2-parallel: Ccharge = Cm + 2Ca = 10Ca

    Vpar= V0 = 2.17437V

    Power consumption ratio = 2 = 43.41%

    )(0282.0or6589.0

    08225.13425.6701.98

    )(9)(5

    22let

    )(

    10

    )(9

    2

    20

    20

    20

    0

    20

    0

    tt

    seqsamplepar

    t

    apar

    t

    aseq

    VVVV

    TTT

    VVk

    VCT

    VVk

    VCT

  • 8/12/2019 Ch4_pipelining and Parallel Processing

    13/15

    VLSI DSP 2008 Y.T. Hwang 5-25

    Parallel processing for low power (6)

    Area efficient 2-parallel version

    Multiplier: 86, adder: 67 Delay: 34

    VLSI DSP 2008 Y.T. Hwang 5-26

    Parallel processing for low power (7)

    Architecture verification

    )22()12()2()12(

    )12(

    )32()22()12()2(

    delay]block1after[)2(

    )12()12())12()22()(())12()2()((

    )22()2(

    )3()2()1()()(

    3210

    3210

    31

    3210

    20

    3210

    kxhkxhkxhkxh

    yyyky

    kxhkxhkxhkxh

    yyky

    kxhkxhykxkxhhkxkxhhy

    kxhkxhy

    nxhnxhnxhnxhny

    CAB

    CA

    C

    B

    A

  • 8/12/2019 Ch4_pipelining and Parallel Processing

    14/15

    VLSI DSP 2008 Y.T. Hwang 5-27

    Parallel processing for low power (8)

    Supply voltage calculation

    Ccharge = Cm + Ca = 9Ca

    2-parallel: Ccharge = Cm + 4Ca = 12Ca

    Vpar= V0 = 2.4585V

    )(025.0or745.0

    06075.0155.2567.32

    )(

    12

    )(

    92

    22let

    )(

    12

    )(

    9

    2

    20

    02

    0

    0

    20

    0

    20

    0

    t

    a

    t

    a

    seqsamplepar

    t

    apa r

    t

    aseq

    VVk

    VC

    VVk

    VC

    TTT

    VVk

    VCT

    VVk

    VCT

    VLSI DSP 2008 Y.T. Hwang 5-28

    Parallel processing for low power (9)

    Power consumption ratio

    %6.4335

    555.0

    2

    155,35

    2

    1

    2

    1

    ,5576

    ,3534

    2

    20

    220

    2)()(

    20

    )()(

    seq

    par

    saparsaseq

    sseqpar

    parparpartotalparaam

    partotal

    seqseq

    totalseqaamseq

    total

    P

    Pratio

    fVCPfVCP

    fff

    fVCPCCCC

    fVCPCCCC

  • 8/12/2019 Ch4_pipelining and Parallel Processing

    15/15

    VLSI DSP 2008 Y.T. Hwang 5-29

    Combining pipelining and parallel processing

    Pipelining

    Reduces the capacitance to be charged/discharged in 1

    clock periodParallel processing

    Increases the clock period for charging/discharging the

    original capacitance

    3-parallel

    2-stage pipelining

    VLSI DSP 2008 YT Hwang 5 30

    pipelining + parallel processing

    Propagation delay of the parallel pipelined filter

    Solution of

    20

    0charge

    20

    0charge

    )()(

    )/(

    tt

    pdVVk

    VLC

    VVk

    VMCLT

    20

    20 )()( tt VVVVML


Recommended