+ All Categories
Home > Documents > CS222: Pipeline Processor · 2017. 4. 12. · Pipeline Design • Single Cycle – Poor Resource...

CS222: Pipeline Processor · 2017. 4. 12. · Pipeline Design • Single Cycle – Poor Resource...

Date post: 01-Feb-2021
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
35
CS222: Pipeline Processor Design Dr. A. Sahu Dept of Comp. Sc. & Engg. Dept of Comp. Sc. & Engg. Indian Institute of Technology Guwahati 1
Transcript
  • CS222: Pipeline Processor Design

    Dr.  A. Sahu

    Dept of Comp. Sc. & Engg.Dept of Comp. Sc. & Engg.

    Indian Institute of Technology Guwahati

    1

  • Outline• Pipeline processor• Basic Structure of Pipeline• Hazards• Hazards 

    – Data Hazards (Data dependency)R H d (S d i– Resource Hazards (Same resource  used in two stage)

    C t l h d (B h i t ti )– Control hazards (Branch instruction)

    2

  • Problems with single cycle designProblems with single cycle design

    • Slowest instruction pulls down the clockSlowest instruction pulls down the clock frequency

    • Resource utilization is poor• Resource utilization is poor• There are some instructions which are i ibl b i l d i hiimpossible to be implemented in this manner– Think which are the instructions ?

  • 1. Clock period in single cycle design1. Clock period in single cycle design

    tt ttR l

    clockperiodtR

    tRtM

    tR

    tR

    tA

    tA

    tI

    tI

    R‐class

    lw

    period

    tMtR

    tR

    tA

    tA

    tI

    tI

    sw

    tR tAt+t

    tIt+tI

    beq

    t+tI

    t+jtI

    j

  • 1. Clock period in multi‐cycle design1. Clock period in multi cycle design

    clocktR

    tRtM

    tR

    tR

    tA

    tA

    tI

    tI

    R‐class

    lw

    clockperiod

    RM

    tM

    R

    tR

    t

    A

    tA

    t

    I

    tI

    t

    sw

    tR tAt+t

    tIt+t

    beq

    t+tI

    t+jtI

    j

  • Single Cycle DatapathSingle Cycle Datapath1

    0

    s2s2ins[25‐0]ja[31‐0]

    28

    0

    ++ s2s2

    1

    4

    PC+4[31‐28]

    ins[25‐21]

    00

    1

    1100

    1

    PCPC

    IM

    adins

    RF

    rad1rad2wadwd

    rd1

    rd2

    DMad rdA

    LU

    ins[25 21]ins[20‐16]

    ins[15‐11]11

    0011wd DMwd

    sxsxins[15‐0]16

  • Multi‐Cycle: Resource UtilizationMulti Cycle: Resource Utilization

    • Merge IM/DMMerge IM/DM– Lw: IM/PC++ ‐R  ‐ ALU‐ DM ‐R– Sw: IM/PC++ ‐R  ‐ ALU‐ DM/

    • Eliminate 1st Adder and Use ALU– As 1st adder is used in 1st Cycle and ALU is free inAs 1 adder is used in 1 Cycle and ALU is free in 1st Cycle

    • Eliminate 2nd Adder and Use ALU– As 2nd adder is used in 2nd Cycle and ALU is free in 2nd Cycle

    7

  • Pipeline DesignPipeline Design • Single Cycle  

    – Poor Resource Utilization, ,TC >= long Instr latency 

    • Multi Cycle– TC > Loner Stage, Better Utilization, Still performance need toperformance need to improve using pipeline

    – When Decoding INSi you h Scan Fetch INSi+1

    • Pipeline

    8

  • Instruction PipelineInstruction Pipeline

    IF D EX Mem WB

    IF D EX Mem WBIF D EX Mem WB

    IF D EX Mem WB

    IF D EX Mem WB

    IF D EX Mem WB

    Performance: 1 instruction per Cycle9

    All the Stages work in parallel, No resource can be shared by stages

  • Single cycle datapath (abstract)Single cycle datapath (abstract)

    +

    +4

    PCPC

    IM

    adins

    RF

    rad

    wadwd

    rd1

    rd2

    DMad rdA

    LU

    wd DMwd

  • Pipelined datapathPipelined datapath

    IF ID EX Mem WB

    IF/ID ID/EX EX/Mem Mem/WB

    +

    +4

    PC

    IM

    adins

    RF

    rad

    wadwd

    rd1

    rd2

    DM

    ad rd

    ALU

    wd DMwd

  • Don’t share resources in StagesDon t share resources in Stages

    • In Multi Cycle DesignIn Multi Cycle Design– ALU used for PC++ and Offset AddingUsed for 1st Adder and 2nd Adder– Used for 1st Adder and 2nd Adder

    – Register FILE is used in  2nd and 4th CycleI Pi li• In Pipeline  – Use Separate resource 1st Adder, 2nd Adder & ALU– Register FILE is accesses 1st Half of 2nd Cycle and 2nd Half of 4th Cycle

    12

  • Put back multiplexersPut back multiplexers

    IF ID EX Mem WB1

    IF/ID ID/EX EX/Mem Mem/WB

    1

    0

    d

    +

    +4

    s2s2

    PCPC

    IM

    adins

    RF

    rad1

    wad

    wd

    rd1

    rd2

    DMad rdA

    LU

    0

    1

    0

    11100

    1

    rad2

    wd DMwd

    0011

    sxsx

  • Correction for WB stageCorrection for WB stage

    IF ID EX Mem WB1

    IF/ID ID/EX EX/Mem Mem/WB

    1

    0

    d

    +

    +4

    s2s2

    PCPC

    IM

    adins

    RF

    rad1

    wad

    wd

    rd1

    rd2

    DMad rdA

    LU 11001

    rad2

    00

    wd DMwd

    0011

    sxsx

    11

  • Abstract: Adding controlAbstract: Adding control1

    0

    ololcontro

    contro

    d

    +

    +4

    s2s2

    PCPC

    IM

    adins

    RF

    rad1

    wad

    wd

    rd1

    rd2

    DMad rdA

    LU 11001

    rad2

    wd DMwd

    00

    0011

    sxsx

    Actrl

    Actrl

    11

  • Control signals with delaysControl signals with delays1

    0

    ololcontro

    contro

    d

    +

    +4

    s2s2

    PCPC

    IM

    adins

    RF

    rad1

    wad

    wd

    rd1

    rd2

    DMad rdA

    LU 11001

    rad2

    wd DMwd

    00

    0011

    sxsx

    Actrl

    Actrl

    11

  • Correction for RF write signalCorrection for RF write signal1

    0

    ololcontro

    contro

    d

    +

    +4

    s2s2

    PCPC

    IM

    adins

    RF

    rad1

    wad

    wd

    rd1

    rd2

    DMad rdA

    LU 11001

    rad2

    wd DMwd

    00

    0011

    sxsx

    Actrl

    Actrl

    11

  • Types of Pipelined processorsTypes of Pipelined processors

    • Degree of overlapg p– Serial, Overlapped, Pipelined, Super‐pipelined

    • Depthp– Shallow, Deep

    • StructureStructure– Linear, Non ‐ linear

    • Scheduling of operationsScheduling of operations– Static, Dynamic

  • Degree of overlap         DepthSerial Shallow

    O l dOverlapped

    Pipelined

    Deep

  • Pipeline StructurePipeline Structure

    A B CLinearPipeline

    A B CNon‐linearPipeline

    Sequence: A, B, C, B, C, A, C, Aq

  • Scheduling/timing alternativesScheduling/timing alternatives

    • Static• Static– same sequence of stages for all instructions– all actions in orderall actions in order– if one instruction stalls, all subsequent instructions are delayedy

    • Dynamic– above conditions are relaxed– higher throughput is achieved

  • Dynamic Scheduling

    • type 1 : beginnings (decode) and endings ( ) i d(put away) in order

    • type 2 : only beginnings in order• type 3 : no order restrictions except dependencies

    • type 1 extended : beginnings in order, references that effect memory state are in dorder[note that a memory reference may lead to page fault]page fault]

  • Pipelining and CPI

    Type CPIypSerial 5 – 6

    Overlapped 3Pipelined (static) 1 5 2Pipelined (static) 1.5 – 2

    Pipelined (dynamic) 1.2 – 1.5p ( y )Multiple instruction issue < 1.0

  • Hazards in Pipelining

    • Data dependencies => Data hazardsData dependencies  > Data hazards– RAW (read after write)– WAR (write after read)– WAR (write after read)– WAW (write after write)R fli t > St t l h d• Resource conflicts => Structural hazards– use of same resource in different stages

    • Procedural dependencies => Control hazards– conditional and unconditional branches, calls/returns

  • Data Hazards

    read/write

    previousinstr

    read/write

    current

    read/write

    instr

    delay = 3

  • Structural HazardsStructural Hazards

    • Use of a hardware resource in A B A CCaused by Resource Conflicts

    • Use of a hardware resource in more than one cycle 

    A B A C

    A B A C

    A B A C

    • Different sequences of 

    A B A C

    A B C Dresource usage by different instructions A C B D

    • Non‐pipelined multi‐cycle resources

    F D X X

    F D X X

  • Handling Data Hazardsgpreviousinstr

    WEXinstr

    currentinstr

    R EXData Forwarding1

    instr

    previousi t

    WInstruction

    instr Reordering

    2

    currenti

    R

    2

    instr

  • Stalls due to data hazardsStalls due to data hazards

    I: lw $t1,...

    IM RF DM

    ALU

    RFIadd $s1,$t1,..

    IM RF DM

    ALU

    RFI+1 IMIM

    A

  • Stalls due to control hazardsStalls due to control hazards

    I: beq ...,L

    IM RF DM

    ALU

    RFI...

    L: add ...

    I+1 RFIM

    I+2 IM

    IM RF DM

    ALU

    RFL

  • Control Hazards

    b h

    cond eval target addr gen

    branchinstr

    next inlineinstr delay = 2

    delay = 5

    targetinstr

    • the order of cond eval and target addr genmay be different• cond evalmay be done in previous instructiony p

  • Handling hazardsHandling hazards

    • Data hazardsData hazards – detect instructions with data dependence– introduce nop instructions (bubbles) in the p ( )pipeline

    – more complex: data forwarding• Control hazards

    – detect branch instructions– flush inline instructions if branching occurs– more complex: branch prediction

  • Pipeline Data HazardsPipeline Data Hazards

    • Stalls due to data hazards• Stalls due to data hazards• Control to introduce stall cycles• Detecting data hazard conditions• Data forwarding paths• Data forwarding paths• Data forwarding control• Stalls with data forwarding

  • Stalls due to data hazardsStalls due to data hazards

    I: lw $t1,...instruction view

    IM RF DM

    ALU

    RFIadd $s1,$t1,..

    IM RF DM

    ALU

    RFI+1 IMIM×

    A

    RF RF DMU RFI+1 RFIM√ ALU√

  • Actual forwarding pathsActual forwarding paths

    EX Mem WB

    ID/EX EX/Mem Mem/WB

    0

    fwdA

    DM

    ad rd

    ALU 1

    0

    0

    1

    12

    01 0 DM

    wd01 1

    20

    1

    fwdB fwdC

  • 35


Recommended