+ All Categories
Home > Documents > Chapter 8 - Pipe Lining

Chapter 8 - Pipe Lining

Date post: 07-Apr-2018
Category:
Upload: vikas-singhal
View: 223 times
Download: 0 times
Share this document with a friend

of 31

Transcript
  • 8/6/2019 Chapter 8 - Pipe Lining

    1/31

    Chapter 8. Pipelining

  • 8/6/2019 Chapter 8 - Pipe Lining

    2/31

    Overview

    Pipelining is widely used in modern

    processors.

    Pipelining improves system performance interms of throughput.

    Pipelined organization requires sophisticated

    compilation techniques.

  • 8/6/2019 Chapter 8 - Pipe Lining

    3/31

    Basic Concepts

  • 8/6/2019 Chapter 8 - Pipe Lining

    4/31

    Making the Execution of

    Programs Faster

    Use faster circuit technology to build the

    processor and the main memory.

    Arrange the hardware so that more than oneoperation can be performed at the same time.

    In the latter way, the number of operations

    performed per second is increased even

    though the elapsed time needed to performany one operation is not changed.

  • 8/6/2019 Chapter 8 - Pipe Lining

    5/31

    Traditional Pipeline Concept

    Laundry Example

    Ann, Brian, Cathy, Dave

    each have one load of clothesto wash, dry, and fold

    Washer takes 30 minutes

    Dryer takes 40 minutes

    Folder takes 20 minutes

    A B C D

  • 8/6/2019 Chapter 8 - Pipe Lining

    6/31

    Traditional Pipeline Concept

    Sequential laundry takes 6 hours

    for 4 loads

    If they learned pipelining, how

    long would laundry take?

    A

    B

    C

    D

    30 40 20 30 40 20 30 40 20 30 40 20

    6 PM 7 8 9 10 11 Midnight

    Time

  • 8/6/2019 Chapter 8 - Pipe Lining

    7/31

    Traditional Pipeline Concept

    Pipelined laundry takes

    3.5 hours for 4 loads

    A

    B

    C

    D

    6 PM 7 8 9 10 11 Midnight

    T

    as

    k

    O

    r

    d

    e

    r

    Time

    30 40 40 40 40 20

  • 8/6/2019 Chapter 8 - Pipe Lining

    8/31

    Traditional Pipeline ConceptPipelining doesnt help latency

    of single task, it helps

    throughput of entire workload

    Pipeline rate limited by slowest

    pipeline stage

    Multiple tasks operating

    simultaneously using different

    resources

    Potential speedup = Number

    pipe stages

    Unbalanced lengths of pipe

    stages reduces speedupTime to fill pipeline and time

    to drain it reduces speedup

    Stall for Dependences

    A

    B

    C

    D

    6 PM 7 8 9

    T

    a

    s

    k

    O

    r

    d

    e

    r

    Time

    30 40 40 40 40 20

  • 8/6/2019 Chapter 8 - Pipe Lining

    9/31

    Use the Idea of Pipelining in a

    Computer

    F1

    E1

    F2

    E2

    F3

    E3

    I1

    I2

    I3

    (a) Sequential execution

    Instructionfetchunit

    Executionunit

    Interstage buffer

    B1

    (b) Hardware organization

    Time

    F1

    E1

    F2

    E2

    F3

    E3

    I1

    I2

    I3

    Instruction

    (c) Pipelined execution

    Figure 8.1. Basic idea of instruction pipelining.

    Clock cycle 1 2 3 4

    Time

    Fetch + Execution

  • 8/6/2019 Chapter 8 - Pipe Lining

    10/31

    Use the Idea of Pipelining in a

    Computer

    F4I4

    F1

    F2

    F3

    I1

    I2

    I3

    D1

    D2

    D3

    D4

    E1

    E2

    E3

    E4

    W1

    W2

    W3

    W4

    Instruction

    Figure 8.2. A 4 stage pipeline.

    Clock cycle 1 2 3 4 5 6 7

    (a) Instruction execution divided into four steps

    F : Fetchinstruction

    D : Decodeinstructionand fetchoperands

    E: Executeoperation

    W : Writeresults

    Interstage buffers

    (b) Hardware organization

    B1 B2 B3

    Time

    Fetch + Decode

    + Execution + Write

    Textbook page: 457

  • 8/6/2019 Chapter 8 - Pipe Lining

    11/31

    Role of Cache Memory

    Each pipeline stage is expected to complete in one

    clock cycle.

    The clock period should be long enough to let the

    slowest pipeline stage to complete. Faster stages can only wait for the slowest one to

    complete.

    Since main memory is very slow compared to the

    execution, if each instruction needs to be fetched

    from main memory, pipeline is almost useless.

    Fortunately, we have cache.

  • 8/6/2019 Chapter 8 - Pipe Lining

    12/31

    Pipeline Performance

    The potential increase in performance

    resulting from pipelining is proportional to the

    number of pipeline stages.

    However, this increase would be achieved

    only if all pipeline stages require the same

    time to complete, and there is no interruption

    throughout program execution.Unfortunately, this is not true.

  • 8/6/2019 Chapter 8 - Pipe Lining

    13/31

    Pipeline Performance

    F1

    F2

    F3

    I1

    I2

    I3

    E1

    E2

    E3

    D1

    D2

    D3

    W1

    W2

    W3

    Instruction

    F4 D4I4

    Clock cycle 1 2 3 4 5 6 7 8 9

    Figure 8.3. Effect of an e xecution operation taking more than one clock c ycle.

    E4

    F5I5 D5

    Time

    E5

    W4

  • 8/6/2019 Chapter 8 - Pipe Lining

    14/31

    Pipeline Performance

    The previous pipeline is said to have been stalled for two clockcycles.

    Any condition that causes a pipeline to stall is called a hazard. Data hazard any condition in which either the source or the

    destination operands of an instruction are not available at thetime expected in the pipeline. So some operation has to bedelayed, and the pipeline stalls.

    Instruction (control) hazard a delay in the availability of aninstruction causes the pipeline to stall.

    Structural hazard the situation when two instructions requirethe use of a given hardware resource at the same time.

  • 8/6/2019 Chapter 8 - Pipe Lining

    15/31

    Pipeline Performance

    F1

    F2

    F3

    I1

    I2

    I3

    D1

    D2

    D3

    E1

    E2

    E3

    W1

    W2

    W3

    Instruction

    Figure 8.4. Pipeline stall caused by a cache miss in F2.

    1 2 3 4 5 6 7 8 9Clock cycle

    (a) Instruction execution steps in successive clock cycles

    1 2 3 4 5 6 7 8Clock cycle

    Stage

    F: Fetch

    D: Decode

    E: Execute

    W: Write

    F1 F2 F3

    D1 D2 D3idle idle idle

    E1 E2 E3idle idle idle

    W1 W2idle idle idle

    (b) Function performed by each processor stage in successive clock cycles

    9

    W3

    F2 F2 F2

    Time

    Time

    Idle periods

    stalls (bubbles)

    Instruction

    hazard

  • 8/6/2019 Chapter 8 - Pipe Lining

    16/31

    Pipeline Performance

    F1

    F2

    F3

    I1

    I2 (Load)

    I3

    E1

    M2

    D1

    D2

    D3

    W1

    W2

    Instruction

    F4

    I4

    Clock cycle 1 2 3 4 5 6 7

    Figure 8.5. Effect of a Load instruction on pipeline timing.

    F5I5 D5

    Time

    E2

    E3 W3

    E4D4

    Load X(R1), R2Structural

    hazard

  • 8/6/2019 Chapter 8 - Pipe Lining

    17/31

    Pipeline Performance

    Again, pipelining does not result in individual

    instructions being executed faster; rather, it is the

    throughput that increases.

    Throughput is measured by the rate at whichinstruction execution is completed.

    Pipeline stall causes degradation in pipeline

    performance.

    We need to identify all hazards that may cause the

    pipeline to stall and to find ways to minimize their

    impact.

  • 8/6/2019 Chapter 8 - Pipe Lining

    18/31

    Quiz

    Four instructions, the I2 takes two clock

    cycles for execution. Pls draw the figure for 4-

    stage pipeline, and figure out the total cycles

    needed for the four instructions to complete.

  • 8/6/2019 Chapter 8 - Pipe Lining

    19/31

    Data Hazards

  • 8/6/2019 Chapter 8 - Pipe Lining

    20/31

    Data Hazards

    We must ensure that the results obtained when instructions areexecuted in a pipelined processor are identical to those obtainedwhen the same instructions are executed sequentially.

    Hazard occurs

    A 3 + A

    B 4 A No hazard

    A 5 C

    B 20 + C When two operations depend on each other, they must be

    executed sequentially in the correct order. Another example:

    Mul R2, R3, R4

    Add R5, R4, R6

  • 8/6/2019 Chapter 8 - Pipe Lining

    21/31

    Data Hazards

    F1

    F2

    F3

    I1 (Mul)

    I2 (Add)

    I3

    D1

    D3

    E1

    E3

    E2

    W3

    Instruction

    Figure 8.6. Pipeline stalled by data dependenc y between D 2 and W1.

    1 2 3 4 5 6 7 8 9Clock cycle

    W1

    D2A W2

    F4 D4 E4 W4I4

    D2

    Time

    Figure 8.6. Pipeline stalled by data dependency between D2 and W1.

  • 8/6/2019 Chapter 8 - Pipe Lining

    22/31

    Operand Forwarding

    Instead of from the register file, the second

    instruction can get data directly from the

    output of ALU after the previous instruction is

    completed.

    A special arrangement needs to be made to

    forward the output of ALU to the input of

    ALU.

  • 8/6/2019 Chapter 8 - Pipe Lining

    23/31

    Register

    file

    SRC1 SRC2

    RSLT

    Destination

    Source 1

    Source 2

    (a) Datapath

    ALU

    E: Execute(ALU)

    W: Write(Register file)

    SRC1,SRC2 RSLT

    (b) Position of the source and result registers in the processor pipeline

    Figure 8.7. Operand forw arding in a pipelined processor .

    Forwarding p ath

  • 8/6/2019 Chapter 8 - Pipe Lining

    24/31

    Handling Data Hazards in

    Software

    Let the compiler detect and handle thehazard:

    I1: Mul R2, R3, R4

    NOP

    NOP

    I2: Add R5, R4, R6

    The compiler can reorder the instructions toperform some useful work during the NOPslots.

  • 8/6/2019 Chapter 8 - Pipe Lining

    25/31

    Side Effects

    The previous example is explicit and easily detected. Sometimes an instruction changes the contents of a register

    other than the one named as the destination. When a location other than one explicitly named in an instruction

    as a destination operand is affected, the instruction is said tohave a side effect. (Example?) Example: conditional code flags:

    Add R1, R3

    AddWithCarry R2, R4

    Instructions designed for execution on pipelined hardware shouldhave few side effects.

  • 8/6/2019 Chapter 8 - Pipe Lining

    26/31

    Instruction Hazards

  • 8/6/2019 Chapter 8 - Pipe Lining

    27/31

    Overview

    Whenever the stream of instructions supplied

    by the instruction fetch unit is interrupted, the

    pipeline stalls.

    Cache miss

    Branch

  • 8/6/2019 Chapter 8 - Pipe Lining

    28/31

    Unconditional Branches

    F2I2 (Branch)

    I3

    Ik

    E2

    F3

    Fk Ek

    Fk+1 Ek+1Ik+1

    Instruction

    Figure 8.8. An idle cycle caused by a branch instruction.

    Execution unit idle

    1 2 3 4 5Clock cycleTime

    F1I1 E1

    6

    X

  • 8/6/2019 Chapter 8 - Pipe Lining

    29/31

    Branch TimingX

    Figure 8.9. Branch timing.

    F1 D1 E1 W1

    I2 (Branch)

    I1

    1 2 3 4 5 6 7Clock cycle

    F2 D2

    F3 X

    Fk Dk Ek

    Fk+1 Dk+1

    I3

    Ik

    Ik+1

    Wk

    Ek+1

    (b) Branch address computed in Decode stage

    F1 D1 E1 W1

    I2 (Branch)

    I1

    1 2 3 4 5 6 7Clock cycle

    F2 D2

    F3

    Fk Dk Ek

    Fk+1 Dk+1

    I3

    Ik

    Ik+1

    Wk

    Ek+1

    (a) Branch address computed in Ex ecute stage

    E2

    D3

    F4 XI4

    8

    Time

    Time

    - Branch penalty

    - Reducing the penalty

  • 8/6/2019 Chapter 8 - Pipe Lining

    30/31

    Instruction Queue and

    Prefetching

    F : Fetchinstruction

    E : Executeinstruction

    W : Writeresults

    D : Dispatch/Decode

    Instruction queue

    Instruction fetch unit

    Figure 8.10. Use of an instruction queue in the hardware organization of Figure 8.2b.

    unit

  • 8/6/2019 Chapter 8 - Pipe Lining

    31/31

    Conditional Braches

    A conditional branch instruction introducesthe added hazard caused by the dependencyof the branch condition on the result of a

    preceding instruction. The decision to branch cannot be made until

    the execution of that instruction has beencompleted.

    Branch instructions represent about 20% ofthe dynamic instruction count of mostprograms.


Recommended