+ All Categories
Home > Documents > 08-speculation_2 [Compatibility Mode].pdf

08-speculation_2 [Compatibility Mode].pdf

Date post: 02-Jun-2018
Category:
Upload: arezo-shafiee
View: 217 times
Download: 0 times
Share this document with a friend

of 24

Transcript
  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    1/24

    Dynamic ILP

    Speculation

    1

  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    2/24

    Outline

    Speculation

    Re-order buffers

    Limits to ILP

    2

  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    3/24

    Speculation

    Branch Prediction Out of OrderExecution

    3

  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    4/24

  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    5/24

    Branch Prediction and

    Speculative Execution

    Speculation is to runinstructions onprediction predictions

    could be wrong.

    Branch prediction:cannot be avoided,could be very accurate

    Misprediction is lessfrequent event but canwe ignore?

    Example:

    for (i=0; i

  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    6/24

    Exception Behavior

    Preserving exception behavior -- exceptions must beraised exactly as in sequential execution

    Same sequence as sequential

    No extra exceptions Example:

    DADDU R2,R3,R4BEQZ R2,L1LW R1,0(R2)

    L1:

    Problem with moving LW before BEQZ? Again, a dynamic execution must look like a sequential

    execution, any time when it is stopped

    6

  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    7/24

    Exceptions in Order

    Solutions:

    Early detection of FP exceptions

    The use of software mechanisms to restore a precise

    exception state before resuming execution,

    Delaying instruction completion until we know an

    exception is impossible

    7

  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    8/24

    Precise Interrupts An interrupt is precise if the saved process

    state corresponds with a sequential model ofprogram execution where one instructioncompletes before the next begins.

    Tomasulo had:

    In-order issue, out-of-order execution, andout-of-order completion

    Need to fix the out-of-order completionaspect so that we can find precise breakpointin instruction stream.

    8

  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    9/24

    Short Seminar PreciseExceptions

    1. 01277582(Implementation of precise exceptionin a 5-stage pipeline embedded processor -

    CNF03).pdf

    2. 01354393(A 0.18-spl mu-m CMOS

    implementation of an area efficient preciseexception handling unit for processing-in-memory systems - CNF04).pdf

    3. 00004607(Implementing precise interrupts inpipelined processors - JNL88).pdf

    9

  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    10/24

    Branch Prediction Vs. Precise

    Interrupt

    Mis-prediction is exception on the branchinst

    Execution branches out on exceptions Every instruction is predicted not to take the branch

    to interrupt handler

    Same technique for handling both issue:

    in-order completion or commit: changeregister/memory only in program order

    (sequential)

    How does it ensure the correctness?

    10

  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    11/24

    HW Support for More ILP

    Speculation: allow an instruction to issue that isdependent on branch predicted to be taken without anyconsequences (including exceptions) if branch is not

    actually taken (HW undo

    );

    Combine branch prediction with dynamic schedulingto execute before branches resolved

    Separate speculative bypassing of results from realbypassing of results

    When instruction no longer speculative,

    write boosted results (instruction commit)or discard boosted results

    execute out-of-order but commit in-orderto prevent irrevocable action (update state or exception)until instruction commits

    11

  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    12/24

  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    13/24

    Reorder Buffer Implementation

    13

  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    14/24

    Result Shift Register Result Shift Register" is used to control

    the result bus

    N is the length of the longest functionalunit pipeline

    An instruction that takes i clockperiods reserves stage i

    If the stage already contains validcontrol information, then issue is helduntil the next clock period

    Issuing instruction places controlinformation in the result shift register.

    the functional unit that will be supplying theresult

    the destination register

    This control information is also marked"valid"

    Each clock period, the controlinformation is shifted down one stagetoward stage one.

    When it reaches stage one, it is usedduring the next clock period to controlthe result bus

    14

  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    15/24

    The Hardware: Reorder Buffer

    If inst write results in program order,reg/memory always get the correctvalues

    Reorder buffer (ROB) reorder out-of-order inst to program order at the time ofwriting reg/memory (commit)

    If some inst goes wrong, handle it at thetime of commit just flush instafterwards

    Inst cannot write reg/memoryimmediately after execution, so ROB alsobuffer the results

    No such a place in Tomasulo original

    ReorderBufferDecode

    FU1 FU2

    RS RS

    Fetch Unit

    Rename

    L-bufS-buf

    DM

    Regfile

    IM

    15

  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    16/24

    Four Steps of Speculative

    Tomasulo Algorithm1. Issueget instruction from FP Op Queue

    If reservation station and reorder buffer slot free, issue instr & sendoperands & reorder buffer no. for destination (this stage sometimescalled dispatch)

    2. Executionoperate on operands (EX)When both operands ready then execute; if not ready, watch CDB forresult; when both in reservation station, execute; checks RAW(sometimes called issue)

    3. Write resultfinish execution (WB)Write on Common Data Bus to all awaiting FUs& reorder buffer; mark reservation station available.

    4. Commit

    update register with reorder resultWhen instr. at head of reorder buffer & result present, update registerwith result (or store to memory) and remove instr from reorder buffer.Mispredicted branch flushes reorder buffer (sometimes calledgraduation)

    16

  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    17/24

    Reorder Buffer Details Holds Instruction type: branch, store, ALU

    register operation

    Holds branch valid and exception bits

    Flush pipeline when any bit is set

    Holds dest, result and PC

    Write results to dest at the time of commit

    Which PC to hold?

    A ready bit indicates if theinstruction has completedexecution and the value is ready

    Supplies operands between executioncomplete and commit

    ROB replaces the Store Buffer also

    Reorder Buffer

    Destr

    eg

    Result

    Except

    ions?

    ProgramCounter

    Branch

    orL/W?

    Ready?

    17

  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    18/24

    Speculative Execution

    Recovery

    Flush the pipeline on mis-

    prediction

    MIPS 5-stage pipeline

    used flushing on taken

    branches

    Where is the flush signal

    from?

    When to flush?

    Reorder

    BufferDecode

    FU1 FU2

    RS RS

    Fetch Unit

    Rename

    L-bufS-buf

    DM

    Regfile

    IM

    18

  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    19/24

    Changes to Other Components

    Use ROB index as tag Why not RS index any more?

    Why is ROB index a valid choice?

    Renaming table maps architecture registersto ROB index if the register is renamed

    Reservation stations now use ROB index fortracking dependence and for wakeup

    Again tag (now ROB index) and data are

    broadcast on CDB at writeback Inst may receive values from reg/mem, data

    broadcasting, or ROB

    19

  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    20/24

    Complexity of ROB

    Assume dual-issue superscalar

    Load/Store machine with three-operand instructions

    64 registers

    16-entry circular buffer Hardware support needed for ROB

    two write ports

    Four read ports (two source operands of two instructions)

    Four 6-bit comparators for associative lookup

    Limited capacity of ROB is a structural hazard

    Repeated writes to same register actually happen

    This is not the case in classical Tomasulo

    20

  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    21/24

    Code ExampleLoop: LD R2, 0(R1)

    DADDIU R2, R2, #1

    SD R2, 0(R1)

    DADDIU R1, R1, #4BNE R2, R3, Loop

    How would this code be executed?

    Inst Issue Exec Memoryread

    Write

    results

    Commit

    LD 1 2 3 4 5

    21

  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    22/24

    Summary Reservations stations: implicit register renaming to

    larger set of registers + buffering source operands Prevents registers as bottleneck

    Avoids WAR, WAW hazards of Scoreboard

    Not limited to basic blocks when compared to staticscheduling (integer units gets ahead, beyondbranches)

    Today, helps cache misses as well Dont stall for L1 Data cache miss

    Can support memory-level parallelism

    Lasting Contributions Dynamic scheduling

    Register renaming

    Load/store disambiguation

    360/91 descendants are Pentium III; PowerPC 604;MIPS R10000; HP-PA 8000; Alpha 21264

    22

  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    23/24

  • 8/10/2019 08-speculation_2 [Compatibility Mode].pdf

    24/24

    Advantages of HW (Tomasulo)

    vs. SW (VLIW) Speculation

    HW determines address conflicts

    HW better branch prediction

    HW maintains precise exception model

    Works across multiple implementations

    SW speculation is much easier for HW design

    24


Recommended