08-speculation_2 [Compatibility Mode].pdf

8/10/2019 08-speculation_2 [Compatibility Mode].pdf

1/24

Dynamic ILP

Speculation

1


2/24

Outline

Speculation

Re-order buffers

Limits to ILP

2


3/24

Speculation

Branch Prediction Out of OrderExecution

3


4/24


5/24

Branch Prediction and

Speculative Execution

Speculation is to runinstructions onprediction predictions

could be wrong.

Branch prediction:cannot be avoided,could be very accurate

Misprediction is lessfrequent event but canwe ignore?

Example:

for (i=0; i


6/24

Exception Behavior

Preserving exception behavior -- exceptions must beraised exactly as in sequential execution

Same sequence as sequential

No extra exceptions Example:

DADDU R2,R3,R4BEQZ R2,L1LW R1,0(R2)

L1:

Problem with moving LW before BEQZ? Again, a dynamic execution must look like a sequential

execution, any time when it is stopped

6


7/24

Exceptions in Order

Solutions:

Early detection of FP exceptions

The use of software mechanisms to restore a precise

exception state before resuming execution,

Delaying instruction completion until we know an

exception is impossible

7


8/24

Precise Interrupts An interrupt is precise if the saved process

state corresponds with a sequential model ofprogram execution where one instructioncompletes before the next begins.

Tomasulo had:

In-order issue, out-of-order execution, andout-of-order completion

Need to fix the out-of-order completionaspect so that we can find precise breakpointin instruction stream.

8


9/24

Short Seminar PreciseExceptions

1. 01277582(Implementation of precise exceptionin a 5-stage pipeline embedded processor -

CNF03).pdf

2. 01354393(A 0.18-spl mu-m CMOS

implementation of an area efficient preciseexception handling unit for processing-in-memory systems - CNF04).pdf

3. 00004607(Implementing precise interrupts inpipelined processors - JNL88).pdf

9


10/24

Branch Prediction Vs. Precise

Interrupt

Mis-prediction is exception on the branchinst

Execution branches out on exceptions Every instruction is predicted not to take the branch

to interrupt handler

Same technique for handling both issue:

in-order completion or commit: changeregister/memory only in program order

(sequential)

How does it ensure the correctness?

10


11/24

HW Support for More ILP

Speculation: allow an instruction to issue that isdependent on branch predicted to be taken without anyconsequences (including exceptions) if branch is not

actually taken (HW undo

);

Combine branch prediction with dynamic schedulingto execute before branches resolved

Separate speculative bypassing of results from realbypassing of results

When instruction no longer speculative,

write boosted results (instruction commit)or discard boosted results

execute out-of-order but commit in-orderto prevent irrevocable action (update state or exception)until instruction commits

11


12/24


13/24

Reorder Buffer Implementation

13


14/24

Result Shift Register Result Shift Register" is used to control

the result bus

N is the length of the longest functionalunit pipeline

An instruction that takes i clockperiods reserves stage i

If the stage already contains validcontrol information, then issue is helduntil the next clock period

Issuing instruction places controlinformation in the result shift register.

the functional unit that will be supplying theresult

the destination register

This control information is also marked"valid"

Each clock period, the controlinformation is shifted down one stagetoward stage one.

When it reaches stage one, it is usedduring the next clock period to controlthe result bus

14


15/24

The Hardware: Reorder Buffer

If inst write results in program order,reg/memory always get the correctvalues

Reorder buffer (ROB) reorder out-of-order inst to program order at the time ofwriting reg/memory (commit)

If some inst goes wrong, handle it at thetime of commit just flush instafterwards

Inst cannot write reg/memoryimmediately after execution, so ROB alsobuffer the results

No such a place in Tomasulo original

ReorderBufferDecode

FU1 FU2

RS RS

Fetch Unit

Rename

L-bufS-buf

DM

Regfile

IM

15


16/24

Four Steps of Speculative

Tomasulo Algorithm1. Issueget instruction from FP Op Queue

If reservation station and reorder buffer slot free, issue instr & sendoperands & reorder buffer no. for destination (this stage sometimescalled dispatch)

2. Executionoperate on operands (EX)When both operands ready then execute; if not ready, watch CDB forresult; when both in reservation station, execute; checks RAW(sometimes called issue)

3. Write resultfinish execution (WB)Write on Common Data Bus to all awaiting FUs& reorder buffer; mark reservation station available.

4. Commit

update register with reorder resultWhen instr. at head of reorder buffer & result present, update registerwith result (or store to memory) and remove instr from reorder buffer.Mispredicted branch flushes reorder buffer (sometimes calledgraduation)

16


17/24

Reorder Buffer Details Holds Instruction type: branch, store, ALU

register operation

Holds branch valid and exception bits

Flush pipeline when any bit is set

Holds dest, result and PC

Write results to dest at the time of commit

Which PC to hold?

A ready bit indicates if theinstruction has completedexecution and the value is ready

Supplies operands between executioncomplete and commit

ROB replaces the Store Buffer also

Reorder Buffer

Destr

eg

Result

Except

ions?

ProgramCounter

Branch

orL/W?

Ready?

17


18/24

Speculative Execution

Recovery

Flush the pipeline on mis-

prediction

MIPS 5-stage pipeline

used flushing on taken

branches

Where is the flush signal

from?

When to flush?

Reorder

BufferDecode

FU1 FU2

RS RS

Fetch Unit

Rename

L-bufS-buf

DM

Regfile

IM

18


19/24

Changes to Other Components

Use ROB index as tag Why not RS index any more?

Why is ROB index a valid choice?

Renaming table maps architecture registersto ROB index if the register is renamed

Reservation stations now use ROB index fortracking dependence and for wakeup

Again tag (now ROB index) and data are

broadcast on CDB at writeback Inst may receive values from reg/mem, data

broadcasting, or ROB

19


20/24

Complexity of ROB

Assume dual-issue superscalar

Load/Store machine with three-operand instructions

64 registers

16-entry circular buffer Hardware support needed for ROB

two write ports

Four read ports (two source operands of two instructions)

Four 6-bit comparators for associative lookup

Limited capacity of ROB is a structural hazard

Repeated writes to same register actually happen

This is not the case in classical Tomasulo

20


21/24

Code ExampleLoop: LD R2, 0(R1)

DADDIU R2, R2, #1

SD R2, 0(R1)

DADDIU R1, R1, #4BNE R2, R3, Loop

How would this code be executed?

Inst Issue Exec Memoryread

Write

results

Commit

LD 1 2 3 4 5

21


22/24

Summary Reservations stations: implicit register renaming to

larger set of registers + buffering source operands Prevents registers as bottleneck

Avoids WAR, WAW hazards of Scoreboard

Not limited to basic blocks when compared to staticscheduling (integer units gets ahead, beyondbranches)

Today, helps cache misses as well Dont stall for L1 Data cache miss

Can support memory-level parallelism

Lasting Contributions Dynamic scheduling

Register renaming

Load/store disambiguation

360/91 descendants are Pentium III; PowerPC 604;MIPS R10000; HP-PA 8000; Alpha 21264

22


23/24


24/24

Advantages of HW (Tomasulo)

vs. SW (VLIW) Speculation

HW determines address conflicts

HW better branch prediction

HW maintains precise exception model

Works across multiple implementations

SW speculation is much easier for HW design

24

Date post:	02-Jun-2018
Category:	Documents
Upload:	arezo-shafiee
View:	217 times
Download:	0 times

08-speculation_2 [Compatibility Mode].pdf

Documents