Computer Architecture

Post on 06-Jan-2016

35 views 0 download

description

Computer Architecture. Lecture 6 Overview of Branch Prediction. 0% 0%. matrix300. 9% 9%. 4096 entries: 2bits per entry Unlimited entries 2 bits per entry. spice. 9% 9%. fpppp. 12% 11%. gcc. 5% 5%. espresso. eqntott. 10% 10%. li. - PowerPoint PPT Presentation

transcript

Computer Architecture

Lecture 6

Overview of Branch Prediction

Prediction accuracy of a 4096- entry 2-bit prediction buffer vs. infinite buffer

0% 2% 4% 6% 8% 10% 12% 14% 16% 18% Frequency of mispredictions

10%

10%

5%

5%

12%

11%

9%

9%

9%

9%

0%

0%

li

eqntott

espresso

gcc

fpppp

spice

matrix300

4096 entries:

2bits per entry

Unlimited entries

2 bits per entry

Comparison of 2 bit predictors

Frequency of mispredictions (%)0 2 4 6 8 10 12 14 16 18

10%

10%

5%

5%

12%

11%

9%

9%

9%

9%

0%

0%

li

eqntott

espresso

gcc

fpppp

spice

matrix300

5%

5%

11%

4%

6%

5%

Local 4096 entries:

2-bits per

Unlimited entries

2-bits

1024 entries (2,2)

Tournament Predictor

Use predictor P1

11

P1 Correct

P2 Correct

P1 Correct

P1 Correct

P1 Correct

Use predictor

P2

00

Use predictor P1

10

Use predictor P2

01

P2 Correct

P2 Correct

Misprediction rate of three predictors

• Note that predictors of equal capacity must be compared. Sizes of each level have to be selected to optimize prediction accurate. Influencing factors: degree of interference between branches, program likely to benefit from local/global history

Total Predictor Size (KBits)

Conditional Branch Mis-prediction Rate.

0 32 64 96 128 160 192 224 256 288 320 352 384 416 448 480 512

Correlating Predictor

Local 2-bit Predictor

8%

7%

6%

5%

4%

3%

2%

1%

0%

Tournament Predictor

Why Prediction

Prediction Reduces Branch hazards in Pipelined Processors.

Used in almost all pipelined processors

0

Mux

1

Branch prediction (T/NT)

Branch Prediction Buffer

Branch Target Address Cache

PC+4

Actual Next PC

A Branch Target Buffer

Branch predicted taken or untaken

Number

of entries

In branch target

buffer

Predicted PC

PC of instruction to fetchLookup

No: not branch instruction; proceed normally

=

Yes: Instruction is branch, use Predicted PC

Prediction Hardware (Counter Etc)

New PC

Handling an instruction with a branch-target

ID

Send PC to memory and branch-target buffer

Entry found in the branch-target buffer?

Send out predicted

PCIs

Instruction

a taken branch?

Taken

Branch?

Mispredicted Branch, kill fetched instruction

Enter Branch instruction address and next PC into branch target buffer

No

No

No

Yes

YesYes

Branch correctly Predicted; Continue execution with no stalls

Normal instruction execution

IF

EX

Penalties for possible combinations of whether the branch is in the buffer

Instruction in

buffer

Prediction Actual branch

Penalty cycles

Yes Taken Taken 0

Yes Taken Not taken 2

No Taken 2

No Not taken 0

Static Super Scalar pipeline in operation

Fetch 64-bits/clock cycle; Int on left, FP on right– Can only issue 2nd instruction if 1st instruction issues– More ports for FP registers to do FP load & FP op in a pair

Type Pipe StagesInt. instruction IF ID EX MEM WBFP instruction IF ID EX MEM WBInt. instruction IF ID EX MEM WBFP instruction IF ID EX MEM WBInt. instruction IF ID EX MEMWBFP instruction IF ID EX MEMWB

1 cycle load delay causes delay to 3 instructions in Superscalar instruction in right half can’t use it, nor instructions in

next slot

Wait for Operands

Check for RS

Check for RAW

Wait for Operands

EXTAC

MemAccess

CDB #1

EX

M1

M2

.

.M7

Divide

Wait for Operands

Wait for Operands

Integer

LD/ST

FP

Write Reg

ISSUE/ Rename to RS

ISSUE/ Rename to RS

Instr.

Cach

e

Wider Bus

CDB #2

Wait for Operands

A1

A2

A3

A4

Wait for OperandsWait for Operands

Wait for Operands

Wait for Operands

Read Reg

Dynamic Super Scalar pipeline in operation

Example 1

Loop: L.D F0,0(R1) ;F0=array elementADD.D F4,F0,F2S.D F4,0(R1) ; store result ADDIU R1,R1,#-8;8 bytes (per DW)

BNE R1,R2,LOOP ;branch R1!=R2

Dual issue, 1 Integer Unit FPMUL = 3 cc

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 First issue

1 ADD.D F4,F0,F2 1

1 S.D F4,0(R1)

1 DADDIU R1,R1,#-8

1 BNE R1,R2,Loop

2 L.D F0,0(R1)

2 ADD.D F4,F0,F2

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 First issue

1 ADD.D F4,F0,F2 1

1 S.D F4,0(R1) 2

1 DADDIU R1,R1,#-8

2

1 BNE R1,R2,Loop

2 L.D F0,0(R1)

2 ADD.D F4,F0,F2

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 First issue

1 ADD.D F4,F0,F2 1

1 S.D F4,0(R1) 2 3

1 DADDIU R1,R1,#-8

2

1 BNE R1,R2,Loop 3

2 L.D F0,0(R1)

2 ADD.D F4,F0,F2

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1

1 S.D F4,0(R1) 2 3

1 DADDIU R1,R1,#-8

2 4

1 BNE R1,R2,Loop 3

2 L.D F0,0(R1) 4

2 ADD.D F4,F0,F2 4

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3

2 L.D F0,0(R1) 4

2 ADD.D F4,F0,F2 4

2 S.D F4,0(R1) 5

2 DADDIU R1,R1,#-8

5

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5,6 Wait for L.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4

2 ADD.D F4,F0,F2 4

2 S.D F4,0(R1) 5

2 DADDIU R1,R1,#-8 5

2 BNE R1,R2,Loop 6

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5,6,7 Wait for L.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 Wait for BNE

2 ADD.D F4,F0,F2 4

2 S.D F4,0(R1) 5

2 DADDIU R1,R1,#-8 5

2 BNE R1,R2,Loop 6

3 L.D F0,0(R1) 7

3 ADD.D F4,F0,F2 7

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 Wait for L.D

2 S.D F4,0(R1) 5 8 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 Wait for ALU

2 BNE R1,R2,Loop 6 Wait for DADDIU

3 L.D F0,0(R1) 7 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8

3 DADDIU R1,R1,#-8 8

3 BNE R1,R2,Loop

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 Wait for L.D

2 S.D F4,0(R1) 5 8 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 Wait for ALU

2 BNE R1,R2,Loop 6 Wait for DADDIU

3 L.D F0,0(R1) 7 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 Wait for ALU

3 BNE R1,R2,Loop 9

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10 Wait for L.D

2 S.D F4,0(R1) 5 8 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 Wait for DADDIU

3 L.D F0,0(R1) 7 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 Wait for ALU

3 BNE R1,R2,Loop 9 Wait for DADDIU

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10,11 Wait for L.D

2 S.D F4,0(R1) 5 8 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 11 Wait for DADDIU

3 L.D F0,0(R1) 7 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 Wait for ALU

3 BNE R1,R2,Loop 9 Wait for DADDIU

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10,11,12 Wait for L.D

2 S.D F4,0(R1) 5 8 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 11 Wait for DADDIU

3 L.D F0,0(R1) 7 12 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 Wait for ALU

3 BNE R1,R2,Loop 9 Wait for DADDIU

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D

2 S.D F4,0(R1) 5 8 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 11 Wait for DADDIU

3 L.D F0,0(R1) 7 12 13 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 13 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 Wait for ALU

3 BNE R1,R2,Loop 9 Wait for DADDIU

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D

2 S.D F4,0(R1) 5 8 14 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 11 Wait for DADDIU

3 L.D F0,0(R1) 7 12 13 14 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 13 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 14 Wait for ALU

3 BNE R1,R2,Loop 9 Wait for DADDIU

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D

2 S.D F4,0(R1) 5 8 14 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 11 Wait for DADDIU

3 L.D F0,0(R1) 7 12 13 14 Wait for BNE

3 ADD.D F4,F0,F2 7 15 Wait for L.D

3 S.D F4,0(R1) 8 13 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 14 15 Wait for ALU

3 BNE R1,R2,Loop 9 Wait for DADDIU

Dual issue, 1 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D

2 S.D F4,0(R1) 5 8 14 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 11 Wait for DADDIU

3 L.D F0,0(R1) 7 12 13 14 Wait for BNE

3 ADD.D F4,F0,F2 7 15,16 Wait for L.D

3 S.D F4,0(R1) 8 13 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 14 15 Wait for ALU

3 BNE R1,R2,Loop 9 16 Wait for DADDIU

Dual issue, 1 Integer Unit, FPMUL = 3 cc

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for L.D

1 S.D F4,0(R1) 2 3 9

1 DADDIU R1,R1,#-8 2 4 5 Wait for ALU

1 BNE R1,R2,Loop 3 6 Wait for DADDIU

2 L.D F0,0(R1) 4 7 8 9 Wait for BNE

2 ADD.D F4,F0,F2 4 10-12 13 Wait for L.D

2 S.D F4,0(R1) 5 8 14 Wait for ADD.D

2 DADDIU R1,R1,#-8 5 9 10 Wait for ALU

2 BNE R1,R2,Loop 6 11 Wait for DADDIU

3 L.D F0,0(R1) 7 12 13 14 Wait for BNE

3 ADD.D F4,F0,F2 7 15-17 18 Wait for L.D

3 S.D F4,0(R1) 8 13 19 Wait for ADD.D

3 DADDIU R1,R1,#-8 8 14 15 Wait for ALU

3 BNE R1,R2,Loop 9 16 Wait for DADDIU

Dual issue, 2 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 First issue

1 ADD.D F4,F0,F2 1

1 S.D F4,0(R1)

1 DADDIU R1,R1,#-8

1 BNE R1,R2,Loop

2 L.D F0,0(R1)

2 ADD.D F4,F0,F2

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Dual issue, 2 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 First issue

1 ADD.D F4,F0,F2 1

1 S.D F4,0(R1) 2

1 DADDIU R1,R1,#-8

2

1 BNE R1,R2,Loop

2 L.D F0,0(R1)

2 ADD.D F4,F0,F2

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Dual issue, 2 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 First issue

1 ADD.D F4,F0,F2 1

1 S.D F4,0(R1) 2 3

1 DADDIU R1,R1,#-8

2 3

1 BNE R1,R2,Loop 3

2 L.D F0,0(R1)

2 ADD.D F4,F0,F2

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Dual issue, 2 Integer Unit

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 Wait for LD.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3

2 L.D F0,0(R1) 4

2 ADD.D F4,F0,F2 4

2 S.D F4,0(R1)

2 DADDIU R1,R1,#-8

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5 Wait for LD.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4

2 ADD.D F4,F0,F2 4

2 S.D F4,0(R1) 5

2 DADDIU R1,R1,#-8

5

2 BNE R1,R2,Loop

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5,6 Wait for LD.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 Wait for BNE

2 ADD.D F4,F0,F2 4 Wait for L.D

2 S.D F4,0(R1) 5 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 Executes earlier

2 BNE R1,R2,Loop 6

3 L.D F0,0(R1)

3 ADD.D F4,F0,F2

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5,6,7 Wait for LD.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 Wait for BNE

2 ADD.D F4,F0,F2 4 Wait for L.D

2 S.D F4,0(R1) 5 7 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6

3 L.D F0,0(R1) 7

3 ADD.D F4,F0,F2 7

3 S.D F4,0(R1)

3 DADDIU R1,R1,#-8

3 BNE R1,R2,Loop

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5-7 8 Wait for LD.D

1 S.D F4,0(R1) 2 3 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 Wait for L.D

2 S.D F4,0(R1) 5 7 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6 8

3 L.D F0,0(R1) 7

3 ADD.D F4,F0,F2 7

3 S.D F4,0(R1) 8

3 DADDIU R1,R1,#-8

8

3 BNE R1,R2,Loop

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D

1 S.D F4,0(R1) 2 3 9 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 9 Wait for L.D

2 S.D F4,0(R1) 5 7 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6 8 Wait for ADDIU

3 L.D F0,0(R1) 7 9 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8

3 DADDIU R1,R1,#-8

8 9

3 BNE R1,R2,Loop 9

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D

1 S.D F4,0(R1) 2 3 9 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 9,10 Wait for L.D

2 S.D F4,0(R1) 5 7 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6 8 Wait for ADDIU

3 L.D F0,0(R1) 7 9 10 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 10 Wait for ADD.D

3 DADDIU R1,R1,#-8

8 9 10 Executes earlier

3 BNE R1,R2,Loop 9 Wait for ADDIU

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D

1 S.D F4,0(R1) 2 3 9 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 9,10,11 Wait for L.D

2 S.D F4,0(R1) 5 7 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6 8 Wait for ADDIU

3 L.D F0,0(R1) 7 9 10 11 Wait for BNE

3 ADD.D F4,F0,F2 7 Wait for L.D

3 S.D F4,0(R1) 8 10 Wait for ADD.D

3 DADDIU R1,R1,#-8

8 9 10 Executes earlier

3 BNE R1,R2,Loop 9 11 Wait for ADDIU

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D

1 S.D F4,0(R1) 2 3 9 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 9-11 12 Wait for L.D

2 S.D F4,0(R1) 5 7 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6 8 Wait for ADDIU

3 L.D F0,0(R1) 7 9 10 11 Wait for BNE

3 ADD.D F4,F0,F2 7 12 Wait for L.D

3 S.D F4,0(R1) 8 10 Wait for ADD.D

3 DADDIU R1,R1,#-8

8 9 10 Executes earlier

3 BNE R1,R2,Loop 9 11 Wait for ADDIU

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D

1 S.D F4,0(R1) 2 3 9 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 9 12 Wait for L.D

2 S.D F4,0(R1) 5 7 13 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6 8 Wait for ADDIU

3 L.D F0,0(R1) 7 9 10 11 Wait for BNE

3 ADD.D F4,F0,F2 7 12,13 Wait for L.D

3 S.D F4,0(R1) 8 10 Wait for ADD.D

3 DADDIU R1,R1,#-8

8 9 10 Executes earlier

3 BNE R1,R2,Loop 9 11 Wait for ADDIU

Dual issue, 2 Integer UnitIteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D F0,0(R1) 1 2 3 4 First issue

1 ADD.D F4,F0,F2 1 5 8 Wait for LD.D

1 S.D F4,0(R1) 2 3 9 Wait for ADD.D

1 DADDIU R1,R1,#-8

2 3 4 Executes earlier

1 BNE R1,R2,Loop 3 5 Wait for ADDIU

2 L.D F0,0(R1) 4 6 7 8 Wait for BNE

2 ADD.D F4,F0,F2 4 9 12 Wait for L.D

2 S.D F4,0(R1) 5 7 13 Wait for ADD.D

2 DADDIU R1,R1,#-8

5 6 7 Executes earlier

2 BNE R1,R2,Loop 6 8 Wait for ADDIU

3 L.D F0,0(R1) 7 9 10 11 Wait for BNE

3 ADD.D F4,F0,F2 7 12-14 15 Wait for L.D

3 S.D F4,0(R1) 8 10 16 Wait for ADD.D

3 DADDIU R1,R1,#-8

8 9 10 Executes earlier

3 BNE R1,R2,Loop 9 11 Wait for ADDIU

Speculative Execution

Need to overcome Branch Hazards Precise Exception

Speculative Pipeline

ISSUE/ Rename to

RS

Check for RS

Check for RAW

CDB

A1

A2

A3

A4

Wait for Operands

FP

Write Reg

Wait for Operands

EXTAC

MemAcces

LD/ST

Wait for Operands

EXInteger

M1

M2

.

.M7

Wait for Operands

DivideWait for Operands

ROB

Read Reg

The Hardware: Reorder Buffer

If inst write results in program order, reg/memory always get the correct values

Reorder buffer (ROB) – reorder out-of-order inst to program order at the time of writing reg/memory (commit)

If some inst goes wrong, handle it at the time of commit – just flush inst afterwards

Inst cannot write reg/memory immediately after execution, so ROB also buffer the results

No such a place in Tomasulo original

ReorderBufferDecode

FU1 FU2

RS RS

Fetch Unit

Rename

L-bufS-buf

DM

Regfile

IM

Issue — get instruction from FP Op QueueCondition: a free RS at the required FUActions: (1) decode the instruction; (2) allocate a RS

and ROB entry; (3) do source register renaming; (4) do dest register renaming; (5) read register file; (6) dispatch the decoded and renamed instruction to the RS and ROB

Execution — operate on operands (EX)Condition: At a given FU, At lease one instruction is

readyAction: select a ready instruction and send it to the FU

Write result — finish execution (WB)Condition: At a given FU, some instruction finishes FU

executionActions: (1) FU writes to CDB, broadcast to all RSs and

to the ROB; (2) FU broadcast tag (ROB index) to all RS; (3) de-allocate the RS. Note: no register status update at this time

Speculative Tomasulo Algorithm

Speculative Tomasulo Algorithm

Commit—update register with reorder result Condition: ROB is not empty and ROB head

inst has finished execution Actions if no mis-prediction/exception: (1)

write result to register/memory, (2) update register status, (3) de-allocate the ROB entry

Actions if with mis-prediction/exception: flush the pipeline, e.g. (1) flush IFQ; (2) clear register status; (3) flush all RS and reset FU;

(4) reset ROB

Loop: LD R2,0(R1) DADDIUR2,R2,#1 SD R2,0(R1) ;store

result

DADDIUR1,R1,#4 ;increment pointer

BNE R2,R3,LOOP ;branch if not last element

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 First issue

1 ADDIU R2,R2,#1 1

1 S.D R2,0(R1)

1 DADDIU R1,R1,#4

1 BNE R2,R3,Loop

2 L.D R2,0(R1)

2 ADDIU R2,R2,#1

2 S.D R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 L.D R2,0(R1)

3 ADDIU R2,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 First issue

1 ADDIU R2,R2,#1 1 Wait for LW

1 S.D R2,0(R1) 2

1 DADDIU R1,R1,#4 2

1 BNE R2,R3,Loop

2 L.D R2,0(R1)

2 ADDIU R2,R2,#1

2 S.D R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 L.D R2,0(R1)

3 ADDIU R2,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 First issue

1 ADDIU R2,R2,#1 1 Wait for LW

1 S.D R2,0(R1) 2 3 Wait for ADDIU

1 DADDIU R1,R1,#4 2 3

1 BNE R2,R3,Loop 3

2 L.D R2,0(R1)

2 ADDIU R2,R2,#1

2 S.D R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 L.D R2,0(R1)

3 ADDIU R2,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 Wait for LW

1 S.D R2,0(R1) 2 3 Wait for ADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3

2 L.D R2,0(R1) 4

2 ADDIU R2,R2,#1 4

2 S.D R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 L.D R2,0(R1)

3 ADDIU R2,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 Wait for LW

1 S.D R2,0(R1) 2 3 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 Wait for DADDIU

2 L.D F0,0(R1) 4 Wait for BNE

2 ADDIU R4,R2,#1 4 Wait for LW

2 S.D R2,0(R1) 5

2 DADDIU R1,R1,#4 5

2 BNE R2,R3,Loop

3 L.D F0,0(R1)

3 ADDIU R4,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for LW

1 S.D R2,0(R1) 2 3 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 Wait for DADDIU

2 L.D R2,0(R1) 4 Wait for BNE

2 ADDIU R2,R2,#1 4 Wait for LW

2 S.D R2,0(R1) 5 Wait for DADDIU

2 DADDIU R1,R1,#4 5 Wait for BNE

2 BNE R2,R3,Loop 6

3 L.D R2,0(R1)

3 ADDIU R2,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for LW

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 Wait for BNE

2 ADDIU R2,R2,#1 4 Wait for LW

2 S.D R2,0(R1) 5 Wait for DADDIU

2 DADDIU R1,R1,#4 5 Wait for BNE

2 BNE R2,R3,Loop 6 Wait for DADDIU

3 L.D R2,0(R1) 7

3 ADDIU R2,R2,#1 7

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 Wait for BNE

2 ADDIU R2,R2,#1 4 Wait for LW

2 S.D R2,0(R1) 5 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 Wait for BNE

2 BNE R2,R3,Loop 6 Wait for DADDIU

3 L.D R2,0(R1) 7 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 S.D R2,0(R1) 8

3 DADDIU R1,R1,#4 8

3 BNE R2,R3,Loop

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 9 Wait for BNE

2 ADDIU R2,R2,#1 4 Wait for LW

2 S.D R2,0(R1) 5 9 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 Wait for DADDIU

3 L.D R2,0(R1) 7 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 S.D R2,0(R1) 8 Wait for DADDIU

3 DADDIU R1,R1,#4 8 Wait for BNE

3 BNE R2,R3,Loop 9

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 Wait for LW

2 S.D R2,0(R1) 5 9 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 Wait for DADDIU

3 L.D R2,0(R1) 7 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 S.D R2,0(R1) 8 Wait for DADDIU

3 DADDIU R1,R1,#4 8 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 Wait for LW

2 S.D R2,0(R1) 5 9 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 Wait for DADDIU

3 L.D R2,0(R1) 7 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 S.D R2,0(R1) 8 Wait for DADDIU

3 DADDIU R1,R1,#4 8 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 S.D R2,0(R1) 5 9 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 Wait for DADDIU

3 L.D R2,0(R1) 7 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 S.D R2,0(R1) 8 Wait for DADDIU

3 DADDIU R1,R1,#4 8 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 S.D R2,0(R1) 5 9 13 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 13 Wait for DADDIU

3 L.D R2,0(R1) 7 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 S.D R2,0(R1) 8 Wait for DADDIU

3 DADDIU R1,R1,#4 8 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 S.D R2,0(R1) 5 9 13 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 13 Wait for DADDIU

3 L.D R2,0(R1) 7 14 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 S.D R2,0(R1) 8 Wait for DADDIU

3 DADDIU R1,R1,#4 8 14 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 S.D R2,0(R1) 5 9 13 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 13 Wait for DADDIU

3 L.D R2,0(R1) 7 14 15 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 S.D R2,0(R1) 8 15 Wait for DADDIU

3 DADDIU R1,R1,#4 8 14 15 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 S.D R2,0(R1) 5 9 13 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 13 Wait for DADDIU

3 L.D R2,0(R1) 7 14 15 16 Wait for BNE

3 ADDIU R2,R2,#1 7 Wait for LW

3 S.D R2,0(R1) 8 15 Wait for DADDIU

3 DADDIU R1,R1,#4 8 14 15 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 S.D R2,0(R1) 5 9 13 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 13 Wait for DADDIU

3 L.D R2,0(R1) 7 14 15 16 Wait for BNE

3 ADDIU R2,R2,#1 7 17 Wait for LW

3 S.D R2,0(R1) 8 15 Wait for DADDIU

3 DADDIU R1,R1,#4 8 14 15 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 S.D R2,0(R1) 5 9 13 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 13 Wait for DADDIU

3 L.D R2,0(R1) 7 14 15 16 Wait for BNE

3 ADDIU R2,R2,#1 7 17 18 Wait for LW

3 S.D R2,0(R1) 8 15 Wait for DADDIU

3 DADDIU R1,R1,#4 8 14 15 Wait for BNE

3 BNE R2,R3,Loop 9 Wait for DADDIU

Non-Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Comment

1 L.D R2,0(R1) 1 2 3 4 First issue

1 ADDIU R2,R2,#1 1 5 6 Wait for BNE

1 S.D R2,0(R1) 2 3 7 Wait for DADDIU

1 DADDIU R1,R1,#4 2 3 4 Execute directly

1 BNE R2,R3,Loop 3 7 Wait for DADDIU

2 L.D R2,0(R1) 4 8 9 10 Wait for BNE

2 ADDIU R2,R2,#1 4 11 12 Wait for LW

2 S.D R2,0(R1) 5 9 13 Wait for DADDIU

2 DADDIU R1,R1,#4 5 8 9 Wait for BNE

2 BNE R2,R3,Loop 6 13 Wait for DADDIU

3 L.D R2,0(R1) 7 14 15 16 Wait for BNE

3 ADDIU R2,R2,#1 7 17 18 Wait for LW

3 S.D R2,0(R1) 8 15 19 Wait for DADDIU

3 DADDIU R1,R1,#4 8 14 15 Wait for BNE

3 BNE R2,R3,Loop 9 19 Wait for DADDIU

Speculative execution:Dual issue, 2 CDB

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1

1 ADDIU R2,R2,#1 1

1 S.D R2,0(R1)

1 DADDIU R1,R1,#4

1 BNE R2,R3,Loop

2 L.D R2,0(R1)

2 ADDIU R2,R2,#1

2 S.D R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 L.D R2,0(R1)

3 ADDIU R2,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D F0,0(R1) 1 2

1 ADDIU R4,R2,#1 1

1 S.D R2,0(R1) 2

1 DADDIU R1,R1,#4 2

1 BNE R2,R3,Loop

2 L.D F0,0(R1)

2 ADDIU R4,R2,#1

2 S.D R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 L.D F0,0(R1)

3 ADDIU R4,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1 2 3

1 ADDIU R2,R2,#1 1

1 S.D R2,0(R1) 2 3

1 DADDIU R1,R1,#4 2 3

1 BNE R2,R3,Loop 3

2 L.D R2,0(R1)

2 ADDIU R2,R2,#1

2 S.D R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 L.D R2,0(R1)

3 ADDIU R2,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1 2 3 4

1 ADDIU R2,R2,#1 1

1 S.D R2,0(R1) 2 3

1 DADDIU R1,R1,#4 2 3 4

1 BNE R2,R3,Loop 3

2 L.D R2,0(R1) 4

2 ADDIU R2,R2,#1 4

2 S.D R2,0(R1)

2 DADDIU R1,R1,#4

2 BNE R2,R3,Loop

3 L.D R2,0(R1)

3 ADDIU R2,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D F0,0(R1) 1 2 3 4 5

1 ADDIU R4,R2,#1 1 5

1 S.D R2,0(R1) 2 3

1 DADDIU R1,R1,#4 2 3 4

1 BNE R2,R3,Loop 3

2 L.D F0,0(R1) 4 5

2 ADDIU R4,R2,#1 4

2 S.D R2,0(R1) 5

2 DADDIU R1,R1,#4 5

2 BNE R2,R3,Loop

3 L.D F0,0(R1)

3 ADDIU R4,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1 2 3 4 5

1 ADDIU R2,R2,#1 1 5 6

1 S.D R2,0(R1) 2 3

1 DADDIU R1,R1,#4 2 3 4

1 BNE R2,R3,Loop 3

2 L.D R2,0(R1) 4 5 6

2 ADDIU R2,R2,#1 4

2 S.D R2,0(R1) 5 6

2 DADDIU R1,R1,#4 5 6

2 BNE R2,R3,Loop 6

3 L.D R2,0(R1)

3 ADDIU R2,R2,#1

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1 2 3 4 5

1 ADDIU R2,R2,#1 1 5 6 7

1 S.D R2,0(R1) 2 3 7

1 DADDIU R1,R1,#4 2 3 4

1 BNE R2,R3,Loop 3 7

2 L.D R2,0(R1) 4 5 6 7

2 ADDIU R2,R2,#1 4

2 S.D R2,0(R1) 5 6

2 DADDIU R1,R1,#4 5 6 7

2 BNE R2,R3,Loop 6

3 L.D R2,0(R1) 7

3 ADDIU R2,R2,#1 7

3 S.D R2,0(R1)

3 DADDIU R1,R1,#4

3 BNE R2,R3,Loop

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1 2 3 4 5

1 ADDIU R2,R2,#1 1 5 6 7

1 S.D R2,0(R1) 2 3 7 7

1 DADDIU R1,R1,#4 2 3 4 8

1 BNE R2,R3,Loop 3 7 8

2 L.D R2,0(R1) 4 5 6 7

2 ADDIU R2,R2,#1 4 8

2 S.D R2,0(R1) 5 6

2 DADDIU R1,R1,#4 5 6 7

2 BNE R2,R3,Loop 6

3 L.D R2,0(R1) 7 8

3 ADDIU R2,R2,#1 7

3 S.D R2,0(R1) 8

3 DADDIU R1,R1,#4 8

3 BNE R2,R3,Loop

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1 2 3 4 5

1 ADDIU R2,R2,#1 1 5 6 7

1 S.D R2,0(R1) 2 3 7 7

1 DADDIU R1,R1,#4 2 3 4 8

1 BNE R2,R3,Loop 3 7 8

2 L.D R2,0(R1) 4 5 6 7 9

2 ADDIU R2,R2,#1 4 8 9

2 S.D R2,0(R1) 5 6

2 DADDIU R1,R1,#4 5 6 7

2 BNE R2,R3,Loop 6

3 L.D R2,0(R1) 7 8 9

3 ADDIU R2,R2,#1 7

3 S.D R2,0(R1) 8 9

3 DADDIU R1,R1,#4 8 9

3 BNE R2,R3,Loop 9

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1 2 3 4 5

1 ADDIU R2,R2,#1 1 5 6 7

1 S.D R2,0(R1) 2 3 7 7

1 DADDIU R1,R1,#4 2 3 4 8

1 BNE R2,R3,Loop 3 7 8

2 L.D R2,0(R1) 4 5 6 7 9

2 ADDIU R2,R2,#1 4 8 9 10

2 S.D R2,0(R1) 5 6 10

2 DADDIU R1,R1,#4 5 6 7

2 BNE R2,R3,Loop 6 10

3 L.D R2,0(R1) 7 8 9 10

3 ADDIU R2,R2,#1 7

3 S.D R2,0(R1) 8 9

3 DADDIU R1,R1,#4 8 9 10

3 BNE R2,R3,Loop 9

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1 2 3 4 5

1 ADDIU R2,R2,#1 1 5 6 7

1 S.D R2,0(R1) 2 3 7 7

1 DADDIU R1,R1,#4 2 3 4 8

1 BNE R2,R3,Loop 3 7 8

2 L.D R2,0(R1) 4 5 6 7 9

2 ADDIU R2,R2,#1 4 8 9 10

2 S.D R2,0(R1) 5 6 10

2 DADDIU R1,R1,#4 5 6 7 11

2 BNE R2,R3,Loop 6 10 11

3 L.D R2,0(R1) 7 8 9 10

3 ADDIU R2,R2,#1 7 11

3 S.D R2,0(R1) 8 9

3 DADDIU R1,R1,#4 8 9 10

3 BNE R2,R3,Loop 9

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1 2 3 4 5

1 ADDIU R2,R2,#1 1 5 6 7

1 S.D R2,0(R1) 2 3 7 7

1 DADDIU R1,R1,#4 2 3 4 8

1 BNE R2,R3,Loop 3 7 8

2 L.D R2,0(R1) 4 5 6 7 9

2 ADDIU R2,R2,#1 4 8 9 10

2 S.D R2,0(R1) 5 6 10

2 DADDIU R1,R1,#4 5 6 11

2 BNE R2,R3,Loop 6 10 11

3 L.D R2,0(R1) 7 8 9 10 12

3 ADDIU R2,R2,#1 7 11 12

3 S.D R2,0(R1) 8 9

3 DADDIU R1,R1,#4 8 9 10

3 BNE R2,R3,Loop 9

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1 2 3 4 5

1 ADDIU R2,R2,#1 1 5 6 7

1 S.D R2,0(R1) 2 3 7 7

1 DADDIU R1,R1,#4 2 3 4 8

1 BNE R2,R3,Loop 3 7 8

2 L.D R2,0(R1) 4 5 6 7 9

2 ADDIU R2,R2,#1 4 8 9 10

2 S.D R2,0(R1) 5 6 10

2 DADDIU R1,R1,#4 5 6 11

2 BNE R2,R3,Loop 6 10 11

3 L.D R2,0(R1) 7 8 9 10 12

3 ADDIU R2,R2,#1 7 11 12 13

3 S.D R2,0(R1) 8 9 13

3 DADDIU R1,R1,#4 8 9 10

3 BNE R2,R3,Loop 9 13

Speculative execution:Dual issue, 2 CDB

Iteration

Instructions Issues Executes Mem access

Write CDB

Commit

1 L.D R2,0(R1) 1 2 3 4 5

1 ADDIU R2,R2,#1 1 5 6 7

1 S.D R2,0(R1) 2 3 7 7

1 DADDIU R1,R1,#4 2 3 4 8

1 BNE R2,R3,Loop 3 7 8

2 L.D R2,0(R1) 4 5 6 7 9

2 ADDIU R2,R2,#1 4 8 9 10

2 S.D R2,0(R1) 5 6 10

2 DADDIU R1,R1,#4 5 6 11

2 BNE R2,R3,Loop 6 10 11

3 L.D R2,0(R1) 7 8 9 10 12

3 ADDIU R2,R2,#1 7 11 12 13

3 S.D R2,0(R1) 8 9 13

3 DADDIU R1,R1,#4 8 9 10 14

3 BNE R2,R3,Loop 9 13 14

IDEAL/Perfect Processor

Register renaming Infinite virtual registers available

Branch prediction All conditional branches are predicted

exactly Jump prediction

All jumps are perfectly predicted Memory address alias analysis

All memory addresses are known exactly.

ILP perfect processor for six SPEC92

Programs

Instr

ucti

on

Issu

es p

er

cycle

0

20

40

60

80

100

120

140

160

gcc espresso li fpppp doducdtomcatv

54.862.6

17.9

75.2

118.7

150.1

Effects of reducing the size of the window

Infinite 2k 512 128 32 8 4

160

140

120

100

80

60

40

20

0

Window size

Instruction issues per cycle

Tomcatv

Doduc

Fpppp

li

Practical possibilities

Another View of Last SlideIPC

Program

Instr

ucti

on

issu

es p

er

cycle

gcc espresso li fpppp

Infinite 2K 512 128 32

doduct

0

10

20

30

40

50

6055

63

18

75

36

41

15

61

10

1512

49

13 11

35

8 8 9

14

10

119

59

16 15

9

150

60

45

34

14

tomcatv

70

80

120

130

140

Window Size

Effect of branch-prediction schemes(1)

Instruction issues per cycle

Perfect Tournament Standard Static None

predictor 2-bit

60

50

40

30

20

10

0

Branch-prediction scheme

fpppp

Doduc

Tomcatv

li

Practical possibilities

Effect of branch-prediction schemes(2)

Program

Instr

ucti

on

issu

es p

er

cycle

0

10

20

30

40

50

60

gcc espresso li fpppp doducd tomcatv

35

41

16

61

5860

9

1210

48

15

67 6

46

13

45

6 6 7

45

14

45

2 2 2

29

4

19

46

Perfect Selective predictor Standard 2-bit Static None

Branch-prediction accuracy for conditional branches in SPEC92

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Frequency of mispredictions

88% 77%

86% 82%

li

espresso

fpppp

tomcatv

86% 82%

99% 99% 100

%

98%

96%

98%

Profile-based

2-bit counter

Tournament

Intl processor based on the p6 micro- architecture

Processor First ship date

Clock rate range

L1 cache L2 cache

Pentium Pro

1995 100-200 MHz

8KB instr. + 8KB data

256 KB-1024 KB

Pentium II 1998 233-450 MHz

16KB instr. + 16KB data

256 KB-512 KB

Pentium II Xeon

1999 400-450 MHz

16KB instr. + 16KB data

512 KB-2 MB

Celeron 1999 500-900 MHz

16KB instr. + 16KB data

128 KB

Pentium III 1999 450-1100 MHz

16KB instr. + 16KB data

256 KB–512 KB

Pentium Xeon

2000 700-900 MHz

16KB instr. + 16KB data

1 MB-2 MB

P6 Architecture (P-II Onwards…)

Instruction name

Pipeline stages

Repeat rate

Integer ALU 1 1

Integer Load 3 1

Integer Multiply 4 1

FP Add 3 1

FP multiply 5 2

FP divide (64-bits)

32 32

P6 processor pipeline

Instruction

Fetch

16 bytes

Per cycle

16 bytesInstruction

Decode

3 instructions

Per cycle

6 uopsRenaming

3 upos

Per cycle

Reservation station

(20)Execution unit

(5 total)

Graduation unit

(3 uops per cycle)

Reorder buffer

(40 entries)The P6 processor pipeline showing the

throughput of each stage and the total buffering provided between stages:

Speculation factor

Percentage of instructions that do not commit in Pentium 3

Ben

chm

ark

s

0

10

20

30

40

50

60

gcc tomcatv perl compressgo li vortex apsi fpppp hydro2d

Performance: Pentium 4 vs IIISpec

rati

o

0

200

400

600

800

1000

gcc mgridvortex applu

SPEC2000 benchmarks