Superscalar - summary
• Superscalar machines have multiple functional units (FUs)eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x
load/store
• Requires complex IFU• Able to issue multiple instructions/cycle (typ 4)• Able to detect hazards (unavailability of
operands)• Able to re-order instruction issue
• Aim to keep all the FUs busy
• Typically, 6-way superscalars can achieveinstruction level parallelism of 2-3
Computer Architecture
Speculation & Branching
Iolanthe II approaches Rangitoto
Speculation
• High Tech Gambling?• Data Prefetch
• Cache instruction dcbt : data cache block touch
• Attempts to bring data into cache• so that it will be “close” when needed
• Allows SIU to use idle bus bandwidth• if there’s no spare bandwidth,
this read can be given low priority• Speculative because
• a branch may occur before it’s used• we speculate that this data may be needed
PowerPC mnemonic -Similar opcodes found in other architectures:SPARC v9, MIPS, …
Speculation - General
• Some functional units almost always idle• Make them do some (possibly useful) work
rather than idle• If the speculation was incorrect,
results are simply abandoned• No loss in efficiency; Chance of a gain
• Researchers are actively looking at software prefetch schemes• Fetch data well before it’s needed• Reduce latency when it’s actually needed
• Speculative operations have low priority and use idle resources
Branching
• Expensive• 2-3 cycles lost in pipeline
• All instructions following branch ‘flushed’
• Bandwidth wasted fetching unused instructions• Stall while branch target is fetched
• We can speculate about the target of a branch• Terminology
• Branch Target : address to which branch jumps
• Branch Taken : control transfers to non- sequential address (target)
• Branch Not Taken : next instruction is executed
Branching - Prediction
• Branches can be• unconditional: branch is always taken
call subroutine return from subroutine
• conditional: branch depends on state of computation, eghas loop terminated yet?
• Unconditional branches are simple• New instructions are fetched as soon as the
branch is recognized • As early in the pipeline as possible
• Branch units often placed with fetch & decode stages
Branching - Branch Unit
• PowerPC 603 logical layout
Branching - Speculation
• We have the following code: if ( cond ) s1; else s2;
• Superscalar machine • Multiple functional units• Start executing both branches (s1 and s2)• Keep idle functional units busy!
• One is speculative and will be abandoned• Processor will eventually calculate the branch
condition and select which result should be retained (written back)
• MIPS R10000 - up to 4 speculative at once
Branching - Speculation
• MIPS R10000 - • Up to 4 speculative at once• Instructions are “tagged” with a 4 bit mask
• Indicates to which branch instruction it belongs
• As soon as condition is determined,mis-predicted instructions are aborted
Branching - Prediction• We have a sequence of instructions:
addlw
sub brne L1 or st
? If you were asked to guess which branch should be preferred, which would you choose:
? Next sequential instruction (L2)? Branch target (L1)
L2
L1 Some mixture of arithmetic,load, store, etc, instructions
branch on some condition
Some more arithmetic,load, store, etc, instructions
Branching - Prediction
• Studies show that branches are taken most of the time!
• Because of loops:
add ;any mix of arith,lw ;load, store, etc,
sub ;instructionsbrne L1 ;branch back to loop start
or ;some more arith,st ;memory, etc instructions
L2
L1
Branching - Prediction Rule
• A simple prediction rule:• Take backward branches
works amazingly well!• For a loop with n iterations,
this is wrong in 1/n cases only!• A system working on this rule alone would
• detect the backward branch and • start fetching from the branch target
rather than the next instruction
Branching - Improving the prediction
• Static prediction systems• Compiler can mark branches
• Likely to be taken or not• Instruction fetch unit will use the marking as
advice on which instruction to fetch
• Compiler often able to give the right advice • Loops are easily detected• Other patterns in conditions can be recognized
• Checking for EOF when reading a file• Error checking
Branching - Improving the prediction
• Dynamic prediction systems• Program history determines most likely branch• Branch Target Buffers - Another cache!
Branching - Branch Target Buffer
• Instruction Add[11:3] selects BTB entry• Tag determines “hit”• Stats select taken/not taken
R1000087% prediction
accuracy -SPEC’92 integer