Date post: | 18-Jan-2018 |
Category: |
Documents |
Upload: | osborne-dorsey |
View: | 215 times |
Download: | 0 times |
COMPSYS 304
Computer ArchitectureSpeculation & Branching
Morning visitors - Paradise Bay, Bay of Islands
Speculation• High Tech Gambling?• Data Prefetch
• Cache instruction dcbt : data cache block touch
• Attempts to bring data into cache• so that it will be “close” when needed
• Allows SIU to use idle bus bandwidth• if there’s no spare bandwidth,
this read can be given low priority• Speculative because
• a branch may occur before it’s used• we speculate that this data may be needed
PowerPC mnemonic -Similar opcodes found in other architectures:SPARC v9, MIPS, …
Speculation - General• Some functional units almost always idle
• Make them do some (possibly useful) workrather than idle
• If the speculation was incorrect, results are simply abandoned
• No loss in efficiency; Chance of a gain• Researchers are actively looking at
software prefetch schemes• Fetch data well before it’s needed• Reduce latency when it’s actually needed
• Speculative operations have low priority and use idle resources
Branching• Expensive
• 2-3 cycles lost in pipeline• All instructions following branch ‘flushed’
• Bandwidth wasted fetching unused instructions• Stall while branch target is fetched
• We can speculate about the target of a branch• Terminology
• Branch Target : address to which branch jumps
• Branch Taken : control transfers to non- sequential address (target)
• Branch Not Taken : next instruction is executed
Branching - Prediction• Branches can be
• unconditional: branch is always taken call subroutine return from subroutine
• conditional: branch depends on state of computation, eg
has loop terminated yet?• Unconditional branches are simple
• New instructions are fetched as soon as the branch is recognized
• As early in the pipeline as possible • Branch units often placed with fetch &
decode stages
Branching - Branch Unit• PowerPC 603 logical layout
Branching - Speculation• We have the following code: if ( cond ) s1; else s2;
• Superscalar machine • Multiple functional units• Start executing both branches (s1 and s2)• Keep idle functional units busy!
• One is speculative and will be abandoned• Processor will eventually calculate the branch
condition and select which result should be retained (written back)
• MIPS R10000 - up to 4 speculative at once
Branching - Speculation• MIPS R10000 -
• Up to 4 speculative at once• Instructions are “tagged” with a 4 bit mask
• Indicates to which branch instruction it belongs
• As soon as condition is determined,mis-predicted instructions are aborted
Branching - Prediction• We have a sequence of instructions:
addlw
sub brne L1 or st
? If you were asked to guess which branch should be preferred, which would you choose:
? Next sequential instruction (L2)? Branch target (L1)
L2
L1 Some mixture of arithmetic,load, store, etc, instructions
branch on some condition
Some more arithmetic,load, store, etc, instructions
Branching - Prediction• Studies show that branches are taken
most of the time!• Because of loops:
add ;any mix of arith,lw ;load, store, etc,
sub ;instructionsbrne L1 ;branch back to loop start
or ;some more arith,st ;memory, etc instructions
L2
L1
Branching - Prediction Rule• A simple prediction rule:
• Take backward branches works amazingly well!• For a loop with n iterations,
this is wrong in 1/n cases only!• A system working on this rule alone would
• detect the backward branch and • start fetching from the branch target
rather than the next instruction
Branching - Improving the prediction• Static prediction systems
• Compiler can mark branches• Likely to be taken or not
• Instruction fetch unit will use the marking as advice on which instruction to fetch
• Compiler often able to give the right advice • Loops are easily detected• Other patterns in conditions can be recognized
• Checking for EOF when reading a file• Error checking
Branching - Improving the prediction• Dynamic prediction systems
• Program history determines most likely branch• Branch Target Buffers - Another cache!
Branching - Branch Target Buffer• Instruction Add[11:3] selects BTB entry• Tag determines “hit”• Stats select taken/not taken
Pentium 4>91% prediction
accuracy -4K entry BHT
(Branch History Table)G4e – 2K entries
Superscalar - summary• Superscalar machines have multiple
functional units (FUs)eg 2 x integer ALU, 1 x FPU, 1 x branch, 1 x
load/store• Requires complex IFU
• Able to issue multiple instructions/cycle (typ 4)• Able to detect hazards (unavailability of
operands)• Able to re-order instruction issue
• Aim to keep all the FUs busy• Typically, 6-way superscalars can achieve
instruction level parallelism of 2-3