1
André Seznec Caps Team
IRISA/INRIA
A 256 Kbits L-TAGE branch predictor
André Seznec
IRISA/INRIA/HIPEAC
2
André SeznecCaps Team
Irisa
Directly derived from:
A case for (partially) tagged branch predictors, A. Seznec and P. Michaud JILP Feb. 2006
+Tricks:
Loop predictorKernel/user histories
3
André SeznecCaps Team
Irisa
TAGE:TAgged GEometric history length predictors
The genesis
4
André SeznecCaps Team
Irisa
Back around 2003
2bcgskew was state-of-the-art, but: but was lagging behind neural inspired
predictors on a few benchmarks Just wanted to get best of both behaviors
and maintain: Reasonable implementation cost:
• Use only global history • Medium number of tables
In-time response
5
André SeznecCaps Team
Irisa
L(0) ?
L(4)
L(3)
L(2)L(1)
TOT1
T2
T3
T4
The basis : A Multiple length global history predictor
6
André SeznecCaps Team
Irisa
GEometric History Length predictor
L(1)1iαL(i)
0 L(0)
The set of history lengths forms a geometric series
What is important: L(i)-L(i-1) is drastically increasing
most of the storage for short history !!
{0, 2, 4, 8, 16, 32, 64, 128}
Capture correlation on very long histories
7
André SeznecCaps Team
Irisa
Combining multiple predictions ?
Classical solution: Use of a meta predictor
“wasting” storage !?! chosing among 5 or 10 predictions ??
Neural inspired predictors, Jimenez and Lin 2001 Use an adder tree instead of a meta-predictor
Partial matching Use tagged tables and the longest matching historyChen et al 96, Michaud 2005
8
André SeznecCaps Team
Irisa
L(0) ∑
L(4)
L(3)
L(2)L(1)
TOT1
T2
T3
T4
CBP-1 (2004): OGEHL
Final computation through a sum
Prediction=Sign
12 components 3.670 misp/KI
9
André SeznecCaps Team
Irisa
pc h[0:L1]
ctru
tag
hash hash
=?
ctru
tag
hash hash
=?
ctru
tag
hash hash
=?
prediction
pc pc h[0:L2] pc h[0:L3]
11 1 1 1 1 1
1
1
TAGEGeometric history length + PPM-like
+ optimized update policy
Tagless base predictor
10
André SeznecCaps Team
Irisa
=? =? =?
11 1 1 1 1 1
1
1
Hit
Hit
Altpred
Pred
Miss
11
André SeznecCaps Team
Irisa
Prediction computation
General case: Longest matching component provides the prediction
Special case: Many mispredictions on newly allocated entries: weak Ctr
On many applications, Altpred more accurate than Pred Property dynamically monitored through a single 4-bit
counter
12
André SeznecCaps Team
Irisa
TAGE update policy
General principle:
Minimize the footprint of the prediction.
Just update the longest history
matching component and allocate at most one entry on mispredictions
13
André SeznecCaps Team
Irisa
A tagged table entry
Ctr: 3-bit prediction counter U: 2-bit useful counter
Was the entry recently useful ? Tag: partial tag
Tag CtrU
14
André SeznecCaps Team
Irisa
Updating the U counter
If (Altpred ≠ Pred) then• Pred = taken : U= U + 1• Pred ≠ taken : U = U - 1
Graceful aging:Periodic shift of all U counters• implemented through the reset of a single bit
15
André SeznecCaps Team
Irisa
Allocating a new entry on a misprediction
Find a single “useless” entry with a longer history: Priviledge the smallest possible history
• To minimize footprint But not too much
• To avoid ping-pong phenomena
Initialize Ctr as weak and U as zero
16
André SeznecCaps Team
Irisa
Improve the global history
Address + conditional branch history: path confusion on short histories
Address + path: Direct hashing leads to path confusion
1. Represent all branches in branch history
2. Use also path history ( 1 bit per branch, limited to 16 bits)
17
André SeznecCaps Team
Irisa
Design tradeoff for CBP2 (1)
13 components:Bring the best accuracy on distributed traces
• 8 components not very far !
History length:Min=4 , Max = 640
Could use any Min in [2,6] and any Max in [300, 2000]
18
André SeznecCaps Team
Irisa
Design tradeoff for CBP2 (2)
Tag width tradeoff: (destructive) false match is better tolerated
on shorter history7 bits on T1 to 15 bits on T12
Tuning the number of table entries:Smaller number for very long historiesSmaller number for very short histories
19
André SeznecCaps Team
Irisa
Adding a loop predictor
The loop predictor captures the number of iterations of a loop
When successively encounters 4 times the same number of iterations, the loop predictor provides the prediction.
Advantages: Very reliable Small storage budget: 256 52-bit entries
Complexity ? Might be difficult to manage speculative iteration numbers on
deep pipelines
20
André SeznecCaps Team
Irisa
Using a kernel history and a user history
Traces mix user and kernel activities: Kernel activity after exception
• Global history pollution
Solution: use two separate global histories
User history is updated only in user mode Kernel history is updated in both modes
21
André SeznecCaps Team
Irisa
L-TAGE submission accuracy (distributed traces)
3.314 misp/KI
22
André SeznecCaps Team
Irisa
Reducing L-TAGE complexity
Included 241,5 Kbits TAGE predictor:
3.368 misp/KI
Loop predictor beneficial only on gzip:
Might not be worth the extra complexity
23
André SeznecCaps Team
Irisa
Using less tables
8 components 256 Kbits TAGE predictor:3.446 misp/KI
24
André SeznecCaps Team
Irisa
TAGE prediction computation time ?
3 successive steps: Index computation Table read Partial match + multiplexor
Does not fit on a single cycle: But can be ahead pipelined !
25
André SeznecCaps Team
Irisa
Ahead pipelining a global history branch predictor (principle)
Initiate branch prediction X+1 cycles in advance to provide the prediction in time Use information available:
• X-block ahead instruction address• X-block ahead history
To ensure accuracy: Use intermediate path information
26
André SeznecCaps Team
Irisa
Practice
Ahead pipelined TAGE:4// prediction computations
bc
Ha
A
A B C
27
André SeznecCaps Team
Irisa
3-branch ahead pipelined 8 component 256 Kbits TAGE
3.552 misp/KI
28
André SeznecCaps Team
Irisa
A final case for the Geometric History Length predictors
delivers state-of-the-art accuracy
uses only global information: Very long history: 300+ bits !!
can be ahead pipelined
many effective design points OGEHL or TAGE Nb of tables, history lengths
29
André SeznecCaps Team
Irisa
The End