A 256 Kbits L-TAGE branch predictor

1

André Seznec Caps Team

IRISA/INRIA

A 256 Kbits L-TAGE branch predictor

André Seznec

IRISA/INRIA/HIPEAC

2

André SeznecCaps Team

Irisa

Directly derived from:

A case for (partially) tagged branch predictors, A. Seznec and P. Michaud JILP Feb. 2006

+Tricks:

Loop predictorKernel/user histories

3


Irisa

TAGE:TAgged GEometric history length predictors

The genesis

4


Irisa

Back around 2003

2bcgskew was state-of-the-art, but: but was lagging behind neural inspired

predictors on a few benchmarks Just wanted to get best of both behaviors

and maintain: Reasonable implementation cost:

• Use only global history • Medium number of tables

In-time response

5


Irisa

L(0) ?

L(4)

L(3)

L(2)L(1)

TOT1

T2

T3

T4

The basis : A Multiple length global history predictor

6


Irisa

GEometric History Length predictor

L(1)1iαL(i)

0 L(0)

The set of history lengths forms a geometric series

What is important: L(i)-L(i-1) is drastically increasing

most of the storage for short history !!

{0, 2, 4, 8, 16, 32, 64, 128}

Capture correlation on very long histories

7


Irisa

Combining multiple predictions ?

Classical solution: Use of a meta predictor

“wasting” storage !?! chosing among 5 or 10 predictions ??

Neural inspired predictors, Jimenez and Lin 2001 Use an adder tree instead of a meta-predictor

Partial matching Use tagged tables and the longest matching historyChen et al 96, Michaud 2005

8


Irisa

L(0) ∑

L(4)

L(3)

L(2)L(1)

TOT1

T2

T3

T4

CBP-1 (2004): OGEHL

Final computation through a sum

Prediction=Sign

12 components 3.670 misp/KI

9


Irisa

pc h[0:L1]

ctru

tag

hash hash

=?

ctru

tag

hash hash

=?

ctru

tag

hash hash

=?

prediction

pc pc h[0:L2] pc h[0:L3]

11 1 1 1 1 1

1

1

TAGEGeometric history length + PPM-like

+ optimized update policy

Tagless base predictor

10


Irisa

=? =? =?

11 1 1 1 1 1

1

1

Hit

Hit

Altpred

Pred

Miss

11


Irisa

Prediction computation

General case: Longest matching component provides the prediction

Special case: Many mispredictions on newly allocated entries: weak Ctr

On many applications, Altpred more accurate than Pred Property dynamically monitored through a single 4-bit

counter

12


Irisa

TAGE update policy

General principle:

Minimize the footprint of the prediction.

Just update the longest history

matching component and allocate at most one entry on mispredictions

13


Irisa

A tagged table entry

Ctr: 3-bit prediction counter U: 2-bit useful counter

Was the entry recently useful ? Tag: partial tag

Tag CtrU

14


Irisa

Updating the U counter

If (Altpred ≠ Pred) then• Pred = taken : U= U + 1• Pred ≠ taken : U = U - 1

Graceful aging:Periodic shift of all U counters• implemented through the reset of a single bit

15


Irisa

Allocating a new entry on a misprediction

Find a single “useless” entry with a longer history: Priviledge the smallest possible history

• To minimize footprint But not too much

• To avoid ping-pong phenomena

Initialize Ctr as weak and U as zero

16


Irisa

Improve the global history

Address + conditional branch history: path confusion on short histories

Address + path: Direct hashing leads to path confusion

1. Represent all branches in branch history

2. Use also path history ( 1 bit per branch, limited to 16 bits)

17


Irisa

Design tradeoff for CBP2 (1)

13 components:Bring the best accuracy on distributed traces

• 8 components not very far !

History length:Min=4 , Max = 640

Could use any Min in [2,6] and any Max in [300, 2000]

18


Irisa

Design tradeoff for CBP2 (2)

Tag width tradeoff: (destructive) false match is better tolerated

on shorter history7 bits on T1 to 15 bits on T12

Tuning the number of table entries:Smaller number for very long historiesSmaller number for very short histories

19


Irisa

Adding a loop predictor

The loop predictor captures the number of iterations of a loop

When successively encounters 4 times the same number of iterations, the loop predictor provides the prediction.

Advantages: Very reliable Small storage budget: 256 52-bit entries

Complexity ? Might be difficult to manage speculative iteration numbers on

deep pipelines

20


Irisa

Using a kernel history and a user history

Traces mix user and kernel activities: Kernel activity after exception

• Global history pollution

Solution: use two separate global histories

User history is updated only in user mode Kernel history is updated in both modes

21


Irisa

L-TAGE submission accuracy (distributed traces)

3.314 misp/KI

22


Irisa

Reducing L-TAGE complexity

Included 241,5 Kbits TAGE predictor:

3.368 misp/KI

Loop predictor beneficial only on gzip:

Might not be worth the extra complexity

23


Irisa

Using less tables

8 components 256 Kbits TAGE predictor:3.446 misp/KI

24


Irisa

TAGE prediction computation time ?

3 successive steps: Index computation Table read Partial match + multiplexor

Does not fit on a single cycle: But can be ahead pipelined !

25


Irisa

Ahead pipelining a global history branch predictor (principle)

Initiate branch prediction X+1 cycles in advance to provide the prediction in time Use information available:

• X-block ahead instruction address• X-block ahead history

To ensure accuracy: Use intermediate path information

26


Irisa

Practice

Ahead pipelined TAGE:4// prediction computations

bc

Ha

A

A B C

27


Irisa

3-branch ahead pipelined 8 component 256 Kbits TAGE

3.552 misp/KI

28


Irisa

A final case for the Geometric History Length predictors

delivers state-of-the-art accuracy

uses only global information: Very long history: 300+ bits !!

can be ahead pipelined

many effective design points OGEHL or TAGE Nb of tables, history lengths

29


Irisa

The End

Date post:	12-Jan-2016
Category:	Documents
Upload:	bridie
View:	75 times
Download:	6 times

A 256 Kbits L-TAGE branch predictor

Documents