Branch Mispredictions in Quicksortconrado/research/talks/aofa06.pdf · Introduction I Modern...

Branch Mispredictions in Quicksort

K. Kaligosi1 C. Martínez2 P. Sanders3

1Max-Planck-Inst., Germany

2Univ. Politècnica de Catalunya, Spain

3Univ. Karlsruhe, Germany

AofA 2006

Alden Biesen, Belgium

Introduction

I Modern hardware executes several sequential

instructions in a pipelined fashion

I Jump instructions pose a major challenge!

I So we try to predict which branch will be taken ...

I Branch mispredictions are expensive: we have to

rollback the pipeline

Introduction







Introduction







Introduction







Introduction

I In comparison-based algorithms, we want

comparisons to yield as much information as

possible =) difficult to predict!

I In static branch prediction, jump instructions are

statically predicted as TAKEN or NOT TAKEN

I In dynamic branch prediction, the hardware predictswhat to do during execution, taking the past intoaccount

I 1-bit: We predict the instruction will take the same

direction it took the last time it was executedI 2-bit: We must be wrong twice before we change

the predictionI . . .

Introduction










Introduction










Introduction








direction it took the last time it was executed

I 2-bit: We must be wrong twice before we change


Introduction









the prediction

I . . .

Introduction










2-bit Predictor

00

PNTPNT

01

02

PT

03

PT

T

NT

T

NT

NT

T

T

NT

Partition

// We have to partition A[i::j] around the pivot

// that we have already put on A[i]int l = i; int u = j + 1; Elem pv = A[i];

for ( ; ; ) {

do ++l; while(A[l] < pv); // Loop S

do --u; while(A[u] > pv); // Loop G

if (l >= u) break;

swap(A[l], A[u]);

};

swap(A[i], A[u]); k = u;

}

Setting up the Recurrences

I Probability that the chosen pivot is the kthsmallest element out of the n: �n;k

I Average number of branch mispredictions when

partitioning an array of size n and the pivot is the

kth: bn;k

I Average number of branch mispredictions whan

partitioning an array of size n:

bn =X

1�k�n

�n;k � bn;k





kth: bn;k



bn =X

1�k�n

�n;k � bn;k





kth: bn;k



bn =X

1�k�n

�n;k � bn;k


I Average number of branch mispredictions Bn to

sort n elements:

Bn = bn +nXk=1

�n;k � (Bk�1 +Bn�k)

I We will later consider the total cost Tn which

satisfies the same recurrence with toll function

tn = n+ � � bn + o(n)


I Average number of branch mispredictions Bn to

sort n elements:

Bn = bn +nXk=1

�n;k � (Bk�1 +Bn�k)

I We will later consider the total cost Tn which

satisfies the same recurrence with toll function

tn = n+ � � bn + o(n)

Sampling

I It is well-known that using samples to select the

pivot of each recursive stage improves the

average performance of quicksort and reduces the

probability of worst-case behavior

I For quicksort with samples of size s from which

we pick the (p+ 1)th element as the pivot, we have

�n;k =

�k�1p

�� n�ks�1�p

��ns

�

Sampling

I It is well-known that using samples to select the

pivot of each recursive stage improves the

average performance of quicksort and reduces the

probability of worst-case behavior

I For quicksort with samples of size s from which

we pick the (p+ 1)th element as the pivot, we have

�n;k =

�k�1p

�� n�ks�1�p

��ns

�

Sampling

I A typical case is to pick the median of the sample

with s = 2t+ 1 and p = t

I We can use variable-size samples with s = s(n);then s!1 as n!1 but must grow sublinearly,

s = o(n); we use to denote the relative rank of

the pivot within the sample =) e.g., = 1=2 means

choosing the median of the sample

Sampling

I A typical case is to pick the median of the sample

with s = 2t+ 1 and p = t

I We can use variable-size samples with s = s(n);then s!1 as n!1 but must grow sublinearly,

s = o(n); we use to denote the relative rank of

the pivot within the sample =) e.g., = 1=2 means

choosing the median of the sample

General results

Theorem

The average number of branch mispredictions to sort

n elements with quicksort using samples of size s and

choosing the (p+ 1)th in the sample of each stage is

Bn =�(s; p)

H(s; p)n lnn+O(n);

where

H(s; p) = Hs+1 �p+ 1

s+ 1Hp+1 �

s� p

s+ 1Hs�p:

and

�(s; p) = limn!1

bnn

= limn!1

1

n

X1�k�n

�(s;p)n;k bn;k

General results

Theorem

For variable-sized sampling, if s!1 as n!1 with

s = o(n), and p=s! then

Bn =�( )

H( )n lnn+ o(n logn);

with �( ) = limn!1 �(s; � s+ o(s)) and

H(x) = �(x lnx+ (1� x) ln(1� x))

General results

Theorem

The total cost Tn of quicksort is given by

Tn =1 + � � �(s; p)

H(s; p)n lnn+O(n); s = �(1)

and

Tn =1 + � � �( )

H( )n lnn+ o(n logn); s = !(1); s = o(n)

General results

I In order to compute �(s; p), we can use, under

suitable conditions,

�(s; p) =s!

p!(s� 1� p)!

Z 1

0xp(1� x)s�1�pb(x) dx

with

b(x) = limn!1

bn;x�nn

I Computing �( ) is easier!

�( ) = b( )

General results

I In order to compute �(s; p), we can use, under

suitable conditions,

�(s; p) =s!

p!(s� 1� p)!

Z 1

0xp(1� x)s�1�pb(x) dx

with

b(x) = limn!1

bn;x�nn

I Computing �( ) is easier!

�( ) = b( )

General results

I The optimal value � for minimizes the total

cost, i.e., minimizes

��( ) =1 + � � �( )

H( )

and depends on �

I It’s not difficult to prove that for any s and p,

�(s; p)

H(s; p)>�( �)

H( �)

General results

I The optimal value � for minimizes the total

cost, i.e., minimizes

��( ) =1 + � � �( )

H( )

and depends on �

I It’s not difficult to prove that for any s and p,

�(s; p)

H(s; p)>�( �)

H( �)

General results

I In general, there exists a threshold value �c such

that if � � �c (branch mispredictions are not too

expensive) then we have to take the median of the

samples, i.e., � = 1=2

I If � > �c (that can happen often in practice!) then

� < 1=2 and it is given by the unique solution in

[0; 1=2) of the equation

� � b0( )H( ) = (1 + � � b( ))H0( )

(provided that b(x) is in C2[0; 1=2))

General results

I In general, there exists a threshold value �c such

that if � � �c (branch mispredictions are not too

expensive) then we have to take the median of the

samples, i.e., � = 1=2

I If � > �c (that can happen often in practice!) then

� < 1=2 and it is given by the unique solution in

[0; 1=2) of the equation

� � b0( )H( ) = (1 + � � b( ))H0( )

(provided that b(x) is in C2[0; 1=2))

General results

I The threshold value �c is the solution of

d2��( )

d 2

�� =1=2

= 0

I That is

�c = �4

b00(1=2) ln 2 + 4b(1=2)

General results

I The threshold value �c is the solution of

d2��( )

d 2

�� =1=2

= 0

I That is

�c = �4

b00(1=2) ln 2 + 4b(1=2)

Static branch prediction

I We analyze here optimal prediction: if the position

of the pivot k � n=2 then we predict Loop S not

taken and loop G taken, and the other way around

I If k � n=2 we incur a branch misprediction every

time there is an element which is smaller than the

pivot; symetrically, if k > n=2 then the number of

branch mispredictions is n� k

I Hence, bn;k = min(k� 1; n� k), b( ) = min( ; 1� ) and

��( ) =1 + � �min( ; 1� )

H( )










��( ) =1 + � �min( ; 1� )

H( )










��( ) =1 + � �min( ; 1� )

H( )


0.2

0.16

5

0.44

0.32

0.4

20 25

0.36

15

0.28

10

0.08

0.12

0 30

0.24

0.48

The value of � as a function of �

1-bit branch prediction

I The number of branch mispredictions is twice the

number of exchanges: we incur a misprediction

each time we abandon the loops S and G

I Hence, bn;k = 2(k � 1)(n� k) and b( ) = 2 (1� )


I The number of branch mispredictions is twice the

number of exchanges: we incur a misprediction

each time we abandon the loops S and G

I Hence, bn;k = 2(k � 1)(n� k) and b( ) = 2 (1� )


I We can analyze in full detail the performance when

using fixed-sized samples. For example, for

median-of-(2t+ 1) we have

�(2t+ 1; t) =t+ 1

2t+ 3

I For variable-size samples, �( ) = 2 (1� ).

I The threshold is then at �c = 2=(2 ln 2� 1) � 5:177 : : :and � is the solution of

ln + 2� 2 ln = ln(1� ) + 2�(1� )2 ln(1� )





�(2t+ 1; t) =t+ 1

2t+ 3



ln + 2� 2 ln = ln(1� ) + 2�(1� )2 ln(1� )





�(2t+ 1; t) =t+ 1

2t+ 3



ln + 2� 2 ln = ln(1� ) + 2�(1� )2 ln(1� )


0.28

0.08

5

0.44

15

0.36

10

0.12

0.16

0.48

2520

0.32

0.4

30

0.2

0

0.24

The value of � as a function of �


I In (Kaligosi, Sanders, 2006), an approximate model

to compute bn;k is given, from which

b(x) =2x4 � 4x3 + x2 + x

1� x(1� x)

follows

I We are working on a more refined analysis of bn;kfor this prediction scheme; once bn;k has been

found, we should only have to apply the machinery

shown here


I In (Kaligosi, Sanders, 2006), an approximate model

to compute bn;k is given, from which

b(x) =2x4 � 4x3 + x2 + x

1� x(1� x)

follows

I We are working on a more refined analysis of bn;kfor this prediction scheme; once bn;k has been

found, we should only have to apply the machinery

shown here

Some real data

6.8

7

7.2

7.4

7.6

7.8

8

8.2

8.4

10 12 14 16 18 20 22 24 26

time

/ n lg

n [n

s]

lg n

random pivotmedian of 3

exact medianskewed pivot n/10

Time vs. size on a Pentium 4 (from (Kaligosi, Sanders,

2006))

Some real data

6.8

7

7.2

7.4

7.6

7.8

8

2 4 6 8 10 12 14 16 18

time

/ n lg

n [n

s]

1/α

n=212

n=219

n=226

Time vs. 1= on a Pentium 4

Some real data

3

3.2

3.4

3.6

3.8

4

4.2

4.4

4.6

4.8

5

10 12 14 16 18 20 22 24 26

time

/ n lg

n [n

s]

lg n



Time vs. size on an Athlon 64

Some real data

3

3.2

3.4

3.6

3.8

4

4.2

4.4

4.6

4.8

10 12 14 16 18 20 22 24 26

time

/ n lg

n [n

s]

lg n



Time vs. size on an Opteron

Some real data

4

6

8

10

12

14

16

18

20

22

10 12 14 16 18 20 22 24

time

/ n lg

n [n

s]

lg n



Time vs. size on a Sun

Future work

I Complete the analysis of static branch prediction

with fixed-size samples (it’s not easy to obtain

�(s; p) for general s and p!)

I Analyze the 2-bit prediction scheme and possibly

others

I Conduct additional experiments, compare

theoretical analysis to real data

I Analyze branch mispredictions and their impact on

the performance of other algorithms

Date post:	11-Aug-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Branch Mispredictions in Quicksortconrado/research/talks/aofa06.pdf · Introduction I Modern...

Documents