+ All Categories
Home > Documents > Statistical Curse of the Second Half Rank, Eulerian...

Statistical Curse of the Second Half Rank, Eulerian...

Date post: 27-Jun-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
31
Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis Riffle Shuffle J. Desbois, A. Polychronakos, S.O., JSTAT (2011) P01025 S.O., special issue of Markov processes and related fields (2014), in honour of Leonid Pastur a problem from real life which can lead to a complicated combinatorics with a nice set of special numbers (and also to interesting open problems see at the end) : ranking expectations in boats regattas (or in students exams, etc)
Transcript
Page 1: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

Statistical Curse of the Second Half Rank, Euleriannumbers and Diaconis Riffle Shuffle

J. Desbois, A. Polychronakos, S.O., JSTAT (2011) P01025

S.O., special issue of Markov processes and related fields(2014), in honour of Leonid Pastur

a problem from real life which can lead to a complicated combinatoricswith a nice set of special numbers (and also to interesting open problemssee at the end) :

ranking expectations in boats regattas (or in students exams, etc)

Page 2: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

an example : the Spi Ouest France at la Trinite sur Mer (Brittany)

Page 3: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

the Spi Ouest France regatta involves :

a ”large” number of identical boats nb ∼ 100

in a ”large” number of races nr ∼ 10 i.e. 2,3 races per day during 4 days atEaster

Page 4: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis
Page 5: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

in each of the 10 races each boat gets a rank 1 ≤ rank ≤ 100

no equal rank (no ex-aequo)

how the race committee determines the final rank of the boats and thus thewinner ?

1) for each boat add its ranks in each of the 10 races → its score nt

here nb = 100 and nr = 10 ⇒ 10 ≤ nt ≤ 1000

nt = 10 → lowest score always 1rst

nt = 1000 → highest score always 100 th

nt = 10×50 = 500 → the middle score

2) order the scores ⇒ the final ranks :

the boat with the lowest score ⇒ winner 1rst

the next boat after the winner ⇒ second 2nd

. . .

Page 6: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

this ranking procedure is clearly legitimate but it has a ”curse” :

consider for example the ranks of a given boat (the LPTMS boat) to be51,67,76,66,55,39,67,59,66,54 → its score nt = 51+67+ . . .= 600

this boat has a mean rank 60010 = 60

→ clearly on average it has been 60th

→ its crew might ”naively” expect its final rank to be around 60th

no way : its final rank will rather be around 70th → the ”curse”

from the Spi Ouest 2009 result sheet (but this is a recurrent problem) :

Page 7: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis
Page 8: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

why this ”curse” : there is a simple qualitative explanation

given the ranks of the LPTMS boat 51,67,76,66,55,39,67,59,66,54

in each race assume that the ranks of the other boats are random

remember that the boats are identical → random ranks is a good assumptionif the crews are more or less equally worthy (which is in part the case)

since no ex aequo random ranks means :

ranks of the other boats = a random permutation

in the first race : random permutation of (1,2,3, ...,50, ,52, ...,100)

in the second race : random permutation of (1,2,3, ...,66, ,68, ...,100)

. . .

Page 9: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

each race is obviously independent from the others

→ the score of each boat is a sum of 10 independent random variables

10 is already a large number in probability calculus :

→ Central Limit Theorem applies

→ scores are random variables with gaussian probability density centeredaround the middle score 10×50 = 500

Page 10: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

gaussian distribution ⇒ a lot a boats with scores packed around 500

if the score of a boat is > 500its final rank is pushed upward from its naive mean rank

⇒ statistical ”curse”

on the contrary if the score of a boat is < 500its final rank is pushed downward from its naive mean rank

⇒ statistical ”blessing”

Page 11: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

compute things more precisely : given the score nt of the LPTMS boat, whatis the probability distribution Pnt (m) for its final rank m to be 1,2, . . . ,nb ?

a little complication : Pnt (m) does not only depend on nt but also on theranks of the boat in each race

for example : nr = 3, nb = 3, score nt = 6

it is easy to check by complete enumeration thatP6=2+2+2(m) = P6=1+2+3(m) (distributions are similar but different)

→ a simplification : consider nb boats with in each race random ranks

i.e ranks = random permutation of (1,2,3, . . . ,nb)

⊕ an additional virtual boat specified only by its score nt

→ same question : given the score nt of the virtual boat, what is theprobability distribution Pnt (m) for its final rank to be m = 1,2, . . . ,nb +1 ?

→ almost the same problem but simpler

Page 12: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

label the boats : #1, . . . , #i, . . . , #nb

in a given race 1 ≤ k ≤ nr : call ni,k the rank of boat i

the ni,k’s are a random permutation of (1,2, . . . ,nb)

mean ⟨ni,k⟩=nb +1

2

sum rulenb

∑i=1

ni,k = 1+2+3+ . . .+nb =nb(nb +1)

2

correlations ⟨ni,kn j,k⟩−⟨ni,k⟩⟨n j,k⟩=nb +1

12(nbδi, j −1)

Page 13: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

nr races ⇒ ni = ∑nrk=1 ni,k is the score of boat i

large nr limit → Central Limit Theorem for correlated random variables

⇒ joint density probability distribution

f (n1, . . . ,nnb) =

√2πλnbδ

(nb

∑i=1

(ni −nrnb +1

2)

)(√1

2πλ

)nb

exp

[− 1

nb

∑i=1

(ni −nrnb +1

2)2

]

λ = nrnb(nb +1)

12, nr

nb +12

= middle score

Page 14: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

for the virtual boat with score nt :

Pnt (m) is the probability for m−1 boats among the nb’s to have a scoreni < nt and for the other nb −m+1’s to have a score ni ≥ nt

Pnt (m) =

(nb

m−1

)∫ nt

−∞dn1 . . .dnm−1

∫ ∞

nt

dnm . . .dnnb f (n1, . . . ,nnb)

take also large number of boats limit → saddle point approximation tofinally get ⟨m⟩= cumulative probability distribution of a normal variable

⟨m⟩= nb√2πλ

∫ nt

−∞exp[− n2

]dn

nt = nt −nr(nb +1)

2= score virtual boat − middle score

Page 15: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

200 400 600 800 1000

0.2

0.4

0.6

0.8

1.0

mean final rank ⟨m⟩nb

nb = 100

nr = 10

middle score = 500

blessing/curse

Page 16: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

f@niD = 1 � Hnr Hnb - 1LLuniforme sur l' intervalle @nr, nr nbDp@nt_, nb_, nr_, m_D :=

Binomial@nb, m - 1D H1 � Hnr Hnb - 1LLL^nbHnt - nrL^Hm - 1L Hnr nb - ntL^Hnb - m + 1L

K 1Hnb - 1L nr

Onb K nbm - 1 O Hnt - nrLm-1 Hnb nr - ntL-m+nb+1

mean@nt_, nb_, nr_D :=Sum@p@nt, nb, nr, mD m, 8m, 1, nb + 1<D

200 400 600 800 1000

0.2

0.4

0.6

0.8

1.0

Page 17: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

→ the random ranks assumption is a bit crude : see later Diaconis

Page 18: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

up to now a ”large” number of races via the Central Limit Theorem

now do combinatorics for a given number of races nr = 2,3, . . .

the simplest case : nr = 2 races like a ”2-body” problem

⇒ exact solution for Pnt (m)

how to proceed :

i) represent the ranks configurations of the nb boats in the 2 races by pointson a nb ×nb square lattice

no ex aequo ⇒ 1 point per line and per column → nb! such configurations

ii) for a given nt enumerate all the configurations with m−1 points belowthe diagonal nt

Page 19: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

4321 5 6

6

5

4

3

2

1

b

t

D

n =1

n =

a m = 3 configuration

Page 20: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

combinatorics → complete enumeration on the square lattice :

nt = 2, . . . ,nb +1, nb +2, . . . ,2nb

⇒Pnt (m)= (nb+1)m−1

∑k=0

(−1)k(nb+1−nt +m−k)nt−1 (nb −nt +m− k)!k!(nb +1− k)!(m− k−1)!

by symmetry for nt = nb +2, . . . ,2nb

particular case (simple) : middle score nt = nb +1

⇒ Pnt=nb+1(m) = (nb +1)m−1

∑k=0

(−1)k (m− k)nb

k!(nb +1− k)!

Page 21: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

tabulate Pnt=nb+1(m), with m ∈ [1,nb +1], for nb = 1,2, ...7:

{1,0}

12!{1,1,0}

13!{1,4,1,0}

14!{1,11,11,1,0}

15!{1,26,66,26,1,0}

16!{1,57,302,302,57,1,0}

17!{1,120,1191,2416,1191,120,1,0}

= Eulerian numbers

Page 22: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

∑i≥1 i1 pi = p(1)(1−p)2

∑i≥1 i2 pi = p(1+p)(1−p)3

∑i≥1 i3 pi = p(1+4p+p2)(1−p)4

∑i≥1 i4 pi = p(1+11p+11p2+p3)

(1−p)5

. . .

Page 23: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

why Eulerian numbers show up in the 2 races problem when nt = nb +1 ?

because they count the number of permutations(1, . . . , i, . . . ,nb)→ (a(1), . . . ,a(i), . . . ,a(nb))

with a given number of elements a(i)> i = ”exceedences”

⇒ so called Eulerian statistics

Eulerian numbers count a lot of other things :

→ for ex they count the number of permutations with a given number ofelements greater than the previous element a(i)> a(i−1) = ”ascents”

→ they also count the number of permutations with r ”rising sequences”

”rising sequences” interesting in a different context : deck of cards shuffling

Page 24: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

un jeu de Poker ”ordonne” : spades, diamonds, clubs, hearts

Page 25: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

use a well known cards shuffling : the riffle shuffle

Page 26: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

a riffle shuffle

Page 27: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

what has been done here : for ex with a deck of 9 cards 123456789

cut : 123456789 → 12345|6789 ⊕ interleave : 12345|6789 → 612347859

relative ordering ⇒ a permutation with 2 rising sequences 12345 and 6789

this was a 2-shuffle (with 2 hands) → an a-shuffle (with a hands)

with 3 hands 123456789 → 123|45|6789 → 461237859

461237859 has 3 rising sequences 123, 45 and 6789

with a hands → at most a rising sequences

Diaconis (1992) (also Shannon, ...) : the probability for getting apermutation with 1 ≤ r ≤ a rising sequences after an a-shuffle on a deck ofn cards is

1an

((a− r)+n

n

)

Page 28: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

⇒ Diaconis again (Markov, Poincare, ...) : with 7 successive 2-shuffles adeck of 52 cards is ”almost” random

each successive 2-shuffle tends to double the number of rising sequences

the maximum number of rising sequences for a deck of 52 cards is 52

= the reversed permutation (52,51,50, . . . ,1)

indeed : 27 > 52

this also explains a magician trick

NB: 2500 successive overhand shuffles would be needed

Page 29: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

Diaconis 1an

((a−r)+nn

)versus uniform 1

n!

n = 5 a = 7

r = 1 identity permutation

r = 2,3,4

r = 5 reversed permutation

far from uniform probability

identity (reversed) permutation

= most (less) probable

2 3 4 5

0.005

0.010

0.015

0.020

0.025

Page 30: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

use this property for the race problem :

label the nb boats by their a priori strength (from previous regattas)

boat labelled # 1 is the a priori best boat

boat labelled # 2 is the next one

etc ⇒ an ordered set of boats

in each race : instead of using the uniform probability

use Diaconis a-shuffle probability

the riffle shuffling of boats keeps track of their a priori strength

1-shuffle (1 hand) = identity with prob 1 ⇒ the flat distribution

∞-shuffle = uniform ⇒ the gaussian distribution

also relevant (in part) for what is actually taking place during a race

again Eulerian numbers should play a central role

Page 31: Statistical Curse of the Second Half Rank, Eulerian …lptms.u-psud.fr/ressources/journee_du_labo/seminairerace...Statistical Curse of the Second Half Rank, Eulerian numbers and Diaconis

in the 2 races problem :

narrows down to doing the combinatorics in a given r1 sector for the firstrace and a given r2 sector for the second race

difficult problem but complete enumerations for small numbers of boats (upto nb = 6) indicate again a simplification when nt = nb +1

in terms of Eulerian numbers

and partitions of product of Eulerian numbers

open problem yet to be solved (work in progress)


Recommended