Don't Compare Averages Holger Bast Max-Planck-Institut für Informatik (MPII) Saarbrücken, Germany joint work with Ingmar Weber WEA 2005 May 10 – May 13, Santorini Island, Greece
Transcript
Slide 1
Don't Compare Averages Holger Bast Max-Planck-Institut fr
Informatik (MPII) Saarbrcken, Germany joint work with Ingmar Weber
WEA 2005 May 10 May 13, Santorini Island, Greece
Slide 2
Two famous quotes There are three kinds of lies: lies, damn
lies, and statistics Benjamin Disraeli, 1804 1881 (reported by Mark
Twain) Never believe any statistics you havent forged yourself
Winston Churchill, 1874 1965
Slide 3
A typical figure Theirs Ours Each point represents an average
over a number of iterations Y-axis: some cost measure X-axis: input
size 3 4
Slide 4
Changing the cost measure... by a monotone function, say from c
to 2 c This is from authentic data! 3 4 c 10 15 2c2c
Slide 5
No deep mathematics here Even for strict monotone f certainly E
f(X) f(E X) in general but also E X E Y does not in general imply E
f(X) E f(Y) Example X : 4, 4 average 4 Y : 1, 5 average 3 2 X : 2
4, 2 4 average 16 2 Y : 2 1, 2 5 average 17
Slide 6
Examples of multiple cost measures Language modeling for a
given probability distribution p 1,, p n find distribution q 1,, q
n from a constrained class that minimizes cross-entropy p i log (p
i /q i ) minimizes perplexity (p i /q i ) p i = 2 cross-entropy
Algorithm A uses algorithm B as a subroutine B produces result of
average quality q complexity of A depends on, say, q 2
Slide 7
Can this also happen with error bars? error bars for c don't
overlap, yet reversal for f(c)? Yes, this can also happen! c
f(c)
Slide 8
Can this also happen with error bars? complete reversal with
error bars? c f(c)
Slide 9
Can this also happen with error bars? complete reversal with
error bars? c f(c)
Slide 10
Can this also happen with error bars? complete reversal with
error bars? E Y + Y E X X E f(Y) f(Y) E f(X) + f(X) c f(c) Z = E |Z
E Z| absolute deviation Z = sqrt E (Z E Z) 2 standard
deviation
Slide 11
Can this also happen with error bars? complete reversal with
error bars? if E X X E Y + Y c f(c) then E f(X) f(X) E f(Y) + f(Y)
Theorem: complete reversal can never happen!
Slide 12
Can this also happen with error bars? complete reversal with
error bars? if E X X E Y + Y c f(c) then E f(X) f(X) E f(Y) + f(Y)
if only one of the four is dropped, the theorem no longer holds in
general
Slide 13
Our first proof
Slide 14
The canonical proof 1.The medians M X and M Y do commute with f
Prob(X M X) = = Prob( f(X) f(M X) ) f(M X) = M f(X) and f(M Y) = M
f(Y) 2. and hence cannot reverse their order M X M Y f(M X) f(M Y)
because f is monotone M f(X) M f(Y) because M and f commute
3.Expectation and median are related as | E X M X | X = E | X E X |
| E Y M Y | Y = E | Y E Y | nothing new, but hardly any computer
scientist seems to know
Slide 15
The canonical proof now assume this would happen contradicts
the fact that the medians cannot reverse E Y + Y E X X E f(Y) f(Y)
E f(X) + f(X) then M Y M Xyet M f(Y) > M f(X) c f(c)
Slide 16
Conclusion Average comparison is a deceptive thing even with
error bars! There are more effects of this kind e.g.
non-overlapping error bars are not statistically significant for a
particular order of the expectations (or medians) e.g. for normally
distributed X, Y Prob( X + X Y Y | E X > E Y ) is up to 8%
Better always look at the complete histogram and at least check
maximum and minimum X Y
Slide 17
! Conclusion Average comparison is a deceptive thing even with
error bars! There are more effects of this kind e.g.
non-overlapping error bars are not statistically significant for a
particular order of the expectations (or medians) e.g. for normally
distributed X, Y Prob( X + X Y Y | E X > E Y ) is up to 8%