Sort in GPDB
Feng TianGreenPlum Inc.
WARNING: NON-TECH SLIDES
Why (NOW)? Real customers, real problems. About to get the code in MAIN
Make Joy/Brian's code reading experience easier.
Outline (Doesn't this look familiar?)
Motivation Review of Current Status Improve Sort Performance Remaining Work
Sort in Database
One of the most important operator Order by Group by OLAP
Rollup and Cube Window (partition by and order by)
Merge Join Build index
Sort in GPDB
One of the most mysterious operator Sort is slow v.s. Sort is OK Fix planner to avoid sort v.s. Fix sort
Sort is fun
One of the most extensively studied algorithm In memory sorting algorithm
CK always got some interesting links Jie challenged my interview question Sedgewick: Quicksort is optimal Bentley & McIlory, 93.
External sort TAOCP
GPDB Sort is funny
Good Honest TAOCP Honest BM93.
Bad Equal keys Lots of columns Sort strings
Ugly Combination of the bads
Goal
Get rid of the ugly part of GPDB Sort.
Outline
Motivation Review of Current Status Improve Sort Performance Remaining Work
GPDB Sort
Quicksort if entries fit in memory External sort
An honest implementation from TAOCP I/O pattern is pretty good Amount of I/O when sorting tuple is OK
No compression Sorting datum is terrible, but not a concern at this moment
Only used for distinct May eventually be replaced by hash
Use Heap to merge
GPDB Sort
Details Cost of comparison
Non trivial overhead (Unicode) String compare is extremely slow
Strcoll v.s. Strxfrm + strcmp
Cost of memtuple_getattr It is way better than heap_getattr Postgres devs know this for a long time Cache first sort column
Sort (1, 'a'), (2, 'a'), (3, 'c') ... is fast. Sort (1, 'a'), (1, 'b'), (1, 'c') ... is miserably slow.
Outline
Motivation Review of Current Status Improve Sort Performance Remaining Work
Goal
It should be “invisible” No API change Keep fast cases fast
Slow cases? What slow cases? Planner can honestly optimize a query, without
worrying about “avoiding” sort User can write a query, without trying to be creative In the cases that a sort cannot be avoided, may
save out neck.
Quicksort Is Optimal (Sedgewick)
Equal keys Equal keys is good (Bentley & McIloy)
Do not special case small n Why? Not sure. Cache oblivious?
Multi column sort keys Comparison get slower and slower
Quicksort
As the old algorithm, cache first sort column Quicksort on first column For the range with equal first column, cache the
second sort column, quick sort the range Until all sort columns are processed
May stop early. Sort (1, 'a'), (2, 'b'), (3, 'c') will not compare string at all. Sort (1, 'a'), (1, 'b'), (1, 'c') will only call memtuple_getattr
when necessary.
Example
(1, ?), (3, ?), (2, ?), (0, ?), (3, ?), (2, ?) Choose Pivot (2, ?) (2, ?), (1, ?), (1, ?) :: (3, ?), (3, ?), (2, ?) Swap to middle (0, ?),(1, ?) :: (2, ?),(2, ?) :: (3, ?), (3, ?)
Recursive Down
Quick sort each partition For left, right, just quick sort. For the middle part, expand to level k+1
(2, ?), (2, ?) ... (2, ?) to (2, 'a'), (2, 'x'), (2, 'd') ... (2, 'z')
Of course, only if middle has not expanded all level NO EXTRA LEVEL EXPANSION NO EXTRA COMPARISON
Heapsort
Used in external sort (both produce runs and merge runs)
Cache first sort column when insert into heap Expand to (n+1)th sort column only when first n
column equals those of heap top Remember the lv of expand
Maintain an array of datum d, entry.sort_column[x] = d[x] if x < lv
Siftup and Siftdown Siftdown hole
HeapSort Continued
NO EXTRA EXPANSION NO EXTRA COMPARISION However, code became more complicated.
Handling String
When cache a sort column, cache strxfrm Comparison use strcmp
Equal String Collapse equal strings
Compare pointer value first Save memory
Problems Memory consumption
Minor improvements
Fast path some basic types Int, maybe float later
Limit Sort: Use heapsort instead of insertion sort
Outline
Motivation Review of Current Status Improve Sort Performance Remaining Work
“Honest” Implementation
Cut corners in performance prototype is dangerous Error handling Special cases
Relatively honest Does not handle unique check etc.
Pass make installcheck-good. Pass TPCH and opperf if turn off hashagg and hashjoin
TPCH 1G Q1
Hashagg ~5.7 sec Old sort ~15 sec New sort ~8 sec
Aggregate computing takes ~4 sec Hashagg proper ~ 1.5 sec New sort, generated 3 runs, motioned 6M tuples, and
do one more comparison in Agg in less than 4 sec. The extra comparison takes more than 1 sec Sort proper is ~2 sec
Building index
On ship_instruction, ship_mode, comment Old: All take 24 to 26 sec New: 4 sec, 6 sec, 11 sec
On two columns Old: 70+ sec New: 16?
OLAP (Cube and Rollup)
For “Big” OLAP CUBE/ROLLUP queries, 10~15% faster Not much on “smaller” ones, some may even see
some small regression Unstable timing, regression comes and goes
Our olap plan have many sorts, on 1 or 2 integer column, so this is expected
However, we can finish some “machine freezing” queries now
Yahoo Hashagg
Slightly slower Heapsort Overhead :-( On par once I fastpath-ed int4cmp
Outline
Motivation Review of Current Status Improve Sort Performance Remaining Work
More Improvements
We know the level of key change Important for sort agg Important for OLAP Important for merge join
Take (more) advantages of unique, limit, aggregate.
Improve the code
Heap code (maybe) is (more) complicated (than necessary), don't know how to improve yet.
Memory management. Explain analyze accounting and reporting.
Code Review
Code is at ftian_main_cr2 branch tuplesort.c
Should make it tuplesortnew.c, and probably GUC it. Uses memtuple and logtape as before. Uses new quick sort and heap sort.
mk_qsort.c Multi key quick sort. Straightforward.
mk_heap.c Multi key heap sort. 700 lines heap sort :-(
About time to port into MAIN.
Feedback (Thanks!)
Welcome ideas, new improvements and critique of the approach.