+ All Categories
Home > Software > Recursive Query Throwdown

Recursive Query Throwdown

Date post: 21-Jan-2018
Category:
Upload: karwin-software-solutions-llc
View: 805 times
Download: 2 times
Share this document with a friend
48
Recursive Query Throwdown in MySQL 8 BILL KARWIN PERCONA LIVE OPEN SOURCE DATABASE CONFERENCE 2017
Transcript

Recursive Query Throwdownin MySQL 8BILL KARWINPERCONA LIVE OPEN SOURCE DATABASE CONFERENCE 2017

Bill KarwinSoftware developer, consultant, trainer

Using MySQL since 2000

Senior Database Architect at SchoolMessenger

Author of SQL Antipatterns: Avoiding the Pitfalls of Database Programming

Oracle ACE Director

How to Query a Tree?Hierarchical data§ Organization charts§ Categories and sub-categories§ Parts explosion§ Threaded discussions

https://commons.wikimedia.org/wiki/File:Staff_Organisation_Diagram,_1896.jpg

Example: Threaded Comments

Adjacency List Example Datacomment_id parent_id author comment1 NULL Fran What’s the cause of this bug?2 1 Ollie I think it’s a null pointer.3 2 Fran No, I checked for that.4 1 Kukla We need to check valid input.5 4 Ollie Yes, that’s a bug.6 4 Fran Yes, please add a check7 6 Kukla That fixed it.

Can’t Easily Query Deep TreesSELECT * FROM Comments c1LEFT JOIN Comments c2 ON (c2.parent_id = c1.comment_id)LEFT JOIN Comments c3 ON (c3.parent_id = c2.comment_id)LEFT JOIN Comments c4 ON (c4.parent_id = c3.comment_id)LEFT JOIN Comments c5 ON (c5.parent_id = c4.comment_id) LEFT JOIN Comments c6 ON (c6.parent_id = c5.comment_id)LEFT JOIN Comments c7 ON (c7.parent_id = c6.comment_id)LEFT JOIN Comments c8 ON (c8.parent_id = c7.comment_id)LEFT JOIN Comments c9 ON (c9.parent_id = c8.comment_id)LEFT JOIN Comments c10 ON (c10.parent_id = c9.comment_id)...

MySQL Workarounds

MySQL WorkaroundsMySQL lacked support for recursive queries, so workarounds were needed

These are all denormalized designs, most don’t have referential integrity

§Path enumeration§Nested sets§Closure table

Path Enumeration Example Datacomment_id path author comment1 1/ Fran What’s the cause of this bug?2 1/2/ Ollie I think it’s a null pointer.3 1/2/3/ Fran No, I checked for that.4 1/4/ Kukla We need to check valid input.5 1/4/5/ Ollie Yes, that’s a bug.6 1/4/6/ Fran Yes, please add a check7 1/4/6/7/ Kukla That fixed it.

Path Enumeration Example QueriesQuery ancestors of comment #7:

SELECT * FROM CommentsWHERE '1/4/6/7/' LIKE CONCAT(path, '%');

Query descendants of comment #4:

SELECT * FROM CommentsWHERE path LIKE '1/4/%';

Path Enumeration Pros and ConsPros:§Single non-recursive query to get a tree or a subtree

Cons:§Complex updates to add or remove a node§Numbers are stored in a string—no referential integrity

Nested SetsEach comment encodes its descendants using two numbers:§ A comment’s left number is less than all numbers used by the comment’s descendants.§ A comment’s right number is greater than all numbers used by the comment’s

descendants.§ A comment’s numbers are between all

numbers used by the comment’s ancestors.

References:§ “Recursive Hierarchies: The Relational Taboo!” Michael J. Kamfonas,

Relational Journal, Oct/Nov 1992§ “Trees and Hierarchies in SQL For Smarties,” Joe Celko, 2004§ “Managing Hierarchical Data in MySQL,” Mike Hillyer, 2005

Nested Sets Example

Nested Sets Example Datacomment_id nsleft nsright author comment1 1 14 Fran What’s the cause of this bug?2 2 5 Ollie I think it’s a null pointer.3 3 4 Fran No, I checked for that.4 6 13 Kukla We need to check valid input.5 7 8 Ollie Yes, that’s a bug.6 9 12 Fran Yes, please add a check7 10 11 Kukla That fixed it.

Nested Sets Example QueriesQuery ancestors of comment #7:

SELECT ancestor.* FROM Comments child JOIN Comments ancestor ON child.nsleft BETWEEN ancestor.nsleft AND ancestor.nsright

WHERE child.comment_id = 7;

Query subtree under comment #4:

SELECT descendant.* FROM Comments parent JOIN Comments descendant ON descendant.nsleft BETWEEN parent.nsleft AND parent.nsright

WHERE parent.comment_id = 4;

Nested Sets Pros and ConsPros:§Single non-recursive query to get a tree or a subtree

Cons:§Complex updates to add or remove a node§Numbers are not foreign keys—no referential integrity

Closure TableMany-to-many table

Stores every path from each node to each of its descendants

A node even connects to itself

CREATE TABLE Closure (ancestor INT NOT NULL,descendant INT NOT NULL,length INT NOT NULL,PRIMARY KEY (ancestor, descendant),FOREIGN KEY(ancestor) REFERENCES Comments(comment_id),FOREIGN KEY(descendant) REFERENCES Comments(comment_id)

);

Closure Table Example

Closure Table Example Datacomment_id author comment

1 Fran What’s the cause of this bug?2 Ollie I think it’s a null pointer.3 Fran No, I checked for that.4 Kukla We need to check valid input.5 Ollie Yes, that’s a bug.6 Fran Yes, please add a check7 Kukla That fixed it.

ancestor descendant length

1 1 0

1 2 1

1 3 2

1 4 1

1 5 2

1 6 2

1 7 3

2 2 0

2 3 1

3 3 0

4 4 0

4 5 1

4 6 1

4 7 2

5 5 0

6 6 0

6 7 1

7 7 0

Closure Table Example QueriesQuery ancestors of comment #7:

SELECT c.* FROM Comments c JOIN Closure t ON (c.comment_id = t.ancestor)

WHERE t.descendant = 7;

Query subtree under comment #4:

SELECT c.* FROM Comments c JOIN Closure t ON (c.comment_id = t.descendant)

WHERE t.ancestor = 4;

Closure Table Pros and ConsPros:§Single non-recursive query to get a tree or a subtree§Referential integrity!

Cons:§Extra table is required§Hierarchy is stored redundantly, too easy to mess up§Lots of joins to do most kinds of queries

ANSI SQL Recursive CTE

WITHer Recursive Queries in MySQL?SQL vendors gradually implemented SQL-99 WITH syntax: § IBM DB2 UDB 8 (Dec. 2002)§ Microsoft SQL Server 2005 (Oct. 2005)§ Sybase SQL Anywhere 11 (Aug. 2008)§ Firebird 2.1 (Sep. 2008)§ PostgreSQL 8.4 (Jul. 2009)§ Oracle 11g release 2 (Sep. 2009)§ Teradata (date and version of support unknown, at least 2009)§ HSQLDB 2.3 (Jul. 2013)§ SQLite 3.8.3.1 (Feb. 2014)§ H2 (date and version unknown)

https://www.percona.com/blog/2014/02/11/wither-recursive-queries/

ANSI SQL Recursive Common Table ExpressionWITH RECURSIVE cte_name (col_name, col_name, col_name) AS(

subquery base case

UNION ALL

subquery referencing cte_name

)

SELECT ... FROM cte_name ...

https://dev.mysql.com/doc/refman/8.0/en/with.html

Generating a Series of NumbersWITH RECURSIVE MySeries (n) AS(

SELECT 1 AS n

UNION ALL

SELECT 1+n FROM MySeries WHERE n < 10

)

SELECT * FROM MySeries;

+------+| n |+------+| 1 || 2 || 3 || 4 || 5 || 6 || 7 || 8 || 9 || 10 |+------+

Generating a Series of DatesWITH RECURSIVE MyDates (d) AS(

SELECT CURRENT_DATE() AS d

UNION ALL

SELECT d + INTERVAL 1 DAY FROM MyDatesWHERE d < CURRENT_DATE() + INTERVAL 7 DAY

)

SELECT * FROM MyDates;

+------------+| d |+------------+| 2017-04-24 || 2017-04-25 || 2017-04-26 || 2017-04-27 || 2017-04-28 || 2017-04-29 || 2017-04-30 || 2017-05-01 |+------------+

Query ancestors of comment #7WITH RECURSIVE CommentTree (comment_id, parent_id, author, comment,depth) AS

(

SELECT comment_id, parent_id, author, comment, 0 AS depthFROM CommentsWHERE comment_id = 7

UNION ALL

SELECT c.comment_id, c.parent_id, c.author, c.comment, ct.depth+1FROM CommentTree ctJOIN Comments c ON (ct.parent_id = c.comment_id)

)

SELECT * FROM CommentTree;

Query subtree under comment #4WITH RECURSIVE CommentTree (comment_id, parent_id, author, comment,depth) AS

(

SELECT comment_id, parent_id, author, comment, 0 AS depthFROM CommentsWHERE comment_id = 4

UNION ALL

SELECT c.comment_id, c.parent_id, c.author, c.comment, ct.depth+1FROM CommentTree ctJOIN Comments c ON (ct.comment_id = c.parent_id)

)

SELECT * FROM CommentTree;

Recursive CTE Pros and ConsPros:§ ANSI SQL-99 Standard§ Compatible with other SQL implementations§ Works with Adjacency List (single source of authority)§ Referential integrity!

Cons:§ Not compatible with earlier MySQL versions§ Use of materialized temporary tables may cause performance problems

MySQL CTE Implementation: 💯

Thanksto@MarkusWinand forhispreviewanalysisbasedon8.0.1-dmrhttp://modern-sql.com/feature/with

Big Hierarchies

ITIS: Sample Hierarchical DataIntegrated Taxonomic Information System (https://www.itis.gov/)§Biological database of species of animals, plants, fungi§One big tree of 544,954 nodes§Data comes in adjacency list & path enumeration format§I converted to closure table for query tests

ITIS Data Modelmysql> select * from longnames

where completename = 'Eschscholzia californica';+--------+---------------------------+| tsn | completename |+--------+---------------------------+| 18956 | Eschscholzia californica |+--------+---------------------------+

mysql> select * from hierarchy where TSN = '18956'\GTSN: 18956

Parent_TSN: 18954level: 11

ChildrenCount: 8hierarchy_string: 202422-954898-846494-954900-846496-846504-18063-846547-18409-18880-18954-18956

Indexesmysql> ALTER TABLE hierarchy ADD KEY (tsn, parent_tsn);

Query OK, 0 rows affected (1.30 sec)

Breadcrumbs QueryWITH RECURSIVE taxonomy AS (SELECT base.tsn, base.parent_tsn, 0 as depth FROM hierarchy base WHERE tsn = '18956'UNION ALL

SELECT next.tsn, next.parent_tsn, t.depth+1 FROM hierarchy next JOIN taxonomy tWHERE t.parent_tsn = next.tsn

)SELECT * FROM taxonomy JOIN longnames USING (tsn)ORDER BY depth DESC;

Breadcrumbs Query Result+--------+------------+-------+--------------------------+| tsn | parent_tsn | depth | completename |+--------+------------+-------+--------------------------+| 202422 | 0 | 11 | Plantae || 954898 | 202422 | 10 | Viridiplantae || 846494 | 954898 | 9 | Streptophyta || 954900 | 846494 | 8 | Embryophyta || 846496 | 954900 | 7 | Tracheophyta || 846504 | 846496 | 6 | Spermatophytina || 18063 | 846504 | 5 | Magnoliopsida || 846547 | 18063 | 4 | Ranunculanae || 18409 | 846547 | 3 | Ranunculales || 18880 | 18409 | 2 | Papaveraceae || 18954 | 18880 | 1 | Eschscholzia || 18956 | 18954 | 0 | Eschscholzia californica |+--------+------------+-------+--------------------------+12 rows in set (0.00 sec)

Breadcrumbs Query EXPLAIN Plan

§New note in Extra: "Recursive"

§Using index (covering index) for both base case and recursive case

§I can eliminate the filesort if I allow natural order (base case first)

§No "Using Temporary"? Not so fast…

+----+-------------+------------+--------+---------------+---------+---------+--------------+------+----------+-----------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |+----+-------------+------------+--------+---------------+---------+---------+--------------+------+----------+-----------------------------+| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | Using where; Using filesort || 1 | PRIMARY | longnames | eq_ref | PRIMARY,tsn | PRIMARY | 4 | taxonomy.tsn | 1 | 100.00 | NULL || 2 | DERIVED | base | ref | TSN | TSN | 4 | const | 1 | 100.00 | Using index || 3 | UNION | t | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | Recursive; Using where || 3 | UNION | next | ref | TSN | TSN | 4 | t.parent_tsn | 1 | 100.00 | Using index |+----+-------------+------------+--------+---------------+---------+---------+--------------+------+----------+-----------------------------+

Breadcrumbs Query Performancemysql> SELECT * FROM SYS.STATEMENTS_WITH_TEMP_TABLES\G

query: WITH RECURSIVE `taxonomy` AS ( ... `tsn` ) ORDER BY `depth` DESC

db: itisexec_count: 1

total_latency: 10.05 msmemory_tmp_tables: 1disk_tmp_tables: 0

avg_tmp_tables_per_query: 1tmp_tables_to_disk_pct: 0

first_seen: 2017-04-24 22:07:56last_seen: 2017-04-24 22:07:56

digest: 8438633360bedce178823bb868589fd0

Breadcrumbs Query Stagesmysql> SELECT * FROM SYS.USER_SUMMARY_BY_STAGES;

+------+--------------------------------+-------+---------------+-------------+| user | event_name | total | total_latency | avg_latency |+------+--------------------------------+-------+---------------+-------------+| root | stage/sql/System lock | 40 | 6.62 ms | 165.60 us || root | stage/sql/Opening tables | 191 | 3.16 ms | 16.52 us || root | stage/sql/checking permissions | 45 | 1.50 ms | 33.44 us || root | stage/sql/Creating sort index | 1 | 239.63 us | 239.63 us || root | stage/sql/closing tables | 191 | 191.03 us | 1.00 us || root | stage/sql/starting | 2 | 188.44 us | 94.22 us || root | stage/sql/Sending data | 6 | 138.96 us | 23.16 us || root | stage/sql/statistics | 4 | 122.42 us | 30.60 us || root | stage/sql/query end | 191 | 56.67 us | 296.00 ns || root | stage/sql/preparing | 4 | 33.57 us | 8.39 us || root | stage/sql/freeing items | 2 | 27.93 us | 13.96 us || root | stage/sql/optimizing | 5 | 20.03 us | 4.01 us || root | stage/sql/executing | 7 | 15.39 us | 2.20 us |

| root | stage/sql/removing tmp table | 4 | 9.35 us | 2.34 us |

| root | stage/sql/init | 3 | 8.76 us | 2.92 us || root | stage/sql/Sorting result | 2 | 4.16 us | 2.08 us || root | stage/sql/end | 3 | 1.93 us | 644.00 ns || root | stage/sql/cleaning up | 2 | 1.43 us | 715.00 ns |+------+--------------------------------+-------+---------------+-------------+

Tree Expansion Query Result

See Demo

Tree Expansion QueryWITH RECURSIVE ancestors (tsn, parent_tsn) AS (

SELECT h.tsn, h.parent_tsn FROM hierarchy AS h WHERE h.tsn = %sUNION ALLSELECT h.tsn, h.parent_tsn FROM hierarchy AS h JOIN ancestors AS base ON h.tsn = base.parent_tsn

),breadcrumbs (tsn, parent_tsn, depth, breadcrumbs) AS (

SELECT h.tsn, h.parent_tsn, 0 AS depth, CAST(LPAD(h.tsn, 8, '0') AS CHAR(255)) AS breadcrumbsFROM hierarchy AS h WHERE h.parent_tsn = 0UNION ALLSELECT h.tsn, h.parent_tsn, base.depth+1 AS depth, CONCAT(base.breadcrumbs, ',', LPAD(h.tsn, 8,

'0'))FROM hierarchy AS hJOIN ancestors AS a ON h.tsn = a.tsnJOIN breadcrumbs AS base ON h.parent_tsn = base.tsn

)SELECT l.tsn, l.completename, b.depth, b.breadcrumbsFROM breadcrumbs AS b JOIN longnames AS l ON b.tsn = l.tsnUNIONSELECT l.tsn, l.completename, b.depth+1, CONCAT(b.breadcrumbs, ',', LPAD(h.tsn, 8, '0'))FROM breadcrumbs AS bJOIN hierarchy AS h ON b.tsn = h.parent_tsnJOIN longnames AS l ON l.tsn = h.tsnORDER BY breadcrumbs

Tree Expansion Query EXPLAIN+--------------+------------+--------+-------------+---------+-------------------+--------+----------+----------------------------------------------------

| id | select_type | table | type | key | key_len | ref | rows | filtered | Extra+--------------+------------+--------+-------------+---------+-------------------+--------+----------+----------------------------------------------------

1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | 250230 | 100.00 | Using where1 | PRIMARY | l | eq_ref | PRIMARY | 4 | b.tsn | 1 | 100.00 | NULL 2 | DERIVED | h | index | TSN | 9 | NULL | 500466 | 10.00 | Using where; Using index 3 | UNION | base | ALL | NULL | NULL | NULL | 50046 | 100.00 | Recursive; Using where 3 | UNION | <derived4> | ALL | NULL | NULL | NULL | 4 | 100.00 | Using where; Using join buffer (Block Nested Loop) |3 | UNION | h | ref | TSN | 9 | a.tsn,base.tsn | 1 | 100.00 | Using index4 | DERIVED | h | ref | TSN | 4 | const | 1 | 100.00 | Using index5 | UNION | base | ALL | NULL | NULL | NULL | 2 | 100.00 | Recursive; Using where 5 | UNION | h | ref | TSN | 4 | base.parent_tsn | 1 | 100.00 | Using index8 | UNION | h | index | TSN | 9 | NULL | 500466 | 100.00 | Using where; Using index 8 | UNION | l | eq_ref | PRIMARY | 4 | itis.h.TSN | 1 | 100.00 | NULL 8 | UNION | <derived2> | ref | <auto_key0> | 5 | itis.h.Parent_TSN | 10 | 100.00 | NULL

| NULL | UNION RESULT | <union1,8> | ALL | NULL | NULL | NULL | NULL | NULL | Using temporary; Using filesort+--------------+------------+--------+-------------+---------+-------------------+--------+----------+----------------------------------------------------

Maybe I need more indexes?Unfortunately I ran out of time to analyze.

Tree Expansion Query Performancemysql> SELECT * FROM SYS.STATEMENTS_WITH_TEMP_TABLES\G

query: WITH RECURSIVE `ancestors` ( ` ... `l` . `completename` , `b` .

db: itisexec_count: 1

total_latency: 1.24 smemory_tmp_tables: 3disk_tmp_tables: 0

avg_tmp_tables_per_query: 3tmp_tables_to_disk_pct: 0

first_seen: 2017-04-27 01:33:14last_seen: 2017-04-27 01:33:14

digest: 86c1417d2ff3679863db754eff425e94

Tree Expansion Query Stagesmysql> SELECT * FROM SYS.USER_SUMMARY_BY_STAGES;

+------+--------------------------------+-------+---------------+-------------+| user | event_name | total | total_latency | avg_latency |+------+--------------------------------+-------+---------------+-------------+

| root | stage/sql/Sending data | 12 | 979.42 ms | 81.62 ms |

| root | stage/sql/System lock | 40 | 6.34 ms | 158.52 us || root | stage/sql/Opening tables | 191 | 3.34 ms | 17.51 us || root | stage/sql/checking permissions | 53 | 1.35 ms | 25.45 us || root | stage/sql/starting | 2 | 356.31 us | 178.16 us || root | stage/sql/statistics | 12 | 271.01 us | 22.58 us || root | stage/sql/closing tables | 191 | 179.15 us | 937.00 ns || root | stage/sql/preparing | 12 | 98.18 us | 8.18 us || root | stage/sql/query end | 191 | 57.60 us | 301.00 ns || root | stage/sql/freeing items | 2 | 47.93 us | 23.96 us || root | stage/sql/Creating sort index | 1 | 37.38 us | 37.38 us || root | stage/sql/optimizing | 13 | 30.60 us | 2.35 us || root | stage/sql/executing | 13 | 30.27 us | 2.33 us || root | stage/sql/removing tmp table | 14 | 24.44 us | 1.74 us || root | stage/sql/init | 3 | 14.78 us | 4.93 us || root | stage/sql/cleaning up | 2 | 11.66 us | 5.83 us || root | stage/sql/Sorting result | 2 | 3.67 us | 1.84 us || root | stage/sql/end | 3 | 3.04 us | 1.01 us |+------+--------------------------------+-------+---------------+-------------+

Conclusions

Conclusions§Overall, MySQL 8 support for recursive CTE queries is worth the wait.

§Exotic cases exist that are beyond any optimizer.§I'm excited to upgrade to MySQL 8.0.x ASAP!

§Now that virtually all major SQL brands support recursive CTE's, we need developer tools and popular apps to use them!

License and CopyrightCopyright 2017 Bill Karwin

http://www.slideshare.net/billkarwinReleased under a Creative Commons 3.0 License: http://creativecommons.org/licenses/by-nc-nd/3.0/

You are free to share—to copy, distribute, and transmit this work, under the following conditions:

Attribution.You must attribute this

work to Bill Karwin.

Noncommercial.You may not use this work for commercial

purposes.

No Derivative Works.You may not alter, transform, or build

upon this work.


Recommended