+ All Categories
Home > Documents > Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM...

Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM...

Date post: 16-Oct-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
36
Transcript
Page 1: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'
Page 2: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Histogram Support in MySQL 8.0

Øystein Grøvlen Senior Principal Software Engineer MySQL Optimizer Team, Oracle February 2018

Page 3: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

Motivating example

Quick start guide

How are histograms used?

Query example

Some advice

1

2

3

4

5

3

Page 4: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

Motivating example

Quick start guide

How are histograms used?

Query example

Some advice

1

2

3

4

5

4

Page 5: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Motivating Example

EXPLAIN SELECT * FROM orders JOIN customer ON o_custkey = c_custkey WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000;

5

JOIN Query

id select type

table type possible keys key key len

ref rows filtered extra

1 SIMPLE orders ALL i_o_orderdate, i_o_custkey

NULL NULL NULL 15000000 31.19 Using where

1 SIMPLE customer eq_ref

PRIMARY PRIMARY 4 dbt3.orders. o_custkey

1 33.33 Using where

Page 6: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Motivating Example

EXPLAIN SELECT /*+ JOIN_ORDER(customer, orders) */ * FROM orders JOIN customer ON o_custkey = c_custkey WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000;

6

Reverse join order

id select type

table type possible keys key key len

ref rows filtered extra

1 SIMPLE customer ALL PRIMARY NULL NULL NULL 1500000 33.33 Using where

1 SIMPLE orders ref i_o_orderdate, i_o_custkey

i_o_custkey 5 dbt3. customer. c_custkey

15 31.19 Using where

Page 7: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Comparing Join Order

0

2

4

6

8

10

12

14

16

Qu

ery

Exe

cuti

on

Tim

e (

seco

nd

s)

orders → customer customer → orders

Performance

Page 8: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Histograms

ANALYZE TABLE customer UPDATE HISTOGRAM ON c_acctbal WITH 1024 BUCKETS;

EXPLAIN SELECT * FROM orders JOIN customer ON o_custkey = c_custkey WHERE o_orderdate < '1993-01-01' AND c_acctbal < -1000;

8

Create histogram to get a better plan

id select type

table type possible keys key key len

ref rows filtered extra

1 SIMPLE customer ALL PRIMARY NULL NULL NULL 1500000 0.00 Using where

1 SIMPLE orders ref i_o_orderdate, i_o_custkey

i_o_custkey 5 dbt3. customer. c_custkey

15 31.19 Using where

Page 9: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

Motivating example

Quick start guide

How are histograms used?

Query example

Some advice

1

2

3

4

5

9

Page 10: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Histograms

• Information about value distribution for a column

• Data values group in buckets

– Frequency calculated for each bucket

–Maximum 1024 buckets

• May use sampling to build histogram – Sample rate depends on available memory

• Automatically chooses between two histogram types:

– Singleton: One value per bucket

– Equi-height: Multiple values per bucket

10

Column statistics

Page 11: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Singleton Histogram

0

0,05

0,1

0,15

0,2

0,25

0 1 2 3 5 6 7 8 9 10

Fre

qu

en

cy

• One value per bucket

• Each bucket stores:

– Value

– Cumulative frequency

• Well suited to estimate both equality and range predicates

Page 12: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Equi-Height Histogram

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0 - 0 1 - 1 2 - 3 5 - 6 7 - 10

Fre

qu

en

cy

• Multiple values per bucket

• Not quite equi-height

– Values are not split across buckets ⇒Frequent values in separate buckets

• Each bucket stores:

–Minimum value

–Maximum value

– Cumulative frequency

– Number of distinct values

• Best suited for range predicates

Page 13: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Usage

• Create or refresh histogram(s) for column(s):

ANALYZE TABLE table UPDATE HISTOGRAM ON column [, column] WITH n BUCKETS;

– Note: Will only update histogram, not other statistics

• Drop histogram:

ANALYZE TABLE table DROP HISTOGRAM ON column [, column];

• Based on entire table or sampling: – Depends on avail. memory: histogram_generation_max_mem_size (default: 20 MB)

• New storage engine API for sampling

– Default implementation: Full table scan even when sampling

– Storage engines may implement more efficient sampling

13

Page 14: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Storage

• Stored in a JSON column in data dictionary

• Can be inspected in Information Schema table:

SELECT JSON_PRETTY(histogram) FROM information_schema.column_statistics WHERE schema_name = 'dbt3_sf1' AND table_name ='lineitem' AND column_name = 'l_linenumber';

14

Page 15: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Histogram content

{ "buckets": [[1, 0.24994938524948698], [2, 0.46421066400720523], [3, 0.6427401784471978], [4, 0.7855470933802572], [5, 0.8927398868395817], [6, 0.96423707532558], [7, 1] ], "data-type": "int", "null-values": 0.0, "collation-id": 8, "last-updated": "2018-02-03 21:05:21.690872", "sampling-rate": 0.20829115437457252, "histogram-type": "singleton", "number-of-buckets-specified": 1024 }

15

Page 16: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Strings

• Max. 42 characters considered

• Base64 encoded SELECT FROM_BASE64(SUBSTR(v, LOCATE(':', v, 10) + 1)) value, c cumulfreq FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets', '$[*]' COLUMNS(v VARCHAR(60) PATH '$[0]', c double PATH '$[1]')) hist WHERE column_name = 'o_orderstatus';

+-------+--------------------+

| value | cumulfreq |

+-------+--------------------+

| F | 0.4862529264385756 |

| O | 0.974029654577566 |

| P | 0.9999999999999999 |

+-------+--------------------+

16

Page 17: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Calculate Bucket Frequency

SELECT FROM_BASE64(SUBSTR(v, LOCATE(':', v, 10) + 1)) value, c cumulfreq, c - LAG(c, 1, 0) over () freq FROM information_schema.column_statistics, JSON_TABLE(histogram->'$.buckets', '$[*]' COLUMNS(v VARCHAR(60) PATH '$[0]', c double PATH '$[1]')) hist WHERE column_name = 'o_orderstatus';

+-------+--------------------+----------------------+

| value | cumulfreq | freq |

+-------+--------------------+----------------------+

| F | 0.4862529264385756 | 0.4862529264385756 |

| O | 0.974029654577566 | 0.48777672813899037 |

| P | 0.9999999999999999 | 0.025970345422433927 |

+-------+--------------------+----------------------+

Use window function

17

Page 18: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

Motivating example

Quick start guide

How are histograms used?

Query example

Some advice

1

2

3

4

5

18

Page 19: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

• tx JOIN tx+1

• records(tx+1) = records(tx) * condition_filter_effect * records_per_key

When are Histograms useful? Estimate cost of join

tx tx+1 Ref

ac

cess

Number of records read

from tx

Co

nd

itio

n f

ilter

ef

fect

Records passing the table conditions on tx

Cardinality statistics for index

Page 20: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Filter estimate based on what is available:

1. Range estimate

2. Index statistics

3. Guesstimate

= 0.1

<=,<,>,>= 1/3

BETWEEN 1/9

NOT <op> 1 – SEL(<op>)

AND P(A and B) = P(A) * P(B)

OR P(A or B) = P(A) + P(B) – P(A and B)

… …

How to Calculate Condition Filter Effect, MySQL 5.7

SELECT *

FROM office JOIN employee ON office.id = employee.office_id

WHERE office_name = 'San Francisco' AND

employee.name = 'John' AND age > 21 AND

hire_date BETWEEN '2014-01-01' AND '2014-06-01';

Page 21: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Filter estimate based on what is available:

1. Range estimate

2. Index statistics

3. Histograms

4. Guesstimate

= 0.1

<=,<,>,>= 1/3

BETWEEN 1/9

NOT <op> 1 – SEL(<op>)

AND P(A and B) = P(A) * P(B)

OR P(A or B) = P(A) + P(B) – P(A and B)

… …

How to Calculate Condition Filter Effect, MySQL 5.7

SELECT *

FROM office JOIN employee ON office.id = employee.office_id

WHERE office_name = 'San Francisco' AND

employee.name = 'John' AND age > 21 AND

hire_date BETWEEN '2014-01-01' AND '2014-06-01';

Page 22: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

SELECT *

FROM office JOIN employee ON office.id = employee.office_id

WHERE office_name = 'San Francisco' AND

employee.name = 'John' AND age > 21 AND

hire_date BETWEEN '2014-01-01' AND '2014-06-01';

Calculating Condition Filter Effect for Tables

Condition filter effect for tables:

– office: 0.03

– employee: 0.29 * 0.1 * 0.33 ≈ 0.01

Example without histograms

0.1 (guesstimate)

0.33 (guesstimate)

0.29 (range)

0.03 (index)

Page 23: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

SELECT *

FROM office JOIN employee ON office.id = employee.office_id

WHERE office_name = 'San Francisco' AND

employee.name = 'John' AND age > 21 AND

hire_date BETWEEN '2014-01-01' AND '2014-06-01';

Calculating Condition Filter Effect for Tables

Condition filter effect for tables:

– office: 0.03

– employee: 0.29 * 0.1 * 0.95 ≈ 0.03

Example with histogram

0.1 (guesstimate)

0.95 (histogram)

0.29 (range)

0.03 (index)

Page 24: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Computing Selectivity From Histogram

00,10,20,30,40,50,60,70,80,9

1

0 - 7

8 - 1

6

17

- 24

25

- 31

32

- 38

39

- 46

47

- 53

54

- 61

62

- 70

71

- 10

4

Fre

qu

en

cy

age

Cumulative Frequency

Example age <= 21

0.203

Selectivity = 0.203 +

0.306

(0.306 – 0.203) * 5/8 = 0.267 age > 21 Selectivity = 1 - 0.267 = 0.733

Page 25: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

Motivating example

Quick start guide

How are histograms used?

Query example

Some advice

1

2

3

4

5

25

Page 26: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

DBT-3 Query 7

SELECT supp_nation, cust_nation, l_year, SUM(volume) AS revenue FROM (SELECT n1.n_name AS supp_nation, n2.n_name AS cust_nation, EXTRACT(YEAR FROM l_shipdate) AS l_year, l_extendedprice * (1 - l_discount) AS volume FROM supplier, lineitem, orders, customer, nation n1, nation n2 WHERE s_suppkey = l_suppkey AND o_orderkey = l_orderkey AND c_custkey = o_custkey AND s_nationkey = n1.n_nationkey AND c_nationkey = n2.n_nationkey AND ((n1.n_name = 'RUSSIA' AND n2.n_name = 'FRANCE') OR (n1.n_name = 'FRANCE' AND n2.n_name = 'RUSSIA')) AND l_shipdate BETWEEN '1995-01-01' AND '1996-12-31') AS shipping GROUP BY supp_nation , cust_nation , l_year ORDER BY supp_nation , cust_nation , l_year;

Volume Shipping Query

Page 27: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

DBT-3 Query 7 Query plan without histogram

Page 28: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

DBT-3 Query 7 Query plan with histogram

Page 29: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

DBT-3 Query 7

0,0

0,2

0,4

0,6

0,8

1,0

1,2

1,4

1,6

1,8

Qu

ery

Exe

cuti

on

Tim

e (

seco

nd

s)

Without histogram With histogram

Performance

Page 30: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

Motivating example

Quick start guide

How is histograms used?

Query example

Some advice

1

2

3

4

5

30

Page 31: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Some advice

• Histograms are useful for columns that are

– not the first column of any index, and

– used in WHERE conditions of • JOIN queries

• Queries with IN-subqueries

• ORDER BY ... LIMIT queries

• Best fit – Low cardinality columns (e.g., gender, orderStatus, dayOfWeek, enums)

– Columns with uneven distribution (skew)

– Stable distribution (do not change much over time)

Which columns to create histograms for?

Page 32: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Some more advice

• When not to create histograms:

– First column of an index

– Never used in WHERE clause

–Monotonically increasing column values (e.g. date columns) • Histogram will need frequent updates to be accurate

• Consider to create index

• How many buckets?

– If possible, enough to get a singleton histogram

– For equi-height, 100 buckets should be enough

Page 33: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

More information

• MySQL Server Team blog

– http://mysqlserverteam.com/

– https://mysqlserverteam.com/histogram-statistics-in-mysql/ (Erik Frøseth)

• My blog: – http://oysteing.blogspot.com/

• MySQL forums:

–Optimizer & Parser: http://forums.mysql.com/list.php?115

– Performance: http://forums.mysql.com/list.php?24

Page 34: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

34

Page 35: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 35

Page 36: Histogram Support in MySQL 8 · How to Calculate Condition Filter Effect, MySQL 5.7 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = 'San Francisco'

Recommended