Relational Division Explainedpublications.sqltopia.com/Relational Division.pdf · 2013. 11. 12. ·...

Post on 22-Jan-2021

1 views 0 download

transcript

Relational Division

What it is, when to use it and how to use it.

Who am I?

• Teacher Institute of Education (1990-1992)– Matemathics and Physics

• Self-employed (1993-1998)

• Long time consultant (1993-)

• MVP – Microsoft Valuable Professional (2009-)

• SolidQ (2011-)

• Active on several forums– SwePeso, Peso or Pesomannen

• www.sqlteam.com

• www.sqlservercentral.com

• msdn.microsoft.com

• Blog at– http://weblogs.sqlteam.com/peterl

– http://www.sqltopia.com

BACKGROUND

Relational database theory

Codd & Date Consulting Group

Edgar Frank "Ted" Codd

• August 23, 1923 – April 18, 2003

• British computer scientist who, while

working for IBM, invented the relational

model for database management, the

theoretical basis for relational

databases.

Chris Date

• January 18, 1941 –

• Is an independent author, lecturer,

researcher, and consultant,

specializing in relational database

theory.

• The Third Manifesto - 1995

Relational Algebra

• Similar to normal algebra, except we use relations as values instead of numbers.

• Not used as a query language in actual DBMS.

• Relations are seen as sets of tuples, which means that no duplicates are allowed.

– SQL behaves differently in some cases.

• T-SQL is declarative, which means that you tell the DBMS what you want, but not

how it is to be calculated

– A .Net-program is procedural, which means that you have to state, step by step, exactly how

the result should be calculated.

• Relational algebra is mathematical expressions

Defining relational division

• Relational division is a binary operation involving two sets, R and S.

– The operation can be written just like a mathematical division, R ÷ S.

– Set R should have at least two attributes, A1 and A2.

– The result, Q will be a set of values from A1 in set R for which there’s a corresponding A2

value to match every row in set S.

• The relational division operation is

– Normally a time-consuming and complex operation.

– Effectively the opposite of the Cartesian product.

Practical uses

• Running a dating site and want to search for an arbitrary number of attributes

• Pattern matching; for example when you have only a part of a song and want to search for the full song, either binary or by the lyrics

• Which customers have been served by James and Gabrielle

• Which rental cars have had both engine overhaul and a paint job

Examples and visualization

Joe Celko

• Is an American relational database

expert from Austin, Texas

• He has participated on the ANSI X3H2

Database Standards Committee, and

helped write the SQL-89 and SQL-92

standards

Examples

• We’ll use his known example with

PilotsSkills and Hangars tables

Main example

PilotSkills

PilotName PlaneType

Celko Piper Cub

Higgins B-52 Bomber

Higgins F-14 Fighter

Higgins Piper Cub

Jones B-52 Bomber

Jones F-14 Fighter

Smith B-1 Bomber

Smith B-52 Bomber

Smith F-14 Fighter

Wilson B-1 Bomber

Wilson B-52 Bomber

Wilson F-14 Fighter

Wilson F-17 Fighter

Hangar

PlaneType

B-1 Bomber

B-52 Bomber

F-14 Fighter

PilotSkills ÷ HangarPilotName

Smith

Wilson

WilsonSmith

Exakt division

Celko Higgins Jones

Piper Cub

B-1 Bomber

B-52 Bomber

F-14 Fighter

F-17 Fighter

Smith Wilson

Division with remainder

Celko Higgins Jones

Piper Cub

B-1 Bomber

B-52 Bomber

F-14 Fighter

F-17 Fighter

DIVIDED WE STAND

Have I already used division?

WilsonSmithJonesHiggins

Single record, exact division

Piper Cub

B-1 Bomber

B-52 Bomber

F-14 Fighter

F-17 Fighter

Celko

WilsonSmithJones

Single record, with remainder

Piper Cub

B-1 Bomber

B-52 Bomber

F-14 Fighter

F-17 Fighter

HigginsCelko

What did we just do?

• Single record

• Single column

• Division with remainder

• WHERE clause!

– WHERE Col1 = 99

WilsonSmithJones

Is WHERE really a division?

Piper Cub

B-1 Bomber

B-52 Bomber

F-14 Fighter

F-17 Fighter

Celko Higgins

Records Columns Remainder You

Single Single Yes

Single Single No

Single Multiple Yes

Single Multiple No

Multiple Single Yes

Multiple Single No

Multiple Multiple Yes

Multiple Multiple No

Types of Relational Division

Relational Division Types

Rows Columns Remainder You

Single Single Yes

Single Single No

Single Multiple Yes

Single Multiple No

Multiple Single Yes

Multiple Single No

Multiple Multiple Yes

Multiple Multiple No

You are already an expert!

• Division is the most common arithmetic operation

• However, it only works for a single record subset

• Wait? Multiple column too?

• Yes– WHERE Col1 = 101

AND Col2 = 9

EXACT DIVISION

The simple way

Exact division, common

Exact division, advanced

Exact division, advanced

A new Relational Division algorithm

COMMON SET-BASED

ALGORITHMS

With remainder

Codd

• Uses double negation– “There ain't no planes in this hangar that I can't fly!“

• Can’t be used for exact division

Chris Date (Celko #1)

• Made popular by Celko

Celko #2

Todd’s Division

• Written by Pierre Mullin

COMMON SET-BASED

ALGORITHMS

Exact Division

Celko #3

COMPARISON GRAPH AND

CHART

Exact Division

Celko algorithms

Celko algorithms

Celko algorithms

New Relational Division algorithms

• http://www.sqlteam.com/forums/

topic.asp?TOPIC_ID=70832

• http://weblogs.sqlteam.com/

peterl/archive/2010/08/19/

checksum-weakness-

explained.aspx

Exact division Comparison Graph

Hangar - Sum of Reads

Hangar - Sum of Scans

PilotSkills - Sum of Reads

PilotSkills - Sum of Scans

Worktable - Sum of ReadsWorktable - Sum of Scans

0

5

10

15

20

25

30

35

40

Hangar - Sum of Reads

Hangar - Sum of Scans

PilotSkills - Sum of Reads

PilotSkills - Sum of Scans

Worktable - Sum of Reads

Worktable - Sum of Scans

Comparison Chart

Column Labels

Hangar PilotSkills Worktable Total Sum of Reads Total Sum of Scans

Row Labels Sum of Reads Sum of Scans Sum of Reads Sum of Scans Sum of Reads Sum of Scans

Celko 1 20 2 6 3 26 5

Celko 2 32 3 8 4 40 7

Celko 3 10 3 2 1 34 3 46 7

Celko 4 38 19 9 2 34 3 81 24

Celko 5 23 3 2 1 25 4

Peso experimental 2 1 2 1 4 2

Peso fast 8 3 4 2 12 5

Grand Total 133 34 33 14 68 6 234 54

COMPARISON GRAPH AND

CHART

With remainder

Remainder division Comparison Graph

Hangar - Sum of Reads

Hangar - Sum of Scans

PilotSkills - Sum of Reads

PilotSkills - Sum of Scans

Worktable - Sum of Reads

Worktable - Sum of Scans

0

5

10

15

20

25

30

35

40

Celko 1Celko 3

Celko 4Celko 5

Celko 6Peso fast

Hangar - Sum of Reads

Hangar - Sum of Scans

PilotSkills - Sum of Reads

PilotSkills - Sum of Scans

Worktable - Sum of Reads

Worktable - Sum of Scans

Comparison Chart

Column Labels

Hangar PilotSkills Worktable Total Sum of Reads Total Sum of Scans

Row Labels Sum of Reads Sum of Scans Sum of Reads Sum of Scans Sum of Reads Sum of Scans

Celko 1 20 2 6 3 26 5

Celko 3 18 3 2 1 31 3 51 7

Celko 4 38 19 16 8 31 3 85 30

Celko 5 23 3 2 1 25 4

Celko 6 20 10 2 1 22 11

Peso fast 16 8 6 3 22 11

Grand Total 135 45 34 17 62 6 231 68

Rows Columns Remainder You

Single Single Yes

Single Single No

Single Multiple Yes

Single Multiple No

Multiple Single Yes

Multiple Single No

Multiple Multiple Yes

Multiple Multiple No

Rows Columns Remainder You

Single Single Yes

Single Single No

Single Multiple Yes

Single Multiple No

Multiple Single Yes

Multiple Single No

Multiple Multiple Yes

Multiple Multiple No

Types of Relational Division

Relational Division Types What’s next?

• As you can see, neither of the previous

shown algorithms handles all types of

division

FINDING A COMMON ALGORITHM

Multiple columns

Comparison Graph

Hangar - Sum of Scans

Hangar - Sum of Reads

PilotSkills - Sum of Scans

PilotSkills - Sum of Reads

0

5

10

15

20

25

Celko 5 (exact)Celko 5

(remainder) Peso (exact)Peso (remainder)

Hangar - Sum of Scans

Hangar - Sum of Reads

PilotSkills - Sum of Scans

PilotSkills - Sum of Reads

Comparison Chart

Column Labels

Hangar PilotSkills Total Sum of Scans Total Sum of Reads

Row Labels Sum of Scans Sum of Reads Sum of Scans Sum of Reads

Celko 5 (exact) 3 23 1 2 4 25

Celko 5 (remainder) 3 23 1 2 4 25

Peso (exact) 4 8 2 4 6 12

Peso (remainder) 8 16 3 6 11 22

Grand Total 18 70 7 14 25 84

Comparison Graph, megasize

Sum of CPU

Sum of Duration

Sum of Reads

0

2 000 000

4 000 000

6 000 000

8 000 000

10 000 000

12 000 000

14 000 000

16 000 000

18 000 000

20 000 000

Celko 5 (megasize, exact)

Peso (megasize, exact)

18 472 449

61 962

18 181 022

39 050

11 300 678

51 879

Sum of CPU

Sum of Duration

Sum of Reads

Comparison Chart, megasize

Row Labels Sum of CPU Sum of Duration Sum of Reads

Celko 5 (megasize, exact) 18 472 449 18 181 022 11 300 678

Peso (megasize, exact) 61 962 39 050 51 879

Grand Total 18 534 411 18 220 072 11 352 557

Final example

PilotSkills

PilotName PlaneType Model

Celko Piper Cub A

Higgins B-52 Bomber A

Higgins F-14 Fighter J

Higgins Piper Cub A

Jones B-52 Bomber A

Jones F-14 Fighter J

Smith B-1 Bomber A

Smith B-52 Bomber B

Smith F-14 Fighter K

Wilson B-1 Bomber A

Wilson B-52 Bomber B

Wilson F-14 Fighter J

Wilson F-17 Fighter C

Hangar

SetID PlaneType Model

1 B-1 Bomber A

1 B-52 Bomber B

2 F-14 Fighter J

PilotSkills ÷ HangarSetID PilotName

1 Smith

1 Wilson

2 HIggins

2 Jones

2 Wilson

One query to rule them all

Rows Columns Remainder You

Single Single Yes

Single Single No

Single Multiple Yes

Single Multiple No

Multiple Single Yes

Multiple Single No

Multiple Multiple Yes

Multiple Multiple No

Rows Columns Remainder You

Single Single Yes

Single Single No

Single Multiple Yes

Single Multiple No

Multiple Single Yes

Multiple Single No

Multiple Multiple Yes

Multiple Multiple No

Rows Columns Remainder You

Single Single Yes

Single Single No

Single Multiple Yes

Single Multiple No

Multiple Single Yes

Multiple Single No

Multiple Multiple Yes

Multiple Multiple No

Types of Relational Division

Relational Division Types And as a bonus, several divisors!

• http://connect.microsoft.com/SQLServ

er/feedback/details/670531/move-t-sql-

language-closer-to-completion-with-a-

divide-by-operator

• http://bit.ly/u3318c

Succesful implementations and PoC

• An international Telecom company

– About 100 million rows

• 2 seconds vs ~1300 seconds

• An archiving medical system

– Projected one billion patients notes

– Already at 300 million rows

• 0.005 seconds vs ~65 seconds

Help make a difference!

• Buy the book!

Want to know more about me?

• Homepage and contact– http://www.sqltopia.com

– http://www.developerworkshop.net/

• Twitter– @SwePeso

• Co-founder of PASS Scania– http://www.pass-scania.se/

• Local mentor– http://www.SQLUG.se

• Phil Factor Speed Phreak challenges– 3 time winner