Relational Division
What it is, when to use it and how to use it.
Who am I?
• Teacher Institute of Education (1990-1992)– Matemathics and Physics
• Self-employed (1993-1998)
• Long time consultant (1993-)
• MVP – Microsoft Valuable Professional (2009-)
• SolidQ (2011-)
• Active on several forums– SwePeso, Peso or Pesomannen
• www.sqlteam.com
• www.sqlservercentral.com
• msdn.microsoft.com
• Blog at– http://weblogs.sqlteam.com/peterl
– http://www.sqltopia.com
BACKGROUND
Relational database theory
Codd & Date Consulting Group
Edgar Frank "Ted" Codd
• August 23, 1923 – April 18, 2003
• British computer scientist who, while
working for IBM, invented the relational
model for database management, the
theoretical basis for relational
databases.
Chris Date
• January 18, 1941 –
• Is an independent author, lecturer,
researcher, and consultant,
specializing in relational database
theory.
• The Third Manifesto - 1995
Relational Algebra
• Similar to normal algebra, except we use relations as values instead of numbers.
• Not used as a query language in actual DBMS.
• Relations are seen as sets of tuples, which means that no duplicates are allowed.
– SQL behaves differently in some cases.
• T-SQL is declarative, which means that you tell the DBMS what you want, but not
how it is to be calculated
– A .Net-program is procedural, which means that you have to state, step by step, exactly how
the result should be calculated.
• Relational algebra is mathematical expressions
Defining relational division
• Relational division is a binary operation involving two sets, R and S.
– The operation can be written just like a mathematical division, R ÷ S.
– Set R should have at least two attributes, A1 and A2.
– The result, Q will be a set of values from A1 in set R for which there’s a corresponding A2
value to match every row in set S.
• The relational division operation is
– Normally a time-consuming and complex operation.
– Effectively the opposite of the Cartesian product.
Practical uses
• Running a dating site and want to search for an arbitrary number of attributes
• Pattern matching; for example when you have only a part of a song and want to search for the full song, either binary or by the lyrics
• Which customers have been served by James and Gabrielle
• Which rental cars have had both engine overhaul and a paint job
Examples and visualization
Joe Celko
• Is an American relational database
expert from Austin, Texas
• He has participated on the ANSI X3H2
Database Standards Committee, and
helped write the SQL-89 and SQL-92
standards
Examples
• We’ll use his known example with
PilotsSkills and Hangars tables
Main example
PilotSkills
PilotName PlaneType
Celko Piper Cub
Higgins B-52 Bomber
Higgins F-14 Fighter
Higgins Piper Cub
Jones B-52 Bomber
Jones F-14 Fighter
Smith B-1 Bomber
Smith B-52 Bomber
Smith F-14 Fighter
Wilson B-1 Bomber
Wilson B-52 Bomber
Wilson F-14 Fighter
Wilson F-17 Fighter
Hangar
PlaneType
B-1 Bomber
B-52 Bomber
F-14 Fighter
PilotSkills ÷ HangarPilotName
Smith
Wilson
WilsonSmith
Exakt division
Celko Higgins Jones
Piper Cub
B-1 Bomber
B-52 Bomber
F-14 Fighter
F-17 Fighter
Smith Wilson
Division with remainder
Celko Higgins Jones
Piper Cub
B-1 Bomber
B-52 Bomber
F-14 Fighter
F-17 Fighter
DIVIDED WE STAND
Have I already used division?
WilsonSmithJonesHiggins
Single record, exact division
Piper Cub
B-1 Bomber
B-52 Bomber
F-14 Fighter
F-17 Fighter
Celko
WilsonSmithJones
Single record, with remainder
Piper Cub
B-1 Bomber
B-52 Bomber
F-14 Fighter
F-17 Fighter
HigginsCelko
What did we just do?
• Single record
• Single column
• Division with remainder
• WHERE clause!
– WHERE Col1 = 99
WilsonSmithJones
Is WHERE really a division?
Piper Cub
B-1 Bomber
B-52 Bomber
F-14 Fighter
F-17 Fighter
Celko Higgins
Records Columns Remainder You
Single Single Yes
Single Single No
Single Multiple Yes
Single Multiple No
Multiple Single Yes
Multiple Single No
Multiple Multiple Yes
Multiple Multiple No
Types of Relational Division
Relational Division Types
Rows Columns Remainder You
Single Single Yes
Single Single No
Single Multiple Yes
Single Multiple No
Multiple Single Yes
Multiple Single No
Multiple Multiple Yes
Multiple Multiple No
You are already an expert!
• Division is the most common arithmetic operation
• However, it only works for a single record subset
• Wait? Multiple column too?
• Yes– WHERE Col1 = 101
AND Col2 = 9
EXACT DIVISION
The simple way
Exact division, common
Exact division, advanced
Exact division, advanced
A new Relational Division algorithm
COMMON SET-BASED
ALGORITHMS
With remainder
Codd
• Uses double negation– “There ain't no planes in this hangar that I can't fly!“
• Can’t be used for exact division
Chris Date (Celko #1)
• Made popular by Celko
Celko #2
Todd’s Division
• Written by Pierre Mullin
COMMON SET-BASED
ALGORITHMS
Exact Division
Celko #3
COMPARISON GRAPH AND
CHART
Exact Division
Celko algorithms
Celko algorithms
Celko algorithms
New Relational Division algorithms
• http://www.sqlteam.com/forums/
topic.asp?TOPIC_ID=70832
• http://weblogs.sqlteam.com/
peterl/archive/2010/08/19/
checksum-weakness-
explained.aspx
Exact division Comparison Graph
Hangar - Sum of Reads
Hangar - Sum of Scans
PilotSkills - Sum of Reads
PilotSkills - Sum of Scans
Worktable - Sum of ReadsWorktable - Sum of Scans
0
5
10
15
20
25
30
35
40
Hangar - Sum of Reads
Hangar - Sum of Scans
PilotSkills - Sum of Reads
PilotSkills - Sum of Scans
Worktable - Sum of Reads
Worktable - Sum of Scans
Comparison Chart
Column Labels
Hangar PilotSkills Worktable Total Sum of Reads Total Sum of Scans
Row Labels Sum of Reads Sum of Scans Sum of Reads Sum of Scans Sum of Reads Sum of Scans
Celko 1 20 2 6 3 26 5
Celko 2 32 3 8 4 40 7
Celko 3 10 3 2 1 34 3 46 7
Celko 4 38 19 9 2 34 3 81 24
Celko 5 23 3 2 1 25 4
Peso experimental 2 1 2 1 4 2
Peso fast 8 3 4 2 12 5
Grand Total 133 34 33 14 68 6 234 54
COMPARISON GRAPH AND
CHART
With remainder
Remainder division Comparison Graph
Hangar - Sum of Reads
Hangar - Sum of Scans
PilotSkills - Sum of Reads
PilotSkills - Sum of Scans
Worktable - Sum of Reads
Worktable - Sum of Scans
0
5
10
15
20
25
30
35
40
Celko 1Celko 3
Celko 4Celko 5
Celko 6Peso fast
Hangar - Sum of Reads
Hangar - Sum of Scans
PilotSkills - Sum of Reads
PilotSkills - Sum of Scans
Worktable - Sum of Reads
Worktable - Sum of Scans
Comparison Chart
Column Labels
Hangar PilotSkills Worktable Total Sum of Reads Total Sum of Scans
Row Labels Sum of Reads Sum of Scans Sum of Reads Sum of Scans Sum of Reads Sum of Scans
Celko 1 20 2 6 3 26 5
Celko 3 18 3 2 1 31 3 51 7
Celko 4 38 19 16 8 31 3 85 30
Celko 5 23 3 2 1 25 4
Celko 6 20 10 2 1 22 11
Peso fast 16 8 6 3 22 11
Grand Total 135 45 34 17 62 6 231 68
Rows Columns Remainder You
Single Single Yes
Single Single No
Single Multiple Yes
Single Multiple No
Multiple Single Yes
Multiple Single No
Multiple Multiple Yes
Multiple Multiple No
Rows Columns Remainder You
Single Single Yes
Single Single No
Single Multiple Yes
Single Multiple No
Multiple Single Yes
Multiple Single No
Multiple Multiple Yes
Multiple Multiple No
Types of Relational Division
Relational Division Types What’s next?
• As you can see, neither of the previous
shown algorithms handles all types of
division
FINDING A COMMON ALGORITHM
Multiple columns
Comparison Graph
Hangar - Sum of Scans
Hangar - Sum of Reads
PilotSkills - Sum of Scans
PilotSkills - Sum of Reads
0
5
10
15
20
25
Celko 5 (exact)Celko 5
(remainder) Peso (exact)Peso (remainder)
Hangar - Sum of Scans
Hangar - Sum of Reads
PilotSkills - Sum of Scans
PilotSkills - Sum of Reads
Comparison Chart
Column Labels
Hangar PilotSkills Total Sum of Scans Total Sum of Reads
Row Labels Sum of Scans Sum of Reads Sum of Scans Sum of Reads
Celko 5 (exact) 3 23 1 2 4 25
Celko 5 (remainder) 3 23 1 2 4 25
Peso (exact) 4 8 2 4 6 12
Peso (remainder) 8 16 3 6 11 22
Grand Total 18 70 7 14 25 84
Comparison Graph, megasize
Sum of CPU
Sum of Duration
Sum of Reads
0
2 000 000
4 000 000
6 000 000
8 000 000
10 000 000
12 000 000
14 000 000
16 000 000
18 000 000
20 000 000
Celko 5 (megasize, exact)
Peso (megasize, exact)
18 472 449
61 962
18 181 022
39 050
11 300 678
51 879
Sum of CPU
Sum of Duration
Sum of Reads
Comparison Chart, megasize
Row Labels Sum of CPU Sum of Duration Sum of Reads
Celko 5 (megasize, exact) 18 472 449 18 181 022 11 300 678
Peso (megasize, exact) 61 962 39 050 51 879
Grand Total 18 534 411 18 220 072 11 352 557
Final example
PilotSkills
PilotName PlaneType Model
Celko Piper Cub A
Higgins B-52 Bomber A
Higgins F-14 Fighter J
Higgins Piper Cub A
Jones B-52 Bomber A
Jones F-14 Fighter J
Smith B-1 Bomber A
Smith B-52 Bomber B
Smith F-14 Fighter K
Wilson B-1 Bomber A
Wilson B-52 Bomber B
Wilson F-14 Fighter J
Wilson F-17 Fighter C
Hangar
SetID PlaneType Model
1 B-1 Bomber A
1 B-52 Bomber B
2 F-14 Fighter J
PilotSkills ÷ HangarSetID PilotName
1 Smith
1 Wilson
2 HIggins
2 Jones
2 Wilson
One query to rule them all
Rows Columns Remainder You
Single Single Yes
Single Single No
Single Multiple Yes
Single Multiple No
Multiple Single Yes
Multiple Single No
Multiple Multiple Yes
Multiple Multiple No
Rows Columns Remainder You
Single Single Yes
Single Single No
Single Multiple Yes
Single Multiple No
Multiple Single Yes
Multiple Single No
Multiple Multiple Yes
Multiple Multiple No
Rows Columns Remainder You
Single Single Yes
Single Single No
Single Multiple Yes
Single Multiple No
Multiple Single Yes
Multiple Single No
Multiple Multiple Yes
Multiple Multiple No
Types of Relational Division
Relational Division Types And as a bonus, several divisors!
• http://connect.microsoft.com/SQLServ
er/feedback/details/670531/move-t-sql-
language-closer-to-completion-with-a-
divide-by-operator
• http://bit.ly/u3318c
Succesful implementations and PoC
• An international Telecom company
– About 100 million rows
• 2 seconds vs ~1300 seconds
• An archiving medical system
– Projected one billion patients notes
– Already at 300 million rows
• 0.005 seconds vs ~65 seconds
Help make a difference!
• Buy the book!
Want to know more about me?
• Homepage and contact– http://www.sqltopia.com
– http://www.developerworkshop.net/
• Twitter– @SwePeso
• Co-founder of PASS Scania– http://www.pass-scania.se/
• Local mentor– http://www.SQLUG.se
• Phil Factor Speed Phreak challenges– 3 time winner