Date post: | 19-Jan-2017 |
Category: |
Technology |
Upload: | stew-ashton |
View: | 1,280 times |
Download: | 6 times |
Ranges, Ranges Everywhere!
Stew Ashton (stewashton.wordpress.com)UKOUG Tech 2016
Can you read the following line? If not, please move closer.
It's much better when you can read the code ;)
2
Agenda
• Defining ranges• Relating ranges: gaps, overlaps• Range DDL: sensible data• Ranges in one table• Ranges in two tables
3
Who am I?• 36 years in IT
– Developer, Technical Sales Engineer, Technical Architect– Aeronautics, IBM, Finance– Mainframe, client-server, Web apps
• 12 years using Oracle database– SQL performance analysis– Replace Java with SQL
• 4 years as in-house “Oracle Development Expert”• Conference speaker since 2014• Currently independent
5
What is a range?• Two values that can be compared
– Always use the same datatype – Comparable datatypes:
• integer, date (without time)• number, datetime, interval, (n)(var)char• rowid
• Range design questions:– Is the "end" value part of the range?– Are NULLs allowed?
6
Allen’sIntervalAlgebra
1 2 3 4A precedes B 1 2
B preceded by A 3 4A meets B 1 2 B met by A 2 3
A overlaps B 1 3B overlapped by A 2 4
A finished by B 1 3B finishes A 2 3A contains B 1 4B during A 2 3A starts B 1 2
B started by A 1 3A and B 1 2
are equal 1 2
Meet
Gap
"Overlap"
1 2 3 41 2 3 4A precedes B 1 2
B preceded by A 3 4
1 2 3 4A precedes B 1 2
B preceded by A 3 4A meets B 1 2 B met by A 2 3
7
End value: Inclusive or Exclusive• Design must allow ranges to "meet"• Discrete quantities can be inclusive– [1-3] meets [4-6] : no intermediate integer– [Jan. 1-31] meets [Feb. 1-28] : no intermediate date
• Continuous quantities require exclusive– Most ranges are continuous (including dates, really)
8
Votes for Exclusive end values• SQL:2013 and Oracle 12c Temporal Validity
– "Period": date/time range• [Closed-Open): includes start time but not end time
• WIDTH_BUCKET() function– Puts values in equiwidth histogram– Buckets must touch– [Closed-open): upper boundary value goes in higher bucket
• Me!– Exclusive end values work for every kind of range– Except: ROWID ranges must be inclusive
9
DDL: make sure data is sensible• Start_range < End_range• If date without time, CHECK( dte = trunc(dte))• If integer, say so• Is NULL allowed?– If so, what does it mean?– Ex. Temporal Validity :
NULL end value means "until the end of time"• Are overlaps allowed?
10
Overlaps avoided by unique constraintsUnique(start,end) Unique(start) Unique(end) 1 2 3 4
No constraint works A overlaps B 1 3B overlapped by A 2 4
Y A finished by B 1 3B finishes A 2 3
No constraint works A contains B 1 4B during A 2 3
Y A starts B 1 2 B started by A 1 3
Y Y Y A and B 1 2 are equal 1 2
11
Avoiding Overlaps: 3 solutions
1. Triggers– Hard to do right, not very scalable
2. "Refresh on commit" materialized views– Not scalable?
3. Virtual ranges
12
Virtual range: no gaps, no overlaps• One column: start value• End value is calculated:
= next row's start– Putting identical value in 2
rows is denormalization• Last row has unlimited end• Maybe OK for audit trails?
START_VALUE END_VALUE
16-11-15 08:30 16-11-15 09:30
16-11-15 09:30 16-11-15 18:30
16-11-15 18:30 (null)
START_VALUE
16-11-15 08:30
16-11-15 09:30
16-11-15 18:30
Physical (table)
Virtual (view)
13
Semi-Virtual range: no overlaps• Start column always used• End column optional:
– If null, use next row's start– If not null, use lesser of end
column and next row's start– Last row can have limited end
• Or: intermediate row with 'not exists' flag– ≅ Change Data Capture format
START_VALUE END_VALUE
16-11-15 08:30 16-11-15 09:30
16-11-15 18:30 (null)
START_VALUE D
16-11-15 08:30
16-11-15 09:30 D
16-11-15 18:30
14
Range-related SQL• Why hard?
– Can't use BETWEEN– Inequality joins impact performance– With overlaps, 1 value point can be in any number of rows– Joining 2 tables with overlaps -> row explosion– NULLs have special meanings
• Common problems– Find gaps– Intersect: find overlaps– Union: packing ranges between gaps– Joins
• Today, ends are exclusive, everything is NOT NULL (unless specified)
16
FROM_TM TO_TM07:00 08:0009:00 10:5010:00 10:4512:00 12:4518:00 23:00
select * from ( select max (to_tm) over(order by from_tm) as gap_from, lead(from_tm) over(order by from_tm) as gap_to from t) where gap_from < gap_to;
select to_tm as gap_from, lead(from_tm) over(order by from_tm) as gap_to from t
FROM_TM GAP_FROM GAP_TO07:00 08:00 09:0009:00 10:50 10:0010:00 10:45 12:0012:00 12:45 18:0018:00 23:00
GAP_FROM GAP_TO08:00 09:0010:50 12:0012:45 18:00
Gaps, ex. Free time in calendarFROM_TM GAP_FROM GAP_TO
07:00 08:00 09:0009:00 10:50 10:0010:00 10:50 12:0012:00 12:45 18:0018:00 23:00
17
Intersect: finding OverlapsTest case Start End01:precedes 1 201:precedes 3 402:meets 1 202:meets 2 303:overlaps 1 303:overlaps 2 404:finished by 1 304:finished by 2 305:contains 1 405:contains 2 306:starts 1 206:starts 1 307:equals 1 207:equals 1 2
select test_case, dte, col from t unpivot (dte for col in ( start_date as 1, end_date as -1))
A overlaps B 1 3B overlapped by A 2 4
1 2 2 3
3 4
18
select test_case, dte, col from t unpivot (dte for col in ( start_date as 1, end_date as -1))
select test_case, dte "Start", lead(dte,1,dte) over( partition by test_case order by dte, col desc ) "End", sum(col) over( partition by test_case order by dte, col desc ) "Rows" from t unpivot (dte for col in ( start_date as 1, end_date as -1))
Intersect: finding OverlapsTest case Dte Col01:precedes 1 101:precedes 2 -101:precedes 3 101:precedes 4 -102:meets 1 102:meets 2 -102:meets 2 102:meets 3 -103:overlaps 1 103:overlaps 3 -103:overlaps 2 103:overlaps 4 -1
19
select test_case, dte "Start", lead(dte,1,dte) over( partition by test_case order by dte, col desc ) "End", sum(col) over( partition by test_case order by dte, col desc ) "Rows" from t unpivot (dte for col in ( start_date as 1, end_date as -1))
select * from ( select test_case, dte "Start", lead(dte,1,dte) over( partition by test_case order by dte, col desc ) "End", sum(col) over( partition by test_case order by dte, col desc ) "Rows" from t unpivot (dte for col in ( start_date as 1, end_date as -1))) where "Start" < "End";
Intersect: finding OverlapsTest case Start End Rows01:precedes 1 2 101:precedes 2 3 001:precedes 3 4 101:precedes 4 4 002:meets 1 2 102:meets 2 2 202:meets 2 3 102:meets 3 3 003:overlaps 1 2 103:overlaps 2 3 203:overlaps 3 4 103:overlaps 4 4 0
✖
✖
✖
✖
20
select * from ( select test_case, dte "Start", lead(dte,1,dte) over( partition by test_case order by dte, col desc ) "End", sum(col) over( partition by test_case order by dte, col desc ) "Rows" from t unpivot (dte for col in ( start_date as 1, end_date as -1))) where "Start" < "End";
select * from ( select test_case, dte "Start", lead(dte,1,dte) over( partition by test_case order by dte, col desc ) "End", sum(col) over( partition by test_case order by dte, col desc ) "Rows" from t unpivot (dte for col in ( start_date as 1, end_date as -1))) where "Rows" > 1and "Start" < "End";
Intersect: finding OverlapsTest case Start End Rows01:precedes 1 2 101:precedes 2 3 001:precedes 3 4 102:meets 1 2 102:meets 2 3 103:overlaps 1 2 103:overlaps 2 3 203:overlaps 3 4 1
Test case Start End Rows03:overlaps 2 3 204:finished by 2 3 205:contains 2 3 206:starts 1 2 207:equals 1 2 2
21
Test case Start End01:precedes 1 201:precedes 3 402:meets 1 202:meets 2 303:overlaps 1 303:overlaps 2 404:finished by 1 304:finished by 2 305:contains 1 405:contains 2 306:starts 1 206:starts 1 307:equals 1 207:equals 1 2
Packing RangesTest case Start End01:precedes 1 201:precedes 3 402:meets 1 303:overlaps 1 404:finished by 1 305:contains 1 406:starts 1 307:equals 1 2
Test case Start End01:precedes 1 201:precedes 302:meets 103:overlaps 104:finished by 105:contains 106:starts 107:equals 1
select * from t match_recognize( partition by test_case order by end_date, start_date measures min(start_date) start_date, last(end_date) end_date pattern(a* b) define a as end_date >= next(start_date));
select * from t match_recognize( partition by test_case order by end_date, start_date measures min(start_date) start_date, last(end_date) end_date pattern(a* b) define a as end_date >= next(start_date) or end_date is null);
22
JOIN: range to range> create table A(start_n, end_n) asselect level, level+1 from dualconnect by level <= 10000;
> create table B as select start_n+9995 start_n, end_n+9996 end_nfrom A;
> select * from Ajoin B on (A.start_n <= B.start_n and B.start_n < A.end_n) or (B.start_n <= A.start_n and A.start_n < B.end_n);
Elapsed: 00:00:13.332
Exadata?
All data in buffer cache
Elapsed: 00:00:13.332
InMemory?Elapsed: 00:00:09.842
23
JOIN: range to range------------------------------------------------------------------------------------------| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:17.82 | 90 || 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:17.82 | 90 || 2 | CONCATENATION | | 1 | | 10 |00:00:00.01 | 90 || 3 | MERGE JOIN | | 1 | 55 | 10 |00:00:00.01 | 45 || 4 | SORT JOIN | | 1 | 10000 | 10000 |00:00:00.01 | 24 || 5 | TABLE ACCESS FULL | T_NEW | 1 | 10000 | 10000 |00:00:00.01 | 24 ||* 6 | FILTER | | 10000 | | 10 |00:00:00.01 | 21 ||* 7 | SORT JOIN | | 10000 | 10000 | 55 |00:00:00.01 | 21 || 8 | TABLE ACCESS FULL| T_OLD | 1 | 10000 | 10000 |00:00:00.02 | 21 || 9 | MERGE JOIN | | 1 | 55 | 0 |00:00:17.80 | 45 || 10 | SORT JOIN | | 1 | 10000 | 10000 |00:00:00.02 | 24 || 11 | TABLE ACCESS FULL | T_NEW | 1 | 10000 | 10000 |00:00:00.01 | 24 ||* 12 | FILTER | | 10000 | | 0 |00:00:17.78 | 21 ||* 13 | SORT JOIN | | 10000 | 10000 | 99M|00:01:21.50 | 21 || 14 | TABLE ACCESS FULL| T_OLD | 1 | 10000 | 10000 |00:00:00.01 | 21 |------------------------------------------------------------------------------------------
24
Join, or Sort and Match?
A 1 4B is equal 1 4
B started by A 1 5B during A 2 3B finishes A 3 4
B overlapped by A 3 4 5B met by A 4 5
B preceded by A 5 6another A 5 7
✔
✖
?
✔✔
✔✔
25
Join, or Sort and Match?
A 1 4B is equal 1 4
B started by A 1 5B during A 2 3B finishes A 3 4
B overlapped by A 3 4 5B met by A 4 5
B preceded by A 5 6another A 5 7
✖
?3
333
26
select A_start_n, A_end_n, B_start_n, B_end_n from ( select 'A' ttype, A.* from A union all select 'B' ttype, B.* from B) match_recognize ( order by start_n, end_n measures decode(f.ttype,'A',f.start_n, o.start_n) A_start_n, decode(f.ttype,'A',f.end_n, o.end_n) A_end_n, decode(f.ttype,'B',f.start_n, o.start_n) B_start_n, decode(f.ttype,'B',f.end_n, o.end_n) B_end_n all rows per match after match skip to next row pattern ( {-f-} (o|{-x-})+ ) define o as ttype != f.ttype and start_n < f.end_n, x as start_n < f.end_n);
Elapsed: 00:00:00.063
{- exclusion -}( grouping )+ at least oneAlternation A | B
✔✔