Ranges, ranges everywhere (Oracle SQL)

Ranges, Ranges Everywhere!

Stew Ashton (stewashton.wordpress.com)UKOUG Tech 2016

Can you read the following line? If not, please move closer.

It's much better when you can read the code ;)

http://stewashton.wordpress.com/

2

Agenda

• Defining ranges• Relating ranges: gaps, overlaps• Range DDL: sensible data• Ranges in one table• Ranges in two tables

3

Who am I?• 36 years in IT

– Developer, Technical Sales Engineer, Technical Architect– Aeronautics, IBM, Finance– Mainframe, client-server, Web apps

• 12 years using Oracle database– SQL performance analysis– Replace Java with SQL

• 4 years as in-house “Oracle Development Expert”• Conference speaker since 2014• Currently independent

4

Questions

5

What is a range?• Two values that can be compared

– Always use the same datatype – Comparable datatypes:

• integer, date (without time)• number, datetime, interval, (n)(var)char• rowid

• Range design questions:– Is the "end" value part of the range?– Are NULLs allowed?

6

Allen’sIntervalAlgebra

1 2 3 4A precedes B 1 2

B preceded by A 3 4A meets B 1 2 B met by A 2 3

A overlaps B 1 3B overlapped by A 2 4

A finished by B 1 3B finishes A 2 3A contains B 1 4B during A 2 3A starts B 1 2

B started by A 1 3A and B 1 2

are equal 1 2

Meet

Gap

"Overlap"

1 2 3 41 2 3 4A precedes B 1 2

B preceded by A 3 4

1 2 3 4A precedes B 1 2

B preceded by A 3 4A meets B 1 2 B met by A 2 3

7

End value: Inclusive or Exclusive• Design must allow ranges to "meet"• Discrete quantities can be inclusive– [1-3] meets [4-6] : no intermediate integer– [Jan. 1-31] meets [Feb. 1-28] : no intermediate date

• Continuous quantities require exclusive– Most ranges are continuous (including dates, really)

8

Votes for Exclusive end values• SQL:2013 and Oracle 12c Temporal Validity

– "Period": date/time range• [Closed-Open): includes start time but not end time

• WIDTH_BUCKET() function– Puts values in equiwidth histogram– Buckets must touch– [Closed-open): upper boundary value goes in higher bucket

• Me!– Exclusive end values work for every kind of range– Except: ROWID ranges must be inclusive

9

DDL: make sure data is sensible• Start_range < End_range• If date without time, CHECK( dte = trunc(dte))• If integer, say so• Is NULL allowed?– If so, what does it mean?– Ex. Temporal Validity :

NULL end value means "until the end of time"• Are overlaps allowed?

10

Overlaps avoided by unique constraintsUnique(start,end) Unique(start) Unique(end) 1 2 3 4

No constraint works A overlaps B 1 3B overlapped by A 2 4

Y A finished by B 1 3B finishes A 2 3

No constraint works A contains B 1 4B during A 2 3

Y A starts B 1 2 B started by A 1 3

Y Y Y A and B 1 2 are equal 1 2

11

Avoiding Overlaps: 3 solutions

1. Triggers– Hard to do right, not very scalable

2. "Refresh on commit" materialized views– Not scalable?

3. Virtual ranges

12

Virtual range: no gaps, no overlaps• One column: start value• End value is calculated:

= next row's start– Putting identical value in 2

rows is denormalization• Last row has unlimited end• Maybe OK for audit trails?

START_VALUE END_VALUE

16-11-15 08:30 16-11-15 09:30

16-11-15 09:30 16-11-15 18:30

16-11-15 18:30 (null)

START_VALUE

16-11-15 08:30

16-11-15 09:30

16-11-15 18:30

Physical (table)

Virtual (view)

13

Semi-Virtual range: no overlaps• Start column always used• End column optional:

– If null, use next row's start– If not null, use lesser of end

column and next row's start– Last row can have limited end

• Or: intermediate row with 'not exists' flag– ≅ Change Data Capture format

START_VALUE END_VALUE

16-11-15 08:30 16-11-15 09:30

16-11-15 18:30 (null)

START_VALUE D

16-11-15 08:30

16-11-15 09:30 D

16-11-15 18:30

14

Range-related SQL• Why hard?

– Can't use BETWEEN– Inequality joins impact performance– With overlaps, 1 value point can be in any number of rows– Joining 2 tables with overlaps -> row explosion– NULLs have special meanings

• Common problems– Find gaps– Intersect: find overlaps– Union: packing ranges between gaps– Joins

• Today, ends are exclusive, everything is NOT NULL (unless specified)

15

16

FROM_TM TO_TM07:00 08:0009:00 10:5010:00 10:4512:00 12:4518:00 23:00

select * from ( select max (to_tm) over(order by from_tm) as gap_from, lead(from_tm) over(order by from_tm) as gap_to from t) where gap_from < gap_to;

select to_tm as gap_from, lead(from_tm) over(order by from_tm) as gap_to from t

FROM_TM GAP_FROM GAP_TO07:00 08:00 09:0009:00 10:50 10:0010:00 10:45 12:0012:00 12:45 18:0018:00 23:00

GAP_FROM GAP_TO08:00 09:0010:50 12:0012:45 18:00

Gaps, ex. Free time in calendarFROM_TM GAP_FROM GAP_TO

07:00 08:00 09:0009:00 10:50 10:0010:00 10:50 12:0012:00 12:45 18:0018:00 23:00

17

Intersect: finding OverlapsTest case Start End01:precedes 1 201:precedes 3 402:meets 1 202:meets 2 303:overlaps 1 303:overlaps 2 404:finished by 1 304:finished by 2 305:contains 1 405:contains 2 306:starts 1 206:starts 1 307:equals 1 207:equals 1 2

select test_case, dte, col from t unpivot (dte for col in ( start_date as 1, end_date as -1))

A overlaps B 1 3B overlapped by A 2 4

1 2 2 3

3 4

18

select test_case, dte, col from t unpivot (dte for col in ( start_date as 1, end_date as -1))

select test_case, dte "Start", lead(dte,1,dte) over( partition by test_case order by dte, col desc ) "End", sum(col) over( partition by test_case order by dte, col desc ) "Rows" from t unpivot (dte for col in ( start_date as 1, end_date as -1))

Intersect: finding OverlapsTest case Dte Col01:precedes 1 101:precedes 2 -101:precedes 3 101:precedes 4 -102:meets 1 102:meets 2 -102:meets 2 102:meets 3 -103:overlaps 1 103:overlaps 3 -103:overlaps 2 103:overlaps 4 -1

19

select test_case, dte "Start", lead(dte,1,dte) over( partition by test_case order by dte, col desc ) "End", sum(col) over( partition by test_case order by dte, col desc ) "Rows" from t unpivot (dte for col in ( start_date as 1, end_date as -1))

select * from ( select test_case, dte "Start", lead(dte,1,dte) over( partition by test_case order by dte, col desc ) "End", sum(col) over( partition by test_case order by dte, col desc ) "Rows" from t unpivot (dte for col in ( start_date as 1, end_date as -1))) where "Start" < "End";

Intersect: finding OverlapsTest case Start End Rows01:precedes 1 2 101:precedes 2 3 001:precedes 3 4 101:precedes 4 4 002:meets 1 2 102:meets 2 2 202:meets 2 3 102:meets 3 3 003:overlaps 1 2 103:overlaps 2 3 203:overlaps 3 4 103:overlaps 4 4 0

✖

✖

✖

✖

20

select * from ( select test_case, dte "Start", lead(dte,1,dte) over( partition by test_case order by dte, col desc ) "End", sum(col) over( partition by test_case order by dte, col desc ) "Rows" from t unpivot (dte for col in ( start_date as 1, end_date as -1))) where "Start" < "End";

select * from ( select test_case, dte "Start", lead(dte,1,dte) over( partition by test_case order by dte, col desc ) "End", sum(col) over( partition by test_case order by dte, col desc ) "Rows" from t unpivot (dte for col in ( start_date as 1, end_date as -1))) where "Rows" > 1and "Start" < "End";

Intersect: finding OverlapsTest case Start End Rows01:precedes 1 2 101:precedes 2 3 001:precedes 3 4 102:meets 1 2 102:meets 2 3 103:overlaps 1 2 103:overlaps 2 3 203:overlaps 3 4 1

Test case Start End Rows03:overlaps 2 3 204:finished by 2 3 205:contains 2 3 206:starts 1 2 207:equals 1 2 2

21

Test case Start End01:precedes 1 201:precedes 3 402:meets 1 202:meets 2 303:overlaps 1 303:overlaps 2 404:finished by 1 304:finished by 2 305:contains 1 405:contains 2 306:starts 1 206:starts 1 307:equals 1 207:equals 1 2

Packing RangesTest case Start End01:precedes 1 201:precedes 3 402:meets 1 303:overlaps 1 404:finished by 1 305:contains 1 406:starts 1 307:equals 1 2

Test case Start End01:precedes 1 201:precedes 302:meets 103:overlaps 104:finished by 105:contains 106:starts 107:equals 1

select * from t match_recognize( partition by test_case order by end_date, start_date measures min(start_date) start_date, last(end_date) end_date pattern(a* b) define a as end_date >= next(start_date));

select * from t match_recognize( partition by test_case order by end_date, start_date measures min(start_date) start_date, last(end_date) end_date pattern(a* b) define a as end_date >= next(start_date) or end_date is null);

22

JOIN: range to range> create table A(start_n, end_n) asselect level, level+1 from dualconnect by level <= 10000;

> create table B as select start_n+9995 start_n, end_n+9996 end_nfrom A;

> select * from Ajoin B on (A.start_n <= B.start_n and B.start_n < A.end_n) or (B.start_n <= A.start_n and A.start_n < B.end_n);

Elapsed: 00:00:13.332

Exadata?

All data in buffer cache

Elapsed: 00:00:13.332

InMemory?Elapsed: 00:00:09.842

23

JOIN: range to range------------------------------------------------------------------------------------------| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:17.82 | 90 || 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:17.82 | 90 || 2 | CONCATENATION | | 1 | | 10 |00:00:00.01 | 90 || 3 | MERGE JOIN | | 1 | 55 | 10 |00:00:00.01 | 45 || 4 | SORT JOIN | | 1 | 10000 | 10000 |00:00:00.01 | 24 || 5 | TABLE ACCESS FULL | T_NEW | 1 | 10000 | 10000 |00:00:00.01 | 24 ||* 6 | FILTER | | 10000 | | 10 |00:00:00.01 | 21 ||* 7 | SORT JOIN | | 10000 | 10000 | 55 |00:00:00.01 | 21 || 8 | TABLE ACCESS FULL| T_OLD | 1 | 10000 | 10000 |00:00:00.02 | 21 || 9 | MERGE JOIN | | 1 | 55 | 0 |00:00:17.80 | 45 || 10 | SORT JOIN | | 1 | 10000 | 10000 |00:00:00.02 | 24 || 11 | TABLE ACCESS FULL | T_NEW | 1 | 10000 | 10000 |00:00:00.01 | 24 ||* 12 | FILTER | | 10000 | | 0 |00:00:17.78 | 21 ||* 13 | SORT JOIN | | 10000 | 10000 | 99M|00:01:21.50 | 21 || 14 | TABLE ACCESS FULL| T_OLD | 1 | 10000 | 10000 |00:00:00.01 | 21 |------------------------------------------------------------------------------------------

24

Join, or Sort and Match?

A 1 4B is equal 1 4

B started by A 1 5B during A 2 3B finishes A 3 4

B overlapped by A 3 4 5B met by A 4 5

B preceded by A 5 6another A 5 7

✔

✖

?

✔✔

✔✔

25

Join, or Sort and Match?

A 1 4B is equal 1 4

B started by A 1 5B during A 2 3B finishes A 3 4

B overlapped by A 3 4 5B met by A 4 5

B preceded by A 5 6another A 5 7

✖

?3

333

26

select A_start_n, A_end_n, B_start_n, B_end_n from ( select 'A' ttype, A.* from A union all select 'B' ttype, B.* from B) match_recognize ( order by start_n, end_n measures decode(f.ttype,'A',f.start_n, o.start_n) A_start_n, decode(f.ttype,'A',f.end_n, o.end_n) A_end_n, decode(f.ttype,'B',f.start_n, o.start_n) B_start_n, decode(f.ttype,'B',f.end_n, o.end_n) B_end_n all rows per match after match skip to next row pattern ( {-f-} (o|{-x-})+ ) define o as ttype != f.ttype and start_n < f.end_n, x as start_n < f.end_n);

Elapsed: 00:00:00.063

{- exclusion -}( grouping )+ at least oneAlternation A | B

✔✔

27

Child's

play

28

More!• Overlapping ranges with priority• Data warehouses with date ranges:– Trickle feed

• Impact on foreign keys• OLTP

• Take advantage of MATCH_RECOGNIZE ,

Date post:	19-Jan-2017
Category:	Technology
Upload:	stew-ashton
View:	1,280 times
Download:	6 times

Ranges, ranges everywhere (Oracle SQL)

Technology