+ All Categories
Home > Technology > Ranges, ranges everywhere (Oracle SQL)

Ranges, ranges everywhere (Oracle SQL)

Date post: 19-Jan-2017
Category:
Upload: stew-ashton
View: 1,280 times
Download: 6 times
Share this document with a friend
28
Ranges, Ranges Everywhere! Stew Ashton (stewashton.wordpress.com ) UKOUG Tech 2016 Can you read the following line? If not, please move closer. It's much better when you can read the code ;)
Transcript

Ranges, Ranges Everywhere!

Stew Ashton (stewashton.wordpress.com)UKOUG Tech 2016

Can you read the following line? If not, please move closer.

It's much better when you can read the code ;)

2

Agenda

• Defining ranges• Relating ranges: gaps, overlaps• Range DDL: sensible data• Ranges in one table• Ranges in two tables

3

Who am I?• 36 years in IT

– Developer, Technical Sales Engineer, Technical Architect– Aeronautics, IBM, Finance– Mainframe, client-server, Web apps

• 12 years using Oracle database– SQL performance analysis– Replace Java with SQL

• 4 years as in-house “Oracle Development Expert”• Conference speaker since 2014• Currently independent

4

Questions

5

What is a range?• Two values that can be compared

– Always use the same datatype – Comparable datatypes:

• integer, date (without time)• number, datetime, interval, (n)(var)char• rowid

• Range design questions:– Is the "end" value part of the range?– Are NULLs allowed?

6

Allen’sIntervalAlgebra

1 2 3 4A precedes B 1 2

B preceded by A 3 4A meets B 1 2 B met by A 2 3

A overlaps B 1 3B overlapped by A 2 4

A finished by B 1 3B finishes A 2 3A contains B 1 4B during A 2 3A starts B 1 2

B started by A 1 3A and B 1 2

are equal 1 2

Meet

Gap

"Overlap"

1 2 3 41 2 3 4A precedes B 1 2

B preceded by A 3 4

1 2 3 4A precedes B 1 2

B preceded by A 3 4A meets B 1 2 B met by A 2 3

7

End value: Inclusive or Exclusive• Design must allow ranges to "meet"• Discrete quantities can be inclusive– [1-3] meets [4-6] : no intermediate integer– [Jan. 1-31] meets [Feb. 1-28] : no intermediate date

• Continuous quantities require exclusive– Most ranges are continuous (including dates, really)

8

Votes for Exclusive end values• SQL:2013 and Oracle 12c Temporal Validity

– "Period": date/time range• [Closed-Open): includes start time but not end time

• WIDTH_BUCKET() function– Puts values in equiwidth histogram– Buckets must touch– [Closed-open): upper boundary value goes in higher bucket

• Me!– Exclusive end values work for every kind of range– Except: ROWID ranges must be inclusive

9

DDL: make sure data is sensible• Start_range < End_range• If date without time, CHECK( dte = trunc(dte))• If integer, say so• Is NULL allowed?– If so, what does it mean?– Ex. Temporal Validity :

NULL end value means "until the end of time"• Are overlaps allowed?

10

Overlaps avoided by unique constraintsUnique(start,end) Unique(start) Unique(end) 1 2 3 4

No constraint works A overlaps B 1 3B overlapped by A 2 4

Y A finished by B 1 3B finishes A 2 3

No constraint works A contains B 1 4B during A 2 3

Y A starts B 1 2 B started by A 1 3

Y Y Y A and B 1 2 are equal 1 2

11

Avoiding Overlaps: 3 solutions

1. Triggers– Hard to do right, not very scalable

2. "Refresh on commit" materialized views– Not scalable?

3. Virtual ranges

12

Virtual range: no gaps, no overlaps• One column: start value• End value is calculated:

= next row's start– Putting identical value in 2

rows is denormalization• Last row has unlimited end• Maybe OK for audit trails?

START_VALUE END_VALUE

16-11-15 08:30 16-11-15 09:30

16-11-15 09:30 16-11-15 18:30

16-11-15 18:30 (null)

START_VALUE

16-11-15 08:30

16-11-15 09:30

16-11-15 18:30

Physical (table)

Virtual (view)

13

Semi-Virtual range: no overlaps• Start column always used• End column optional:

– If null, use next row's start– If not null, use lesser of end

column and next row's start– Last row can have limited end

• Or: intermediate row with 'not exists' flag– ≅ Change Data Capture format

START_VALUE END_VALUE

16-11-15 08:30 16-11-15 09:30

16-11-15 18:30 (null)

START_VALUE D

16-11-15 08:30

16-11-15 09:30 D

16-11-15 18:30

14

Range-related SQL• Why hard?

– Can't use BETWEEN– Inequality joins impact performance– With overlaps, 1 value point can be in any number of rows– Joining 2 tables with overlaps -> row explosion– NULLs have special meanings

• Common problems– Find gaps– Intersect: find overlaps– Union: packing ranges between gaps– Joins

• Today, ends are exclusive, everything is NOT NULL (unless specified)

15

16

FROM_TM TO_TM07:00 08:0009:00 10:5010:00 10:4512:00 12:4518:00 23:00

select * from ( select max (to_tm) over(order by from_tm) as gap_from, lead(from_tm) over(order by from_tm) as gap_to from t) where gap_from < gap_to;

select to_tm as gap_from, lead(from_tm) over(order by from_tm) as gap_to from t

FROM_TM GAP_FROM GAP_TO07:00 08:00 09:0009:00 10:50 10:0010:00 10:45 12:0012:00 12:45 18:0018:00 23:00

GAP_FROM GAP_TO08:00 09:0010:50 12:0012:45 18:00

Gaps, ex. Free time in calendarFROM_TM GAP_FROM GAP_TO

07:00 08:00 09:0009:00 10:50 10:0010:00 10:50 12:0012:00 12:45 18:0018:00 23:00

17

Intersect: finding OverlapsTest case Start End01:precedes 1 201:precedes 3 402:meets 1 202:meets 2 303:overlaps 1 303:overlaps 2 404:finished by 1 304:finished by 2 305:contains 1 405:contains 2 306:starts 1 206:starts 1 307:equals 1 207:equals 1 2

select test_case, dte, col from t unpivot (dte for col in ( start_date as 1, end_date as -1))

A overlaps B 1 3B overlapped by A 2 4

1 2 2 3

3 4

18

select test_case, dte, col from t unpivot (dte for col in ( start_date as 1, end_date as -1))

select test_case, dte "Start", lead(dte,1,dte) over( partition by test_case order by dte, col desc ) "End", sum(col) over( partition by test_case order by dte, col desc ) "Rows" from t unpivot (dte for col in ( start_date as 1, end_date as -1))

Intersect: finding OverlapsTest case Dte Col01:precedes 1 101:precedes 2 -101:precedes 3 101:precedes 4 -102:meets 1 102:meets 2 -102:meets 2 102:meets 3 -103:overlaps 1 103:overlaps 3 -103:overlaps 2 103:overlaps 4 -1

19

select test_case, dte "Start", lead(dte,1,dte) over( partition by test_case order by dte, col desc ) "End", sum(col) over( partition by test_case order by dte, col desc ) "Rows" from t unpivot (dte for col in ( start_date as 1, end_date as -1))

select * from ( select test_case, dte "Start", lead(dte,1,dte) over( partition by test_case order by dte, col desc ) "End", sum(col) over( partition by test_case order by dte, col desc ) "Rows" from t unpivot (dte for col in ( start_date as 1, end_date as -1))) where "Start" < "End";

Intersect: finding OverlapsTest case Start End Rows01:precedes 1 2 101:precedes 2 3 001:precedes 3 4 101:precedes 4 4 002:meets 1 2 102:meets 2 2 202:meets 2 3 102:meets 3 3 003:overlaps 1 2 103:overlaps 2 3 203:overlaps 3 4 103:overlaps 4 4 0

20

select * from ( select test_case, dte "Start", lead(dte,1,dte) over( partition by test_case order by dte, col desc ) "End", sum(col) over( partition by test_case order by dte, col desc ) "Rows" from t unpivot (dte for col in ( start_date as 1, end_date as -1))) where "Start" < "End";

select * from ( select test_case, dte "Start", lead(dte,1,dte) over( partition by test_case order by dte, col desc ) "End", sum(col) over( partition by test_case order by dte, col desc ) "Rows" from t unpivot (dte for col in ( start_date as 1, end_date as -1))) where "Rows" > 1and "Start" < "End";

Intersect: finding OverlapsTest case Start End Rows01:precedes 1 2 101:precedes 2 3 001:precedes 3 4 102:meets 1 2 102:meets 2 3 103:overlaps 1 2 103:overlaps 2 3 203:overlaps 3 4 1

Test case Start End Rows03:overlaps 2 3 204:finished by 2 3 205:contains 2 3 206:starts 1 2 207:equals 1 2 2

21

Test case Start End01:precedes 1 201:precedes 3 402:meets 1 202:meets 2 303:overlaps 1 303:overlaps 2 404:finished by 1 304:finished by 2 305:contains 1 405:contains 2 306:starts 1 206:starts 1 307:equals 1 207:equals 1 2

Packing RangesTest case Start End01:precedes 1 201:precedes 3 402:meets 1 303:overlaps 1 404:finished by 1 305:contains 1 406:starts 1 307:equals 1 2

Test case Start End01:precedes 1 201:precedes 302:meets 103:overlaps 104:finished by 105:contains 106:starts 107:equals 1

select * from t match_recognize( partition by test_case order by end_date, start_date measures min(start_date) start_date, last(end_date) end_date pattern(a* b) define a as end_date >= next(start_date));

select * from t match_recognize( partition by test_case order by end_date, start_date measures min(start_date) start_date, last(end_date) end_date pattern(a* b) define a as end_date >= next(start_date) or end_date is null);

22

JOIN: range to range> create table A(start_n, end_n) asselect level, level+1 from dualconnect by level <= 10000;

> create table B as select start_n+9995 start_n, end_n+9996 end_nfrom A;

> select * from Ajoin B on (A.start_n <= B.start_n and B.start_n < A.end_n) or (B.start_n <= A.start_n and A.start_n < B.end_n);

Elapsed: 00:00:13.332

Exadata?

All data in buffer cache

Elapsed: 00:00:13.332

InMemory?Elapsed: 00:00:09.842

23

JOIN: range to range------------------------------------------------------------------------------------------| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:17.82 | 90 || 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:17.82 | 90 || 2 | CONCATENATION | | 1 | | 10 |00:00:00.01 | 90 || 3 | MERGE JOIN | | 1 | 55 | 10 |00:00:00.01 | 45 || 4 | SORT JOIN | | 1 | 10000 | 10000 |00:00:00.01 | 24 || 5 | TABLE ACCESS FULL | T_NEW | 1 | 10000 | 10000 |00:00:00.01 | 24 ||* 6 | FILTER | | 10000 | | 10 |00:00:00.01 | 21 ||* 7 | SORT JOIN | | 10000 | 10000 | 55 |00:00:00.01 | 21 || 8 | TABLE ACCESS FULL| T_OLD | 1 | 10000 | 10000 |00:00:00.02 | 21 || 9 | MERGE JOIN | | 1 | 55 | 0 |00:00:17.80 | 45 || 10 | SORT JOIN | | 1 | 10000 | 10000 |00:00:00.02 | 24 || 11 | TABLE ACCESS FULL | T_NEW | 1 | 10000 | 10000 |00:00:00.01 | 24 ||* 12 | FILTER | | 10000 | | 0 |00:00:17.78 | 21 ||* 13 | SORT JOIN | | 10000 | 10000 | 99M|00:01:21.50 | 21 || 14 | TABLE ACCESS FULL| T_OLD | 1 | 10000 | 10000 |00:00:00.01 | 21 |------------------------------------------------------------------------------------------

24

Join, or Sort and Match?

A 1 4B is equal 1 4

B started by A 1 5B during A 2 3B finishes A 3 4

B overlapped by A 3 4 5B met by A 4 5

B preceded by A 5 6another A 5 7

?

✔✔

✔✔

25

Join, or Sort and Match?

A 1 4B is equal 1 4

B started by A 1 5B during A 2 3B finishes A 3 4

B overlapped by A 3 4 5B met by A 4 5

B preceded by A 5 6another A 5 7

?3

333

26

select A_start_n, A_end_n, B_start_n, B_end_n from ( select 'A' ttype, A.* from A union all select 'B' ttype, B.* from B) match_recognize ( order by start_n, end_n measures decode(f.ttype,'A',f.start_n, o.start_n) A_start_n, decode(f.ttype,'A',f.end_n, o.end_n) A_end_n, decode(f.ttype,'B',f.start_n, o.start_n) B_start_n, decode(f.ttype,'B',f.end_n, o.end_n) B_end_n all rows per match after match skip to next row pattern ( {-f-} (o|{-x-})+ ) define o as ttype != f.ttype and start_n < f.end_n, x as start_n < f.end_n);

Elapsed: 00:00:00.063

{- exclusion -}( grouping )+ at least oneAlternation A | B

✔✔

27

Child's

play

28

More!• Overlapping ranges with priority• Data warehouses with date ranges:– Trickle feed

• Impact on foreign keys• OLTP

• Take advantage of MATCH_RECOGNIZE ,


Recommended