Oracle8i Analytical SQL Features
04/22/23Willie Albino2
Analytical SQL Features Overview
Available in Oracle 8.1.6 and above
Analytical Features/Enhancements– GROUP BY Extensions
SQL GROUP BY clause has been augmented to make querying and reporting easier
– Analytical SQL Functions Analytical functions enabling rankings, moving window calculations,
lead/lag analysis – CASE Expressions
Increased/Efficient ‘if-then-else’ capabilities provided
Oracle8i Analytical SQL Features
GROUP BY Extensions
04/22/23Willie Albino4
GROUP BY Extensions
ROLLUP– Calculates subtotals at increasing levels of
aggregation from the most detail to a grand total– used to generate simple cross tabular reports
CUBE– Calculates all possible combinations of subtotals– used to generate full cross tabular reports
04/22/23Willie Albino5
GROUP BY Extensions Syntax
ROLLUP: SELECT <column list> FROM <table…> GROUP BY ROLLUP(column_list);
CUBE: SELECT <column list> FROM <table…> GROUP BY CUBE(column_list);
04/22/23Willie Albino6
Standard GROUP BY Example
select region, product, SUM(amount) from region, product, product_sales where region_id = reg_id and region = 'East' and product_id = prod_id group by region,product
REGION PRODUCT SUM(AMOUNT)------------ ------------ -----------East Hats 75East Jackets 100East Pants 100East Shirts 75East Shoes 130East Suits 90East Sweaters 75East T-Shirts 20East Ties 60
04/22/23Willie Albino7
GROUP BY with ROLLUP
select region, product, SUM(amount) from region, product, product_sales where region_id = reg_id and region = 'East' and product_id = prod_id group by ROLLUP(region,product)
REGION PRODUCT SUM(AMOUNT)------------ ------------ -----------East Hats 75East Jackets 100East Pants 100East Shirts 75East Shoes 130East Suits 90East Sweaters 75East T-Shirts 20East Ties 60East - 725- - 725
04/22/23Willie Albino8
GROUP BY with CUBE
select region, product, SUM(amount) from region, product, product_sales where region_id = reg_id and region = 'East' and product_id = prod_id group by CUBE(region,product)
REGION PRODUCT SUM(AMOUNT)------------ ------------ ----------- : : :East Ties 60East - 725- Hats 75- Jackets 100- Pants 100- Shirts 75- Shoes 130- Suits 90- Sweaters 75- T-Shirts 20- Ties 60- - 725
04/22/23Willie Albino9
Comments about ROLLUP/CUBE
ROLLUP: creates subtotals at n+1 levels where n equals the number of grouping columns
CUBE: creates 2n combinations of subtotals where n equals the number of grouping columns
Sub-total generation more efficient than equivalent SQL code (a 4 column CUBE grouping, has a 93.75% reduction in table access, ROLLUP has 80%)
04/22/23Willie Albino10
Comments about ROLLUP/CUBE
Partial rollups/cubes can be specified – GROUP BY exp1, CUBE(exp2, exp3, ....)
ROLLUP/CUBE can be used with all aggregating functions (MAX, MIN, AVG, etc.)
HAVING clause applies to all the data returned
NULLs are generated for dimensions at subtotal levels
04/22/23Willie Albino11
GROUPING() Function
Used to distinguish between NULLs in data and NULLs generated by ROLLUP/CUBE extensions
GROUPING() return values: – 1 for extension-generated NULLs, 0 for NULL data values– Can be passed to DECODE for custom interpretation
SYNTAX:– SELECT .. GROUPING(column name) .. GROUP BY ..– SELECT .. DECODE(GROUPING(col), 1, ‘Sub’, col)) …
04/22/23Willie Albino12
GROUPING() Function Example
select DECODE(GROUPING(region), 1, 'All Regions', 0, region) region, DECODE(GROUPING(product), 1, 'All Products', 0, product) product, SUM(amount)from region, product, product_saleswhere region_id = reg_id and region = 'East' and product_id = prod_idgroup by CUBE(region,product)
REGION PRODUCT SUM(AMOUNT)------------ ------------ ----------- : : :East Ties 60East All Products 725All Regions Hats 75All Regions Jackets 100All Regions Pants 100All Regions Shirts 75All Regions Shoes 130All Regions Suits 90All Regions Sweaters 75All Regions T-Shirts 20All Regions Ties 60All Regions All Products 725
Oracle8i Analytical SQL Features
Analytical SQLFunctions
04/22/23Willie Albino14
Analytical SQL Functions
Analytical Function Categories:– Ranking Functions– Windowing Functions– Reporting Functions– Lag/Lead Functions– Statistics Functions
Functions are applied after all joins, WHERE, GROUP BY and HAVING clauses are performed, but before the ORDER BY clause is applied
04/22/23Willie Albino15
Basic Analytical Function Syntax
Syntax:<function_name>() OVER ( [PARTITION BY <exp1> [, …]] ORDER BY <exp2> [ASC|DESC] [NULLS FIRST|NULLS LAST])
Example:SELECT RANK(amount) OVER (PARTITION BY region ORDER BY amount) FROM REG_SALES;
04/22/23Willie Albino16
Function Syntax Comments
PARTITION BY <exp> [, …] - this clause divides the query result into groups within which the analytical function operates
– If the PARTITION BY clause is missing, the function operates over the entire dataset
– <exp> can be any valid expression involving column references
04/22/23Willie Albino17
Function Syntax Comments
ORDER BY <exp>[ASC|DESC] [NULLS FIRST|NULLS LAST]
– specifies how the data is ordered within a group (partition)– ASC|DESC specifies the sorting order for the grouping. The
default sorting order is ASC.– The presence of ORDER BY affects the outcome of analytical
functions With ORDER BY, the set of rows used is the current row and all
preceding rows in the partition (a growing window) Without ORDER BY, all the rows in the partition will be used The ORDER BY clause can be used to resolve ties between
repeated values in a set.
04/22/23Willie Albino18
Function Syntax Comments
ORDER BY <exp>[ASC|DESC] [NULLS FIRST|NULLS LAST]
– The NULLS FIRST|NULLS LAST clause determines the position of NULLs in the ordered sequence. If omitted, the position depends on the ASC, DESC arguments.
NULLs are considered to be larger than any other values.
– It is not guaranteed that the data will be sorted on the measures. Use the ORDER BY clause to specify the ordering sequence.
04/22/23Willie Albino19
Ranking Functions
Computes the rank of a record with respect to other records in the dataset based on the values of a set of measures
Ranking Functions:– RANK() and DENSE_RANK()– CUME_DIST() and PERCENT_RANK()– NTILE()– ROW_NUMBER()
04/22/23Willie Albino20
RANK() and DENSE_RANK() Functions
The RANK() and DENSE_RANK() functions allow you to rank items in a dataset or sub-group.
The RANK() function leaves gaps in the ranking sequence when there are ties in the rankings.
The DENSE_RANK() function does not leave gaps in the ranking sequence when there are ties in the rankings.
04/22/23Willie Albino21
Example of RANK()/DENSE_RANK()
select amount, RANK() OVER (ORDER BY amount) AS rank_asc, DENSE_RANK() OVER (ORDER BY amount) AS dense_rank from product_sales, product, region where prod_id=product_id and region_id=reg_id and region='East' order by amount;
AMOUNT RANK_ASC DENSE_RANK------------ ------------ ----------- 20 1 1 60 2 2 75 3 3 75 3 3 75 3 3 90 6 4 100 7 5 100 7 5 130 9 6
NOTES:- Ranking value will repeat (leaving gaps) when the same data values are found in the dataset- The order or the rows with repeated values is non-deterministic- DENSE_RANK() does not leave gaps in the rank values for repeated data values (RANK() does)- The largest rank value produced by DENSE_RANK equals the number of distinct values in the dataset
04/22/23Willie Albino22
Using RANK() For Top-N Values List
select * from ( select region, product,SUM(amount) amt, SUM(profit) profit, RANK() OVER (PARTITION BY region ORDER BY SUM(amount) DESC) AS rank from product_sales, product, region where prod_id=product_id and region_id=reg_id GROUP BY region, product )where rank_sum_amt < 4
NOTES:- Using RANK() or DENSE_RANK(), you can get the top N ranks within a dataset- DENSE_RANK() could yield different results than RANK() depending on repeated values within the dataset.
(The function ROW_NUMBER() would provide more accurate results.)- The bottom N rankings can be generated by changing the ordering sequence within the rank expression
(e.g., ORDER BY SUM(amount) ASC))
Region Product Amt Profit Rank------ ------- ---- ------ ----Central Sweaters 242 92 1Central Shirts 213 87 2Central Pants 123 57 3East Shoes 130 30 1East Jackets 100 28 2East Pants 100 24 2West Shoes 100 13 1West Jackets 99 17 2West T-Shirts 89 23 3
04/22/23Willie Albino23
CUME_DIST() and PERCENT_RANK()
CUME_DIST() computes the position of a specified value relative to the set of values (also known as inverse of percentile in statistics books)
– CD = (# values different from or equal to x)/(total # of values)– Return values are between 0 and 1
PERCENT_RANK() returns the percent rank of a value relative to a group of values
– PR = (rank of row in partition - 1)/(# of rows in the partition - 1)
04/22/23Willie Albino24
Example of CUME_DIST() & PERCENT_RANK()
select region, product, SUM(amount),
CUME_DIST() OVER (PARTITION BY region ORDER BY SUM(amount) ASC) AS cume_dist,
PERCENT_RANK() OVER (PARTITION BY region ORDER BY SUM(amount) ASC) AS pct_rnk
from product_sales, product, regionwhere prod_id=product_id and region_id=reg_idGROUP BY region, product
Region Product Amt Cume_Dist Pct_Rnk
Central Belts 85 0.1429 0Central Suits 98 0.2857
0.1667Central Ties 104 0.4286
0.3333Central Hats 111 0.5714 0.5Central Pants 123 0.7143
0.6667Central Shirts 213 0.8571
0.8333Central Sweaters 242 1 1
East T-Shirts 20 0.1111 0
East Ties 60 0.2222 0.125
East Hats 75 0.5556 0.25
East Shirts 75 0.5556 0.25
East Sweaters 75 0.5556 0.25
East Suits 90 0.6667 0.625
East Jackets 100 0.8889 0.75
East Pants 100 0.8889 0.75
East Shoes 130 1 1 : : : : : : : : : :
04/22/23Willie Albino25
NTILE(n) and ROW_NUMBER()
NTILE(n)– Divides dataset into a specified number of buckets– Takes the number of buckets as an argument
ROW_NUMBER()– assigns a unique number to each row within a partition– row numbers start with 1 and increase sequentially within each
partition– better than RANK() or DENSE_RANK() for top-N queries
Rows with rankings that are ties will not necessarily be assigned to the same bucket (if they span buckets) or the same row number in subsequent runs of the query using the same dataset
04/22/23Willie Albino26
Example of NTILE() & ROW_NUMBER()
select region, product, SUM(amount),
NTILE(3) OVER (PARTITION BY region ORDER BY SUM(amount) ASC) AS bucket,
ROW_NUMBER() OVER (PARTITION BY region ORDER BY SUM(amount) ASC) AS row
from product_sales, product, regionwhere prod_id=product_id and region_id=reg_idGROUP BY region, product
Region Product Amt Bucket Row
----------- ------------- ------- ------------ -----Central Belts 85 1
1Central Suits 98 1
2Central Ties 104 1
3Central Hats 111 2
4Central Pants 123 2
5Central Shirts 213 3
6Central Sweaters 242 3
7East T-Shirts 20 1
1East Ties 60 1
2East Hats 75 1
3East Shirts 75 2
4East Sweaters 75 2
5East Suits 90 2
6East Jackets 100 3
7East Pants 100 3
8East Shoes 130 3
9
Oracle8i Analytical SQL Features
Windowing Functions
04/22/23Willie Albino28
Windowing Functions
Used to compute cumulative, moving or centered aggregates
Returns a value for each row in a dataset which depends on other rows in the corresponding window
Windowing functions include moving sum, moving average, moving min/max, cumulative sum and statistical functions, first and last value in window
04/22/23Willie Albino29
Windowing Function Syntax
{SUM|AVG|MAX|MIN|COUNT|FIRST_VALUE|LAST_VALUE} OVER ( PARTITION BY <exp1> [, …] ORDER BY <exp2> [ASC|DESC] [NULLS FIRST|NULLS LAST] ROWS | RANGE { UNBOUNDED PRECEDING | <exp3> PRECEDING} | BETWEEN {UNBOUNDED PRECEDING | <exp4> PRECEDING} AND {CURRENT ROW | <exp5> FOLLOWING} } )
04/22/23Willie Albino30
Windowing Function Syntax Comments
<exp>– Must be a constant or an expression which
evaluates to a positive value– If ROWS was specified, it’s a physical offset which
represents number of rows in the window– If RANGE was specified, it’s a logical offset (value
or interval literal) An interval literal is specified as follows: RANGE INTERVAL n DAYS|MONTHS|YEARS RANGE x PRECEDING|FOLLOWING
04/22/23Willie Albino31
Windowing Function Syntax Comments
ROWS | RANGE– ROWS specifies the window in physical units – RANGE specifies the window as a logical offset
BETWEEN … AND …– Specifies the start and end point of the window– If BETWEEN is omitted and an end point is
specified, that point will be considered the start point and the current row will be used as the end point
04/22/23Willie Albino32
Windowing Function Syntax Comments
UNBOUNDED PRECEDING– Specifies that the window starts at the first row of
the partition, (or the start of the dataset, if the PARTITION BY clause is omitted)
UNBOUNDED FOLLOWING– Specifies that the window ends at the last row of the
partition, (or the last row of the dataset, if the PARTITION BY clause is omitted)
04/22/23Willie Albino33
Windowing Function Syntax Comments
CURRENT_ROW– As a start point:
If ROWS was specified, makes the current row the start of the window.
If RANGE was specified, then the current value is the start of the window
– As an end point: If ROWS was specified, makes the current row the end of
the window. If RANGE was specified, then the current value is the end
of the window
04/22/23Willie Albino34
Windowing Function Syntax Comments
<exp> FOLLOWING– If this is the start point, then the end point must be <exp>
FOLLOWING or UNBOUNDED FOLLOWING
<exp> PRECEDING– If this is the end point, then the start point must be <exp>
PRECEDING or UNBOUNDED PRECEDING
This applies whether ROWS or RANGE was specified
04/22/23Willie Albino35
Example of a Partition (Sub-Grouping) Based Moving Window
QUERY: Calculate a running total of the amount of sales by region
select region, amount, SUM(amount) OVER (PARTITION BY region ORDER BY amount ROWS UNBOUNDED PRECEDING) as mov_amt_sum from product_sales,
product, region where prod_id=product_id and region_id=reg_id
REGION AMOUNT MOV_AMT_SUM------------ ------------ -----------Central 85 85Central 98 183Central 104 287Central 111 398Central 123 521Central 213 734Central 242 976East 20 20East 60 80East 75 155East 75 230East 75 305East 90 395East 100 495East 100 595East 130 725
04/22/23Willie Albino36
Example of Date Based Moving Window Summaries
select cust_id, trans_dt, amt, sum(amt) over (partition by cust_id
order by trans_dt range interval '1' month
preceding) sum_1_mnth, sum(amt) over (partition by cust_id
order by trans_dt range between interval '1' month
preceding and interval '1' month following)
sum_2_mnth, sum(amt) over (partition by cust_id
order by trans_dt range between interval '7' DAY
preceding and interval '7' DAY following)
sum_wkfrom cust_daily_summaryorder by cust_id, trans_dt
cust_id Trans_dt Amt Sum_1_mth Sum_2_mth Sum_wk------- ------- ------ ------ --------- ------1643060 3/15/00 -3.08 -3.08 118.78 -3.081643060 4/15/00 121.86 118.78 103.27 121.861643060 5/11/00 -6.80 115.06 101.85 -15.511643060 5/12/00 0.25 115.31 100.88 -15.511643060 5/13/00 -0.85 114.46 96.63 -15.511643060 5/14/00 -1.68 112.78 92.06 -15.511643060 5/15/00 -6.43 106.35 88.95 -15.511643060 6/11/00 -4.50 -20.01 -36.91 -17.401643060 6/12/00 -0.97 -14.18 -32.51 -17.401643060 6/13/00 -4.25 -18.68 -35.76 -17.401643060 6/14/00 -4.57 -22.40 -35.42 -17.401643060 6/15/00 -3.11 -23.83 -34.63 -17.401643060 7/11/00 -4.00 -21.40 -28.95 -10.801643060 7/12/00 -2.40 -19.30 -27.57 -10.801643060 7/13/00 -3.00 -21.33 -27.73 -10.801643060 7/14/00 -0.51 -17.59 -24.83 -10.801643060 7/15/00 -0.89 -13.91 -23.82 -10.80
Oracle8i Analytical SQL Features
Reporting Functions
04/22/23Willie Albino38
Reporting Functions
Allow for the calculation of aggregate values within a data partition
Return the same aggregate value for every row in a partition
Syntax: – {SUM | AVG | MAX | MIN | COUNT | STDDEV | VARIANCE} ([ALL | DISTINCT] {<value expression1> | *}) OVER ([PARTITION BY <value expression2>[,...]])
04/22/23Willie Albino39
Example of Using Reporting Functions
Query: Find the region where each product was best seller
SELECT product, region, sum_amtFROM (SELECT product, region, SUM(amount) AS sum_amt, MAX(SUM(amount)) OVER (PARTITION BY product) AS max_sum_amt FROM product_sales, region, product WHERE region_id=reg_id AND product_id=prod_id GROUP BY product, region)WHERE sum_amt = max_sum_amt
REGION REGION SUM_AMT------------ ------------ ---------BELTS Central 85HATS Central 111JACKETS East 100JEANS West 50PANTS Central 123SHIRTS Central 213SHOES East 130SOCKS West 78SUITS Central 98SWEATERS Central 242T-SHIRTS West 89TIES Central 104
04/22/23Willie Albino40
New Reporting Functions
RATIO_TO_REPORT(exp)– Computes the ratio of a value to the sum of a set of values
LEAD() and LAG()– Useful for comparing values in different time period– Allows access to more than one row in a table without a self-join– LAG() provides access to a prior row (at a given offset)– LEAD() provides access to a row after the current position– These functions are position, not value based
04/22/23Willie Albino41
New Reporting Function Syntax
Syntax– RATIO_TO_REPORT(<exp1>) OVER ([PARTITION BY <exp2> [,…]])– LEAD | LAG (<exp1> [,<offset> [, <default>]]) OVER ([PARTITION BY <exp2> [,…]]) ORDER BY <exp3> [ASC|DESC] [NULLS FIRST | NULLS LAST] [,…])
<offset> is optional and defaults to 1 <default> is optional and is the value returned if the <offset> falls outside the bounds of the dataset
04/22/23Willie Albino42
Example of RATIO_TO_REPORT()
Query: Find ratio of total sales per product to total sales
SELECT product, SUM(amount) AS sum_amt, SUM(SUM(amount)) OVER() AS total_amt, RATIO_TO_REPORT(SUM(amount)) OVER () AS ratio FROM product_sales, product, regionWHERE prod_id = product_id AND region='East'GROUP BY product
PRODUCT SUM_AMT TOTAL_AMT RATIO-------- ------- ---------- -------Belts 150 2443 .061Hats 186 2443 .076Jackets 199 2443 .081Jeans 50 2443 .020Pants 268 2443 .110Shirts 363 2443 .149Shoes 230 2443 .094Socks 78 2443 .032Suits 188 2443 .077Sweaters 392 2443 .160T-Shirts 109 2443 .045Ties 230 2443 .094
04/22/23Willie Albino43
Example of LAG Function
Query: Compare cust’s presentamount to the amount 2 days ago
SELECT cust_id, acct_date, sum(amt_usd) amount, LAG(SUM(amt_usd),2,-999) OVER (PARTITION BY cust_id ORDER BY acct_trans_date ) AS old_amt FROM acctWHERE acct_date > '01-NOV-00'GROUP BY cust_id, acct_date
cust_ID ACCT_DATE AMOUNT OLD_AMT
-------- ---------- ------- -------1643060 11/11/2000 147.47 -9991643060 11/12/2000 -5.5 -9991643060 11/13/2000 -2.27 147.471643060 11/14/2000 -11.72 -5.51643060 11/15/2000 -4.25 -2.271643060 12/11/2000 -7.15 -11.721643060 12/12/2000 -2.25 -4.251643060 12/13/2000 -2.05 -7.151643060 12/14/2000 -15.13 -2.251643060 12/15/2000 -0.71 -2.051659880 11/11/2000 169.17 -9991659880 11/12/2000 -14.25 -9991659880 11/13/2000 -12.5 169.17
Oracle8i Analytical SQL Features
CASE Expressions
04/22/23Willie Albino45
CASE Expressions
Used for bucketing data– allows for differently sized buckets
Very similar to DECODE statement– Provides more flexibility and logical power– Offers better performance and is easier to read
Syntax:– CASE WHEN <cond1> THEN <v1> WHEN <cond2> THEN <v2> … [ ELSE <vn> ] END
04/22/23Willie Albino46
Example #1 of Using CASE Expressions
SELECT SUM(CASE WHEN SUM(amount) BETWEEN 0 AND 50 THEN 1 ELSE 0 END) AS "0-50", SUM(CASE WHEN SUM(amount) BETWEEN 51 AND 150 THEN 1 ELSE 0 END) AS "51-150", SUM(CASE WHEN SUM(amount) BETWEEN 151
AND 250 THEN 1 ELSE 0 END) AS "151-250", SUM(CASE WHEN SUM(amount) > 251 THEN 1 ELSE 0 END) "251+" FROM product_sales, product, regionWHERE prod_id = product_id AND region='East'GROUP BY product
0-50 51-150 151-250 251+---- ------ ------- ---- 1 3 5 5
04/22/23Willie Albino47
Example #2 of Using CASE Expressions
SELECT CASE WHEN amount BETWEEN 0 AND 50 THEN ' 0-50' WHEN amount BETWEEN 51 AND 150 THEN ' 51-150' WHEN amount BETWEEN 151 AND 250 THEN '151-250' WHEN amount > 250 THEN '251+' END bucket, COUNT(*) cnt, SUM(amount) amt FROM product_sales, product, regionWHERE prod_id = product_id AND region='East' GROUP BY CASE WHEN amount BETWEEN 0 AND 50 THEN ’ 0-50' WHEN amount BETWEEN 51 AND 150 THEN ’ 51-150' WHEN amount BETWEEN 151 AND 250 THEN '151-250' WHEN amount > 250 THEN '251+' END ORDER BY bucket
BUCKET CNT AMT------ --- ----0-50 3 11551-150 21 1873151-250 2 455251+ 3 176
04/22/23Willie Albino48
Summary
New analytical functionality in Oracle 8.1.6 (+)– Makes it easier to code certain types of SQL– Allows for more efficient SQL code when compared to the
equivalent pure SQL implementation
Enhancements include– SQL GROUP BY clause has been augmented to make querying
and reporting easier – Analytical functions enabling rankings, moving window calculations,
lead/lag analysis – Better ‘if-then-else’ capabilities provided through CASE