+ All Categories
Home > Documents > OLAP Functions - Part2

OLAP Functions - Part2

Date post: 07-Apr-2015
Category:
Upload: bobby-big-balls
View: 64 times
Download: 3 times
Share this document with a friend
37
OLAP Functions Part 2 Patrice Bérubé Solution Architect Teradata Canada
Transcript
Page 1: OLAP Functions - Part2

OLAP Functions Part 2

Patrice Bérubé

Solution Architect

Teradata Canada

Page 2: OLAP Functions - Part2

2 pg.

OLAP Analytics - Agenda

• History & Recap

• RESET WHEN

• New OLAP

• OLAP in transforms

• Summary

Page 3: OLAP Functions - Part2

3 pg.

History - V2R3 and V2R6 OLAP

Page 4: OLAP Functions - Part2

4 pg.

History – V2R12

(V2R6)

(V2R6)

(V2R6)

(V2R6)

(V2R6)

(V2R6)

(V2R6)

(V2R6)

Page 5: OLAP Functions - Part2

5 pg.

History - Clauses

•PARTITION BY

•ORDER BY

•RESET WHEN

•ROWS

•PARTITION BY

•ORDER BY

•ROWS

•None

V2R13V2R6V2R3

Page 6: OLAP Functions - Part2

6 pg.

Traditional SQL requests vsOrdered Analytical FunctionsCalculation

Aggregation

Ordered Analytical Functions

Page 7: OLAP Functions - Part2

7 pg.

Ordered Analytical –Functions PermutationsFour Categories

Group WindowCumulative WindowMoving WindowRemaining Window

Aggregates

SUM ( ) OVERCOUNT ( ) OVERAVG ( ) OVERMIN ( ) OVERMAX ( ) OVER

x

Group Window Function

• Use of keywords: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

• Absence of keywords: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

Remaining Window Function

• Use of keywords: ROWS BETWEEN UNBOUNDED FOLLOWING

• Absence of keywords: UNBOUNDED PRECEDING

Moving Window Function

• Use of keywords: ROWS BETWEEN # PRECEDING AND # FOLLOWING

• Absence of keywords: UNBOUNDED

Cumulative Window Function

• Use of keywords: ROWS BETWEEN UNBOUNDED PRECEDING

• Absence of keywords: UNBOUNDED FOLLOWING

Page 8: OLAP Functions - Part2

8 pg.

OLAP Analytics - Agenda

• History & Recap

• RESET WHEN

• New OLAP

• OLAP in transforms

• Summary

Page 9: OLAP Functions - Part2

9 pg.

RESET WHEN - Rules

A RESET WHEN condition can contain the following:

• Ordered analytical functions that do not include the RESET WHEN clause• Scalar subqueries• Aggregate operators• DEFAULT functions

A RESET WHEN condition cannot contain the following:

• Ordered analytical functions that include the RESET WHEN clause• SELECT statement• LOB columns• UDT expressions, including UDFs that return a UDT value.

However, a RESET WHEN condition can include an expression that contains UDTs as long as that expression returns a result that has a predefined data type.

Page 10: OLAP Functions - Part2

10 pg.

RESET WHEN (1 of 7)

finds cumulative sales for all periods of increasing sales for each region

SUM(sales) OVER (PARTITION BY region

ORDER BY day_of_calendarRESET WHEN sales < SUM(sales)

OVER (PARTITION BY regionORDER BY day_of_calendarROWS BETWEEN 1 PRECEDING

AND 1 PRECEDING)ROWS UNBOUNDED PRECEDING)

Preceding row salesCurrent row sales

Page 11: OLAP Functions - Part2

11 pg.

RESET WHEN (2 of 7)

finds sequences of finds sequences of increasingincreasing balancesbalances

Reset whenever the current balance is less than or equal to the preceding balance

Page 12: OLAP Functions - Part2

12 pg.

RESET WHEN (3 of 7)

finds sequences of finds sequences of increasingincreasing balancesbalances

reset whenever the current balance is less than or equal to the preceding balance

SELECT account_key, month, balance,ROW_NUMBER()

over (PARTITION BY account_keyORDER BY monthRESET WHEN balance <= SUM(balance)

over (PARTITION BY account_keyORDER BY monthROWS BETWEEN 1 PRECEDING

AND 1 PRECEDING)) - 1 /* to get the count started at 0 */

as balance_increaseFROM accounts;

Preceding row balanceCurrent row balance

Page 13: OLAP Functions - Part2

13 pg.

RESET WHEN (4 of 7)

finds sequences of finds sequences of increasingincreasing balances by quarterbalances by quarter

Must roll up months to quarter first, then verify condition

Page 14: OLAP Functions - Part2

14 pg.

RESET WHEN (5 of 7)

finds sequences of finds sequences of increasingincreasing balances by quarterbalances by quarterreset whenever the current balance is less than or equal to the preceding balance

SELECT account_key, quarter, sum(balance),ROW_NUMBER()

over (PARTITION BY account_keyORDER BY quarterRESET WHEN sum(balance) <= SUM(sum(balance))

over (PARTITION BY account_keyORDER BY quarterROWS BETWEEN 1 PRECEDING

AND 1 PRECEDING)) - 1 /* to get the count started at 0 */as balance_increase

FROM accounts GROUP BY account_key, quarter;

Preceding row balanceCurrent row balance

Page 15: OLAP Functions - Part2

15 pg.

RESET WHEN (6 of 7)

finds sequences of consecutive balances finds sequences of consecutive balances belowbelow credit limitcredit limit

Accounts Data

Account_Limit Data

Results Data

Page 16: OLAP Functions - Part2

16 pg.

RESET WHEN (7 of 7)

finds sequences of consecutive balances finds sequences of consecutive balances belowbelow credit limitcredit limit

select a.account_No,a.balance,(select Credit_limit from Account_Limit L where l.account_No=a.account_No) Credit_Limit,ROW_NUMBER() OVER(partition by a.account_No

order by statement_dateRESET WHEN balance > Credit_Limit)

from accounts A;

Scalar select on Credit limit

Page 17: OLAP Functions - Part2

17 pg.

OLAP Analytics - Agenda

• History & Recap

• RESET WHEN

• New OLAP

• OLAP in transforms

• Summary

Page 18: OLAP Functions - Part2

18 pg.

List includes New OLAP

Page 19: OLAP Functions - Part2

19 pg.

Statistical OLAP #1 (1 of 2)

What is the average Plan Price without discount compare to normal average ?

Sales Data

Discount DataPlan Data

Page 20: OLAP Functions - Part2

20 pg.

Statistical OLAP #1

Select

avg(Discount_Amount) OVER (partition by extract(year from Trans_Date)),

avg(Plan_Price) OVER (partition by extract(year from Trans_Date)),

regr_avgx(Plan_Price,Discount_Amount) OVER (partition by extract(year from Trans_Date)),

regr_avgy(Plan_Price,Discount_Amount) OVER (partition by extract(year from Trans_Date)),

regr_count(Discount_Amount,Plan_Price) OVER (partition by extract(year from Trans_Date))

from Sales_Trans t

inner join Plan_Price p

on t.Plan_No = p.Plan_No

inner join Discount d

on t.Discount_No = d.Discount_No

Regressions: REGR_AVGX, REGR_AVGY REGR_COUNT

Page 21: OLAP Functions - Part2

21 pg.

Statistical OLAP #2 (1 of 2)

• How to identify the impact of a change?• In other words: Is there a relation between number of sales rep and revenues?

• About Correlation• Correlation is a measure of association between two variables.

• The value of a correlation coefficient can vary from minus one to plus one

• a negative correlation between two variables• As the value of one variable increases, the value of the other variable decreases, and vice

versa.

• In other words, for a negative correlation, the variables work opposite each other.

• Zero means there is no relationship between the two variables.

• Positive correlation between two variables.• As the value of one variable increases, the value of the other variable also increases.

• The variables move together.

Page 22: OLAP Functions - Part2

22 pg.

Statistical OLAP #2 (1 of 2)

In which market shall new sales rep be added?

Sales Data Correlation

select Region_no, CORR(Sales_Rep_No,Nb_Sales)

OVER(partition by Region_no)from Sales_rep

Page 23: OLAP Functions - Part2

23 pg.

Statistical OLAP #3 (1 of 3)

• How to derive basic predictions based on existing data?• In other words: Can I leverage past data to predict future results?

• Regression• Simple regression is used to examine the relationship between one dependent and one

independent variable.

• The regression statistics can be used to predict the dependent variable when the independent variable is known.

• Regression goes beyond correlation by adding prediction capabilities

• Regression analysis usually required 3 OLAPs

• Y = intercept + (slope * X) � Y = REGR_INTERCEPT(Y,X) + (REGR_SLOPE(Y,X))

• REGR_R2 provides the % of values explained by the formula.

Adv_Actual Data Adv_Budget Data

Page 24: OLAP Functions - Part2

24 pg.

Statistical OLAP #3 (2 of 3)

Can I leverage past data to predict future results?

Results Data

Regression results

Sales Forecast

0

10000

20000

30000

40000

50000

Sep-

09

Oct-09 Nov-09 Dec-

09

Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-10 Jul-10 Aug-10

Adv_Budget Sales

Actual Predicted

Page 25: OLAP Functions - Part2

25 pg.

Statistical OLAP #3 (1 of 2)

Can I leverage past data to predict future results?

select Media,Month_Start_Date,Adv_Budget,Sales, 'Real' from Adv_Actualunion allselect f.Media, f.Month_Start_Date, f.Adv_Budget,

Intercept + (slope * f.Adv_Budget) predicted, 'Esti'from Adv_Budget f inner join

(select Media,Month_Start_Date,Regr_Intercept(sales, Adv_Budget)

OVER(partition by Media order by Month_Start_Date) Intercept,

Regr_Slope(sales, Adv_Budget) OVER(partition by Media order by Month_Start_Date) Slope

from Adv_Actual) as h(Media,Month_Start_Date,Intercept,Slope)on f.Media = h.Media and f.Month_Start_Date = add_months(h.Month_Start_Date,6)

order by 1,2;

Page 26: OLAP Functions - Part2

26 pg.

Statistical OLAP #4 (1 of 3)

• How to identify a strong change?• In other words: Who spent with significant variance

over their prior 12mo history?

• Statistical Process Control can identify outliers• Standard Deviation is a measure of variance

• Std Dev (sigma σ) captures # of observations

• within +/- 1 sigma = 68.2%

• within +/- 2 sigma = 95.45%

• within +/- 3 sigma = 99.7%

• within +/- 4 sigma = 99.99%

• within +/- 5 sigma = 99.9999%

• within +/- 6 sigma = 99.9999998%

• +/- 5 sigma are outliers by definition (very high)

• Upper Control Limit & Lower Control Limit can be manually-set or use sigma

Page 27: OLAP Functions - Part2

27 pg.

Statistical OLAP #4(2 of 3)

• Who spent an alarming high amount that should be proactively investigated?

Client A - GREEN

• Very consistent spending,

no significant variance

observed

Client B - RED

• This client had a history of

$61 monthly spending

• Nov 2008 spent $27,849 ,

clearly exceeding the UCL

• SPC identifies the outlier

$-

$20.00

$40.00

$60.00

$80.00

$100.00

$120.00

$140.00

Apr-08 May-08 Jun-08 Jul-08 Aug-08 Sep-08 Oct-08 Nov-08 Dec-08 Jan-09 Feb-09 Mar-09 Apr-09 May-09

tot_chrg_amt UCL LCL

tot_chrg_amt $71.03 $68.58 $89.12 $65.81 $65.11 $64.75 $62.48 $65.98 $65.89 $64.45 $64.81 $65.58 $64.50 $63.76

UCL $- $71.03 $75.93 $122.04 $119.28 $116.17 $113.28 $111.51 $108.79 $106.51 $104.72 $103.07 $101.54 $100.90

LCL $- $71.03 $63.68 $30.44 $27.99 $27.69 $28.19 $27.60 $29.42 $30.99 $31.92 $32.93 $34.06 $33.61

4/20/2008 5/20/2008 6/20/2008 7/20/2008 8/20/2008 9/20/200810/20/200

811/20/2008

12/20/200

81/20/2009 2/20/2009 3/20/2009 4/20/2009 5/20/2009

$(60,000.00)

$(40,000.00)

$(20,000.00)

$-

$20,000.00

$40,000.00

$60,000.00

Apr-08 May-08 Jun-08 Jul-08 Aug-08 Sep-08 Oct-08 Nov-08 Dec-08 Jan-09 Feb-09 Mar-09 Apr-09 May-09

tot_chrg_amt UCL LCL

tot_chrg_amt $61.15 $61.26 $61.06 $61.06 $61.06 $61.06 $61.06 $27,849.99 $61.06 $61.06 $61.06 $61.06 $61.06 $61.40

UCL $- $61.15 $61.48 $61.57 $61.54 $61.51 $61.49 $61.46 $49,486.27 $46,814.78 $44,523.33 $42,531.09 $40,779.06 $40,779.07

LCL $- $61.15 $60.93 $60.75 $60.72 $60.72 $60.73 $60.74 $(42,416.84 $(40,517.28) $(38,843.3 $(37,356.38 $(36,025.41 $(36,025.43

4/11/2008 5/11/2008 6/11/2008 7/11/2008 8/11/2008 9/11/2008 10/11/2008 11/11/2008 12/11/2008 1/11/2009 2/11/2009 3/11/2009 4/11/2009 5/11/2009

Page 28: OLAP Functions - Part2

28 pg.

Statistical OLAP #4 (3 of 3)

select

bss.accs_id

,bss.bl_dt

,bss.tot_chrg_amt

,AVG(tot_chrg_amt) OVER

(PARTITION BY bss.accs_id

order by bss.bl_dt

ROWS BETWEEN 12 PRECEDING AND 1 PRECEDING

) as AVG_12MO_TOT_CHRG_AMT

,count(*) OVER

(PARTITION BY bss.accs_id

order by bss.bl_dt

ROWS unbounded PRECEDING

) as BILL_CNT

,STDDEV_POP(tot_chrg_amt) OVER

(PARTITION BY bss.accs_id

order by bss.bl_dt

ROWS BETWEEN 12 PRECEDING AND 1 PRECEDING

) as STDDEV_12MO_TOT_CHRG_AMT

,(STDDEV_12MO_TOT_CHRG_AMT * 5) as SIGMA

,(AVG_12MO_TOT_CHRG_AMT + SIGMA) AS UCL

,(AVG_12MO_TOT_CHRG_AMT – SIGMA) AS LCL

,case

when BILL_CNT > 4 and ( bss.tot_chrg_amt > UCL or bss.tot_chrg_amt < LCL )

then 'Y'

else ''

end as INV_PROB_IND

from bl_stmnt_sys bss

where INV_PROB_IND = 'Y' and bss.bl_confirm_ind = 'Y'

order by accs_id, bl_dt

• Who spent an alarming high amount?

• Standard Deviation OLAP

easily calcs a rolling 12mo

statistic

• Ignores the first 4

observations to allow system

to “calibrate”

Page 29: OLAP Functions - Part2

29 pg.

OLAP Analytics - Agenda

• History & Recap

• RESET WHEN

• New OLAP

• OLAP in transforms

• Summary

Page 30: OLAP Functions - Part2

30 pg.

Transform #1 (1 of 2)

Prorate Weekly units sold over working days.

Allocate uneven quantity on last working day

Weekly Data Prorated Daily Data

Page 31: OLAP Functions - Part2

31 pg.

Transform #1 (2 of 2)

case when bd.calendar_dt = bd.last_day

then o.unit_wk -

(cast((o.unit_wk/nb_days) as integer)

*(nb_days-1))

else cast((o.unit_wk/nb_days) as integer)

end

as NRC_QTY_,

……………………………………………………………………………………………………

(select TR.SRC_CO_CD, TR.SO_REGION_CD, extract (year from B.calendar_dt)as yr, W.SMB_RPTG_WK,

B.calendar_dt,

cast(COUNT(B.calendar_dt) OVER(partition by TR.SRC_CO_CD,TR.SO_REGION_CD, YR,W.SMB_RPTG_WK

rows between unbounded preceding and unbounded following) AS INTEGER),

MAX(B.calendar_dt) OVER(partition by TR.SRC_CO_CD,TR.SO_REGION_CD, YR,W.SMB_RPTG_WK

rows between unbounded preceding and unbounded following)

from TSO_RGN TR, TSO_RGN_PROV TRP, TBUSDAY B, TSMB_RPTG_WK W

where B.bus_day_flg = 'Y'

AND TR.SRC_CO_CD = TRP.SRC_CO_CD AND TR.SO_REGION_CD = TRP.SO_REGION_CD

AND TRP.PROV_STATE_CD = B.PROV_STATE_CD AND B.CALENDAR_DT = W.CALENDAR_DT

group by 1,2,3,4,5)

as BD(SRC_CO_CD,SO_REGION_CD,THE_YR,SMB_RPTG_WK, calendar_dt,nb_days,last_day)

OLAP function provide the number and last working day

Working day

Last working day

Page 32: OLAP Functions - Part2

32 pg.

Transform #2 (1 of 3)

Use OLAP to identify when a client became FIRST TIME PAYING.

T

i

m

e

L

i

n

e

Client # 1 subscriptions

•Obtain a Guess Pass

•Guess Pass expired

----- client inactive ------

•Obtain a Free contract

•Free contract expired

----- client inactive ------

•Obtain a Guess Pass

•Guess Pass expired

----- client inactive ------

•Purchase a Paid contract

•Paid contract expired

----- client inactive ------

•Obtain a Guess Pass

Client # 2 subscriptions

•Purchase a Paid contract

•Paid contract expired

----- client inactive ------

•Obtain a Guess Pass

Page 33: OLAP Functions - Part2

33 pg.

Transform #2 (2 of 3)

FROM (Select Sub_Duration , User_Account_Id, SUBS_STATUS_CD, Revenue_Type_CD

, count (case when Revenue_Type_CD = 'paid'

and SUBS_STATUS_CD='active'

then 1

else null

end) OVER (PARTITION BY User_Account_Id ORDER BY From_Day ROWS UNBOUNDED PRECEDING)

as Nb_Paying_Contract

, count (case when Revenue_Type_CD in ('free','free w/packaged good','free promotion')

and SUB_STATUS_D='active'

then 1

else null

end) OVER (PARTITION BY User_Account_Id ORDER BY From_Day ROWS UNBOUNDED PRECEDING)

as Nb_Guess_Pass

FROM Client_subs_all_JA

WHERE THRU_Day >= FROM_DAY

group by 1,2,3,412

) subs

Use OLAP to derive a Paid Contract & Guess Passes count to date.

Page 34: OLAP Functions - Part2

34 pg.

Transform #2 (3 of 3)

--Monthly First Time Paying With Guess Pass - A subscriber who on this day is starting a paying monthly

--subscription, had never had a paying subscription in the past, but had a guess pass

, count(CASE WHEN subs.SUBS_STATUS_CD = 'active'

and subs.Revenue_Type_CD = 'paid'

and Nb_Paying_Contract = 1

and Nb_Guess_Pass > 0

and subs.sub_duration = 'Monthly'

THEN 1

ELSE null

END

) as Monthly_First_Time_Paying_WGP

--Monthly First Time Paying Without Guess Pass - A subscriber who on this day is starting a paying monthly

--subscription, had never had a paying subscription in the past, nor had a guess pass

, count(CASE WHEN subs.SUBS_STATUS_CD = 'active'

and subs.Revenue_Type_CD = 'paid'

and Nb_Paying_Contract = 1

and Nb_Guess_Pass = 0

and subs.sub_duration = 'Monthly'

THEN 1

ELSE null

END

) as Monthly_First_Time_Paying

Count subscribers when became FIRST TIME PAYING.

Page 35: OLAP Functions - Part2

35 pg.

OLAP Analytics - Agenda

• History & Recap

• RESET WHEN

• New OLAP

• OLAP in transforms

• Summary

Page 36: OLAP Functions - Part2

36 pg.

Summary

• New RESET WHEN clause extend OLAPusability

• Increased compatibility with other DB

• Simplify coding

• New statistical OLAP extend SQL possibilities

• Open new possibilities

• Simplify coding

• Transformation

• Enable single data pass

• Coding closer to transformation rules

Page 37: OLAP Functions - Part2

37 pg.

Thanks and Questions

• Questions?

[email protected]

Thanks all of you !


Recommended