information engineering Associates
DAMA-CHICAGO, JUNE 15,2016Predictive Analytics:A Statistical Primer for Data Modelers
Presenter:Bob ConwayInformation Engineering [email protected]
Copyright 2014© Information Engineering Associates. All Rights Reserved. 1
Information Engineering Associates Predictive Analytics: Agenda
• What – Contrast to Descriptive Analytics • Why – Value Proposition for Predictive Analytics • How – Statistical basis of Predictive Analytics • Getting Started with Predictive Analytics • Q&A
Information Engineering Associates
‘Traditional’ DW/BI Architecture
ERP
CRM
RPTG INTG
A_ERP
A_CRM
ETL1
ETL1
ETL2 ETL3
… …
SystemsOfRecord DataWarehouse/BusinessIntelligence
• Source-centric• Non-transformed
• Normalized• Granular
• Denormalized• Mul7-dimensional• aggrega7on
Information Engineering Associates
Descriptive Analytics (OLAP)
COGNOS
Bus Obj
QlikView
ETL3
How effective was that promotional campaign?
What is the MTBF for part components? Supplier dependent?
Are customer transaction patterns coincident with fraud?
Descriptive Analytics – What has happened? Trends, patterns, exceptions in historic data Monitor/Control, business process improvement
… Reporting Layer
retail manufacturing
financial services
Information Engineering Associates Predictive Analytics
SAS
MatLab
Excel
Forecastfuturesales?àInventorylevelàLaborneeds?
OpAmalequipmentmaintenanceschedule?
Futurejetfuelpricesforhedgecontracts?
Predic'veAnaly'cs–Whatwillhappen?AdvancedstaAsAcsàmathemaAcalmodelsForecastfuturestate/behavior
ERP
CRM
A_E
A_C
INT RPT
… …
ext ETL P.A. Sandbox
granular denormalized (OBFT)
retail equip operation
airlines
Information Engineering Associates Predictive Analytics Lifecycle
Copyright 2014© Information Engineering Associates. All Rights Reserved. 6
Value
1Business
Understanding
2Data
PreparaAon
3Data
ExploraAon
4StaAsAcalModeling
5EvaluaAon
6Deployment
Information Engineering Associates Statistics 101
Mean(average),X=Σ(Xi)/N
Variance,Sx2=Σ(Xi–X)2/(N-1)=sx2
StandardDeviaAon,sx=√sx2
Information Engineering Associates Simple Linear Regression
Covariance,Sxy2=Σ(Xi–X)(Yi–Y)/(N-1)=sxy2Slope,b=sxy/sxIntercept,a=Y–bX
Y= a +bX
Best Fit: minimizes variance between predicted values and observed values
Information Engineering Associates
Simple Correlation
$-
$100.00
$200.00
$300.00
$400.00
$500.00
$600.00
$700.00
$800.00
0 200 400 600 800 1000
r=0.58
CorrelaAonCoefficient,r=sxy/sxsy -1<=r<=+1Correla'on=/=>CauseandEffect
0<=r2<=+1 r2==>fracAonofYvarianceaaributabletoX
$-
$100.00
$200.00
$300.00
$400.00
$500.00
$600.00
$700.00
$800.00
0 200 400 600 800 1000
r=0.8
r=0.0
Information Engineering Associates
Multiple Correlation & Regression
$-
$100.00
$200.00
$300.00
$400.00
$500.00
$600.00
$700.00
0 200 400 600 800 1000 $-
$100.00
$200.00
$300.00
$400.00
$500.00
$600.00
$700.00
$800.00
0 50 100
$- $100.00 $200.00 $300.00 $400.00 $500.00 $600.00 $700.00 $800.00
0 2 4 6 8
HH Income (x1000) Customer Age
MercedesSales=a+b(Income)+c(Age)+d(HHCars)R2=0.89
Cars/HH
r=0.8 r=0.3 r=0.6
Variable Sales Income Age Cars
Sales 1.0 0.8 0.6 0.3
Income 1.0 0.7 0.4
Age 1.0 0.3
Cars 1.0
Information Engineering Associates
Partial Correlation
MercedesSales=a’+b’(HHIncome)-c’(Age)
Variable Sales Income Age Cars
Sales 1.0 0.8 0.6 0.3
Income 1.0 0.7 0.4
Age 1.0 0.3
Cars 1.0
Variable Sales Income Age* Cars*
Sales 1.0 0.79 -0.48 0.09
Income 1.0
Age 1.0
Cars 1.0
OriginalCorrela'onMatrix Stepwise(Par'al)Correla'onMatrix
Information Engineering Associates
Nonlinear Regression
0
20
40
60
80
100
120
0 5 10 15 20 25
Y=a+bX+cX2
0
20
40
60
80
100
120
0 2 4 6 8 10 12 14
Y=a10-bX
Does the mathematical model make business sense?
Information Engineering Associates Periodic (Cyclic) Models
$-
$500
$1,000
$1,500
$2,000
$2,500
$3,000
$3,500
$4,000
1 3 5 7 9 11 13 15 17 19 21 23 25 27
Time DJIA = a + bT + cSIN(Tc -T) + dSIN(Td – T) + …
b=~1.10
Fourier transform
Information Engineering Associates
Getting Started Suggestions
1. Bone up on statistics/predictive modeling • www.coursera.org – free, online classes • Books – “Predictive Analytics” by Conrad Carlberg • Leverage in-house methods/tools expertise
2. Engage a user-partner/business problem • Evaluate current forecast process and impact • Explore the data – metrics, sensitivity variables • Prototype a model – test it against actual data
3. Estimate business impact ($) – what if scenarios, precasting 4. Get started with simple tools (Excel with advance stat plug-ins) 5. Parallel trials with current process and proposed predictive model
• Continuous monitoring and refinement
Information Engineering Associates Questions
Copyright 2014© Information Engineering Associates. All Rights Reserved. 15