© 2012 SAP AG. All rights reserved. 3
Digital Era: Big Data
GPS, RFID,
Hypervisor, Web Servers,
Email, Messaging Clickstreams, Mobile,
Telephony, IVR, Databases, Sensors, Telematics, Storage,
Servers, Security Devices, Desktops
© 2012 SAP AG. All rights reserved. 4
The Potential to Connect “Things“ (and get data:) is immense
Turbines, windmills, UPS, Batteries, Generators, Meters, Drills, Fuels Cells, etc. Source: Beecham Research
HVAC Transport, Fire & Safety, Lighting, Security, Access etc.
TVs, Power Systems, Dishwashers, Lighting, Washer/Dryers. Meters/Lights, Alarms etc.
Pumps, Valves, Vats, Conveyors, Pipelines, Motors, Drives, Converting, Fabrication, Assembly/Packaging, Vessels/Tanks etc. MRI, PDAs, Implants,
Surgical Equip., Pumps, Monitors, Telmedicine etc.
Servers, Storage, PCs, routers, Switches, PBXs etc.
Cars, Ambulances, Fire, Breakdown, Lone Worker, Homeland Security, Environment Monitor, etc.
Vehicles, Lights, Ships, Planes, Signage, Tolls, Containers, etc.
POS Terminals, Tags, Cash Registers, Vending machines, Signs etc.
Buildings
Consumer & Home
IT & Networks
Security / Public Safety
Industrial Healthcare
& Life Sciences Energy
Retail
Transportation
Commercial, Institutional, Industrial
Infrastructure, Awareness & Safety, Comfort & Convenience
Resource automation, Fluid/Processes, Converting/ Discrete
Care, In Vivo/Home, Research
Supply/Demand, Alternative, Oil/Gas
Stores, Hospitality, Specialty
Trans Systems, Vehicles, Non-Vehicular
Tracking, Equipment, Surveillance
Enterprise, Public
© 2012 SAP AG. All rights reserved. 6
CRM ERP Billing Click- streams Mobile Social
Media
Logs Sensors Email IoT
Predictive Analytics Extracting Information/Knowledge from Data
© 2012 SAP AG. All rights reserved. 7
Predictive Analytics Extracting Information/Knowledge from Data
© 2012 SAP AG. All rights reserved. 8
Example: Machine Health Prediction Same principles apply to Churn / Propensity-to-Buy / Ebola detection etc. predictions
• Predict machine/part failure to lower service costs and increase machine up-time
• Potentially interesting attributes (predictors) : • Sensor data like temperatures, pressures, machine conditions • Failure codes • Status sequences • Machine master data
© 2012 SAP AG. All rights reserved. 9
Measurements: every 10 seconds Dataset: 1.200 variables 140.000 records Target variable : Failure in next 24h
Example: Machine Health Prediction Using Sensor Data
© 2012 SAP AG. All rights reserved. 11
What is Predictive Analytics? Typical questions predictive analytics / data mining may answer
Classification Who will (need intervention | be at risk of fraud) next (week | month | year)?
Prediction How will (debt | crime | budget) be next (week | month)?
Forecasting How will the (budget | debt | grant) be over the next (year | month)?
Clustering/Segmentation What are the groups of (constituents | businesses | employees) with a similar (behaviour | profile)?
(Social) Network Analysis Analyse interactions to identify (communities | influencers)
Association Rules Analyse transactions to identity events likely to occur together
© 2012 SAP AG. All rights reserved. 12
Doing Predictions basically is a Four Step Approach
Historic data is used to learn. These leanings are used to create prediction models. These models can than be applied to current data. They need to be systematically controlled & maintained to ensure best possible results.
Prediction Models
Control & Maintenance
Current Data Transactions, sensors, ...
Historic Data Transactions, demographics, sensors, ...
© 2012 SAP AG. All rights reserved. 13
Doing Predictions...
Historic Data Sensors, transactions,...
Prediction Models
Control & Maintenance
Current Data Sensors ...
© 2012 SAP AG. All rights reserved. 14
Creating Analytical Dataset Example
Analytical Record Domain 1
Var1 Var2
Var3
...
…
Domain 2
CRM ERP Sensor data DWH
• Real-time sensor data (billion records/year)
• CRM, ERP, EAM, DWH data
© 2012 SAP AG. All rights reserved. 15
Doing Predictions...
Historic Data Sensors, transactions,...
Prediction Models
Control & Maintenance
Derived attributes
Current Data Sensors ...
© 2012 SAP AG. All rights reserved. 16
Analytical Record
Domain 1
Var1 Var2
Var3
...
…
…
…
Varn -2
Var n-1
Var n
Domain 2 Domain N-1 Domain N
Derived Attributes: • Time Window Aggregates • Sequences • Text/Log Analytics • Link/Network/Social Attributes • Co-location Events/transactions • Geolocation Path Identification
CRM ERP Sensor data DWH
Enriching Analytical Dataset Derived Attributes
© 2012 SAP AG. All rights reserved. 20
Reusable Reduces Human Error Self-Service Prepare
Metadata based modelling
Create 1000’s of derived attributes
Define metadata once
Select time-stamped population
Builds analytic dataset automatically
SAP InfiniteInsight Explorer Analytical data sets with clicks not code
© 2012 SAP AG. All rights reserved. 21
Historic Data Sensors, transactions,...
Prediction Models
Control & Maintenance
Derived attributes
Current Data Sensors ...
Doing Predictions...
© 2012 SAP AG. All rights reserved. 22
Classification Who will (need intervention | be at risk of fraud) next (week | month | year)?
Prediction How will (debt | crime | budget) be next (week | month)?
Forecasting How will the (budget | debt | grant) be over the next (year | month)?
(Social) Network Analysis Analyse interactions to identify (communities | influencers)
Clustering/Segmentation What are the groups of (constituents | businesses | employees) with a similar (behaviour | profile)?
Association Rules Analyse transactions to identity events likely to occur together
Prediction modelling Families of Problems & Algorithms Used for Predictive Analytics
© 2012 SAP AG. All rights reserved. 23
Modelling with SAP InfiniteInsight
Automatization of repeatable & time consuming modeling steps: • Missing data • Outliers • Skewed distributions • Correlations • Data encoding etc.
Automatised model building & optimization
© 2012 SAP AG. All rights reserved. 24
Easy to Use Time to Market More Models Build
Fully automated modeling process • Regression • Classification • Segmentation • Time series forecasting • Association rules
Identify key variables Executive and operational reports
SAP InfiniteInsight Modeler Predictive power in days not months
© 2012 SAP AG. All rights reserved. 25
Improve Insight Extend Reach Boost ROI Social
Use link/social variables for enhanced prediction
Identify communities amongst your customers
Find influencers to make your campaigns viral
SAP InfiniteInsight (Social) Networks Analysis Improve insight with (social) networks
© 2012 SAP AG. All rights reserved. 26
Historic Data Sensors, transactions,...
Prediction Models
Current Data Sensors ...
Control & Maintenance
Derived attributes
Doing Predictions...
© 2012 SAP AG. All rights reserved. 27
SAP InfiniteInsight Scorer Put scores into action
One-click deployment of scores into production environment
In-database scoring (SQL)
Interface with business apps via scoring equations in
• C++ • Java • PMML • SAS
Non-Intrusive Time to Value Repeatable Deploy
© 2012 SAP AG. All rights reserved. 28
Refresh analytic data sets and models automatically
Deploy scores to production
Alert on data and model deviations
No Programming Scale Manage By Exception Improve
SAP InfiniteInsight Factory Every model at peak performance
© 2012 SAP AG. All rights reserved. 30
Challenge No 1: Traditional predictive analytics approach too long ... to prepare, deploy and manage models
Source: Adopted from Phases of the Pattern Mining Process by Gartner
• Manual • Repetitive • Prone to error
© 2012 SAP AG. All rights reserved. 31
Many models needed to cover all needs
N machine types
M failure types
= N x M Models
Historic Data Sensors, transactions,...
Prediction Models
Current Sensor Data
Control & Maintenance
Derived attributes
K marketing campaigns
M communication channels
N customer segments
= K x N x M Models
© 2012 SAP AG. All rights reserved. 32
Need for Faster and Better Predictive Models
• Advanced techniques & tools for ADS building • Automation of model creation
• Automated data preparation • High quality – comparable/better models • Short „Time-to-Market“ • Low TCO per model
• Production use in real-time environments • Control & Maintenance of models
© 2012 SAP AG. All rights reserved. 33
Model transformed in SQL Code
Dataset automation
SQL Code
To real time environments
SAP InfiniteInsight
© 2012 SAP AG. All rights reserved. 34
Challenge No 2: Traditional predictive modeling can’t handle BigData
Can’t scale across wide data sets
Hard to interpret semi- and unstructured data
Exhaust data scientists’ “a priori” knowledge
© 2012 SAP AG. All rights reserved. 35
High Dimensionality Problems Data preparation&encoding, model overfiting, scalability of modeling&deployment
CRM ERP Billing
Profile Products
Purchase History
Usage
Before 2010 (Transactions)
SmartGrid Web Mobile
Social Media
Now (Behaviors)
Logs Sensors IoT M2M
CRM
ERP
Cam
paig
n
100’s of Derived Attributes Big Data
Handcrafted SAP InfiniteInsight
© 2012 SAP AG. All rights reserved. 36
SAP InfiniteInsight – Automated Machine Learning Approach The more data, the better models & More data (generally) “beats” better algorithms
20 Variables Demographics +
Simple Aggregates
500 Variables Time Pivoted Behavior
Social Communities
© 2012 SAP AG. All rights reserved. 37
SAP InfiniteInsight – Automated Machine Learning Approach Identify any and all information that has predictive power
© 2012 SAP AG. All rights reserved. 38
Real-life Examples Deep Insight from Big Data
Customer No of Variables UniCredit 675 Cox 800 Sears 900 Large Wireless Telco 1,000 Lowe’s 1,100 Mobilink 1,100 Large UK Retail Bank 2,000 Experian 2,000 Vodafone D2 2,500 MonotaRO 2,500 Bell Canada 3,000 Rogers Wireless 3,000 Discover 10,000 U.S. eBusiness 15,000 Shutterfly 28,000+
© 2012 SAP AG. All rights reserved. 40
Why SAP InfiniteInsight ?
Productive Predictive analytics process made efficient. Automated data preparation, modeling and deployments tasks. Models in minutes or hours.
No PhD Required Easy yet sophisticated. Model building and deployment in clicks.
Big Data Made Easy Scales for terabytes and petabytes of data. Rapid insight from 1,000's and 10,000's of variables with no expert intervention.
Fast & Accurate Automation cuts human time. Increased accuracy by including all potentially predictive variables and eliminating manual errors
Corporate Knowledge Models incorporated into the business process. Knowledge shared and retained across the organization.
Quick Win Quick installation. Short training. Leapfrog to best-in-class analytics.
Lower TCO Leverage existing infrastructure. No need for additional resources. Payback in weeks.
© 2012 SAP AG. All rights reserved. 41
SAP InfiniteInsight & Big data
Not just large records, but high dimensions
Lack of data knowledge, complex domain knowledge
Textual information, weblogs, transactions, phone calls,
location data, sensor signals…
Fast modeling & scoring, (social) network analysis, recommendations
© 2012 SAP AG. All rights reserved. 42
Some Proof Points
Allegro: 100M+ personalized recommendations a day Mobilink: Social graphs on 70M distinct nodes and 900M links out of 4.3 billion CDRs Shutterfly: Model with 28.000+ columns
Vodafone: Churn and X-sell management with 700 models Rhapsody: Survival analysis Mobilink: Find the influencers for a variety of business questions
E.ON: Analyze call center logs (text) to enhance customer targeting Firmenich: Text, memos & chemical attributes used to predict the likelihood of a fragrance to sell Vodafone: Variety of data types (transactions, geo, …)
Mobilink: Graphs built in 30 hours Shutterfly: Model built in a day Telco: Processing SNA analysis on billions of CDR transactions in hours