Leonardo Framework –SAP’ Innovation platformsapnow.hu/wp-content/uploads/2018/08/03_SAP... ·...

Post on 21-Jul-2020

2 views 0 download

transcript

CUSTOMERDirk Miethe, SAPSeptember, 2018

Leonardo Framework – SAP’ Innovation platform for Machine Learning SAP Data Framework as Foundation for Data Innovation

2CONFIDENTIAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

„you can spend the whole day waiting on

the ocean without catching a wave”

5CONFIDENTIAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

SAP Leonardo

New Technology

Design Services

Natural LanguageImage Recognition

IoT

Deep Learning

Connected Data

Blockchain

API’s

Machine Learning

Business Process Innovation

Big Data Real-time

Analyses

Workshop

Experience

Start Ups

Discover

Innovation

Center

Prototype

The Machine

Learning

Akademie

Innovation Center

Potsdam

MicroservicesData Hub

Berlin

6CUSTOMER© 2017 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Innovation: getting ideas into productionScaling the last mile is much harder than developing MVPs

# Projects

# People

# Users

time

PIONEER SETTLER CITY BUILDER

DATA LAB

DATA FACTORY

INTELLIGENT

ENTERPRISE

PILOT

EASY DATA ACCESS

NEW

INDEPENDENT

PRODUCTIVE

STABLE

AUTOMATE & RE-USE

INTEGRATED

INDUSTRIALIZED IT

DATA GOVERNANCE

SECURITY

PLATFORM STRATEGY

ALL AT ONCE

7CONFIDENTIAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Our startup found an ML

algorithm!

9CONFIDENTIAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Data Lab Data Factory Intelligent Enterprise

Move beyond initial innovation successes

PIONEER SETTLER CITY BUILDER

* Google Research Paper on ML Systems

Manual data extraction

Feature Matrix

<Transform>

</></>

Data ready for consumption

<ML>

<Stats>

5% *

95% *Technical Debt

in ML Systems

10CONFIDENTIAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Plant

Protection

Force

Plant

Control

Station

Pipeline

Base

Material

ProcessingData

Scientist

IT

Operations

BW & BI

Expert

Data Engineer

IT

Developer

11CONFIDENTIAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Move beyond initial innovation successes

PIONEER SETTLER CITY BUILDER

Manual data extraction

Features

<Transform>

</></>

Data ready for consumption

<ML>

<Stats>

</> </> </>

Train & deploy ML

ModelsData

Pipelines

</>

ETL

<ML> Data Catalogue

Pu

sh

-Dow

n E

xe

cu

tio

n

Mo

nito

rin

g

Connections

Dashboard App

Data Lab Data Factory Intelligent Enterprise

13CONFIDENTIAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Example:Typical Machine Learning System

PIONEER SETTLER CITY BUILDER

Manual data extraction

Features

<Transform>

</></>

Data ready for consumption

<ML>

<Stats>

</> </> </>

Train & deploy ML

Models

Dashboard App

Data

Pipelines

</>

ETL

<ML> Data Catalogue

Pu

sh

-Dow

n E

xe

cu

tio

n

Mo

nito

rin

g

Connections

App

API

Micro-

Service

API

SAP Leonardo

Data Lab Data Factory Intelligent Enterprise

14CONFIDENTIAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Data ScientistBusiness(User / Analyst)

Data Engineer Operations

Accelerate

transformation towards

Intelligent Enterprise

15CONFIDENTIAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Do you prefer an architect

& building contractor …

or integrate / coordinate yourself ?

Building the foundation of the Intelligent Enterprise

In-memory database

Flexible data access

Machine Learning Tools&Capabilities

Application Development Environment

ETL

Process Orchestration

Authority Management

ComplianceUser Interface

IT Operations

Visualization

Automation

16CONFIDENTIAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Application

Required building blocks to develop Machine Learning content

Data

Engines for

specific data

types

Library

Coding

Tools

› Actionable,

Business critical

Insights

Data

Preparation

17CONFIDENTIAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Componentmapping

Data ScientistBI ConsumerBusiness

Analyst

Business User

& Customer

Data Apps Ad-hocanalysis

BI Data Lab

Cloud Infrastructure On Premise Infrastructure

SAP HANA

SAP Data Hub

ML Foundation

SAPEnterprise Integration

SAP Analytics

CloudSAP Agile Data

Preparation

HANA (XSA)

Fiori, etc.

Any Tool

(i.e. Matlab, R-Studio, Jupyter,

Zepplin)

PAL APL

GeoText

Graph

pre trained / training / bring you own Model

18CONFIDENTIAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Example ML service: facial recognition

https://budgetapiwebsdciot.hana.ondemand.com/budgetapiweb/#

19CONFIDENTIAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Example of a Shopping App

Shop

Recommendation

with HANA Geo-

Spatial Engine

Product

Recommendation

with HANA PAL

Image

Recognition

withTensorflow

1

2

3

20CONFIDENTIAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Image Recognition with HANA-Tensorflow Integration

Tensorflow Libraries

› Tensorflow Image

Recognition

withTensorflow

HANA

21CONFIDENTIAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Product Recommendation with HANA PAL or APL

PAL/APL Libraries

› HANA Predictive Analytics Library (PAL)

› HANA Automated Predictive Library (APL)

HANA

Product

Recommendation

with HANA PAL

22CONFIDENTIAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Different algorithms for different type of questions - Examples

Grouping customers

› E.g. K-Means Clustering

Probability to buy

a certain product

› E.g. Logistic regression, Decision Tree

Predict the value spent

› E.g. Multiple Linear Regression

Seasonality

› E.g. Triple Exponential Smoothing

Product

Recommendation

with HANA PAL

23CONFIDENTIAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

SAP HANA Predictive Analytics Library (PAL) – Recent Enhancements

Classification Analysis▪ CART

▪ C4.5 Decision Tree Analysis

▪ CHAID Decision Tree Analysis

▪ K Nearest Neighbour

▪ Logistic Regression Elastic Net

▪ Back-Propagation (Neural Network)

▪ Naïve Bayes

▪ Support Vector Machine

▪ Random Forests

▪ Gradient Boosting Decision Tree (GBDT)*

▪ Linear Discriminant Analysis (LDA)*

▪ Confusion Matrix

▪ Area Under Curve (AUC)

▪ Parameter Selection / Model Evaluation

Regression▪ Multiple Linear Regression Elastic Net

▪ Polynomial, Exponential, Bi-Variate Geometric, Bi-Variate Logarithmic Regression

▪ Generalized Linear Model (GLM)*

▪ Cox Proportional Hazards Model*

Association Analysis▪ Apriori, Apriori Lite

▪ FP-Growth

▪ KORD – Top K Rule Discovery

▪ Sequential Pattern Mining*

Probability Distribution▪ Distribution Fit/ Weibull analysis

▪ Cumulative Distribution Function

▪ Quantile Function

▪ Kaplan-Meier Survival Analysis

Outlier Detection▪ Inter-Quartile Range Test (Tukey’s

Test)

▪ Variance Test

▪ Anomaly Detection

▪ Grubbs Outlier Test

Recommender Systems▪ Factorized Polynomial Regression

Models**

Link Prediction▪ Common Neighbors, Jaccard’s

Coefficient, Adamic/Adar, Katzβ* New in HANA 2 SPS0

**New in HANA 2 SPS01

***New in HANA 2 SPS02

Statistical Functions ▪ Mean, Median, Variance, Standard

Deviation, Kurtosis, Skewness

▪ Covariance Matrix

▪ Pearson Correlations Matrix

▪ Chi-squared Tests:

- Test of Quality of Fit

- Test of Independence

▪ F-test (variance equal test)

▪ Data Summary*

▪ ANOVA**

▪ One-sample Median Test**

▪ T Test**

▪ Wilcox Signed Rank Test**

Data Preparation

▪ Sampling, Binning, Scaling, Partitioning

▪ Principal Component Analysis (PCA) /

PCA Projection

▪ Factor Analysis

▪ Multi dimensional scaling

Other▪ Weighted Scores Table

▪ Substitute Missing Values

Cluster Analysis▪ ABC Classification

▪ DBSCAN

▪ K-Means/ Accelerated K-Means**

▪ K-Medoid Clustering

▪ K-Medians

▪ Kohonen Self Organized Maps

▪ Agglomerate Hierarchical

▪ Affinity Propagation

▪ Latent Dirichlet Allocation (LDA)

▪ Gaussian Mixture Model (GMM)

▪ Cluster Assignment

Time Series Analysis▪ Single/Double/ Brown /Triple Exp.Smoothing

▪ Forecast Smoothing

▪ Auto - ARIMA/ Seasonal ARIMA

▪ Croston Method

▪ Forecast Accuracy Measure

▪ Linear Regression with Damped Trend and

Seasonal Adjust

▪ Test for White Noise, Trend, Seasonality

▪ Fast Fourier Transform (FFT)*

▪ Correlation Function*

24CONFIDENTIAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Integrated non-SAP libraries

Other Libraries

› R library

› Python library

› others

HANA

25CONFIDENTIAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Nearest shop search with Geo-Spatial engine in HANA

Pre-Processing of specific data types

Shop

Recommendation

with HANA Geo-

Spatial Engine

Geo Spatial

26CONFIDENTIAL© 2018 SAP SE or an SAP affiliate company. All rights reserved. ǀ

Other engines in HANA

Pre-Processing of specific data types

Text Analysis Graph AnalysisGeo Spatial

Demo SAP Data Hubc

Thank you.

Contact information:

Dirk Miethe & Karsten HaldenwangCenter of Excellence Database and Data ManagementMiddle & Eastern Europe