+ All Categories
Home > Documents > PowerPoint Presentationdownload.microsoft.com/download/D/E/7/DE7AE181-EE05-4699...开源软件...

PowerPoint Presentationdownload.microsoft.com/download/D/E/7/DE7AE181-EE05-4699...开源软件...

Date post: 27-Mar-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
31
Transcript

Source: Gartner

价值

困难

描述性分析

诊断性分析

预测性分析

告知性分析

过去发生了什么?

为什么会发生

未来会发生什么?

我们应该做什么?

传统BI 高级分析

常见方案

Machine Learning或者

Microsoft R Server

Cortana

高级分析套件

SQL Server 2016或者

Microsoft R Server

1. 业务理解

2. 数据理解

3. 数据准备 (清洗/准备)

4. 利用数据创建/训练模型

5. 测试/验证模型

6. 使用模型得出预测的输出

7. 与业务应用集成

8. 周期性的更新模型

开源软件

分析,计算,建模

全球范围的社区

数百万用户 9,000+包

大数据支持良好的生态

扩展性

$?

开源带来了额外的成本开销

缺少企业级支持服务

将R与现存的不断变化的数据架构进行集成

有限的扩展性

在大数据量下的性能瓶颈

数据移动带来额外的成本和安全隐患

7*24 支持

企业级的平台

包含在SQL Server 中

在多个数据平台上的统一分析应用

更智能的决策

更快的得到分析结果

支持混合云

开源R代码直接复用

不用考虑内存数据量的限制

US flight data for 20 years

Linear Regression on Arrival Delay

Run on 4 core laptop, 16GB RAM and 500GB SSD

开源社区

Revolution R Open

R Open

企业用户

Revolution R Enterprise

R Server

R Open Microsoft R Server

DeployRDevelopR

ConnectR• High-speed & direct

connectors

Available for:• High-performance XDF

• SAS, SPSS, delimited & fixed format text data files

• Hadoop HDFS (text & XDF)

• Teradata Database & Aster

• EDWs and ADWs

• ODBCScaleR• Ready-to-Use high-performance

big data big analytics

• Fully-parallelized analytics

• Data prep & data distillation

• Descriptive statistics & statistical tests

• Range of predictive functions

• User tools for distributing customized R algorithms across nodes

• Wide data sets supported – thousands of variables

DistributedR• Distributed computing framework

• Delivers cross-platform portability

R+CRAN• Open source R interpreter

• R 3.1.2

• Freely-available huge range of R algorithms

• Algorithms callable by RevoR

• Embeddable in R scripts

• 100% Compatible with existing R scripts, functions and packages

RevoR• Performance enhanced R

interpreter

• Based on open source R

• Adds high-performance math library to speed up linear algebra functions

Custom parallelization

PEMA-R API

rxDataStep

rxExec

Data step

Data import – Delimited, fixed, SAS, SPSS, OBDC

Variable creation & transformation

Recode variables

Factor variables

Missing value handling

Sort, merge, split

Aggregate by category (means, sums)

Descriptive statistics

Min/max, mean, median (approx.)

Quantiles (approx.)

Standard deviation

Variance

Correlation

Covariance

Sum of squares (cross-product matrix for set variables)

Pairwise cross tabs

Risk ratio & odds ratio

Cross-tabulation of data (standard tables & long form)

Marginal summaries of cross tabulations

Statistical tests

Chi Square Test

Kendall Rank Correlation

Fisher’s Exact Test

Student’s t-Test

Sampling

Subsample (observations & variables)

Random sampling

Predictive models

Sum of squares (cross-product matrix for set variables)

Multiple linear regression

Generalized linear models (GLM) exponential family distributions: binomial,

Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions: cauchit,

identity, log, logit, probit. User defined distributions & link functions.

Covariance & correlation matrices

Logistic regression

Classification & regression trees

Predictions/scoring for models

Residuals for all models

Simulation

Simulation (e.g., Monte Carlo)

Parallel random number generation

Cluster analysis

K-Means

Classification

Decision trees

Decision forests

Gradient-boosted decision trees

Naïve Bayes

数据科学家交互式分析数据

SQL 开发者/DBA管理数据/分析数据

扩展到R

例如:销售额预测

库存优化

预测性维护

信用卡交易保护

010010

100100

010101

关系型数据

分析库

T-SQL 接口

?R 集成

内置于SQL Server 2016

010010

100100

010101

不用移动数据就可以实时的分析交易数据

R结合SQL的内存数据库

数据科学家交互式直接访问数据并发布算法

在整个数据集上执行R

执行和测试都在数据库中

部署到本地数据库中

SQL 开发者更轻松的同时管理和分析数据

轻松调用R脚本或者模型

使用T-SQL调用R代码

DBA

更方便的管理数据

统一的管理性能

可以安全的管理R的执行

select o.name, o.description

from sys.dm_xe_objects o join sys.dm_xe_packages p

on o.package_guid = p.guid

where o.object_type = 'event' and p.name = 'SQLSatellite' order by o.name;

Sensors

Machines

Data Suppliers

Legacy Sources

Data Sources

EDW ERP/MRP

SQL Server

Azure Data Platform

29

Business Analysts

Power Analysts(R Studio, DevelopR, etc.)

Line of Business users(Analytic Apps, Rules Engines, etc.)

Analytics Consumers

Math Servers and

Clusters

Data

Models

Execution

DataModelsExecution

Ingest

Scored Data

Structured Data

Events Stream

Processing

ModelsEdge

Computing

Scores

VisualizationBig Data• Transformation

• Aggregation

• Exploration

• Modeling

• Model Evaluation

• Data Scoring

金融服务 保险

医疗&制药 数字公司 分析服务提供商

制造&高科技

https://catalog.imagine.microsoft.com/en-us/Catalog/Product/105

https://msdn.microsoft.com/en-us/library/mt591993.aspx

https://blogs.msdn.microsoft.com/business-intelligence


Recommended