+ All Categories
Home > Software > ownR platform technical introduction

ownR platform technical introduction

Date post: 22-Jan-2018
Category:
Upload: david-kun
View: 443 times
Download: 5 times
Share this document with a friend
22
1 ownR Enterprise R platform ownR Enterprise Analytics Platform for R and Python A technical introduction
Transcript
Page 1: ownR platform technical introduction

1ownR Enterprise R platform

ownR Enterprise Analytics Platformfor R and Python

A technical introduction

Page 2: ownR platform technical introduction

2ownR Enterprise R platform

Motivation

1. Can new hires get set up in the environment to run analyses on their first day?

2. Can data scientists utilize the latest tools/packages without help from IT?

3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops?

4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions?

The “Joel Test” for Data Science

Page 3: ownR platform technical introduction

3ownR Enterprise R platform

Motivation

5. Does collaboration happen through a system other than email?

6. Can predictive models be deployed to production without custom engineering or infrastructure work?

7. Is there a single place to search for past research and reusable data sets, code, etc.?

8. Do your data scientists use the best tools money can buy?

Source: https://blog.dominodatalab.com/joel-test-data-science/

+1 Can you access your statistics without using R?

The “Joel Test” for Data Science

Page 4: ownR platform technical introduction

4ownR Enterprise R platform

Ideal Process

From code to production

SCM Package Repo

ProdProdProd

Page 5: ownR platform technical introduction

5ownR Enterprise R platform

Ideal Process

From code to production

SCM Package Repo

ProdProdProd

CI

• Documentation• Unit tests• Build

Page 6: ownR platform technical introduction

6ownR Enterprise R platform

Ideal Process

From code to production

SCM Package Repo

ProdProdProdCI

• Upload to repository

Page 7: ownR platform technical introduction

7ownR Enterprise R platform

Ideal Process

From code to production

SCM Package Repo

ProdProdProd

CI

• Deploy to separate applications

Page 8: ownR platform technical introduction

8ownR Enterprise R platform

The ownR product suite

The enterprise analytics platform

Page 9: ownR platform technical introduction

9ownR Enterprise R platform

laiR

1. Multiple repositories

• Shared and Private repositories

• CRAN, Bioconductor & public Python repos

2. Upload any tar.gz with a DESCRIPTION

3. Search packages

4. Dependency management across all repositories

5. Maintain multiple (historical) versions of the same package

6. Standard web application with zero-config installation

7. Integration with LDAP, Active Directory, etc.

The analytics application repository

Page 10: ownR platform technical introduction

10ownR Enterprise R platform

laiR

1. Can new hires get set up in the environment to run analyses on their first day?

2. Can data scientists utilize the latest tools/packages without help from IT? ✅

3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops?

4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions?

The “Joel Test” for Data Science

Page 11: ownR platform technical introduction

11ownR Enterprise R platform

laiR

5. Does collaboration happen through a system other than email? ✅

6. Can predictive models be deployed to production without custom engineering or infrastructure work?

7. Is there a single place to search for past research and reusable data sets, code, etc.? ✅

8. Do your data scientists use the best tools money can buy?

+1 Can you access your statistics without using R?

The “Joel Test” for Data Science

Page 12: ownR platform technical introduction

12ownR Enterprise R platform

roveR

1. Preconfigured, separated R environments

2. Open-source R package

3. Linking R environments to laiR installations

4. Release packages into laiR

5. Install dependencies with specific versions using laiR API

6. Integrate R application deployment with Jenkins, Hudson, etc.

Note: Python solves this with virtualenv, YAML and pip without third party tools.

R container management

Page 13: ownR platform technical introduction

13ownR Enterprise R platform

roveR + laiR

1. Can new hires get set up in the environment to run analyses on their first day? ✅

2. Can data scientists utilize the latest tools/packages without help from IT? ✅

3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops?

4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions? ✅

The “Joel Test” for Data Science

Page 14: ownR platform technical introduction

14ownR Enterprise R platform

roveR + laiR

5. Does collaboration happen through a system other than email? ✅

6. Can predictive models be deployed to production without custom engineering or infrastructure work? ✅

7. Is there a single place to search for past research and reusable data sets, code, etc.? ✅

8. Do your data scientists use the best tools money can buy?

+1 Can you access your statistics without using R? ✅

The “Joel Test” for Data Science

Page 15: ownR platform technical introduction

15ownR Enterprise R platform

exposeR

1. Access selected R & Python calculations via REST API

2. roveR container management for R - CRUD

3. Select functions to expose in packages / modules in container

4. Allows other application to integrate analytics calculations developed in R or Python without porting to a third language

5. R & Python in a dedicated server environment

6. Zero-config installation

7. Secure access using API keys

REST API for R containers

Page 16: ownR platform technical introduction

16ownR Enterprise R platform

exposeR + roveR + laiR

1. Can new hires get set up in the environment to run analyses on their first day? ✅

2. Can data scientists utilize the latest tools/packages without help from IT? ✅

3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops?

4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions? ✅

The “Joel Test” for Data Science

Page 17: ownR platform technical introduction

17ownR Enterprise R platform

exposeR + roveR + laiR

5. Does collaboration happen through a system other than email? ✅

6. Can predictive models be deployed to production without custom engineering or infrastructure work? ✅

7. Is there a single place to search for past research and reusable data sets, code, etc.? ✅

8. Do your data scientists use the best tools money can buy?

+1 Can you access your statistics without using R? ✅

The “Joel Test” for Data Science

Page 18: ownR platform technical introduction

18ownR Enterprise R platform

How does it work?

Division of labour

Page 19: ownR platform technical introduction

19ownR Enterprise R platform

Reference Implementation

The Delta-Lloyd case studyD

ev

elo

pm

en

t

R DEPLOYMENT PROCESS Functional Finances Ltd | January 13, 2016

R ContainerroveR::create

GITcommit

test, build, install

Jenkinscheckoutcheckout

test, build, install

laiR

roveR::release

roveR::install

Development Start

R Container

roveR::install

roveR::createTest Start Jenkins

Shiny

CLI

execute

execute

download

release

laiR

R Container(MPI grid)

roveR::install

roveR::createProduction Start

Shiny

CLI

execute

execute

Te

st

Pro

du

cti

on

roveR::install

roveR::install

Sa

nd

bo

x

R Container(MPI grid)

roveR::createGIT

commit

test, build, install

clone

checkoutModel Development

Start

Page 20: ownR platform technical introduction

20ownR Enterprise R platform

Reference Implementation

1. Can new hires get set up in the environment to run analyses on their first day? ✅

2. Can data scientists utilize the latest tools/packages without help from IT? ✅

3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops? ✅

4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions? ✅

The “Joel Test” for Data Science

Page 21: ownR platform technical introduction

21ownR Enterprise R platform

Reference Implementation

5. Does collaboration happen through a system other than email? ✅

6. Can predictive models be deployed to production without custom engineering or infrastructure work? ✅

7. Is there a single place to search for past research and reusable data sets, code, etc.? ✅

8. Do your data scientists use the best tools money can buy? ✅

+1 Can you access your statistics without using R? ✅

The “Joel Test” for Data Science

Page 22: ownR platform technical introduction

22ownR Enterprise R platform

Questions?

[email protected]


Recommended