ownR platform technical introduction

Post on 22-Jan-2018

443 views 5 download

transcript

1ownR Enterprise R platform

ownR Enterprise Analytics Platformfor R and Python

A technical introduction

2ownR Enterprise R platform

Motivation

1. Can new hires get set up in the environment to run analyses on their first day?

2. Can data scientists utilize the latest tools/packages without help from IT?

3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops?

4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions?

The “Joel Test” for Data Science

3ownR Enterprise R platform

Motivation

5. Does collaboration happen through a system other than email?

6. Can predictive models be deployed to production without custom engineering or infrastructure work?

7. Is there a single place to search for past research and reusable data sets, code, etc.?

8. Do your data scientists use the best tools money can buy?

Source: https://blog.dominodatalab.com/joel-test-data-science/

+1 Can you access your statistics without using R?

The “Joel Test” for Data Science

4ownR Enterprise R platform

Ideal Process

From code to production

SCM Package Repo

ProdProdProd

5ownR Enterprise R platform

Ideal Process

From code to production

SCM Package Repo

ProdProdProd

CI

• Documentation• Unit tests• Build

6ownR Enterprise R platform

Ideal Process

From code to production

SCM Package Repo

ProdProdProdCI

• Upload to repository

7ownR Enterprise R platform

Ideal Process

From code to production

SCM Package Repo

ProdProdProd

CI

• Deploy to separate applications

8ownR Enterprise R platform

The ownR product suite

The enterprise analytics platform

9ownR Enterprise R platform

laiR

1. Multiple repositories

• Shared and Private repositories

• CRAN, Bioconductor & public Python repos

2. Upload any tar.gz with a DESCRIPTION

3. Search packages

4. Dependency management across all repositories

5. Maintain multiple (historical) versions of the same package

6. Standard web application with zero-config installation

7. Integration with LDAP, Active Directory, etc.

The analytics application repository

10ownR Enterprise R platform

laiR

1. Can new hires get set up in the environment to run analyses on their first day?

2. Can data scientists utilize the latest tools/packages without help from IT? ✅

3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops?

4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions?

The “Joel Test” for Data Science

11ownR Enterprise R platform

laiR

5. Does collaboration happen through a system other than email? ✅

6. Can predictive models be deployed to production without custom engineering or infrastructure work?

7. Is there a single place to search for past research and reusable data sets, code, etc.? ✅

8. Do your data scientists use the best tools money can buy?

+1 Can you access your statistics without using R?

The “Joel Test” for Data Science

12ownR Enterprise R platform

roveR

1. Preconfigured, separated R environments

2. Open-source R package

3. Linking R environments to laiR installations

4. Release packages into laiR

5. Install dependencies with specific versions using laiR API

6. Integrate R application deployment with Jenkins, Hudson, etc.

Note: Python solves this with virtualenv, YAML and pip without third party tools.

R container management

13ownR Enterprise R platform

roveR + laiR

1. Can new hires get set up in the environment to run analyses on their first day? ✅

2. Can data scientists utilize the latest tools/packages without help from IT? ✅

3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops?

4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions? ✅

The “Joel Test” for Data Science

14ownR Enterprise R platform

roveR + laiR

5. Does collaboration happen through a system other than email? ✅

6. Can predictive models be deployed to production without custom engineering or infrastructure work? ✅

7. Is there a single place to search for past research and reusable data sets, code, etc.? ✅

8. Do your data scientists use the best tools money can buy?

+1 Can you access your statistics without using R? ✅

The “Joel Test” for Data Science

15ownR Enterprise R platform

exposeR

1. Access selected R & Python calculations via REST API

2. roveR container management for R - CRUD

3. Select functions to expose in packages / modules in container

4. Allows other application to integrate analytics calculations developed in R or Python without porting to a third language

5. R & Python in a dedicated server environment

6. Zero-config installation

7. Secure access using API keys

REST API for R containers

16ownR Enterprise R platform

exposeR + roveR + laiR

1. Can new hires get set up in the environment to run analyses on their first day? ✅

2. Can data scientists utilize the latest tools/packages without help from IT? ✅

3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops?

4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions? ✅

The “Joel Test” for Data Science

17ownR Enterprise R platform

exposeR + roveR + laiR

5. Does collaboration happen through a system other than email? ✅

6. Can predictive models be deployed to production without custom engineering or infrastructure work? ✅

7. Is there a single place to search for past research and reusable data sets, code, etc.? ✅

8. Do your data scientists use the best tools money can buy?

+1 Can you access your statistics without using R? ✅

The “Joel Test” for Data Science

18ownR Enterprise R platform

How does it work?

Division of labour

19ownR Enterprise R platform

Reference Implementation

The Delta-Lloyd case studyD

ev

elo

pm

en

t

R DEPLOYMENT PROCESS Functional Finances Ltd | January 13, 2016

R ContainerroveR::create

GITcommit

test, build, install

Jenkinscheckoutcheckout

test, build, install

laiR

roveR::release

roveR::install

Development Start

R Container

roveR::install

roveR::createTest Start Jenkins

Shiny

CLI

execute

execute

download

release

laiR

R Container(MPI grid)

roveR::install

roveR::createProduction Start

Shiny

CLI

execute

execute

Te

st

Pro

du

cti

on

roveR::install

roveR::install

Sa

nd

bo

x

R Container(MPI grid)

roveR::createGIT

commit

test, build, install

clone

checkoutModel Development

Start

20ownR Enterprise R platform

Reference Implementation

1. Can new hires get set up in the environment to run analyses on their first day? ✅

2. Can data scientists utilize the latest tools/packages without help from IT? ✅

3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops? ✅

4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions? ✅

The “Joel Test” for Data Science

21ownR Enterprise R platform

Reference Implementation

5. Does collaboration happen through a system other than email? ✅

6. Can predictive models be deployed to production without custom engineering or infrastructure work? ✅

7. Is there a single place to search for past research and reusable data sets, code, etc.? ✅

8. Do your data scientists use the best tools money can buy? ✅

+1 Can you access your statistics without using R? ✅

The “Joel Test” for Data Science

22ownR Enterprise R platform

Questions?

info@functionalfinances.com