1ownR Enterprise R platform
ownR Enterprise Analytics Platformfor R and Python
A technical introduction
2ownR Enterprise R platform
Motivation
1. Can new hires get set up in the environment to run analyses on their first day?
2. Can data scientists utilize the latest tools/packages without help from IT?
3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops?
4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions?
The “Joel Test” for Data Science
3ownR Enterprise R platform
Motivation
5. Does collaboration happen through a system other than email?
6. Can predictive models be deployed to production without custom engineering or infrastructure work?
7. Is there a single place to search for past research and reusable data sets, code, etc.?
8. Do your data scientists use the best tools money can buy?
Source: https://blog.dominodatalab.com/joel-test-data-science/
+1 Can you access your statistics without using R?
The “Joel Test” for Data Science
4ownR Enterprise R platform
Ideal Process
From code to production
SCM Package Repo
ProdProdProd
5ownR Enterprise R platform
Ideal Process
From code to production
SCM Package Repo
ProdProdProd
CI
• Documentation• Unit tests• Build
6ownR Enterprise R platform
Ideal Process
From code to production
SCM Package Repo
ProdProdProdCI
• Upload to repository
7ownR Enterprise R platform
Ideal Process
From code to production
SCM Package Repo
ProdProdProd
CI
• Deploy to separate applications
8ownR Enterprise R platform
The ownR product suite
The enterprise analytics platform
9ownR Enterprise R platform
laiR
1. Multiple repositories
• Shared and Private repositories
• CRAN, Bioconductor & public Python repos
2. Upload any tar.gz with a DESCRIPTION
3. Search packages
4. Dependency management across all repositories
5. Maintain multiple (historical) versions of the same package
6. Standard web application with zero-config installation
7. Integration with LDAP, Active Directory, etc.
The analytics application repository
10ownR Enterprise R platform
laiR
1. Can new hires get set up in the environment to run analyses on their first day?
2. Can data scientists utilize the latest tools/packages without help from IT? ✅
3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops?
4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions?
The “Joel Test” for Data Science
11ownR Enterprise R platform
laiR
5. Does collaboration happen through a system other than email? ✅
6. Can predictive models be deployed to production without custom engineering or infrastructure work?
7. Is there a single place to search for past research and reusable data sets, code, etc.? ✅
8. Do your data scientists use the best tools money can buy?
+1 Can you access your statistics without using R?
The “Joel Test” for Data Science
12ownR Enterprise R platform
roveR
1. Preconfigured, separated R environments
2. Open-source R package
3. Linking R environments to laiR installations
4. Release packages into laiR
5. Install dependencies with specific versions using laiR API
6. Integrate R application deployment with Jenkins, Hudson, etc.
Note: Python solves this with virtualenv, YAML and pip without third party tools.
R container management
13ownR Enterprise R platform
roveR + laiR
1. Can new hires get set up in the environment to run analyses on their first day? ✅
2. Can data scientists utilize the latest tools/packages without help from IT? ✅
3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops?
4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions? ✅
The “Joel Test” for Data Science
14ownR Enterprise R platform
roveR + laiR
5. Does collaboration happen through a system other than email? ✅
6. Can predictive models be deployed to production without custom engineering or infrastructure work? ✅
7. Is there a single place to search for past research and reusable data sets, code, etc.? ✅
8. Do your data scientists use the best tools money can buy?
+1 Can you access your statistics without using R? ✅
The “Joel Test” for Data Science
15ownR Enterprise R platform
exposeR
1. Access selected R & Python calculations via REST API
2. roveR container management for R - CRUD
3. Select functions to expose in packages / modules in container
4. Allows other application to integrate analytics calculations developed in R or Python without porting to a third language
5. R & Python in a dedicated server environment
6. Zero-config installation
7. Secure access using API keys
REST API for R containers
16ownR Enterprise R platform
exposeR + roveR + laiR
1. Can new hires get set up in the environment to run analyses on their first day? ✅
2. Can data scientists utilize the latest tools/packages without help from IT? ✅
3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops?
4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions? ✅
The “Joel Test” for Data Science
17ownR Enterprise R platform
exposeR + roveR + laiR
5. Does collaboration happen through a system other than email? ✅
6. Can predictive models be deployed to production without custom engineering or infrastructure work? ✅
7. Is there a single place to search for past research and reusable data sets, code, etc.? ✅
8. Do your data scientists use the best tools money can buy?
+1 Can you access your statistics without using R? ✅
The “Joel Test” for Data Science
18ownR Enterprise R platform
How does it work?
Division of labour
19ownR Enterprise R platform
Reference Implementation
The Delta-Lloyd case studyD
ev
elo
pm
en
t
R DEPLOYMENT PROCESS Functional Finances Ltd | January 13, 2016
R ContainerroveR::create
GITcommit
test, build, install
Jenkinscheckoutcheckout
test, build, install
laiR
roveR::release
roveR::install
Development Start
R Container
roveR::install
roveR::createTest Start Jenkins
Shiny
CLI
execute
execute
download
release
laiR
R Container(MPI grid)
roveR::install
roveR::createProduction Start
Shiny
CLI
execute
execute
Te
st
Pro
du
cti
on
roveR::install
roveR::install
Sa
nd
bo
x
R Container(MPI grid)
roveR::createGIT
commit
test, build, install
clone
checkoutModel Development
Start
20ownR Enterprise R platform
Reference Implementation
1. Can new hires get set up in the environment to run analyses on their first day? ✅
2. Can data scientists utilize the latest tools/packages without help from IT? ✅
3. Can data scientists use on-demand and scalable compute resources without help from IT/dev ops? ✅
4. Can data scientists find and reproduce past experiments and results, using the original code, data, parameters, and software versions? ✅
The “Joel Test” for Data Science
21ownR Enterprise R platform
Reference Implementation
5. Does collaboration happen through a system other than email? ✅
6. Can predictive models be deployed to production without custom engineering or infrastructure work? ✅
7. Is there a single place to search for past research and reusable data sets, code, etc.? ✅
8. Do your data scientists use the best tools money can buy? ✅
+1 Can you access your statistics without using R? ✅
The “Joel Test” for Data Science