+ All Categories
Home > Science > DF1 - R - Roark - H2O Overview

DF1 - R - Roark - H2O Overview

Date post: 16-Jan-2017
Category:
Upload: moscowdatafest
View: 761 times
Download: 4 times
Share this document with a friend
19
H 2 O.ai Machine Intelligence ML is the new SQL Prediction is the new Search
Transcript
Page 1: DF1 - R - Roark - H2O Overview

H2O.aiMachine Intelligence

ML is the new SQLPrediction is the new Search

Page 2: DF1 - R - Roark - H2O Overview

H2O.aiMachine Intelligence

Company Overview

Company

Product

• Founded: 2011 venture-backed• Team: 40• Distributed Systems Engineers doing Machine

Learning• HQ: Mountain View, CA• Fast, scalable machine and deep learning• Predictive analytics• Open Source Applications

Page 3: DF1 - R - Roark - H2O Overview

H2O.aiMachine Intelligence

Executive Team

Sri Satish AmbatiCEO & Co-founder

Cliff ClickCTO & Co-founder

Tom KraljevicVP of Engineering

Board of DirectorsJishnu Bhattacharjee // Nexus VenturesAsh Bhardwaj // Flextronics

Scientific Advisory CouncilTrevor HastieStephen BoydRob Tibshirani

DataStax Sun, Java Hotspot Abrizio, Intel Lexalytics

VP of MarketingOleg Rognyskyy

Page 4: DF1 - R - Roark - H2O Overview

H2O.aiMachine Intelligence

H2O Product Overview

H2O is: Open Source , Distributed, In Memory, Predictive Analytics Platform!

Speed Matters!

No Sampling

Interactive UI

Cutting-Edge Algos

• Time is valuable• In-memory is faster• Intelligence as a service• High speed AND accuracy

• Scale to big data• Access data links• Use all data without sampling

• Online modeling with H2O Flow• Model comparison• R Python Web REST API

• Suite of cutting-edge algorithms• Deep Learning• NanoFast Scoring Engine• Move from model>production extremely fast

Page 5: DF1 - R - Roark - H2O Overview

H2O.aiMachine Intelligence

Ensembles

Deep Neural Networks

Algorithms on H2O

• Generalized Linear Models : Binomial, Gaussian, Gamma, Poisson and Tweedie

• Cox Proportional Hazards Models• Naïve Bayes • Distributed Random Forest : Classification

or regression models• Gradient Boosting Machine : Produces an

ensemble of decision trees with increasing refined approximations

• Deep learning : Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations

Supervised Learning

Statistical Analysis

Page 6: DF1 - R - Roark - H2O Overview

H2O.aiMachine Intelligence

Dimensionality Reduction

Anomaly Detection

Algorithms on H2O

• K-means : Partitions observations into k clusters/groups of the same spatial size

• Principal Component Analysis : Linearly transforms correlated variables to independent components

• Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning

Unsupervised Learning

Clustering

Page 7: DF1 - R - Roark - H2O Overview

H2O.aiMachine Intelligence

Accuracy with Speed and Scale

Page 8: DF1 - R - Roark - H2O Overview

H2O.aiMachine Intelligence

Accuracy with Speed and Scale

HDFS

S3

SQL

NoSQL

ClassificationRegression

Feature Engineering

In-Memory

Map Reduce/Fork Join

Columnar Compression

Deep Learning

PCA, GLM, Cox

Random Forest / GBM Ensembles

Fast Modeling Engine

Streaming

Nano Fast Java Scoring Engines

Matrix Factorization Clustering

Munging

Page 9: DF1 - R - Roark - H2O Overview

H2O.aiMachine Intelligence

HDFS=DATA

MLlib H2O SQL

H2ORDD

H2O – The Killer-App for Spark

In-Memory Big Data, Columnar

ML 100x faster Algos

R CRAN, API, fast engine

API Spark API, Java MM

Community Devs, Data Science

Page 10: DF1 - R - Roark - H2O Overview

H2O.aiMachine Intelligence

H2O Flow Interface

Page 11: DF1 - R - Roark - H2O Overview

H2O.aiMachine Intelligence

125 Meetups15,000 Attendees13,200+ Installations 2,300+ Corporations1st annual H2O World Conference

Adoption and Growth

weeks

Page 12: DF1 - R - Roark - H2O Overview

H2O.aiMachine Intelligence

H2O’s Install BaseML is the new SQL

103 634 2329

463 2,887 13,237

Companies

Users

Mar 2014 July 2014 Mar 2015

Open Source

135+ MeetupsWord-of-Mouth

Active Users

Page 13: DF1 - R - Roark - H2O Overview

H2O.aiMachine Intelligence

Hadoop + HDFS

YARN node manager

worker

+Mllib worker

YARN container

Spark executor

Scala main program

Sparkling Water cluster of size 3 on YARN

YARN node manager

worker

+Mllib worker

YARN container

Spark executor

YARN node manager

worker

+Mllib worker

YARN container

Spark executor

client

+Mllib client

Driver

Page 14: DF1 - R - Roark - H2O Overview

H2O.aiMachine Intelligence

JavaScript R Python Excel/Tableau

Network

Rapids Expression Evaluation Engine Scala Customer

Algorithm

Customer AlgorithmParse

GLMGBMRF

Deep LearningK-Means

PCA

In-H2O Prediction Engine

H2O Software Stack

Fluid Vector Frame

Distributed K/V Store

Non-blocking Hash Map

Job

MRTask

Fork/Join

Flow

Customer Algorithm

Spark Hadoop Standalone H2O

Page 15: DF1 - R - Roark - H2O Overview

H2O.aiMachine Intelligence

Reading Data from HDFS into H2O with R

STEP 1

R user

h2o_df = h2o.importFile(“hdfs://path/to/data.csv”)

Page 16: DF1 - R - Roark - H2O Overview

H2O.aiMachine Intelligence

Reading Data from HDFS into H2O with R

H2OH2O

H2O

data.csv

HTTP REST API request to H2Ohas HDFS path

H2O ClusterInitiate distributed

ingest

HDFSRequest

data from HDFS

STEP 22.2

2.3

2.4

R

h2o.importFile()

2.1R function

call

Page 17: DF1 - R - Roark - H2O Overview

H2O.aiMachine Intelligence

Reading Data from HDFS into H2O with R

H2OH2O

H2O

R

HDFS

STEP 3

Cluster IPCluster Port

Pointer to Data

Return pointer to data in REST

API JSON Response

HDFS provides data

3.3

3.43.1h2o_df object

created in R

data.csv

h2o_df H2OFrame

3.2Distributed

H2OFrame in DKV

H2O Cluster

Page 18: DF1 - R - Roark - H2O Overview

H2O.aiMachine Intelligence

R Script Starting H2O GLM

HTTP

REST/JSON

.h2o.startModelJob()POST /3/ModelBuilders/glm

h2o.glm()

R script

Standard R process

TCP/IP

HTTP

REST/JSON

/3/ModelBuilders/glm endpoint

Job

GLM algorithm

GLM tasks

Fork/Join framework

K/V store framework

H2O process

Network layer

REST layer

H2O - algos

H2O - core

User process

H2O process

Legend

Page 19: DF1 - R - Roark - H2O Overview

H2O.aiMachine Intelligence

R Script Retrieving H2O GLM Result

HTTP

REST/JSON

h2o.getModel()GET /3/Models/glm_model_id

h2o.glm()

R script

Standard R process

TCP/IP

HTTP

REST/JSON

/3/Models endpoint

Fork/Join framework

K/V store framework

H2O process

Network layer

REST layer

H2O - algos

H2O - core

User process

H2O process

Legend


Recommended