Date post: | 23-Jan-2018 |
Category: |
Technology |
Upload: | avkash-chauhan |
View: | 146 times |
Download: | 0 times |
H2O Core Architecture &
Algori thms
Avkash [email protected]
@avkashchauhanhttps://www.linkedin.com/in/avkashchauhan
Please visit: http://www.h2o.ai/customers/# H2O Users List: http://www.h2o.ai/user-list/
H2O Platform(s)
In-Memory, Distributed Machine Learning Algorithms with H2O Flow GUI
H2O AI Open Source Engine Integration with Spark
DEEP WATER
Key features
• Open Source (Apache 2.0)
• All supported ML algorithms are coded by our engineers
• Designed for speed, scalability and for super large data-sets
• Same distribution for open source community & enterprise
• Very active production, every other week release
• Vibrant open source community
o https://community.h2o.ai
• Enterprise Support portal
o https://support.h2o.ai
• We have 70,000 users, 8,000 organizations and growing daily
Usage: Simple Solut ion
o Single Deployable compiled Java code (jar)
o Ready to use point and click FLOW Interface
o Connection from R and Python after specific packages are
installed
o Use Java, Scala natively and any other language through
RESTful API
o Deployable models - Binary & Java (POJO & MOJO)
o One click prediction/scoring engine
Usage: Complex Solut ion
o Multi-node Deployment
o Spark and Hadoop distributed environment
• Sparkling Water (Spark + H2O)
o Data ingested from various inputs
• S3, HDFS, NFS, JDBC, Object store etc.
• Streaming support in Spark (through Sparking Water)
o Distributed machine learning for every algorithm in platform
o Prediction service deployment on several machines
H2O Core
H2O Core
H2O
H2O Core
CPU
H2O Core
CPU
H2O Core
CPU
Model Building
H2O Core
H2O
H2O
H2O
H2O Core
CPU CPU CPU
H2O Core
CPU CPU CPU
Model Building
H2O Distributed In-Memory
H2O Core
YARN
CPU CPU CPU
H2O Core
YARN
CPU CPU CPU
Model Building
H2O Distributed In-Memory
H2O Core
YARN
CPU CPU CPU
Model Building
H2O Distributed In-Memory
SQL NFS
S3
H2O Core
YARN
CPU CPU CPU
Model Building
H2O Distributed In-Memory
SQL NFS
S3
Models
Binary
MOJO
POJO
H2O Cluster
H2O
H2O Clients
H2O Cluster
H2O
H2O
H2O Clients
H2O Cluster
REST/
JSON
LocalMachine
H2O
H2O
H2O Clients
H2O Cluster
REST/
JSON
LocalMachine
H2O
H2O
H2O Clients
H2O Cluster
REST/
JSON
LocalMachine
H2O
H2O Clients
JVM 1
JVM 2
JVM N
REST/
JSON
LocalMachine
H2O
H2O Clients
JVM 1
JVM 2
JVM N
H2O Cluster
REST/
JSON
LocalMachine
H2O
H2O Clients
JVM 1
JVM 2
JVM N
x1 x2 x3 xp y
H2O Cluster
REST/
JSON
LocalMachine
H2O
H2O Clients
JVM 1
JVM 2
JVM N
x1 x2 x3 xp y
H2O Cluster
REST/
JSON
LocalMachine
H2O
Current Algori thm Overview
Statistical Analysis
• Linear Models (GLM)
• Naïve Bayes
Ensembles
• Random Forest
• Distributed Trees
• Gradient Boosting Machine
• Stacking / Super Learner
Deep Neural Networks
• MLP
• Autoencoder
• Anomaly Detection
• Deep Features
• CNN, RNN (Deep Water)
Clustering
• K-Means (Auto-K)
Dimension Reduction
• Principal Component Analysis
• Generalized Low Rank Models
Word Embedding
• Word2Vec
Time Series
• iSAX
Machine Learning Tuning
• Hyperparameter Search
• Early Stopping
H2O Flow
H2O R Interface
H2O Python Interface
Deployment Code
YARN
CPU CPU CPU
Model Building
H2O Distributed In-Memory
SQL NFS
S3
Models
Deployment Code: Plain Old Java Object (POJO)
POJO
Current Algori thm Overview
Statistical Analysis
• Linear Models (GLM)
• Naïve Bayes
Ensembles
• Random Forest
• Distributed Trees
• Gradient Boosting Machine
• R Package - Stacking / Super
Learner
Deep Neural Networks
• Multi-layer Feed-Forward Neural
Network
• Auto-encoder
• Anomaly Detection
Clustering• K-Means
Dimension Reduction
• Principal Component Analysis
• Generalized Low Rank Models
Solvers & Optimization
• Generalized ADMM Solver
• L-BFGS (Quasi Newton Method)
• Ordinary Least-Square Solver
• Stochastic Gradient Descent
Data Munging
• Scalable Data Frames
• Sort, Slice, Log Transform
• Data.table (1B rows groupBy record)
Text Processing
• Word2Vec
What is new – Driverless AI
• https://techcrunch.com/2017/07/06/h2o-ais-driverless-ai-automates-machine-learning-for-businesses/
Helpful resources
• Docs
o http://docs.h2o.ai/h2o/latest-stable/index.html
• H2O User Guide
o http://docs.h2o.ai/h2o/latest-stable/h2o-docs/index.html
• Source Code
o https://github.com/h2oai/
o https://github.com/h2oai/h2o-3
o https://github.com/h2oai/sparkling-water
o https://github.com/h2oai/deepwater
• Meetup content
o https://github.com/h2oai/h2o-meetups
• Tutorials
o https://github.com/h2oai/h2o-tutorials
Thank you so much!!
ありがとうございました