+ All Categories
Home > Data & Analytics > Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016

Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016

Date post: 21-Apr-2017
Category:
Upload: jo-fai-chow
View: 2,111 times
Download: 0 times
Share this document with a friend
40
H 2 O.ai Machine Intelligence H2O: A Platform for Big Math Arno Candel, PhD Chief Architect or: How to make A.I. and TensorFlow work for you +20 more
Transcript

H2O.ai Machine Intelligence

H2O: A Platform for Big Math

Arno Candel, PhDChief Architect

or: How to make A.I. and TensorFlow work for you

+20 more

H2O.ai Machine Intelligence

Who Am I?

Arno Candel Chief Architect, Physicist & Hacker at H2O.ai

PhD Physics, ETH Zurich 2005 10+ yrs Supercomputing (HPC) 6 yrs at SLAC (Stanford Linear Accelerator) 4.5 yrs Machine Learning 2.5 yrs at H2O.ai

Fortune Magazine Big Data All Star

Follow me @ArnoCandel

2

WhoamI?

H2O.ai Machine Intelligence 3

Overview

Machine Learning (ML)

Artificial Intelligence (A.I.)

Computer Science (CS)

H2O.ai

Deep Learning (DL)hot hot hot hot hot

H2O.ai Machine Intelligence 4

ASimpleDeepLearningModel:ArtificialNeuralNetwork

heartbeat

blood pressure

oxygensend to regular care

send to intensivecare unit (ICU)

IN: data OUT: prediction

nodes : neuron activations (real numbers) — represent features arrows : connecting weights (real numbers) — learned during training

: non-linearity x -> f(x) — adds model complexity

from 1970s, now rebranded as DL

H2O.ai Machine Intelligence 5

BriefHistoryofA.I.,MLandDL

John McCarthyPrinceton, Bell Labs, Dartmouth, later: MIT, Stanford

1955: “A proposal for the Dartmouth summer research project on Artificial Intelligence”

with Marvin Minsky (MIT), Claude Shannon (Bell Labs) and Nathaniel Rochester (IBM)

http://www.asiapacific-mathnews.com/04/0403/0015_0020.pdf

A step back: A.I. was coined over 60 years ago

H2O.ai Machine Intelligence 6

Step1:GreatAlgorithms+FastComputers

http://nautil.us/issue/18/genius/why-the-chess-computer-deep-blue-played-like-a-human

1997: Playing Chess (IBM Deep Blue beats Kasparov)

ComputerScience 30customCPUs,60billionmovesin3mins

“No computer will ever beat me at playing chess.”

H2O.ai Machine Intelligence 7

Step2:MoreData+Real-TimeProcessing

http://cs.stanford.edu/group/roadrunner/old/presskit.html

2005: Self-driving CarsDARPA Grand Challenge, 132 miles (won by Stanford A.I. lab*)

Sensors&ComputerSciencevideo,radar,laser,GPS,7Pentiumcomputers

“No computer will ever drive a car!?”

*A.I. lab was established by McCarthy et al. in the early 60s

H2O.ai Machine Intelligence 8

Step3:BigData+In-MemoryClusters

2011: Jeopardy (IBM Watson)

In-MemoryAnalytics/ML4TBofdata(incl.wikipedia),90servers,16TBRAM,Hadoop,6millionlogicrules

https://www.youtube.com/watch?v=P18EdAKuC1U https://en.wikipedia.org/wiki/Watson_(computer)

Note: IBM Watson received the question in electronic written form, and was often able to press the answer button faster than the competing humans.

“No computer will ever answer random questions!?”

H2O.ai Machine Intelligence 9

“No computer will ever speak any language!?”

2014: Google(acquired Quest Visual)

DeepLearning ConvolutionalandRecurrent

NeuralNetworks,withtrainingdatafromusers

Step4:DeepLearning

• Translate between 103 languages by typing • Instant camera translation: Use your camera to translate text instantly in 29 languages • Camera Mode: Take pictures of text for higher-quality translations in 37 languages • Conversation Mode: Two-way instant speech translation in 32 languages • Handwriting: Draw characters instead of using the keyboard in 93 languages

H2O.ai Machine Intelligence 10

Step5:AugmentedDeepLearning

2014: Atari Games (DeepMind)

2016: AlphaGo (Google DeepMind)

DeepLearning +reinforcementlearning,treesearch,

MonteCarlo,GPUs,playingagainstitself,…

https://deepmind.com

Go board has approx. 200,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 (2E170) possible positions.

trainedfromrawpixelvalues,nohumanrules

“No computer will ever beat the best Go master!?”

H2O.ai Machine Intelligence 11

Microsoft had won the Visual Recognition challenge: http://image-net.org/challenges/LSVRC/2015/

Step6:A.I.ChatbotshaveOpinionstoo!

H2O.ai Machine Intelligence 12

WhatWillChange?

Today Tomorrow

BetterData—BetterModels—BetterResults

Example:Fraud

Prediction

H2O.ai Machine Intelligence 13

H2O.ai-MakersofH2O

H2O-AIforBusinessTransformation• ScalableandDistributedDataScienceandMachineLearning:DeepLearning,GradientBoosting,RandomForest,DecisionTrees, LogisticRegression,GeneralizedLinearModeling,K-Means,PCA,GLRM,…

• Fast,accurate,robust,proven,fullyfeatured• Apachev2opensource(github.com/h2oai)

EasytoUseandDeploy• h2o.ai/downloadandrunanywhere,immediately• ClientAPIs:R,Python,Java,Scala,REST,FlowGUI• Spark(cf.SparklingWater),Hadoop,Standalone• Javascoringcodeauto-generated

H2O.ai Machine Intelligence 14

H2O.ai-GrowingRapidly

+ 10 more recent hires

author data.table

r2d3.us

ceo

grammar of graphics

pure software

kaggle master

many many talents at H2O…

found Pentium bug

POSIX

h2o.ai/careers

H2O.ai Machine Intelligence 15

HighLevelArchitectureofH2O

HDFS

S3

NFS

DistributedIn-Memory

ParallelParser

LosslessCompression

H2OComputeEngine

ProductionScoringEnvironment

Exploratory&DescriptiveAnalysis

FeatureEngineering&

Selection

Supervised&UnsupervisedModeling

ModelEvaluation&Selection

Predict

Data&ModelStorage

ModelExport:PlainOldJavaObject

YourImagination

DataPrepExport:PlainOldJavaObject

Local

SQL

LDAP Kerberos SSL HTTPS

HTTP

H2O.ai Machine Intelligence

NativeAPIs:Java,Scala—RESTAPIs:R,Python,Flow,JavaScript,Java

16

library(h2o)h2o.init()h2o.deeplearning(x=1:4,y=5,as.h2o(iris))

importh2o

fromh2o.estimators.deeplearningimportH2ODeepLearningEstimator

h2o.init()

dl=H2ODeepLearningEstimator()

dl.train(x=list(range(1,4)),y="Species",training_frame=iris.hex)

import_root_.hex.deeplearning.DeepLearningimport_root_.hex.deeplearning.DeepLearningParametersvaldlParams=newDeepLearningParameters()dlParams._train=iris.hexdlParams._response_column=‘Speciesvaldl=newDeepLearning(dlParams)valdlModel=dl.trainModel.get

Allheavyliftingisdonebythebackend!

Built-ininteractiveGUIandnotebook-nocodingnecessary!

H2O.ai Machine Intelligence 17

Gradient Boosting MachineTree Model (nano-fast)

Auto-generated Java scoring code

to easily Operationalize Data Science

EasilyBringModelsintoProduction

READ MORE

H2O.ai Machine Intelligence

Spark+H2O=SparklingWater

18

• Spark2.0APIcompatibility• UseH2Oalgorithmsinconjunctionwith,orinsteadof,MLLibalgorithmsonSpark• BuildEnsemblesusingH2OandMLLibAlgorithms• VisualIntelligenceforSpark.RunSpark,MLLib,ScalainFlow• ExportMLLibmodelsasPOJOs• ToolchainforMLpipelinesanddebuggingsupport

Spark

lingW

ater

2.0

H2O.ai Machine Intelligence 19

LiveH2ODeepLearningDemo:PredictAirplaneDelays

10 nodes: all 320 cores busy

real-time, interactive model inspection in Flow

116M rows, 6GB CSV file 800+ predictors (numeric + categorical)

model trained in <1 min: 2M+ samples/second

Deep Learning Model

H2O.ai Machine Intelligence 20

H2O Elastic Net (GLM): 10 secs alpha=0.5, lambda=1.379e-4 (auto)

H2O Deep Learning: 45 secs 4 hidden ReLU layers of 20 neurons, 1 epoch

Features have non-linear impact

Chicago, Atlanta, Dallas: often delayed

SignificantPerformanceGainswithDeepLearning

Predict departure delay (Y/N) on 20 years of airline flight data (116M rows, 12 cols, categorical + numerical data with missing values)

WATCH NOW

AUC: 0.656

AUC: 0.703 (higher is better, ranges from 0.5 to 1)

Feature importances

10 nodes: Dual E5-2650 (8 cores, 2.6GHz), 10GbE

21

• Datamatrixischunkedintocolumnarblocks

• Algorithmscanparallelizeovertheseblocks

• ScalabletomanyTBs:Eachnodefillsitsmemorywithdata

• Columnsareseparateentities(fastadd/remove/modify)

• SimilartodataframesinR,Pandas,andnowalsoSpark

Distributed In-Memory Data Frames

So How Does It Work?

22

pcols

N/6rows

N/6rows

N/6rows

N/6rows

N/6rows

N/6rows

massively

pcols

Nrows parallel

Parallel Parse into Distributed Rows

HDFS,S3,NFS,SQL,…

parser

23

map()map() reduce()

reduce()reduce()map()

map()

map()map() reduce()

reduce()reduce()map()

map()

map()map() reduce()

reduce()reduce()map()

map()

map()map() reduce()

reduce()reduce()map()

map()

map()map() reduce()

reduce()reduce()map()

map()

map()map() reduce()

reduce()reduce()map()

map()

reduce()

reduce()

reduce()

reduce()

reduce()driver AlgocallsM/RTask

DataParallelism-allCPUcoresareatwork

Compute Paradigm: Fine-Grain Map/Reduce

finalresult

map():processdata,reduce():aggregateresults

24

• Distributedin-memorydatastoreholdsdata,models,etc.

• Columnarcompression(oftenbetterthangzipondisk)

• Low-levelJavacode(byte[],float[],bitoperations,etc.)

• Dataread/writeaccessatmemorybandwidthspeeds

• Customserialization,networkingandexecutionlayer

• Auto-generatedRESTclient-serverAPI(R,Python,Flow,…)

• Standalonescoringcodeauto-generatedforeverymodel

Implementation Details

25

Distributed Gradient Boosting Machine

findoptimalsplit

(feature&value)

• H2O:Firstopen-sourceimplementationofscalable,distributedGradientBoostingMachine-fullyfeatured

• ParallelizedIndividualTreeConstruction

• Discretization(binning)forspeedupwithoutlossofaccuracy

age<25?

Y N

alldata

age

12 118

income

1k 1M

Analyticalerrorlandscape

bestsplit:age25

H2O:discretizedintobins

12 118

age25

age

26

map()map()map()map()

map()map()map()map()

map()map()map()map()

map()map()map()map()

map()map()map()map()

map()map()map()map()

driver Algocallshistogrammethod

dataparallelism-globalhistogramcomputation

Scalable Distributed Histogram Calculation

globalhistogram

localhistogram-oneeachforw,w*yandw*y2

w:observationweightsy:response

Sameresultsasifcomputedonasinglecomputenode

Over7000enterprisesuseH2O

Financial Insurance MarketingTelecom Healthcare

27

H2O.aiMachine Intelligence

UserBasedInsurance

WATCH NOW

WATCH NOW

“H2O is an enabler in how people are thinking about data.”

“We have many plans to use H2O across the different business units.”

28

Today’s Keynote!

H2O.aiMachine Intelligence

DigitalMarketing-Campaigns

“H2O gave us the capability to do Big Modeling. There is no limit to scaling in H2O.”

“Working with the H2O team has been amazing.”

“The business value that we have gained from advanced analytics is enormous.”

WATCH NOW

WATCH NOW

29

H2O.aiMachine Intelligence

WATCH NOW

WATCH NOW

MatchingTVWatchingBehaviorwithBuyingBehavior

“Unlike other systems where I had to buy the whole package and just use 10-20%, I can customize H2O to suit my needs.”

“I am a big fan of open source. H2O is the best fit in terms of cost as well as ease of use and scalability and usability.”

30

H2O.aiMachine Intelligence

WATCH NOW

WATCH NOW

Insurance-RiskAssessment

“Predictive analytics is the differentiator for insurance companies going forward in the next couple of decades.”

“Advanced analytics was one of the key investments that we decided to make.”

31

H2O.aiMachine Intelligence

Fintech-Fraud/Risk/Churn/etc.

“H2O is a great solution because it's designed to be enterprise ready and can operate on very large datasets.”

”H2O has been a one-stop shop that helps us do all our modeling in one framework.”

”H2O is the best solution to be able to iterate very quickly on large datasets and produce meaningful models.”

WATCH NOW

WATCH NOW

32

Today’s Keynote!

H2O.ai Machine Intelligence 33

H2OBooklets

DOWNLOAD

Come get your booklets at our booth!

R Python Deep Learning GLMGBMSparkling Water

H2O.ai Machine Intelligence 34

DataScientistsLoveThisStuff

H2O GBM Model Tuning Tutorial for R/Python/Flowhttps://github.com/h2oai/h2o-3/blob/master/h2o-docs/src/product/tutorials/gbm/

H2O.ai Machine Intelligence 35

KDNuggetsPollaboutDeepLearningTools&Platforms

http://www.kdnuggets.com

H2OandTensorFlowaretied

usageofDeepLearningtoolsinpastyear

H2O.ai Machine Intelligence 36

TensorFlow+H2O+ApacheSpark=Anythingispossible

Inprogress:IntegrationwithGPUDLtools(TensorFlow/Caffe/mxnet/etc.)

https://github.com/h2oai/sparkling-water/blob/master/py/examples/notebooks/TensorFlowDeepLearning.ipynb https://www.youtube.com/watch?v=62TFK641gG8

H2O

ALGORITHM

S

EXPERIEN

CE

DATA

VERTICALS

• H2OFlowSingleweb-basedDocumentforcodeexecution,text,mathematics,plotsandrichmedia

• VisualIntelligenceUXandInterpretabilityforAI

• SteamElasticML&AutoML

OperationalizeDataScience

H2O.aiNowFocusedOnExperienceBeyondAlgorithmsandData

37

DATAPRODUCTS

H2O.ai Machine Intelligence 38

Steam-AutomatedPlatformtoBuildandScaleSmartDataProducts

DevOps/DataEngineers

DataScientists

AdvancedDataScientists

SoftwareEngineers

ApplicationSoftwareEngineers

DATABUSINESSINSIGHTS

AI–MachineLearning Automation Scalability Visualization

ComingSoon

H2O.ai Machine Intelligence 39

H2O OPEN TOUR w w w. O P E N . H 2 O . A I

We’recomingtoatownnearyouinNYC/TX

VisitourBoothToday!

H2O.ai Machine Intelligence

A.I. and Deep Learning are hot (again)! Make your own smart data products with H2O!

Try H2O today - installs in minutes!

40

h2o.ai/download https://www.youtube.com/user/0xdata/videos

https://github.com/h2oai/h2o-3 H2O Google Group

@h2oai

Summary

We’re hiring: h2o.ai/careers/


Recommended