Date post: | 21-Apr-2017 |
Category: |
Data & Analytics |
Upload: | jo-fai-chow |
View: | 2,111 times |
Download: | 0 times |
H2O.ai Machine Intelligence
H2O: A Platform for Big Math
Arno Candel, PhDChief Architect
or: How to make A.I. and TensorFlow work for you
+20 more
H2O.ai Machine Intelligence
Who Am I?
Arno Candel Chief Architect, Physicist & Hacker at H2O.ai
PhD Physics, ETH Zurich 2005 10+ yrs Supercomputing (HPC) 6 yrs at SLAC (Stanford Linear Accelerator) 4.5 yrs Machine Learning 2.5 yrs at H2O.ai
Fortune Magazine Big Data All Star
Follow me @ArnoCandel
2
WhoamI?
H2O.ai Machine Intelligence 3
Overview
Machine Learning (ML)
Artificial Intelligence (A.I.)
Computer Science (CS)
H2O.ai
Deep Learning (DL)hot hot hot hot hot
H2O.ai Machine Intelligence 4
ASimpleDeepLearningModel:ArtificialNeuralNetwork
heartbeat
blood pressure
oxygensend to regular care
send to intensivecare unit (ICU)
IN: data OUT: prediction
nodes : neuron activations (real numbers) — represent features arrows : connecting weights (real numbers) — learned during training
: non-linearity x -> f(x) — adds model complexity
from 1970s, now rebranded as DL
H2O.ai Machine Intelligence 5
BriefHistoryofA.I.,MLandDL
John McCarthyPrinceton, Bell Labs, Dartmouth, later: MIT, Stanford
1955: “A proposal for the Dartmouth summer research project on Artificial Intelligence”
with Marvin Minsky (MIT), Claude Shannon (Bell Labs) and Nathaniel Rochester (IBM)
http://www.asiapacific-mathnews.com/04/0403/0015_0020.pdf
A step back: A.I. was coined over 60 years ago
H2O.ai Machine Intelligence 6
Step1:GreatAlgorithms+FastComputers
http://nautil.us/issue/18/genius/why-the-chess-computer-deep-blue-played-like-a-human
1997: Playing Chess (IBM Deep Blue beats Kasparov)
ComputerScience 30customCPUs,60billionmovesin3mins
“No computer will ever beat me at playing chess.”
H2O.ai Machine Intelligence 7
Step2:MoreData+Real-TimeProcessing
http://cs.stanford.edu/group/roadrunner/old/presskit.html
2005: Self-driving CarsDARPA Grand Challenge, 132 miles (won by Stanford A.I. lab*)
Sensors&ComputerSciencevideo,radar,laser,GPS,7Pentiumcomputers
“No computer will ever drive a car!?”
*A.I. lab was established by McCarthy et al. in the early 60s
H2O.ai Machine Intelligence 8
Step3:BigData+In-MemoryClusters
2011: Jeopardy (IBM Watson)
In-MemoryAnalytics/ML4TBofdata(incl.wikipedia),90servers,16TBRAM,Hadoop,6millionlogicrules
https://www.youtube.com/watch?v=P18EdAKuC1U https://en.wikipedia.org/wiki/Watson_(computer)
Note: IBM Watson received the question in electronic written form, and was often able to press the answer button faster than the competing humans.
“No computer will ever answer random questions!?”
H2O.ai Machine Intelligence 9
“No computer will ever speak any language!?”
2014: Google(acquired Quest Visual)
DeepLearning ConvolutionalandRecurrent
NeuralNetworks,withtrainingdatafromusers
Step4:DeepLearning
• Translate between 103 languages by typing • Instant camera translation: Use your camera to translate text instantly in 29 languages • Camera Mode: Take pictures of text for higher-quality translations in 37 languages • Conversation Mode: Two-way instant speech translation in 32 languages • Handwriting: Draw characters instead of using the keyboard in 93 languages
H2O.ai Machine Intelligence 10
Step5:AugmentedDeepLearning
2014: Atari Games (DeepMind)
2016: AlphaGo (Google DeepMind)
DeepLearning +reinforcementlearning,treesearch,
MonteCarlo,GPUs,playingagainstitself,…
https://deepmind.com
Go board has approx. 200,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 (2E170) possible positions.
trainedfromrawpixelvalues,nohumanrules
“No computer will ever beat the best Go master!?”
H2O.ai Machine Intelligence 11
Microsoft had won the Visual Recognition challenge: http://image-net.org/challenges/LSVRC/2015/
Step6:A.I.ChatbotshaveOpinionstoo!
H2O.ai Machine Intelligence 12
WhatWillChange?
Today Tomorrow
BetterData—BetterModels—BetterResults
Example:Fraud
Prediction
H2O.ai Machine Intelligence 13
H2O.ai-MakersofH2O
H2O-AIforBusinessTransformation• ScalableandDistributedDataScienceandMachineLearning:DeepLearning,GradientBoosting,RandomForest,DecisionTrees, LogisticRegression,GeneralizedLinearModeling,K-Means,PCA,GLRM,…
• Fast,accurate,robust,proven,fullyfeatured• Apachev2opensource(github.com/h2oai)
EasytoUseandDeploy• h2o.ai/downloadandrunanywhere,immediately• ClientAPIs:R,Python,Java,Scala,REST,FlowGUI• Spark(cf.SparklingWater),Hadoop,Standalone• Javascoringcodeauto-generated
H2O.ai Machine Intelligence 14
H2O.ai-GrowingRapidly
+ 10 more recent hires
author data.table
r2d3.us
ceo
grammar of graphics
pure software
kaggle master
many many talents at H2O…
found Pentium bug
POSIX
h2o.ai/careers
H2O.ai Machine Intelligence 15
HighLevelArchitectureofH2O
HDFS
S3
NFS
DistributedIn-Memory
ParallelParser
LosslessCompression
H2OComputeEngine
ProductionScoringEnvironment
Exploratory&DescriptiveAnalysis
FeatureEngineering&
Selection
Supervised&UnsupervisedModeling
ModelEvaluation&Selection
Predict
Data&ModelStorage
ModelExport:PlainOldJavaObject
YourImagination
DataPrepExport:PlainOldJavaObject
Local
SQL
LDAP Kerberos SSL HTTPS
HTTP
H2O.ai Machine Intelligence
NativeAPIs:Java,Scala—RESTAPIs:R,Python,Flow,JavaScript,Java
16
library(h2o)h2o.init()h2o.deeplearning(x=1:4,y=5,as.h2o(iris))
importh2o
fromh2o.estimators.deeplearningimportH2ODeepLearningEstimator
h2o.init()
dl=H2ODeepLearningEstimator()
dl.train(x=list(range(1,4)),y="Species",training_frame=iris.hex)
import_root_.hex.deeplearning.DeepLearningimport_root_.hex.deeplearning.DeepLearningParametersvaldlParams=newDeepLearningParameters()dlParams._train=iris.hexdlParams._response_column=‘Speciesvaldl=newDeepLearning(dlParams)valdlModel=dl.trainModel.get
Allheavyliftingisdonebythebackend!
Built-ininteractiveGUIandnotebook-nocodingnecessary!
H2O.ai Machine Intelligence 17
Gradient Boosting MachineTree Model (nano-fast)
Auto-generated Java scoring code
to easily Operationalize Data Science
EasilyBringModelsintoProduction
READ MORE
H2O.ai Machine Intelligence
Spark+H2O=SparklingWater
18
• Spark2.0APIcompatibility• UseH2Oalgorithmsinconjunctionwith,orinsteadof,MLLibalgorithmsonSpark• BuildEnsemblesusingH2OandMLLibAlgorithms• VisualIntelligenceforSpark.RunSpark,MLLib,ScalainFlow• ExportMLLibmodelsasPOJOs• ToolchainforMLpipelinesanddebuggingsupport
Spark
lingW
ater
2.0
H2O.ai Machine Intelligence 19
LiveH2ODeepLearningDemo:PredictAirplaneDelays
10 nodes: all 320 cores busy
real-time, interactive model inspection in Flow
116M rows, 6GB CSV file 800+ predictors (numeric + categorical)
model trained in <1 min: 2M+ samples/second
Deep Learning Model
H2O.ai Machine Intelligence 20
H2O Elastic Net (GLM): 10 secs alpha=0.5, lambda=1.379e-4 (auto)
H2O Deep Learning: 45 secs 4 hidden ReLU layers of 20 neurons, 1 epoch
Features have non-linear impact
Chicago, Atlanta, Dallas: often delayed
SignificantPerformanceGainswithDeepLearning
Predict departure delay (Y/N) on 20 years of airline flight data (116M rows, 12 cols, categorical + numerical data with missing values)
WATCH NOW
AUC: 0.656
AUC: 0.703 (higher is better, ranges from 0.5 to 1)
Feature importances
10 nodes: Dual E5-2650 (8 cores, 2.6GHz), 10GbE
21
• Datamatrixischunkedintocolumnarblocks
• Algorithmscanparallelizeovertheseblocks
• ScalabletomanyTBs:Eachnodefillsitsmemorywithdata
• Columnsareseparateentities(fastadd/remove/modify)
• SimilartodataframesinR,Pandas,andnowalsoSpark
Distributed In-Memory Data Frames
So How Does It Work?
22
pcols
N/6rows
N/6rows
N/6rows
N/6rows
N/6rows
N/6rows
massively
pcols
Nrows parallel
Parallel Parse into Distributed Rows
HDFS,S3,NFS,SQL,…
parser
23
map()map() reduce()
reduce()reduce()map()
map()
map()map() reduce()
reduce()reduce()map()
map()
map()map() reduce()
reduce()reduce()map()
map()
map()map() reduce()
reduce()reduce()map()
map()
map()map() reduce()
reduce()reduce()map()
map()
map()map() reduce()
reduce()reduce()map()
map()
reduce()
reduce()
reduce()
reduce()
reduce()driver AlgocallsM/RTask
DataParallelism-allCPUcoresareatwork
Compute Paradigm: Fine-Grain Map/Reduce
finalresult
map():processdata,reduce():aggregateresults
24
• Distributedin-memorydatastoreholdsdata,models,etc.
• Columnarcompression(oftenbetterthangzipondisk)
• Low-levelJavacode(byte[],float[],bitoperations,etc.)
• Dataread/writeaccessatmemorybandwidthspeeds
• Customserialization,networkingandexecutionlayer
• Auto-generatedRESTclient-serverAPI(R,Python,Flow,…)
• Standalonescoringcodeauto-generatedforeverymodel
Implementation Details
25
Distributed Gradient Boosting Machine
findoptimalsplit
(feature&value)
• H2O:Firstopen-sourceimplementationofscalable,distributedGradientBoostingMachine-fullyfeatured
• ParallelizedIndividualTreeConstruction
• Discretization(binning)forspeedupwithoutlossofaccuracy
age<25?
Y N
alldata
age
12 118
income
1k 1M
Analyticalerrorlandscape
bestsplit:age25
H2O:discretizedintobins
12 118
age25
age
26
map()map()map()map()
map()map()map()map()
map()map()map()map()
map()map()map()map()
map()map()map()map()
map()map()map()map()
driver Algocallshistogrammethod
dataparallelism-globalhistogramcomputation
Scalable Distributed Histogram Calculation
globalhistogram
localhistogram-oneeachforw,w*yandw*y2
w:observationweightsy:response
Sameresultsasifcomputedonasinglecomputenode
H2O.aiMachine Intelligence
UserBasedInsurance
WATCH NOW
WATCH NOW
“H2O is an enabler in how people are thinking about data.”
“We have many plans to use H2O across the different business units.”
28
Today’s Keynote!
H2O.aiMachine Intelligence
DigitalMarketing-Campaigns
“H2O gave us the capability to do Big Modeling. There is no limit to scaling in H2O.”
“Working with the H2O team has been amazing.”
“The business value that we have gained from advanced analytics is enormous.”
WATCH NOW
WATCH NOW
29
H2O.aiMachine Intelligence
WATCH NOW
WATCH NOW
MatchingTVWatchingBehaviorwithBuyingBehavior
“Unlike other systems where I had to buy the whole package and just use 10-20%, I can customize H2O to suit my needs.”
“I am a big fan of open source. H2O is the best fit in terms of cost as well as ease of use and scalability and usability.”
30
H2O.aiMachine Intelligence
WATCH NOW
WATCH NOW
Insurance-RiskAssessment
“Predictive analytics is the differentiator for insurance companies going forward in the next couple of decades.”
“Advanced analytics was one of the key investments that we decided to make.”
31
H2O.aiMachine Intelligence
Fintech-Fraud/Risk/Churn/etc.
“H2O is a great solution because it's designed to be enterprise ready and can operate on very large datasets.”
”H2O has been a one-stop shop that helps us do all our modeling in one framework.”
”H2O is the best solution to be able to iterate very quickly on large datasets and produce meaningful models.”
WATCH NOW
WATCH NOW
32
Today’s Keynote!
H2O.ai Machine Intelligence 33
H2OBooklets
DOWNLOAD
Come get your booklets at our booth!
R Python Deep Learning GLMGBMSparkling Water
H2O.ai Machine Intelligence 34
DataScientistsLoveThisStuff
H2O GBM Model Tuning Tutorial for R/Python/Flowhttps://github.com/h2oai/h2o-3/blob/master/h2o-docs/src/product/tutorials/gbm/
H2O.ai Machine Intelligence 35
KDNuggetsPollaboutDeepLearningTools&Platforms
http://www.kdnuggets.com
H2OandTensorFlowaretied
usageofDeepLearningtoolsinpastyear
H2O.ai Machine Intelligence 36
TensorFlow+H2O+ApacheSpark=Anythingispossible
Inprogress:IntegrationwithGPUDLtools(TensorFlow/Caffe/mxnet/etc.)
https://github.com/h2oai/sparkling-water/blob/master/py/examples/notebooks/TensorFlowDeepLearning.ipynb https://www.youtube.com/watch?v=62TFK641gG8
H2O
ALGORITHM
S
EXPERIEN
CE
DATA
VERTICALS
• H2OFlowSingleweb-basedDocumentforcodeexecution,text,mathematics,plotsandrichmedia
• VisualIntelligenceUXandInterpretabilityforAI
• SteamElasticML&AutoML
OperationalizeDataScience
H2O.aiNowFocusedOnExperienceBeyondAlgorithmsandData
37
DATAPRODUCTS
H2O.ai Machine Intelligence 38
Steam-AutomatedPlatformtoBuildandScaleSmartDataProducts
DevOps/DataEngineers
DataScientists
AdvancedDataScientists
SoftwareEngineers
ApplicationSoftwareEngineers
DATABUSINESSINSIGHTS
AI–MachineLearning Automation Scalability Visualization
ComingSoon
H2O.ai Machine Intelligence 39
H2O OPEN TOUR w w w. O P E N . H 2 O . A I
We’recomingtoatownnearyouinNYC/TX
VisitourBoothToday!
H2O.ai Machine Intelligence
A.I. and Deep Learning are hot (again)! Make your own smart data products with H2O!
Try H2O today - installs in minutes!
40
h2o.ai/download https://www.youtube.com/user/0xdata/videos
https://github.com/h2oai/h2o-3 H2O Google Group
@h2oai
Summary
We’re hiring: h2o.ai/careers/