Date post: | 11-May-2015 |
Category: |
Technology |
Upload: | adatao-inc |
View: | 1,186 times |
Download: | 1 times |
Adatao Live Demo at the First Spark Summit Dec 2, 2013, San Francisco (Video at the end of this deck)
Christopher Nguyen, PhD Co-Founder & CEO
DATA INTELLIGENCE FOR ALL
Hadoop distributed/streaming analytics, Yahoo Hadoop Eng, UIUC PhD
Machine learning & machine vision, US Army Research Lab, Johns Hopkins PhD
Big-Data Compute Engines, Google Apps Engineering Director, Google Founders’ Award, HKUST Prof, 2 successful enterprise exits, Stanford PhD
Deep engineering & business experience from Google, Yahoo et al. PhD’s in DM & ML from UIUC, Georgia Tech, Stanford, Berkeley, ...
Powerful In-Memory Data Mining
Machine Learning Big Analytics Platform BIG
COMPUTE
(Hadoop HDFS, Cassandra, SQL DMBS, Streaming Data)BIG
DATA
Business Users Data Scientists Data Engineers
Visually Beautiful
Interactive DataExploration
Narrative Web App
BIG INSIGHTS
01100011
0110001
01100011
10001100
01100011
0110001
01100011
10001100
ONE Integrated Platform for Business & Data Science & Engineering
Architecture Design One Integrated Platform for Business & Data Science & EngineeringBusiness Users
01100011
0110001
01100011
10001100
01100011
0110001
01100011
10001100
Data Scientists Data Engineers
OTHERS
Business Users
stack for
business users
Data Scientists Data Engineers
VSstack for
data science
stack for
data eng
for Data Scientists & Engineers
Powerful In-Memory Data Mining & Machine Learning—Model Terabytes in Seconds
Interactive, Cluster-Scale Data Munging & Modeling with Native R, R-Studio, Python, SQL, and Java Front-ends
Real-Time Scoring Directly From Trained Models
Share reproducible, live data analysis documents
Hadoop, Cassandra, RDBMS, Streaming Data
01100011
0110001
01100011
10001100
01100011
0110001
01100011
10001100 Big Data Mining & Machine Learning
for Business Users
A Beautiful New Way to Create & Share Visual Narratives of Your Analysis !Perform Ad Hoc Queries in Plain English !Publish Streaming, Interactive Dashboards !Collaborate With Others In Real Time !Query Terabytes in Seconds.
Predictive Decision Making
CLIENT WORKER WORKER WORKERWORKERMASTER
Demo Deployment Diagram
Demo Config
Cluster: 8-node x 8-core x 30GB RAM x 1TB Disk
Data Sets: 12GB-100GB, 100M-1B rows
Airline Arrival Data, 1988-2008 from DoT
Algorithms- LM & supporting statistics (AIC, log-likelihood, R2, cross-validation) - Binning - Classification metrics: confusion matrix, ROC, AUC, F1 - Logistic Regression with Ref Level for Categorical Vars - k-Means- Random Forest - Naive Bayes- Linear SVM
Algorithm Roadmap
- Hierarchical Clustering - Text Mining (token, POS, LDA, …) - SVD- Markov Chain Models- Ensemble Models - …
Thank you!
See demo video at !
http://youtu.be/5UAdk7oHoPE?t=7m