Date post: | 26-Jan-2015 |
Category: |
Technology |
Upload: | lucboudreau |
View: | 107 times |
Download: | 1 times |
F**** around with Big Data and Predictive Analytics
Featuring Kettle, Weka & Hadoop.
© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Pentahuh?
2© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
What’s Pentaho exactly?
CENTRAL ADMINISTRATION, AUDITING & MONITORING
DELIVER When & WhereUsers Need It
STREAMLINE Information Delivery
VISUALIZE& Report Information In Any Style
ACCESSAll Enterprise Data Sources
ISV & Packaged Applications
SaaS / Cloud Applications
EMBEDDED
Web
Mobile
STANDALONE
‣ Advanced & Predictive Analytics
DATA MINING
‣ Interactive
‣ Operational
‣ Enterprise
REPORTING
‣ Ad hoc Exploration
‣ Multi-Dimensional
ANALYSIS
‣ Interactive Metrics
‣ Rich Visualizations
DASHBOARDS
ERP / CRM / Enterprise Apps (e.g. SAP, Oracle)
Hadoop & NoSQL Data
Unstructured & semi-structured (XML, Excel, Files, etc.)
Relational Data Sources
Cloud(e.g. Salesforce, Amazon, Dell)
‣Data Integration
‣ Graphical ETL Designer
INTEGRATE, CLEANSE, & ENRICH DATA
‣ In Memory Caching
‣ High Performance
ANALYTICS ACCELERATOR
‣ Direct Access
‣ Hadoop Clustering/ Scheduling
‣ Instant OLAP Cubes
‣ Enterprise Scalability
We do open source analytics.
4© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Why does Pentaho claim to have anything to do with Big Data??
5© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Project Kettle powerful Extraction, Transformation and Loading (ETL) capabilities
using an innovative, metadata-driven approach
6© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Bring the code to the data
7© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
JDBC
Bring the code to the data
8© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
JDBCKettle
KettleKettle
Bring the code to the data
9© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Kettle
Project Weka a comprehensive set of tools for machine learning and data mining
10© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
11© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
12© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Bring Weka to the data
13© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Kettle
Kettle
JDBCKettle
Kettle
Bring Weka to the data
14© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
JDBC Services for Kettleruntime optimization and SQL pushdown
15© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
A smart(er) JDBC Layer
16© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Kettle
Kettle
Kettle
Kettle JDBC
SELECT CUSTOMER_ID, SUM(UNIT_SALES)
FROM SALES_FACT
WHERE AGE_GROUP_ID > 3
GROUP BY CUSTOMER_ID;
SELECT CUSTOMER_ID
FROM SALES_FACT;
SELECT CUSTOMER_ID, SUM(UNIT_SALES)
FROM SALES_FACT
WHERE AGE_GROUP_ID > 3
GROUP BY CUSTOMER_ID;
A smart(er) JDBC Layer
17© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
Kettle
Kettle
Kettle
Kettle Kettle JDBC
Kettle
Kettle
The gains
18© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
• Job design and
administration becomes
trivial.
• Runs the rich Kettle plugin
environment directly on the
nodes.
• Performs much better than
Hive.
• The JDBC layer is pretty
neat.
The caveats
19© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555
• True parallel machine
learning algorithms are rare
and hard to design.
• Not an actual
production-ready design.
• Clients might have caches,
which must be notified by
the BD store for updates.
© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755520
Demo!
© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755521
Thank you!
Join the conversation. You can find us on:
blog.pentaho.com
@Pentaho
Facebook.com/Pentaho
Pentaho Business Analytics
© 2012, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-755522
Want to learn more?
Learning Linear Models in Hadoop with Wekahttp://markahall.blogspot.ca/2013/03/learning-linear-models-in-hadoop-with.html
Introduction to MapReduce with Pentaho Data Integrationhttp://www.youtube.com/watch?v=KZe1UugxXcs`