+ All Categories
Home > Data & Analytics > Big Data with KNIME is as easy as 1, 2, 3, ...4!

Big Data with KNIME is as easy as 1, 2, 3, ...4!

Date post: 09-Jan-2017
Category:
Upload: knimeslides
View: 1,518 times
Download: 1 times
Share this document with a friend
29
Copyright © 2015 KNIME.com AG Big Data Science is just a Click Away! Rosaria Silipo KNIME.com
Transcript

Copyright © 2015 KNIME.com AG

Big Data Science is just a Click Away!

Rosaria Silipo

KNIME.com

Copyright © 2015 KNIME.com AG

Variety, Volume, Velocity

Variety:• integrating heterogeneous data (and tools)

Volume:• from small files...

• ...to distributed data repositories (Hadoop)

• bring the tools to the data

Velocity:• from distributing computationally heavy

computations...

• ...to real time scoring of millions of records/sec.

4

Copyright © 2015 KNIME.com AG

Every Minute…

5

Copyright © 2015 KNIME.com AG

IoT

6

Copyright © 2015 KNIME.com AG 7

The Challenge

Copyright © 2015 KNIME.com AG

Energy Usage Prediction from Smart Meters Data

• Read Smart Meter Energy Data (176 millions rows)

• Clean Up and Aggregate total Energy Usage by hour, week, day, month, year

• Calculate Behavioral Measures for each Smart Meter

• Cluster Smart Meters with Similar Behavior (k-Means)

• Predict Energy Usage in Clustered Smart Meters (Auto-Regressive Time Series Prediction)

8

Workflow 1

Workflow 2

Workflow 3

Copyright © 2015 KNIME.com AG

Workflow 1: PrepareData

9

~ 2 days

Copyright © 2015 KNIME.com AG 10

Big Data

Copyright © 2015 KNIME.com AG

Big Data Support

• KNIME Big Data Access Nodes

– preconfigured connectors

– in database processing

• Big Data Platforms

– HDFS, Hive, Impala, HP Vertica, Hortonworks, ParStream, Actian, any big data platform really!

• Spark MLlib integration (coming soon)

• Streaming Executor (coming soon)

Copyright © 2015 KNIME.com AG

Hadoop Sandboxes

• Hortonworks:

http://hortonworks.com/products/hortonworks-sandbox/

• Cloudera:

http://www.cloudera.com/content/cloudera/en/downloads/quickstart_vms.html

• Virtual Box

https://www.virtualbox.org/

• VMWare Player

http://www.vmware.com/

12

Copyright © 2015 KNIME.com AG

Access Big Data

Select TableIn-DB

ProcessingInto

KNIME

… as easy as 1,2,3,… 4

13

4321

Copyright © 2015 KNIME.com AG

1. Database Connector

Generic Database Connector

– Can connect to any JDBC source

– Register new JDBC driver via preferences page

14

Access Big Data

Copyright © 2015 KNIME.com AG

1. Register JDBC Driver

15

Open KNIME and go toFile -> Preferences

Increase connection timeout for long running retrieval operations

Access Big Data

Copyright © 2015 KNIME.com AG

1. Dedicated Connectors

Dedicated pre-configured connectors

– Bundling necessary JDBC drivers

– Easy to use

– DB specific behavior/capability

Some dedicated connectors are part of the open source KNIME Analytics Platform, some belong to the commercial KNIME Big Data Extension

16

works for most Hadoop HIVE installations, including Hortonworks

free

Access Big Data

Copyright © 2015 KNIME.com AG

2. Data Table Selection

18

Select Table

Copyright © 2015 KNIME.com AG

3. In-Database Processing

• Filter rows and columns

• Join tables/queries

• Sort your data

• Write your own query

• Aggregate* your data

19

Similar Settings as GroupBy node

Similar Settings as Joiner node

* Database GroupBy node exposes DB specific aggregation methods

In-DB Processing

Copyright © 2015 KNIME.com AG

3. Queries for average Measures

20

In-DB Processing

Copyright © 2015 KNIME.com AG

3. Average Monthly Values

22

In-DB Processing

Copyright © 2015 KNIME.com AG

4. Import Data from Database

23

< 30 min

1 2

3

4

Into KNIME

Copyright © 2015 KNIME.com AG

New Big Data Platform?

24

No problem!Just change the connector node!

Copyright © 2015 KNIME.com AG

Other Useful Database Nodes

• Drop table

– missing table handling

– cascade option

• Execute any SQL statement

• Manipulate existing queries

25

Executes severalqueries separatedby ; and new line

Copyright © 2015 KNIME.com AG 26

KNIME Big Data Extension

Copyright © 2015 KNIME.com AG

KNIME Big Data Extension

• KNIME Big Data Access Nodes

– preconfigured connectors

– HDFS File Handling

– Hive/Impala Loader

• Big Data Platforms

– HDFS, Hive, Impala, HP Vertica, Hortonworks, ParStream, Actian, SAP Hana (to be), …

• Spark MLlib integration (coming soon)

• Streaming Executor (coming soon)

Copyright © 2015 KNIME.com AG

HDFS File Handling

• KNIME & Extensions -> KNIME File Handling Nodes

• HDFS Connection and HDFS File Permission nodes

28

Copyright © 2015 KNIME.com AG

Hive/Impala Loader

29

• Upload a KNIME data table to Hive/Impala

Copyright © 2015 KNIME.com AG

KNIME Big Data Extension: Download and Install

KNIME.com Extension Store

License Required!

Installation Instructions

http://tech.knime.org/installation-instructions

Product Description

http://www.knime.org/knime-big-data-extension

Copyright © 2015 KNIME.com AG

License on KNIME Store

http://tech.knime.org/knime-store

30-day trial license available with special Promotion [email protected]

Copyright © 2015 KNIME.com AG

References

• Whitepaper “KNIME opens the Doors to Big Data”

http://www.knime.org/files/big_data_in_knime_1.pdf

• Blog Post “Integrating Big data is as Easy as 1,2,3, … 4”

http://www.knime.org/blog/integrating-big-data-is-as-easy-as-1-2-3-4

• The Big Data Extension Product Description http://www.knime.org/knime-big-data-extension

32

Copyright © 2015 KNIME.com AG

Thank You!

[email protected]

• Twitter: @KNIME

• LinkedIn Group: KNIME

• KNIME Blog: http://www.knime.org/blog

33


Recommended