+ All Categories
Home > Documents > 002 Ugm2013 Whats New Final

002 Ugm2013 Whats New Final

Date post: 12-Nov-2014
Category:
Upload: nilmarnascimento7833
View: 24 times
Download: 3 times
Share this document with a friend
128
What‘s new… Bernd Wiswedel KNIME.com AG, Zurich, Switzerland
Transcript
Page 1: 002 Ugm2013 Whats New Final

What‘s new…

Bernd Wiswedel

KNIME.com AG, Zurich, Switzerland

Page 2: 002 Ugm2013 Whats New Final

Two feature releases last year: 2.6 & 2.7

Documented in Changelog, “What‘s new

summary” and as video on YouTube

Page 3: 002 Ugm2013 Whats New Final

What‘s new page on knime.org

Page 4: 002 Ugm2013 Whats New Final

KNIMETV Youtube Channel

Page 5: 002 Ugm2013 Whats New Final

Outline

Illustrative examples

• Swiss Survival Analysis

• KNIME Forum Analysis

• (Next Best Offer)

New Features in 2.6 & 2.7

Page 6: 002 Ugm2013 Whats New Final

Outline

Illustrative examples

• Swiss Survival Analysis

• KNIME Forum Analysis

• (Next Best Offer)

New Features in 2.6 & 2.7

Page 7: 002 Ugm2013 Whats New Final

Swiss Survival Analysis

• Survival Analysis / Actuarial Tables

• Using population and deaths data to predict

longevity

• Creating the tables

• Investigating the tables

• Creating customer tables for:

• Overall

• Personal

• Historical

• Forecasting

• Make it easy to use for the non-expert!

Page 8: 002 Ugm2013 Whats New Final
Page 9: 002 Ugm2013 Whats New Final
Page 10: 002 Ugm2013 Whats New Final
Page 11: 002 Ugm2013 Whats New Final
Page 12: 002 Ugm2013 Whats New Final
Page 13: 002 Ugm2013 Whats New Final
Page 14: 002 Ugm2013 Whats New Final
Page 15: 002 Ugm2013 Whats New Final
Page 16: 002 Ugm2013 Whats New Final

Outline

Illustrative examples

• Swiss Survival Analysis

• KNIME Forum Analysis

• (Next Best Offer)

New Features in 2.6 & 2.7

Page 17: 002 Ugm2013 Whats New Final

KNIME Forum Analysis

Learn something about the KNIME forum:

http://tech.knime.org/forum

Page 18: 002 Ugm2013 Whats New Final

KNIME Forum Analysis

Page 19: 002 Ugm2013 Whats New Final

KNIME Forum Analysis

Learn something about the KNIME forum:

http://tech.knime.org/forum

Challenges:

• Get data into KNIME

• Extract simple statistics (how many posts,

response time, response length)

• Classify topics and detect topic shifts

• Identify content and users

Page 20: 002 Ugm2013 Whats New Final

KNIME Forum Analysis

Learn something about the KNIME forum:

http://tech.knime.org/forum

Challenges:

• Get data into KNIME

• Extract simple statistics (how many posts,

response time, response length)

• Classify topics and detect topic shifts

• Identify content and users

Page 21: 002 Ugm2013 Whats New Final

Forum Analysis – Get Data

Two alternatives:

• Connect to underlying database, read

content

Page 22: 002 Ugm2013 Whats New Final

Doable but complicated:

7+ tables need to be read,

prepared and joined

Page 23: 002 Ugm2013 Whats New Final

Forum Analysis – Get Data

Two alternatives:

• Connect to underlying database, read

content

complicated and not generic

• Crawl the web page, parse html

• Use XML parser & Palladian’s html retriever

nodes

Page 24: 002 Ugm2013 Whats New Final

Forum Analysis – Structure of forum

Several Categories, “KNIME General”,

“KNIME Reporting”, “Palladian”, …

(~20 in total)

Page 25: 002 Ugm2013 Whats New Final

Forum Analysis – Structure of forum

Discussion threads on several sub-pages

Page 26: 002 Ugm2013 Whats New Final

Forum Analysis – Structure of forum

Each thread consists of an initial post

and a variable number of comments

Page 27: 002 Ugm2013 Whats New Final

Forum Analysis – Crawler Flow

Page 28: 002 Ugm2013 Whats New Final

Forum Analysis – Crawler Flow

Page 29: 002 Ugm2013 Whats New Final

Forum Analysis – Crawler Flow

Page 30: 002 Ugm2013 Whats New Final

Forum Analysis – Crawler Flow

Page 31: 002 Ugm2013 Whats New Final

Forum Analysis – Crawler Flow

Page 32: 002 Ugm2013 Whats New Final

Forum Analysis – Crawler Flow

Page 33: 002 Ugm2013 Whats New Final

Forum Analysis – Structure of forum

Discussion threads on several sub-pages

Page 34: 002 Ugm2013 Whats New Final

Forum Analysis – Crawler Flow

Page 35: 002 Ugm2013 Whats New Final

Forum Analysis – Crawler Flow

Page 36: 002 Ugm2013 Whats New Final

Forum Analysis – Crawler Flow

Input for all subsequent workflows!

Page 37: 002 Ugm2013 Whats New Final

KNIME Forum Analysis

Learn something about the KNIME forum:

http://tech.knime.org/forum

Challenges:

• Get data into KNIME

• Extract simple statistics (how many posts,

response time, response length)

• Classify topics and detect topic shifts

• Identify content and users

Page 38: 002 Ugm2013 Whats New Final

Forum Analysis – Simple Statistics

Page 39: 002 Ugm2013 Whats New Final

Forum Analysis – Simple Statistics

Input table from crawler workflow

Page 40: 002 Ugm2013 Whats New Final

Forum Analysis – Simple Statistics

Meta nodes perform simple

preprocessing, e.g. average number

of active users per month

Page 41: 002 Ugm2013 Whats New Final

Forum Analysis – Simple Statistics

Many different reporting nodes with

different statistics. Reporting

extension to generate PDF, DOC, …

Page 42: 002 Ugm2013 Whats New Final

Forum Analysis – Simple Statistics

Page 43: 002 Ugm2013 Whats New Final

Number of active users per year

Forum Analysis – Simple Statistics

An active user is an user with at

least one comment or one post in

that year.

Page 44: 002 Ugm2013 Whats New Final

Number of posts per year

Forum Analysis – Simple Statistics

Numbers are just posts (new

discussion threads), not comments

Page 45: 002 Ugm2013 Whats New Final

Number of posts per month and year

Forum Analysis – Simple Statistics

Big increase early 2011.

Coincidentally, Simon Richards

(richards99) joined

Page 46: 002 Ugm2013 Whats New Final

Who comments/answers on posts?

Forum Analysis – Simple Statistics

Page 47: 002 Ugm2013 Whats New Final

Response time

Forum Analysis – Simple Statistics

Page 48: 002 Ugm2013 Whats New Final

Number of comments per post

Forum Analysis – Simple Statistics

Page 49: 002 Ugm2013 Whats New Final

KNIME Forum Analysis

Learn something about the KNIME forum:

http://tech.knime.org/forum

Challenges:

• Get data into KNIME

• Extract simple statistics (how many posts,

response time, response length)

• Classify topics and detect topic shifts

• Identify content and users

Page 50: 002 Ugm2013 Whats New Final

Forum Analysis – Classify Posts

• Use text mining to classify forum post into

categories such as ‘io’, ‘manipulation’,

‘mining’, …

• No training set available

(mis-)use KNIME node description

• See evolution of discussion topics over the

years

Page 51: 002 Ugm2013 Whats New Final

Forum Analysis – Classify Posts

Want to classify forum post (only

first post, no comments)…

Page 52: 002 Ugm2013 Whats New Final

Forum Analysis – Classify Posts

… using KNIME node description text

as labeled training set

Page 53: 002 Ugm2013 Whats New Final

Forum Analysis – Classify Posts

Reads node descriptions from xml

dumps (generated with KNIME

command line tool)

Uses forum data input file and

prepares with text mining tools

Page 54: 002 Ugm2013 Whats New Final

Forum Analysis – Classify Posts

Unzips an archive with all xml files

into temp location

Page 55: 002 Ugm2013 Whats New Final

Forum Analysis – Classify Posts

XML files read with loop and

preprocessed (header and footer

removed)

Page 56: 002 Ugm2013 Whats New Final

Forum Analysis – Classify Posts

Description is converted into KNIME

text document, from which

(stemmed) terms are extracted

Page 57: 002 Ugm2013 Whats New Final

Forum Analysis – Classify Posts

Page 58: 002 Ugm2013 Whats New Final

Forum Analysis – Classify Posts

Training data extracted. Learning

attributes are keyword

occurrences; target is document

category

Page 59: 002 Ugm2013 Whats New Final

Forum Analysis – Classify Posts

Training data extracted. Learning

attributes are keyword

occurrences; target is document

category

Verify model by splitting data

into train/test.

Using random forest classifier to

address high dimensionality of

small (and sparse) data set

Page 60: 002 Ugm2013 Whats New Final

Forum Analysis – Classify Posts

… continuing with main input branch

(Input table from crawler workflow)

Page 61: 002 Ugm2013 Whats New Final

Forum Analysis – Classify Posts

Preprocessing similar to before,

extracting date, author, title, …

Page 62: 002 Ugm2013 Whats New Final

Forum Analysis – Classify Posts

Extracting attribute table using the

keywords from the node description

(training) data.

Page 63: 002 Ugm2013 Whats New Final

Forum Analysis – Classify Posts

Remainder of the workflow ranks

the prediction and prepares for the

report.

Page 64: 002 Ugm2013 Whats New Final

Forum Analysis – Classify Posts

Hot topics have always been

manipulation and mining … tasks

that KNIME is very good at.

Note also increase of ‘flowcontrol’

over the years and low ‘r’ traffic

(separate forum category, not part

of this data set)

Page 65: 002 Ugm2013 Whats New Final

KNIME Forum Analysis

Learn something about the KNIME forum:

http://tech.knime.org/forum

Challenges:

• Get data into KNIME

• Extract simple statistics (how many posts,

response time, response length)

• Classify topics and detect topic shifts

• Identify content and users

Page 66: 002 Ugm2013 Whats New Final

Forum Analysis – Content & Users

• Look at individual categories (KNIME

General, Developer, Reporting, …)

• Learn what is discussed

• See who is contributing

Page 67: 002 Ugm2013 Whats New Final

Forum Analysis – Content & Users

Input are all discussions

in one forum category…

Page 68: 002 Ugm2013 Whats New Final

Forum Analysis – Content & Users

Output is a multi page

report with tag cloud and

user connection graph

Combines KNIME’s text and

network mining extensions

Page 69: 002 Ugm2013 Whats New Final

Forum Analysis – Content & Users

Page 70: 002 Ugm2013 Whats New Final

Forum Analysis – Content & Users

Input table from crawler workflow

Page 71: 002 Ugm2013 Whats New Final

Forum Analysis – Content & Users

Main loop over all ~20 categories

Page 72: 002 Ugm2013 Whats New Final

Forum Analysis – Content & Users

General statistics per category

User network analysis

Text analytics

Page 73: 002 Ugm2013 Whats New Final

Forum Analysis – Content & Users

Text analysis: Forum posts converted

to documents and tagged (persons,

node names, node categories)

Page 74: 002 Ugm2013 Whats New Final

Forum Analysis – Content & Users

Terms fed into tag cloud, colors

represent persons (‘kilian’), nodes

(‘bow creator’), node categories

(‘xml’), …

Page 75: 002 Ugm2013 Whats New Final

Forum Analysis – Content & Users

Network analysis:

User connections

(content ignored)

Page 76: 002 Ugm2013 Whats New Final

Forum Analysis – Content & Users

Network analysis: Ignore topics, only

look at user relation ships. Network

nodes represent users, connections

represent (directed) relationships

between users

Page 77: 002 Ugm2013 Whats New Final

Forum Analysis – Content & Users

Network analysis: Very simple

user graph, visualized with

standard KNIME graph viewer

Page 78: 002 Ugm2013 Whats New Final

Forum Analysis – Content & Users

Data collected and send

to reporting extension

Page 79: 002 Ugm2013 Whats New Final

Forum Analysis – Content & Users

Multi page pdf output for

different forum categories

Page 80: 002 Ugm2013 Whats New Final

Forum Analysis – Content & Users

Text Mining forum category

Page 81: 002 Ugm2013 Whats New Final

Forum Analysis – Content & Users

RDKit (community

chemistry extension)

Page 82: 002 Ugm2013 Whats New Final

Forum Analysis – Content & Users

KNIME Users – not

dominated by any

particular users

Page 83: 002 Ugm2013 Whats New Final

KNIME Forum Analysis

Learn something about the KNIME forum:

http://tech.knime.org/forum

Challenges:

• Get data into KNIME

• Extract simple statistics (how many posts,

response time, response length)

• Classify topics and detect topic shifts

• Identify content and users

Page 84: 002 Ugm2013 Whats New Final

Reviewing all workflows

• All workflows rely on the same input data

• Requires re-run of “Crawler” workflow and

updating parameters in analysis flow

Page 85: 002 Ugm2013 Whats New Final

What do all these flows have in common?

Page 86: 002 Ugm2013 Whats New Final

They all require the “Crawler” data

Page 87: 002 Ugm2013 Whats New Final

Reviewing all workflows

• All workflows rely on the same input data

• Requires re-run of “Crawler” workflow and

updating parameters in analysis flow

• Better: Use meta node and share it between

all instances

Page 88: 002 Ugm2013 Whats New Final

They all require the “Crawler” data

Page 89: 002 Ugm2013 Whats New Final

They all require the “Crawler” data

Page 90: 002 Ugm2013 Whats New Final
Page 91: 002 Ugm2013 Whats New Final
Page 92: 002 Ugm2013 Whats New Final

Now use it in all the

analysis flows

Page 93: 002 Ugm2013 Whats New Final
Page 94: 002 Ugm2013 Whats New Final
Page 95: 002 Ugm2013 Whats New Final
Page 96: 002 Ugm2013 Whats New Final

Nice … but now all workflows

fetch the data each time they

execute!

Let’s add a cache option.

Page 97: 002 Ugm2013 Whats New Final

Quickform Node defining a switch:

-Get data from web or

-use cached file (lives on server)

Page 98: 002 Ugm2013 Whats New Final
Page 99: 002 Ugm2013 Whats New Final
Page 100: 002 Ugm2013 Whats New Final
Page 101: 002 Ugm2013 Whats New Final

Meta Node Templates

• Meta nodes as isolated functional unit

• Shared on KNIME Server (or teamspace) for

use in other workflows or by other users

• Quickforms to expose relevant parameters

in meta node dialog or in wizard execution

• Can also be used on the KNIME server…

Page 102: 002 Ugm2013 Whats New Final

Knime Web Portal

Page 103: 002 Ugm2013 Whats New Final

Knime Web Portal

Page 104: 002 Ugm2013 Whats New Final

Knime Web Portal

Page 105: 002 Ugm2013 Whats New Final

Knime Web Portal

Page 106: 002 Ugm2013 Whats New Final

Outline

Illustrative examples

• Swiss Survival Analysis

• KNIME Forum Analysis

• (Next Best Offer)

New Features in 2.6 & 2.7

Page 107: 002 Ugm2013 Whats New Final

NBO as a typical Project Collect training

data from

multiple sources:

- DB tables

- text files

- excel files

- SAS files

- binary tables

- map files

Define File Paths

and Parameters

Train and evaluate a number of

prediction algorithms to predict

variable Target

Retrieve old model

that has been decently working so

far

Compare

performances

and choose

best model

Recalculate

predictions

based on

best model

and save

Save

best

model

Read

current

data

Page 108: 002 Ugm2013 Whats New Final

NBO as an Example

Collect Training

Data from

multiple Sources

Select best

prediction model

Apply best

model to

score data

Select files and

define parameters

Build a

report

Page 109: 002 Ugm2013 Whats New Final

NBO Report

KNIME Server Training 109

Mean Error in %

Mean Error in %

Page 110: 002 Ugm2013 Whats New Final

e-mail

notification

[email protected]

Global

Flow

Variables

Page 111: 002 Ugm2013 Whats New Final

Quickform dialogs

Execution

Wizard File Upload

Quickforms

Page 112: 002 Ugm2013 Whats New Final

Value

Selection

Quickform

Page 113: 002 Ugm2013 Whats New Final

Integer Input

Quickform „Workflow

Stopped“ light

Page 114: 002 Ugm2013 Whats New Final

Status

“Workflow

Running” icon

“Workflow

Running” light

Page 115: 002 Ugm2013 Whats New Final

KNIME User Training 115

Errors and Warnings

Report

Export report as

Results of past

Executions

Page 116: 002 Ugm2013 Whats New Final

Outline

Illustrative examples

• Swiss Survival Analysis

• KNIME Forum Analysis

• (Next Best Offer)

New Features in 2.6 & 2.7

Page 117: 002 Ugm2013 Whats New Final

New Features in 2.6 & 2.7 - Highlights

• Enhanced database functionality

• File Handling node collection

• More flexible R integration

• Streaming API

• Better (Java) scripting support

• Hypothesis testing nodes

• UI Changes

Page 118: 002 Ugm2013 Whats New Final

• Database update and delete

Enhanced DB functionality

Page 119: 002 Ugm2013 Whats New Final

• New type support: Boolean and Blobs

Enhanced DB functionality

Page 120: 002 Ugm2013 Whats New Final

• Set of nodes to read, (un)zip, copy, move,

convert, … files

• Add notion of unique resource identifier

(URI) and mime types

Used in 3rd party extensions

• Nodes to up and

download files:

ssh, http, ftp, …

File Handling Nodes

Page 121: 002 Ugm2013 Whats New Final

• Collection of Nodes to extract statistical measures

• Different t-tests

• Anova

• (Crosstab)

Hypotheses Testing Nodes

Page 122: 002 Ugm2013 Whats New Final

• Before KNIME 2.7:

• With KNIME 2.7:

Flexible R integration

Page 123: 002 Ugm2013 Whats New Final

• Enhanced functionality:

• define multiple outputs at once

• Script templates

• Better editor

• Syntax highlighting

• Auto completion

Scripting – Java Snippet & friends

Page 124: 002 Ugm2013 Whats New Final

Enhanced programming interface in KNIME

enabling nodes to be streamed and

distributed.

Streaming API

Page 125: 002 Ugm2013 Whats New Final

KNIME Explorer replaces “Workflow Projects”

KNIME UI Changes

Page 126: 002 Ugm2013 Whats New Final

Customizable Node repository

(getting from 1500+ nodes to <100)

KNIME UI Changes

Page 127: 002 Ugm2013 Whats New Final

Tons more …

Page 128: 002 Ugm2013 Whats New Final

Summary

Discussed KNIME Usage Examples

check “Examples” Server for even more

New functionality constantly added, thanks to

community, partners and customers

And more is coming…


Recommended