+ All Categories
Home > Documents > Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data...

Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data...

Date post: 21-Jul-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
39
1 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013 Big Data Analytics Analysis of high-volume and unstructured Data Stefan Weingaertner, DYMATRIX CONSULTING GROUP KNIME Meetup Italia, 10 th October 2013
Transcript
Page 1: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

1 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

Big Data Analytics

Analysis of high-volume and unstructured Data

Stefan Weingaertner, DYMATRIX CONSULTING GROUP

KNIME Meetup Italia, 10th October 2013

Page 2: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

2 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

Agenda

1 Company Introduction

2 Big Data - an Introduction

3 Big Data Analytics on high-volume Data

5 Livedemo: Advanced Email Classification

4 Big Data Analytics on unstructured Data

6 Q & A

Page 3: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

3 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

Company Introduction

Page 4: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

4 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

DYMATRIX – The analytical CRM Company

» Solution provider for Customer Intelligence, Marketing Automation and

Advanced Predictive Analytics

» Consulting, development and implementation know how, based upon

more than 900 projects with mid- and large cap companies across

industries

» Goal- and client- oriented project execution based upon award winning,

established solutions

» Owner managed and independent

Page 5: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

5 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

Our Consulting Competence Centers

Business Intelligence

Advanced Analytics

Campaign Management

» Conception of (big) data warehouse and business intelligence architectures

» Enterprise Reporting Systems

» Dashboards

» Sales Controlling

» Planning & Forecasting

» Balanced Scorecard

E-commerce insight

» Customer Segmentation

» Customer Value Analysis

» Propensity Modeling (Cross-/Upsell/Churn)

» Shopping Basket Analysis

» Credit Rating Analysis & Credit Scoring

» Text Mining

» Data Mining Automation

» Big Data Analytics

» Design and Optimization of Campaign Processes and Workflows

» Implementation of Campaign Management Systems

» Integration of Data Mining Models in Campaign Processes

» Campaign Optimization

» Consulting & Implementation of Next Best Activity Processes

» Web Tracking

» Web Controlling

» Web Mining

» Real Time Recommendation

» Social Media Tracking & Analysis

» Web Performance Measurement

» Customer Journey Analytics

Analysis of client oriented processes Initial situation – Analysis – Conception of processes for customer retention and its optimization -

customer reactivation and new customer activation – benchmarking against industry leaders

Page 6: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

6 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

Solution Portfolio – The Customer Insight Suite

DynaCampaign

» Intelligent multi-touchpoint campaign management platform

» Planning, target group selection, execution and response measurement of campaigns

» Event-triggered realtime campaigning

DynaMine

» End2end automation of data mining processes

» Intelligent model management for automation of preprocessing, training & scoring of models

DynaCision

» Realtime decision management platform

» Design & exection of complex embedded decision processess

DynaSocial

» Social CRM platform to listen, track, identify and quantify customer needs and sentiments

Page 7: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

7 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

Our KNIME Solution Nodes & KNIME Consulting Services

PMML2SQL / PMML2SAS Converter

» Convert PMML to executable SQL Code for In-Database-Scoring

» Convert PMML to executable SAS Code for Model Scoring within SAS

Big Data Integration

» Access any Hadoop large-scale distributed batch processing infrastructure from KNIME

» Efficiently distribute large amounts of data & preprocessing across a set of machines

Uplift Modeling

» Predictive Modeling Nodes to predict the incremental response to marketing actions

» For up-sell, cross-sell, churn and retention activities

Interactive Scorecard Builder

» interactive Scorecard Building Nodes for Design of Credit or Marketing Scorecards

+ Business Consulting + Analytical Consulting + Technical Consulting + Trainings

Page 8: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

8 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

Referenzen References

Telecommunication Travel, Transportation Retail, Service Provider

Page 9: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

9 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

References

Media Banks, Insurances Utilities, Industries, Public

Schwäbisch Hall

Page 10: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

10 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

Big Data - an Introduction

Page 11: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

11 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

A Characterization of Big Data

Big Data

Volume

Structured

Structured & Unstructured

Streaming

Batch

Zettabyte Terabyte

Source: Understanding Big Data (Zikopolous et al.), 2012

Page 12: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

12 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

Needs

Possibilities

Decisions

Approach

Purchase Delivery

Usage

Service & Support

Remember

Challenge: Big Data Collection & Integration

Source: Phil Winters, 2011

Page 13: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

13 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

Needs

Possibilities

Decisions

Approach

Purchase Delivery

Usage

Service & Support

Remember

Big Data Analytics: Learn, Target & Influence!

Source: Phil Winters, 2011

Page 14: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

14 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

Big Data Analytics on high-volume Data

Volume

Structured

Structured & Unstructured

Streaming

Batch

Zettabyte Terabyte

Big Data

Page 15: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

15 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

Big Data Access

Hadoop Distributed File System (HDFS)

MapReduce

Hive HBase Had

oo

p

Exte

nsi

on

s

Mahout

An

alyt

ic

Ap

plic

atio

ns

Had

oo

p

Co

re

Big

Dat

a

Sou

rce

s

MapReduce Routines

Page 16: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

16 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

Big Data Analytics

Hadoop Distributed File System (HDFS)

MapReduce

Hive HBase Had

oo

p

Exte

nsi

on

s

Mahout

An

alyt

ic

Ap

plic

atio

ns

Had

oo

p

Co

re

Big

Dat

a

Sou

rce

s

MapReduce Routines

PMML2SQL Converter

Page 17: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

17 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

Big Data Analytics on unstructured Data

Volume

Structured

Structured & Unstructured

Streaming

Batch

Zettabyte Terabyte

Big Data

Page 18: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

18 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

80% of the world’s data is unstructured.

Unstructured data is growing at 15 times the rate of structured data.

Source: Google Trends April 6, 2012

Big Data is not just about structured data…

15 times

80%

Page 19: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

19 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

» …to classify all customer related text messages by

Source / Origin

Sentiment

Product or Service

Business Transaction

Context

etc.

» …to identify unknown trends

» …to identify cause and effect relations

» …to react on that information, e.g.

Technical Problems

Needs

Usability

Competition

etc.

Imagine…

The KNIME platform supports these efforts with comprehensive Text Analytics & Network Analytics capabilities!

Page 20: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

20 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

Deutsche Telekom: Social Earthquake

0

200

400

600

800

1000

1. Mrz. 8. Mrz. 15. Mrz. 22. Mrz. 29. Mrz. 5. Apr. 12. Apr. 19. Apr. 26. Apr.

Facebook Posts & Comments March & April 2013

Negativ

Neutral

Positiv

First Rumours: Limitation of Bandwidth (21.3. – 23.3.)

„DSL-Drossel“: Official Pressrelease on Limitation of Bandwidth leads to a Social Earthquake. (22.4. – 27.4.)

Page 21: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

21 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

DYMATRIX Text Mining Process

Page 22: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

22 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

DYMATRIX Text Mining Process (KNIME Text Processing)

Text Datasources

Datasources: • Facebook • Twitter • Emails • Data Provider

like GNIP, Datasift etc.

• Crawled Data • etc. For Machine Learning • Provide Training

Data for Classification (e.g. Sentiment)

Text Enrichment

Language Detection • English • German • Many more… Language individual NLP POS Tagging • Penn Treebank

Tagger • STTS Tagger Text Cleansing • Stop Words • Punctuations • Stemming Sentiment Amplifier • Matching of

Sentiment- & Emoticon-Dictionaries

Subject Matching

Text Tagging with any Subjects • Products • Brands • Business

Transactions • Service • Complaints • Requests • etc.

Fuzzy Matching with Dictionary Tagger • Matching of

Subject-Dictionaries

Sentiment Classification

Text Vectorization • Creation of text

predictors to predict sentiments

Machine Learning • Classification with

Predictive Analytics (e.g. Decision Tree)

Retraining Interface • Adjustment of

misclassified messages for permanent optimization of classification

Information Delivery

Text Data Mart • Make information

available in central Text Data Mart for visualization, alerting etc.

Fields of Application • Email-Routing • Event triggered

Campaign Management

• etc.

Page 23: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

23 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

DYMATRIX Text Mining Process: Datasources

Text Datasources Information

Delivery Sentiment

Classification Subject

Matching Text

Enrichment

Access any Text Datasource to start the Text Mining Process

» Facebook

» Twitter

» Emails

» Crawler

» Data Provider like GNIP, Datasift etc.

Exemplified contribution on Facebook Fanpage

Vodafone UK

Page 24: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

24 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

DYMATRIX Text Mining Process: Text Enrichment

Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap [----] signal but yet paying FULL monthly contract! Vodafone sort it.

Sentiment Amplifier

sort[VBG] signal[VBP] issues [VBZ] instead[RB] bringing[VBG] phones[NNS] Wk[NNP] 3[CD] crap[NN] paying[VBG] monthly[RB] contract[NN] Vodafone[NNP]

Removal of Stop Words & Punctuations

Penn Treebank POS Tagger (English Messages)

Why[WRB] not[RB] sort[VBG] your[PRP] signal[VBP] issues [VBZ] out[IN] instead[RB] of[IN] bringing[VBG] new[JJ] phones[NNS]!!!![SYM] Wk[NNP] 3[CD] of[IN] crap[NN] but[CC] yet[RB] paying[VBG] FULL[NNP] monthly[RB] contract[NN] ![SYM] Vodafone[NNP] sort[VBG] it[PRP] .[SYM]

Text Datasources Information

Delivery Sentiment

Classification Subject

Matching Text

Enrichment

Original Facebook Message

Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it.

Page 25: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

25 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

DYMATRIX Text Mining Process: Subject Matching

Subject Matching (Fuzzy Matching)

Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal [NETWORK] but yet paying FULL monthly contract! Vodafone sort it [COMPLAINT].

Text Datasources Information

Delivery Sentiment

Classification Subject

Matching Text

Enrichment

Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it.

BUSINESS TRANSACTION: Complaint

NETWORK: No Signal

PRODUCT: Nokia Lumia 925

Original Facebook Message

Page 26: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

26 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

DYMATRIX Text Mining Process: Sentiment Classification

Output from Text Enrichment

Predictors relevant for Text Classification , e.g. - Emoticons positive/negative - Length of message - Fragments positive/negative - Likes - Words positive/negative - Comments - Author-related Inputs - Other linguistic Inputs

Text Vectorization (Transformation)

Text Datasources Information

Delivery Sentiment

Classification Subject

Matching Text

Enrichment

Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it.

Original Facebook Message

Text Classification with Decision Tree

Resulting Classification

Page 27: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

27 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

DYMATRIX Text Mining Process: Information Delivery

Make information available in central Text Data Mart Visualization in DynaSocial

Original Facebook Message

Other Fields of Application

» Subject-oriented Email-Classification & Email-Routing

Text Datasources Information

Delivery Sentiment

Classification Subject

Matching Text

Enrichment

Why not sort your signal issues out instead of bringing new phones out!!!! Wk 3 of crap signal but yet paying FULL monthly contract! Vodafone sort it.

Sentiment Business Transaction

Product Relevance

+

+ + +

Network

Page 28: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

28 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

DYMATRIX Text Mining Process: KNIME Workflow

Page 29: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

29 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

Benefits

Page 30: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

30 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

» Text Enrichment & Classification Workflows can be used for classification of any electronic text message (e.g. Social Content, Blogs, Emails).

» KNIME Server-based Text Enrichment & Classification Workflows can be deployed as a webservice and called easily from any other application.

KNIME Server: Develop once, deploy everywhere!

Benefits

» Uniformed Sentiment- and Classification-Handling for all customer-related messages.

» Batch- or Realtime-Execution from any application.

Page 31: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

31 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

Application Integration I: DynaSocial

Social Media Monitoring & Analytics

Page 32: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

32 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

Generic Big Data Model

Social Media Analytics Data Management

Social Media Analytics Dashboard

DynaSocial – Social Media Excellence Architecture

Text Enrichment & Classification Network Insights

Advanced Social Media Analytics Text Mining & Network Mining

Facebook

Twitter

Social Media Analytics Content Extractor

Client individual Sources

Social Media Data Provider

Social Service Platforms

Emails Integrated Social Inbox including all Social Touchpoints

Social Engagement

Data Sources Sentiments & Classifications Reports & Dashboard

DynaSocial Configuration Center

Page 33: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

33 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

DynaSocial Management Dashboard

Activities

Sentiment Ratio

Key Influencer

Platform Distribution

Trends compared to competition (Share of Voice)

Geographic Distribution

Overall Sentiments

Top Keywords

Flexible Selection of Time Windows

Page 34: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

34 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

DynaSocial Management Dashboard (Project Example)

Page 35: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

35 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

Application Integration II: Advanced Email-Classification

Multidimensional realtime Email-Classification

Page 36: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

36 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

Email Classification: MS Exchange Connector

KNIME Server

Microsoft Exchange Webservice

.NET Batch

Microsoft Outlook

2 Call .NET Procedure and transfer email contents to KNIME Server via Webservice Call.

Incoming Email

Call KNIME Text Enrichment & Classification Workflows und return classification results.

Classification results are returned to Exchange Server and are saved persistantly with object categories.

Any clients having access to Exchange Server get the same classification.

1

4

3

5

Microsoft Outlook Webaccess

Other Email-Clients

Page 37: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

37 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

Livedemo

Realtime Email-Classification

Page 38: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

38 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

Q & A

Page 39: Big Data Analytics - KNIME · 2017-05-23 · Text Mining Data Mining Automation » Big Data Analytics Design and Optimization of Campaign Processes »Implementation of Campaign Management

39 © DYMATRIX CONSULTING GROUP KNIME Meetup Italia 2013

Thank you for your attention. We are happy to answer any of your questions!

DYMATRIX CONSULTING GROUP GmbH Zeppelin Carré Lautenschlagerstrasse 2 D-70173 Stuttgart Your Contact: Stefan Weingaertner

Phone

Fax E-Mail

Web

+49.711.22.007.88 - 12 +49.711.22.007.88 - 88 [email protected] www.dymatrix.de

Contact


Recommended