DATA
SCIENCE
PROCESS
MODEL
20.11.14 The unbelievable Machine Company 3
Florian Dohmann Data Scientist @ *um
20.11.14 The unbelievable Machine Company 4
*um The unbelievable Machine Company GmbH
20.11.14 The unbelievable Machine Company 5
C L O U D S E R V I C E S
INTERNET APPLICATIONS
B I G D A T A
S P E C I A L I S T F O R
F R O M B E R L I N
20.11.14 The unbelievable Machine Company 6
We create data
solutions.
20.11.14 The unbelievable Machine Company 7
From idea
to cable.
20.11.14 The unbelievable Machine Company 8
BORN
2008
20.11.14 The unbelievable Machine Company 9
DATA SCIENCE
2011
20.11.14 The unbelievable Machine Company 10
2013
20.11.14 The unbelievable Machine Company 11
2015
20.11.14 The unbelievable Machine Company 12
~100 Employees in Berlin & Vienna
20.11.14 The unbelievable Machine Company 13
Data Science
20.11.14 The unbelievable Machine Company 14
Custom Big
Data Analytics,
Machine
Learning & Co.
20.11.14 The unbelievable Machine Company 15
Workshops
& Trainings
Support in creating
own Data Science
Teams
20.11.14 The unbelievable Machine Company 16
= Full-
Service
20.11.14 The unbelievable Machine Company 17
„We need a perfect
interplay of human &
machine intelligence“
20.11.14 The unbelievable Machine Company 18
Process?
20.11.14 The unbelievable Machine Company 19
http://en.wikipedia.org/wiki/Software_development_process#mediaviewer/
File:Three_software_development_patterns_mashed_together.svg
20.11.14 The unbelievable Machine Company 20
Data Science Process Model: Stages
20.11.14 The unbelievable Machine Company 21
Data Science Process Model
“What is needed to provide our customers a perfect search result?”
“Why does my conversion rate raise/fall?”
“How can we analyze 100GB+ of user data in a few hours for our recommender system?”
“How can we automatically tag our content to optimize targeting, advertising & search?”
“How can we segment customers based on interests?”
“In which context do customers talk about my products online?”
20.11.14 The unbelievable Machine Company 22
Data Science Process Model
“Why does my conversion rate raise/fall?”
20.11.14 The unbelievable Machine Company 23
Data Science Process Model: Stages & Lanes
Ideation / StorylineData Pool Creation
Explorative Analysis
Automated Analysis
Delivery & Storytelling
Production /Data Product
Bu
sin
ess
La
ne
Da
ta S
cie
nce
La
ne
Te
ch
no
log
y
La
ne
20.11.14 The unbelievable Machine Company 24
Ideation / StorylineData Pool Creation
Explorative Analysis
Automated Analysis
Delivery & Storytelling
Production /Data Product
Business Owner
DataScientist
Machine Intelligence
Bu
sin
ess
La
ne
Da
ta S
cie
nce
La
ne
System Engineer
Infrastructure/Cloud
Te
ch
no
log
y
La
ne
Human-Machine Interplay
Data Science Process Model
created 2013 by *um - The unbelievable Machine Company GmbH
20.11.14 The unbelievable Machine Company 25
Ideation / StorylineData Pool Creation
Explorative Analysis
Automated Analysis
Delivery & Storytelling
Production /Data Product
Business Owner
DataScientist
Machine Intelligence
Bu
sin
ess
La
ne
Da
ta S
cie
nce
La
ne
System Engineer
Infrastructure/Cloud
Te
ch
no
log
y
La
ne
Human-Machine Interplay
Data Science Process Model
created 2013 by *um - The unbelievable Machine Company GmbH
20.11.14 The unbelievable Machine Company 26
log files
~ 100 GB
~ 2 Monate
Sample: 200 K
20.11.14 The unbelievable Machine Company 27
Ideation / StorylineData Pool Creation
Explorative Analysis
Automated Analysis
Delivery & Storytelling
Production /Data Product
Business Owner
DataScientist
Machine Intelligence
Bu
sin
ess
La
ne
Da
ta S
cie
nce
La
ne
System Engineer
Infrastructure/Cloud
Te
ch
no
log
y
La
ne
Human-Machine Interplay
Data Science Process Model
created 2013 by *um - The unbelievable Machine Company GmbH
20.11.14 The unbelievable Machine Company 29
80.18xxxx - - [19/Nov/2014:15:55:21 +0100] "GET /hd_de/checkout/onepage/success/ HTTP/1.1" 200 10341 "https://adomain/hd_de/checkout/onepage/"
"Mozilla/5.0 (iPad; CPU OS 7_0_4 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11B554a Safari/9537.53" 1635552
https
Conversion Rate Based On ..
20.11.14 The unbelievable Machine Company 30
Exploration
20.11.14 The unbelievable Machine Company 31
Exploration
Pattern: 0000000000000000001000000000000
Pattern: 0000000000010
Pattern: 00000000011
…
1
01
1000
11
010
100000
001
10
111
0001
0000001
0100
100
20.11.14 The unbelievable Machine Company 32
Exploration
20.11.14 The unbelievable Machine Company 33
Ideation / StorylineData Pool Creation
Explorative Analysis
Automated Analysis
Delivery & Storytelling
Production /Data Product
Business Owner
DataScientist
Machine Intelligence
Bu
sin
ess
La
ne
Da
ta S
cie
nce
La
ne
System Engineer
Infrastructure/Cloud
Te
ch
no
log
y
La
ne
Human-Machine Interplay
Data Science Process Model
created 2013 by *um - The unbelievable Machine Company GmbH
20.11.14 The unbelievable Machine Company 34
*umDataMonitor Data Data
Automation
20.11.14 The unbelievable Machine Company 35
Ideation / StorylineData Pool Creation
Explorative Analysis
Automated Analysis
Delivery & Storytelling
Production /Data Product
Business Owner
DataScientist
Machine Intelligence
Bu
sin
ess
La
ne
Da
ta S
cie
nce
La
ne
System Engineer
Infrastructure/Cloud
Te
ch
no
log
y
La
ne
Human-Machine Interplay
Data Science Process Model
created 2013 by *um - The unbelievable Machine Company GmbH
20.11.14 The unbelievable Machine Company 36
20.11.14 The unbelievable Machine Company 37
20.11.14 The unbelievable Machine Company 38
20.11.14 The unbelievable Machine Company 39
Ideation / StorylineData Pool Creation
Explorative Analysis
Automated Analysis
Delivery & Storytelling
Production /Data Product
Business Owner
DataScientist
Machine Intelligence
Bu
sin
ess
La
ne
Da
ta S
cie
nce
La
ne
System Engineer
Infrastructure/Cloud
Te
ch
no
log
y
La
ne
Human-Machine Interplay
Data Science Process Model
created 2013 by *um - The unbelievable Machine Company GmbH
20.11.14 The unbelievable Machine Company 40
Development
20.11.14 The unbelievable Machine Company 41
Thanks!
20.11.14 The unbelievable Machine Company 42
The unbelievable Machine Company GmbH Grolmanstr. 40
10623 Berlin
Contact:
Florian Dohmann
Tel. +49-30-889 26 56 – 36
Mobile +49-173 75 22 140
20.11.14 The unbelievable Machine Company 43
What we also
need ..
20.11.14 The unbelievable Machine Company 44
A Great Team
20.11.14 The unbelievable Machine Company 45
20.11.14 The unbelievable Machine Company 46
+ Algorithms
20.11.14 The unbelievable Machine Company 47
Machine Learning
Magic?
20.11.14 The unbelievable Machine Company 48
Machine Learning & Co.
20.11.14 The unbelievable Machine Company 49
+ Programming
20.11.14 The unbelievable Machine Company 50
created 2013 by *um - The unbelievable Machine Company GmbH
Python & Co.
20.11.14 The unbelievable Machine Company 51
e.g. PYTHON
created 2013 by *um - The unbelievable Machine Company GmbH
20.11.14 The unbelievable Machine Company 52
Hadoop
created 2013 by *um - The unbelievable Machine Company GmbH
Hadoop & Co.
20.11.14 The unbelievable Machine Company 53
+ Stream
processing
20.11.14 The unbelievable Machine Company 54
Storm & Co.
20.11.14 The unbelievable Machine Company 55
20.11.14 The unbelievable Machine Company 56
+ Infrastructure
20.11.14 The unbelievable Machine Company 57
Dedicated
20.11.14 The unbelievable Machine Company 58
Cloud
20.11.14 The unbelievable Machine Company 59
Data visualisation
Visuals /
Interfaces
20.11.14 The unbelievable Machine Company 60
Thanks 2!
20.11.14 The unbelievable Machine Company 61
The unbelievable Machine Company GmbH Grolmanstr. 40
10623 Berlin
Contact:
Florian Dohmann
Tel. +49-30-889 26 56 – 36
Mobile +49-173 75 22 140