Data Science 6= Machine Learning
Prof. Dr. Jens Dittrich
bigdata.uni-saarland.dedaimond.ai
twitter.com/jensdittrich
April 19, 2018
Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 1 / 12
Data Science (one possible View)
Application Domain
Machine Learning
A.I. Big DataManagement
Data Science Data Mining
Statistics
Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 2 / 12
Data Science (another View)
Artificial Intelligence
- Machine Learning
DataManagement
Data Science
Data Mining
Statistics
Math
Programming
Visualization
Application Domain
Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 3 / 12
Artificia
l Intellig
ence/
Machine Learning D
ata Managem
ent
Data Mining Application
Domain
The Data Science CakeIngredients: 50g statistics120g linear algebra200g programming1kg visualisation300g software engineering
Additional skills: creativityout of the box thinkinggritteam spirit
© istock.com sasilsolutions
Data Science = Three Thirds
Definition (Data Science)
Data Science =
1/3 Artificial Intelligence (⊃ Machine Learning) +
1/3 Data Mining +
1/3 Data Management.
Preach and get involved!
It is our job as a community to spread the word about this and getinvolved! Otherwise we will again witness the reinvention of the wheel(e.g. like in NoSQL).
Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 5 / 12
Opportunity!
In Data Science there is tremendous opportunity for data management.
Translates to:
In Data Science there is tremendous opportunity for us!
Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 6 / 12
The Data Science-Pipeline/Waterfall ModelDatenbanken im Wasserfallmodell?data collection
data acquisition
data profiling,exploration
&visualization
data cleaning
feature engineering
modeling
model training
model testing
result interpretation
Achtung: Data Transformation kann zu beliebigen Zeitpunktenstattfinden und ist deshalb hier nicht als extra Schritt eingezeichnet.
Prof. Dr. Jens Dittrich Datenbanken 2 / 9
This is at the same time a process model and a dataflow.
Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 7 / 12
AI/ML as the GoalDatenbanken im Wasserfallmodell?data collection
data acquisition
data profiling,exploration
&visualization
data cleaning
feature engineering
modeling
model training
model testing
result interpretation
Achtung: Data Transformation kann zu beliebigen Zeitpunktenstattfinden und ist deshalb hier nicht als extra Schritt eingezeichnet.
Prof. Dr. Jens Dittrich Datenbanken 2 / 9
AI&ML
Data Collection through data cleaning has a single goal here:enable AI&ML
Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 8 / 12
Data Curation/OLAP as the GoalDatenbanken im Wasserfallmodell?data collection
data acquisition
data profiling,exploration
&visualization
data cleaning
feature engineering
modeling
model training
model testing
result interpretation
Achtung: Data Transformation kann zu beliebigen Zeitpunktenstattfinden und ist deshalb hier nicht als extra Schritt eingezeichnet.
Prof. Dr. Jens Dittrich Datenbanken 2 / 9
Database Management System (DBMS)
Data Collection through data cleaning has a single goal here:enable data curation/OLAP
Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 9 / 12
Alternative: DBMS as an Intermediate Tool
In principle, this is possible at any step.
Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 10 / 12
Alternative: DBMS as an Intermediate Tool
Examples: DeepDive, HoloClean,other steps: MonetDB/Tensorflow marriageany data transformation: relational algebra-style, e.g. Pandas, statelessDBMS, Spark/Flink, etc.
Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 11 / 12
Summary: Data Science = Three Thirds
Definition (Data Science)
Data Science =
1/3 Artificial Intelligence (⊃ Machine Learning) +
1/3 Data Mining +
1/3 Data Management.
Preach and get involved!
It is our job as a community to spread the word about this and getinvolved! Otherwise we will again witness the reinvention of the wheel(e.g. like in NoSQL).
Prof. Dr. Jens Dittrich Data Science 6= Machine Learning 12 / 12