How to Build a Successful Data Team - Florian Douetteau (@Dataiku)

Post on 21-Jan-2017

7,274 views 3 download

transcript

How to build A Successful Data Team

March 2016

Hi ! I’m FLORIAN DOUETTEAU, CEO of Dataiku

x 54 +

x 1+

+ 58++It’s Me !!

It’s our software !!

…and our software is

The most complete Data Science platform

Deployment

Dataiku - Data Tuesday

Meet Hal Alowne

4Big Guys• 10B$+ Revenue• 100M+ customers• 100+ Data Scientist

Hal AlowneBI ManagerDim’s Private Showroom

Hey Hal ! We need a big data platform

like the big guys.Let’s just do as they do!

‟”Average E-commerce Web site

• 100M$ Revenue• 1 Million customer• 1 Data Analyst (Hal Himself)

Dim SumCEO & Founder Dim’s Private Showroom

Big DataCopy Cat Project

5

Technology Disconnect

Welcome to Technoslavia !

6

LOL PLATFORM ANTI-PATTERN

7

Test and Invest in Infrastructure == Skilled Peopleor

Go For Cloud / Packaged Infrastructure

Your Brand New Hadoop Clusteris perceived as slow, not so used and not reliable

TECHNO MISMATCH ANTI-PATTERN

8

Assume Being Polyglotor

Be a Dictator

VS

VS

The PythonClan

The RTribe

The Old ElephantFraternity

The New ElephantClub

PREDICTIVE ANALYTICS DEPLOYMENT STRATEGY

9

Website 2000’ winners

Companies that were able to release fast

"Artificial Intelligence with Data for Internet of Things" 2010’ winners

Companies able to put intelligence in production

?

Design a way to put “PREDITICTIVE MODELS” IN PRODUCTION

10

PEOPLE DISCONNECT

Classic Business Intelligence Team Organization

Business Leader Data Consumer

Line-of-business Data Consumer

Business ProjectSponsor BI Solution Architect

Model Designer

ETL Developer

Dashboard / Report Designer

DBA / IT Data Owner

Specs

Data Science Team Organization

Business Leader Data Consumer

Line-of-business Data Consumer

Business ProjectSponsor Data Team Manager

Data Engineer

Data Analyst

Data System Engineer / Data Architect

Specs

Data Scientist

Built From Scratch

13

Business Leader Data Consumer

Line-of-business Data Consumer

Business ProjectSponsor

DBA / IT Data Owner

Specs

DATA SCIENTISTS EVERYWHERE

Built From Engineering

14

Business Leader Data Consumer

Line-of-business Data Consumer

Business ProjectSponsor

Specs

DATA ENGINEERS

DATA ANALYSTS

Built From Analysts

15

Business Leader Data Consumer

Line-of-business Data Consumer

Business ProjectSponsor

Specs

Manage Expectations

16

Data Plumberer

DataEngineer

Data Scientist

Data Waiter

DataCleaner

DataAnalyst

REALJOB

DREAMJOB

Perfectly Natural Hidden thoughts

17

Business ProjectSponsor

Data Team Manager

Data EngineerData Analyst

Data Scientist

Managing Extreme Personalities

18

Data SCIENTIST

Highly Creative

Passionate

Hard to hire ?

Hard to manage ?

Want to take your job ? Ambitious

Paired for Data

19

Data AnalystDiscover Patterns

Data EngineerMake things work

Fightdata entropy

Entropytech

entropy

When do you prefer ?

20

One AnalystOne EngineerOne Data Scientist That work together ?

Four data scientists

21

Data Disconnect

What is the main reason for data project to fail ?

22

DATA NOT

AVAILABLE

BUT FOR ONLY INCREMENTAL GAIN

0% 25% 50% 75% 100%

50 30 20

Contribution to the overall project performance

Business Goal Definition and Data Feature Engineering Algorithm

How to Get Data if you don’t have it

24

THE GRASSHOPER THE SPIDER THE FOX

The Cicada : Optimistic and Opportunistic Data

26

THE CICADA

As a startup

As a group inside a company

- Build a new product using open data

- Benefit from the data sharing initiative within your company

- Wait for data to be available in your data lake

The Spider: Power of the Network

27

THE SPIDER

As a startup

As a group inside a company

- Create a network of (web trackers | sensors)

- Make it available for free

- Build your service on people’s collected data

- Make a web service available to collect data

- Promote it internally so that people use it

The Fox: Hunt for the Big Money first

28

THE FOX

As a startup

As a group inside a company

- Hunt for a Business Group within a large company with a problem

- Build a SaaS solution using their data

- Replicate to competitors

- Take in a charge a critical problem as per the CEO’s request

- Build your own integrated tech team to solve it

- Use those ressources to reset data services internally

29

PRODUCT DISCONNECT

What is Big Data about ?

30

The Age Of Distributed Intelligence

31

Global, Personalised and Real Time Data Driven Services

32

Data to Visualize or Data to Automate ?

2013 2014 2015 2016 2017 2018

Automated Decision VIsualize To Decide

Moving to a world of automated decision making

Involve product team

33

Product FeaturePersonalised Item Ranking

Product FeatureNotify User Only when Needed

Product Feature:Historical Data For Path Optimisation

Have Product Management Deeply Involved In the Data Team

Where is your added value ?

34

Is the problem at the Core of my Business Process?

Is it a common problem / with share data ?

Go for Best of Breed SAAS

Solution

Can I Solve it on my own ?

Really ?

Build by the data team

Build by the data team ?

Build by the data team

Hire Consultants and Learn

Yes

Yes No

I can’t Ok, I can try

Yes!

No!

No

Be aware of the confort zone

35

MissionCritical

SmallStructured Large

Diverse

Sheer Curiosity

Reporting for Financein Any Industry

Analyze Each Tweet

Web NavigationFor E-Merchant

Ticket DataFor Discountsin Retail

Phone Call Logs for Security

RTB Data For Advertising

Customer Consumption For Anti-Churn in Utilities

Optimization

FilingsFor Fraud in Insurance

Not EnoughData To Learn From ?

Not Enough“Hard" Examples So that you can learn

Create an "API" Culture

36

Do not share• Random Piece of Code• Flat File

Do share• Reproductible documented workflows• Clean, documented APIs

Food for thoughts www.dataiku.com/blog

Free Data Science Software

www.dataiku.com/dss

THANK YOU !

Data Science Is no longer a science