+ All Categories
Home > Technology > Building a Data Lake - An App Dev's Perspective

Building a Data Lake - An App Dev's Perspective

Date post: 12-Apr-2017
Category:
Upload: geeknighthyderabad
View: 56 times
Download: 1 times
Share this document with a friend
32
Building a Data Lake An App Dev’s Perspective GeekNight Hyderabad - March 8th 2017 Geetha Balasundaram [email protected] © 2017 ThoughtWorks Technologies Pvt. Limited
Transcript
Page 1: Building a Data Lake - An App Dev's Perspective

Building a Data LakeAn App Dev’s Perspective

GeekNight Hyderabad - March 8th 2017

Geetha Balasundaram

[email protected]

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 2: Building a Data Lake - An App Dev's Perspective

ABOUT ME

Developer @ ThoughtWorks

Building a data lake in the enterprise ecosystem

Helping a retail business make sense of it ( data guided org )

Been part of web development space ( enterprise rewrite )

Equally startled like everyone else by the data engineering space

Share know-how’s and do-how’s from our team’s experience

[email protected]

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 3: Building a Data Lake - An App Dev's Perspective

AGENDA

What is data in the true sense…

Data Warehouse in an enterprise ecosystem...

What is a data lake...

Data lake implementation in an enterprise ecosystem…

How to make effective use of a data lake: technology+process+people

Cluster Administration tool - Cloudera Manager

Pitfalls to avoid

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 4: Building a Data Lake - An App Dev's Perspective

Question ???

How did R.Ashwin perform in the last Test match?

HIGH LEVEL

PROBLEM STATEMENT

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 5: Building a Data Lake - An App Dev's Perspective

COMPLEX HISTORICAL DATA

Why?

Exploit and derive as much new insights as possible

Match Made

Enterprise systems produce this nature of complexity

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 6: Building a Data Lake - An App Dev's Perspective

DATA WAREHOUSE

https://martinfowler.com/articles/microservices.html

ETL

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 7: Building a Data Lake - An App Dev's Perspective

DID MICROSERVICES CAUSE THIS PROBLEM ?

Decentralised Data

https://martinfowler.com/articles/microservices.html© 2017 ThoughtWorks Technologies Pvt. Limited

Page 8: Building a Data Lake - An App Dev's Perspective

MICROSERVICES HELPED

Break down business unit

Break down complexity

Understand the nature of data

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 9: Building a Data Lake - An App Dev's Perspective

Question ???

R.Ashwin performed well ( 6/41 ) in yesterday’s match!

Complex historical data can quantify how well he has performed

Can we say why did he do well in this particular match? What factors affected his enhanced performance?

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 10: Building a Data Lake - An App Dev's Perspective

FACT is a FACT

… even when we don’t know how it can be used

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 11: Building a Data Lake - An App Dev's Perspective

KEY DIFFERENCE

https://martinfowler.com/bliki/DataLake.html© 2017 ThoughtWorks Technologies Pvt. Limited

Page 12: Building a Data Lake - An App Dev's Perspective

What is a data lake?

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 13: Building a Data Lake - An App Dev's Perspective

LAKE is...

.. a large body of water in a more natural state.

The contents of the lake, stream in from a source to fill the lake,

and various users of the lake can come to examine, dive in, or

take samples

https://jamesdixon.wordpress.com/2010/10/14/pentaho-hadoop-and-data-lakes/

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 14: Building a Data Lake - An App Dev's Perspective

DATA LAKE is...

.. a large body of water data facts in a more natural state.

The contents of the lake, stream in from a source to fill the lake,

and various users of the lake can come to examine analyse, dive

in build models, or take samples use subset for specific use

cases

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 15: Building a Data Lake - An App Dev's Perspective

KEY DIFFERENCE

https://martinfowler.com/bliki/DataLake.html© 2017 ThoughtWorks Technologies Pvt. Limited

Page 16: Building a Data Lake - An App Dev's Perspective

Implementation

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 17: Building a Data Lake - An App Dev's Perspective

OUR IMPLEMENTATION - TECH STACK

DATA SOURCE

DATA INGESTION

DATA LAKE

DATA MARTS DATA ANALYSIS

Staging / Queue

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 18: Building a Data Lake - An App Dev's Perspective

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 19: Building a Data Lake - An App Dev's Perspective

How to make effective use of a data lake:

technology+process+people

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 20: Building a Data Lake - An App Dev's Perspective

Functionality Vs Reality

I need a feature so that I can do this action…..

to

I need this insight so that I can take this action….

eg : I need a functionality to order items anytime before or during a promotion…

to

..I need to know on time, if I have to order items anytime before or during a promotion…

so that I can improve promotion sales

People

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 21: Building a Data Lake - An App Dev's Perspective

Start Simple

There is no data lake yet…

Carve out portions of data which are easy wins yet critical to

arrive at the earlier stated insight..

Set up the infrastructure and pipeline

Get your hands dirty..

eg: Sales is an important factor to analyse / predict anything in retail space..

Technology

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 22: Building a Data Lake - An App Dev's Perspective

How much should I know about the data ?

As a consumer of data (read ‘not a consumer of service’)

How much should I know about it?

Schema ⇔ Contracts

Nature of the data versioned vs latest

transactional vs reference

facts vs aggregate

frequency of change

…..

Technology

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 23: Building a Data Lake - An App Dev's Perspective

DATA INSIGHT - Part 1

Incrementally add

new data to the

lake

Serve data

for analysis

eg: What data wrt promotions do I need to bring into the datalake ??

Sales → improve promotion sales

Technology

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 24: Building a Data Lake - An App Dev's Perspective

DATA INSIGHT - Part 2Sales + Promotions → improve promotion sales

How does adding more data to the lake help arriving at new insights..?

history of past promotions sales = how much to order for this promotion

history of past promotion sales + ‘X’ = how much to order for this promotion

history of past promotion sales + ‘X’ + ‘Y’ …… = how much to order for this promotion

eg: seasonality has a strong correlation with sales

history of past promotion sales + ‘X’ + ‘Y’ …… + ‘A’ = how much to order for this promotion after the start

People

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 25: Building a Data Lake - An App Dev's Perspective

Think Agile

Sales + Promotions + X factor → improve promotion sales

Near perfect list of parameters

Progressive set of parameters

Sales + Promotions → is the quantity arrived from these factors (known to business) ordered on time?

Process

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 26: Building a Data Lake - An App Dev's Perspective

DataMarts

... as a store of bottled water – cleansed and packaged and

structured for easy consumption

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 27: Building a Data Lake - An App Dev's Perspective

DataMarts

... as a store of data subset - curated from meaningful facts

bundled into logical groups for arriving at useful insights

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 28: Building a Data Lake - An App Dev's Perspective

Easy Insight

Sales + Promotions →

is the quantity arrived from these factors (known to business) ordered on time?

System : Tells me what is the quantity that is supposed to be ordered for this promotion..

System : Tells me in realtime what is the quantity that is ordered

Technology

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 29: Building a Data Lake - An App Dev's Perspective

Cluster Administration Tool

Cloudera Manager

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 30: Building a Data Lake - An App Dev's Perspective

Think DevOps

Scale | Performance | Memory | Resource Contention |

Optimization | Stability |

Need for an ecosystem - to monitor how well the different tools

play together without chaos

Tools

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 31: Building a Data Lake - An App Dev's Perspective

QUICK RECAP

What is data in the true sense…

Data Warehouse in an enterprise ecosystem...

What is a data lake...

Data lake implementation in an enterprise ecosystem...

How to make effective use of a data lake…

Cluster Administration tool - Cloudera Manager

© 2017 ThoughtWorks Technologies Pvt. Limited

Page 32: Building a Data Lake - An App Dev's Perspective

PITFALLS TO AVOID

Data envy - Ref:https://martinfowler.com/bliki/Datensparsamkeit.html

Tool envy

Reliable data is a luxury

Understanding the nature of data is a must

Dialogue with the data scientist

Treating the data lake like a RDBMS

Keeping the business involved

Data flow state visibility

© 2017 ThoughtWorks Technologies Pvt. Limited


Recommended