+ All Categories
Home > Technology > Build and Operationalize Enterprise Data Lake in Big

Build and Operationalize Enterprise Data Lake in Big

Date post: 21-Jan-2017
Category:
Upload: julius-remigio-cbip
View: 172 times
Download: 1 times
Share this document with a friend
16
© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1 Alex Garbarini, Data Lake Service Owner
Transcript
Page 1: Build and Operationalize Enterprise Data Lake in Big

© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1

Alex Garbarini, Data Lake Service Owner

Page 2: Build and Operationalize Enterprise Data Lake in Big

© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 2

§  What is a Data Lake in Today’s Climate?

§  Starting the Data Lake Journey

§  Data Management Options

§  Automated Data Ingestion Pipelines

§  File System Layout and Security

§  Enterprise Processes

§  Cisco Operational Use Cases

Page 3: Build and Operationalize Enterprise Data Lake in Big

© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 3

•  Data Lake - a place to store practically unlimited amounts of data of any format, schema and type that is relatively inexpensive and massively scalable. Data processing software like Hadoop can transform the data from its raw state to a finished product.

~ Revelytix

•  If you think of a datamart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.

~ Pentaho

Page 4: Build and Operationalize Enterprise Data Lake in Big

© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 4

Data Lake

Data Reservoir

Data Swamp

Data Ponds

“Tread carefully, you must, or the DARK side of the swamp you will find.”

Page 5: Build and Operationalize Enterprise Data Lake in Big

© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 5

Can you taste the rainbow… of problems?

Hadoop

Platform

App Data App

Data App

Data App

Data App

Data

Page 6: Build and Operationalize Enterprise Data Lake in Big

© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 6

Initial Data Lake Objectives:

•  Eliminate Silos & Data Reuse

•  Optimize Data Ingestion from Source Systems

•  Metadata Management

•  Data on Tap

•  Provide All (Useful, Enterprise only) Data on One Platform

Hadoop

Platform

App Data App

Data App

Data App

Data App

Data

Page 7: Build and Operationalize Enterprise Data Lake in Big

© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 7

Hadoop

Platform

App Data App

Data App

Data App

Data App

Data

“Build In-House” – or – “Buy”

“Data Lake” – or –

“Data Reservoir”

“Self Serviced” – or –

“Managed”

Key Decisions Best Choice

It Depends!

Page 8: Build and Operationalize Enterprise Data Lake in Big

© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 8

Hadoop

Platform

App Data App

Data App

Data App

Data App

Data

“Build In-House” – or – “Buy”

“Data Lake” – or –

“Data Reservoir”

“Self Serviced” – or –

“Managed”

Key Decisions Our Choice

Build in House

Data Lake

Managed

Page 9: Build and Operationalize Enterprise Data Lake in Big

© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 9

•  This translates to “Data on Tap.” •  Automated Data Ingestion Pipeline sounds fancy, but it just means creating easy ways for new data

to become incorporated into the Lake. •  Build for the most common data sources: Relational, File System, Streaming, Web Service… •  Employ best practices (E.G. security, governance, compliance, impact assessment…)

Design

Develop

Implement

Audit

Normalize

Add an Entry to your

Metadata

Get Coffee

Automated Process

Typical Process

Page 10: Build and Operationalize Enterprise Data Lake in Big

© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 10

•  Cisco solves this problem with an automated ingest engine driven from a metadata repository:

Metadata Repo

Data Sources 12 6

9 3

Scheduler

Hadoop Platform

Page 11: Build and Operationalize Enterprise Data Lake in Big

© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 11

•  Control and Segregate the Data ingested by domain and access. •  Security by design is better but not always realistic or necessary to the same degree. •  Keep PII/SOX/non-owned data under separate restrictions

•  Understand the purpose of the data being imported or it will need to move! •  (Most common cause of a lake turning into a swamp)

•  Store data in the format it will be consumed.

•  You can’t please everyone; so don’t compromise implementation to please no one.

•  Understand the security required with impact assessments.

Page 12: Build and Operationalize Enterprise Data Lake in Big

© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 12

Hadoop Platform

App

App

App

App

App

Data Lake

Page 13: Build and Operationalize Enterprise Data Lake in Big

© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 13

Enterprise Data

Supply Chain

Services

Reference

Sales

Channels Public

Restricted Pre-Sales

Post-Sales …

Internal Data

External Data

LinkedIn

Twitter

Facebook Projects

•  Unix Level Control: •  Data access groups for each restricted, final

level •  Mode assignment for restricted groups

•  Simple Metadata Driven Access Definitions

Public Restricted RWX R-X R-X RWX R-X ---

Page 14: Build and Operationalize Enterprise Data Lake in Big

© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 14

Subscription

Data Catalogs

Self-Service

Automation

Self Healing

Page 15: Build and Operationalize Enterprise Data Lake in Big

© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 15

Page 16: Build and Operationalize Enterprise Data Lake in Big

© 2012 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 16

Thank You


Recommended