HOW TO SELL AN AZURE DATA LAKE PROJECT FOR YOUR … · NOSQL/MS-SQL 2. What is a Data Lake?...

Post on 22-May-2020

10 views 0 download

transcript

HOW TO SELL AN AZURE DATA LAKE PROJECT FOR YOUR ORGANIZATION’S BENEFIT

Presented by:

Victor Karamalis

TTI Corp.

WHO I AM

20 Years on a broad range of Sectors in Information Technology Services

Education & Affiliations:

Master of Science in Management & Systems (NYU)

Project Management Professional (PMI.ORG)

Data Management International (DAMA.ORG)

Fellow, Royal Society of Arts, Manufactures & Commerce (thersa.org)

Large Scale Artificial Intelligence Projects with Multi-National Companies

System and Data Integrations in Enterprise ERP & IIoT

Innovative Proof of Concepts (PoC) with Formal Sponsor Support

Product Management with multiple global teams

Past Contributor in Leading Silicon Valley Tech Blogs

WHAT WE WILL COVER1. DATA LAKE DESIGN

LEAN DATA GOVERNANCE MACHINE LEARNING NOSQL/MS-SQL

2. What is a Data Lake? Explanation of Azure Data Lake Storage GEN 2

3. USE CASE SCENARIO

4. AZURE Data Lake IaaS VS. PaaS1. IaaS2. PaaS

5. EXAMPLE IaaS Architecture

6. DEMONSTRATION BASED ON BASIC ACCOUNT SUBSCRIPTION

7. LESSONS LEARNED

THE ROSETTA STONE @ THE BRITISH MUSEUM

YOUR ORGANIZATION

AS A ‘LEAN STARTUP’

“Somebody has a theory about what’s going to work and what the benefit will be. We don’t measure it. We don’t actually see if it did what we thought it was going to do. And we keep doing it. And then it doesn’t work, so we do something else. And then we layer on program after program that doesn’t actually meet its objectives. And if we actually brought in the mind-set that said, “No, actually we’re going to figure out if we actually accomplish what we set out to accomplish; and if we don’t, we’re going to change it,” that would be huge.”

-Eric Ries, Lean Startup

DATA MANAGEMENT LESSONS

Data Governance: must support business strategy and goals. An organization’s business strategy and goals inform both the enterprise data strategy and how data governance and data management activities need to be operationalized in the organization.

Must contribute to the organization by identifying and delivering on specific benefits

Formalized via Project Charter

Enterprise Data Architecture: Enterprise Data Model (EDM)

Data Flow Design

Maintain compliance throughout data lifecycle

HIPAA

GDPR

DPA UK

PIPEDA (Canada)

MACHINE LEARNING IN A NUTSHELL

Requires Data Scientists to teach system how to learn

Good performance is difficult or infeasible using traditional programming techniques

Complete Logic or Formula to implement solution is not known or does not currently exist

Significant Data size to Compute.

Business Questions Answered Which Products are likely to be bought

together? Collaborative Filtering

How much, what will be the number of..? Regression

Who are my best customers? Clustering

What will be price of stock in a month? Gradient Boosted Tree

Is Fraud Occurring? Decision Tree

Is that image a known intruder? Support Vector Machine (aka, supervised

learning)

AI VS. ML VS. DLEXAMPLE OF RECOGNIZING A PICTURE

Artificial Intelligence

Requires a programmer(s) to write all the code required for a computer to recognize a picture of an object (e.g. cat).

Machine Learning

Requires data scientists to teach the system how to learn what a cat looks like by feeding images and correcting its analysis until the system becomes accurate.

DEEP LEARNING

Divide the task of recognizing an object into different layers1st layer of the algorithm earns to recognize cat body part2nd layer learns another cat body partFinal connects previous layers

MACHINE LEARNING ALGORITHMS

NOSQL/MS-SQL MIGRATION OPTIONS

NO-SQL

SPARK

COUCHDB

HADOOP

COSMOS

RDBMS/SQL

AZURE SQL

MS-SQL SERVER

ORACLE

SAP

WHAT IS A DATA LAKE?

A data lake is an organic store of data without regard for the perceived value or structure of the data unlike a data warehouse

Unstructured

Semi-structured

Structured

A Data Warehouse is a highly structured store of data.

Data Lakes Market segment by Type:

Data Discovery (Insight)

Data Integration and Management

Data Lakes Analytics

Data Visualization

WHAT MAKES A DATA LAKE SO GREAT?

Massive Scale Granular, Multi-layered Security

Optimized for Maximum

Performance

Integration Friendly

Cost Effectiveness

Petabyte Scale, data accessible

everywhere, growth on demand

Granular Security & Protection against

accidental data loss

Extremely fast job execution

Supports multiple methods of data

ingress, processing, egress,

and visualization

Cloud Economic Model with the

ability to intelligently

manage costs

RICH DATA MANAGEMENT & GOVERNANCE(Standards Compliant & Available Everywhere)

A “NO COMPROMISES” DATA LAKE

A Secure, performant, massively scalable Data Lake Storage that brings the cost & scale of object storage together with the performance and analytics feature set of data lake storage

Secure

Manageable

Fast

Scalable

Cost Effective

Integration Ready

AZURE DATA LAKE STORAGE GEN 2: ADLS GEN 2

SECURE MANAGAEABLE FAST SCALABLE COST EFFECTIVE INTEGRATION READY

Support for fine-grained Access Control Lists, Protecting data at File & Folder Level

Automated Lifecycle Policy Management

Atomic File Operations Means Jobs complete faster

No Limits on Data Store Size

Object Store Pricing Levels

Optimized for Spark & Hadoop Analytic Engines

Multi-Layered protections via at-rest storage service encryption *Azure Active Directory Integration

Object Level Tiering

Global Footprint(54 Regions)Including Government Clouds

File System operations minimize transactions required for job completion

Tightly integrated with Azure end to end Analytics Solutions

GEN 1 STORAGE DIFFERENCES

Blob Storage

Large Partner Ecosystem

Global Scale- All 57 Regions

Durability Options

Tiered – Hot/Cool/Archive

Cost Efficient

Data Lake Store

Built for Hadoop

Hierarchical Namespace

ACL, AAD, & RBAC

Performance Tuned for Big Data

Very High Scale Capacity & Throughput

DATA LAKE DESIGN

Cloud/On-premises, Hybrid Cloud, Multi-Cloud (AZURE)

Storage (AZURE SQL DATA BLOB Storage)

Processing (AZURE DATA LAKE)

Data Management (AZURE DATA STORE)

Advanced Analytics Enterprise Reporting Apps (Power Bi)

USE CASE SCENARIO

BUSINESS CONSIDERATIONS

SPONSOR/MANAGEMENT SUPPORT

AUGMENT DEFINED BUSINESS INSIGHTS

TIME TO MARKET FOR KEY INSIGHTS (aka AGILITY)

BUDGET CONSIDERATIONS

TECHNICAL SKILLS CONSIDERATIONS

MINIMAL DEPENDENCE ON IT FOR DRASTIC CHANGES

RIGIDITY OF SINGLE DATA MODEL

ABILITY TO HANDLE STREAMING DATA

SCALABILITY

MINIMAL SIZE FOR A BUSINESS ADLS PROJECT TEAM

Project Manager

Solution Architect

Data Engineer/Lead

Data Scientist

IAAS ADLS VS. PAAS: ADLS (GEN 2)

INGESTING DATA FROM VARIOUS SOURCES

MIGRATE FROM EXISTING ON-PREMISE DATA WAREHOUSE MOBILE DATA

ERP DATA WAREHOUSE

APP DATA

SENSOR DATA

MASTER DATA

PROGRAMATIC

MACHINE LEARNING SERVICES WITH LITTLE OR NO-CODE Run & Monitor Experiments

Register Models

Build Docker Images

Deploy Models

Create Pipeline

DEMONSTRATION ON AZURE FOR ADLS GEN 2

LESSONS LEARNED

The Soft Skills

Get Buy-In from Technical Staff (IT)

Security Policies are understood and use approved VM’s

Ensure Business/Technical Stakeholders are informed regularly.

The Technical Matters

ADDRESS DATA GOVERNANCE INTEGRITY SECURITY

ACCESS ONLY DATA YOU NEED REGULATIONS May Add Cost (Transport & Store)

ADD ALERTS MONITORING FOR ANY & ALL VM’S + SERVICES

AFTER FINISHED SHUT DOWN V-NET RESOURCE GROUPS UPDATED

AUTHORIZED AD USERS

POLICIES

IMPORTANT URL’S

Azure updates: https://azure.microsoft.com/en-us/updates/

Azure Blogs: https://azure.microsoft.com/en-us/blog/

Azure Data Lake Storage Gen2:

https://azure.microsoft.com/en-us/blog/under-the-hood-performance-scale-security-for-cloud-analytics-with-adls-gen2/

https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-define-outputs#blob-storage-and-azure-data-lake-gen2

SPARK to SQL SERVER: https://docs.microsoft.com/en-us/sql/big-data-cluster/spark-mssql-connector?view=sql-server-ver15

AZURE V-NET: https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-service-endpoints-overview

AZ 300 Ref Exam: https://www.microsoftpressstore.com/store/exam-ref-az-300-microsoft-azure-architect-technologies-9780135802540

THANK YOU FOR COMING!

Contact information:

E: Victor@tticorp.tech

P: 954-707-7545