+ All Categories
Home > Documents > Infrastructure Agnostic Machine Learning Workload Deployment

Infrastructure Agnostic Machine Learning Workload Deployment

Date post: 21-Feb-2022
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
38
Infrastructure Agnostic Machine Learning Workload Deployment Abi Akogun Data Science Consultant (MavenCode) Charles Adetiloye ML Platforms Engineer (MavenCode)
Transcript
Page 1: Infrastructure Agnostic Machine Learning Workload Deployment

Infrastructure Agnostic Machine Learning Workload Deployment

Abi Akogun Data Science Consultant (MavenCode)

Charles Adetiloye ML Platforms Engineer (MavenCode)

Page 2: Infrastructure Agnostic Machine Learning Workload Deployment

About MavenCodeMavenCode is an Artificial Intelligence Solutions company located in Dallas, Texas - We do training, product development, and consulting services in the following areas:

● Provisioning Scalable Data Processing Pipelines on Cloud Infrastructure

● Development & Deployment of Machine Learning and Artificial Intelligence Platforms

● Streaming and Big Data Analytics Edge-IoT and Sensors

Page 3: Infrastructure Agnostic Machine Learning Workload Deployment

About The Presenters

Charles Adetiloye is an ML Platforms Engineer

at MavenCode. He has well over 15 years of

experience building large-scale, distributed

applications. He has extensive experience

working and consulting with several companies

implementing production grade ML and AI

platforms

twitter.com/cadetiloye

Abiodun Akogun is a Machine Learning and Data

Science Consultant at Mavencode. He has extensive

experience building and deploying large-scale Machine

Learning Applications in different industries that

include Healthcare, Finance, Telecommunications, and

Insurance. He has experience solving several business

problems using Data Analytics, Sentiment Analysis,

Topic Modelling, Named Entity Recognition(N.E.R),

Opinion Mining, Data Mining, Time Series, Spatial

Statistics and Marketing Analytics

twitter.com/akogz

Page 4: Infrastructure Agnostic Machine Learning Workload Deployment

Agenda

▪ Overview of Machine Learning Model Deployment Workflow

▪ Various Approaches to model training, management, and serving in the Cloud

▪ Deploying Machine Learning Workloads in the Cloud

▪ Implementing Feature Storage backend for ML model training

▪ Running Spark Workloads for ML training on Kubernetes with Kubeflow

Page 5: Infrastructure Agnostic Machine Learning Workload Deployment

Overview of Machine Learning Deployment Workflow

Data Sourcing

Pre Processing

Feature Engineering

Model Training /

Evaluation

Model Scoring /Management

Model Inferencing

Page 6: Infrastructure Agnostic Machine Learning Workload Deployment

Machine Learning Workload Deployment

Data Sourcing

Pre Processing

Feature Engineering

Model Training /

Evaluation

Model Scoring /Management

Model Inferencing

Google Cloud AWS Azure On Prem

Page 7: Infrastructure Agnostic Machine Learning Workload Deployment

Machine Learning Deployment Effort

Data Verification

Configuration

FeatureExtraction

Data ValidationMachine Resource

Management

Serving Infrastructure Monitoring

Analysis Tool

Machine Learning Code

Data Preparation +Storage

Efficient Compute Resource Management

Page 8: Infrastructure Agnostic Machine Learning Workload Deployment

Overview of Machine Learning Deployment Workflow

Data Sourcing

Pre Processing

Feature Engineering

Model Training /

Evaluation

Model Scoring /Management

Model Inferencing

32%

10%

36%

2% 4%

16%

Page 9: Infrastructure Agnostic Machine Learning Workload Deployment

A Typical Machine Learning Developer Workflow

Data Sourcing

Pre Processing

Feature Engineering

Model Training /

Evaluation

Model Scoring

/Management

Model Inferencing

Azure Storage

Google Storage

AWS S3 Storage

Raw Data Transformation Processed Data

Storage Compute1 2

Google Cloud AI AWS Sage Maker Azure ML

Data Scientist / ML Engineers works on pulling or processing data first before starting ML training on a Managed Cloud Service

Raw Data Processing and Transformation Pipeline

Cloud Training Platforms

Page 10: Infrastructure Agnostic Machine Learning Workload Deployment

What Enterprise Machine Learning Workflow In the Cloud Looks Like!

Data Sourcing

Pre Processing

Feature Engineering

Azure Storage

Google Storage

AWS S3 Storage

Raw Data Transformation Processed Data

Storage Compute1 2

Team A

Team B

Team C

Team D

Google Cloud AI

AWS SageMaker

AWS SageMaker

Azure ML

Running ML workflow across the enterprise with multiple teams using different Cloud Provider technology stacks

Page 11: Infrastructure Agnostic Machine Learning Workload Deployment

Implementing Machine Learning solutions in the cloud comes at a cost, with cost of Compute and Storage on top of the list.

Page 12: Infrastructure Agnostic Machine Learning Workload Deployment

If we plan to be Cloud Neutral, can we abstract our ● Machine Learning Compute Workload→Kubernetes?● Machine Storage → Feature Store?

Page 13: Infrastructure Agnostic Machine Learning Workload Deployment

Google Cloud AI AWS Sage Maker Azure ML

A Typical Machine Learning Developer Workflow

Data Sourcing

Pre Processing

Feature Engineering

Model Training /

Evaluation

Model Scoring /Management

Model Inferencing

Azure Storage

Google Storage

AWS S3 Storage

Data Source Transformation Processed Data

Storage Compute1 2

Page 14: Infrastructure Agnostic Machine Learning Workload Deployment

Towards A Cloud Neutral ML Deployment Environment

Data Sourcing Pre ProcessingFeature Engineering

Model Training / Evaluation

Model Scoring /Management

Model Inferencing

Storage Compute1 2

Feature Store

Kubernetes

Page 15: Infrastructure Agnostic Machine Learning Workload Deployment

Why the need for Cloud Agnostic Deployment Infrastructure?

Page 16: Infrastructure Agnostic Machine Learning Workload Deployment

● Makes it easier to migrate workloads in a Hybrid Cloud Environment

● We are not tied to particular Cloud Infrastructure technology stack

● It’s easier to Implement best practice patterns and solutions

● Your team will have a common base denominator for all Enterprise ML workload

● Easy to control cost, manage utilization and forecast demand

Page 17: Infrastructure Agnostic Machine Learning Workload Deployment

Cloud Agnostic Machine Learning Development

Data Sourcing Pre ProcessingFeature Engineering

Model Training / Evaluation

Model Scoring /Management

Model Inferencing

Storage Compute1 2

Feature Store

Kubernetes

Azure StorageGoogle StorageAWS S3 Storage

Page 18: Infrastructure Agnostic Machine Learning Workload Deployment

What’s Feature Store All about?A Feature is a measurable observable attribute that is part of the input to a

Machine Learning Model.

Model Training

X1

X2

X3

Xn

[Feature Vector]

Model

Page 19: Infrastructure Agnostic Machine Learning Workload Deployment

What’s Feature Store All about?

Model Training

X1

X2

X3

Xn

[Feature Vector]

Model

Model 1

Features are derived from

● Raw Datastore

● Streaming Datasource

● Aggregates of Raw Inputs

● Windows (mins, hourly, daily, weekly)

Page 20: Infrastructure Agnostic Machine Learning Workload Deployment

Features Change Over time!

Model Training

X1

X2

X3

Xn

X1

X2

X3

Xn

X1

X2

X3

Xn

Time

Page 21: Infrastructure Agnostic Machine Learning Workload Deployment

Machine Learning Feature Store● Makes it easy to operationalize our ML workload, most importantly Data

Management and Storage for Model training

● Features can be shared easily amon teams running different Model

training pipelines

● We can get to version of datasets and track changes easily

● Consistency in Feature input attributes between Model Training and

Serving

Page 22: Infrastructure Agnostic Machine Learning Workload Deployment

● Offline Feature Store → Batching Training

● Online Feature Store → Inferencing / Serving

Types Of Feature Store

Page 23: Infrastructure Agnostic Machine Learning Workload Deployment

Implementing Offline Feature Storage with Apache Hudi

Azure Storage

Google StorageAWS S3 Storage

Streaming Source

Batch Job Operations

Datasource with Streaming sources like MQTT, Kafka, Pubsub etc

Batch Operations on Databases, FileStorage, Distributed Storage etc

Feature Store

Workflow Scheduling Orchestration with Kubeflow Pipelines or Airflow Dags on Kubernetes

Feature Store Implementation on any of the Major Cloud Storage

Page 24: Infrastructure Agnostic Machine Learning Workload Deployment

● A need for a Unified Platform where new data can be made available in addition to historical data within minutes.

● The need for a quick computation (or derivation ) of Feature vectors in other to make them available for our model input.

● Incremental Versioning of our Feature collections so that we can time-travel and use a particular set of features for Model training.

● Our Hudi dataset can be stored in Azure, Google Cloud, AWS cloud storage layer.

● Easy to implement all our code and everything we need to do with Spark and PySpark

Why did we use Apache Hudi?

Page 25: Infrastructure Agnostic Machine Learning Workload Deployment

Getting Data into Hudi Feature Store with Kubeflow Pipelineimport kfpfrom kfp import components

KafkaDatastreamer_op = kfp.components.create_component_from_func(KafkaDatastreamer,base_image="python:3.7.1”)

ValidatorOnSchema_op = kfp.components.create_component_from_func(ValidatorOnSchema,base_image="python:3.7.1")

PreProcessor_op = kfp.components.create_component_from_func(PreProcessor,base_image="python:3.7.1")

HudiTableWriter_op= kfp.components.create_component_from_func(HudiTableWriter, base_image="mavencode.io/spark:v3.1.1")

Page 26: Infrastructure Agnostic Machine Learning Workload Deployment

The Hudi Data Store writer

Configure the Spark Session with the packages needed to run hudi and avro

Hudi configuration Options

Writing the data into our Hudi data store in the right format

Page 27: Infrastructure Agnostic Machine Learning Workload Deployment

Cloud Agnostic Machine Learning Development

Data Sourcing Pre ProcessingFeature Engineering

Model Training / Evaluation

Model Scoring /Management

Model Inferencing

Storage Compute1 2

Feature Store

Kubernetes

Cloud Native ML Workload Deployment with Operators on Kubeflow

Cloud Native ML Training Deployment

● Containerized Workload

● Scalable + Can Run in Distributed Mode

● Efficient Compute Utilization

● Language Agnostic!

Page 28: Infrastructure Agnostic Machine Learning Workload Deployment

Machine Learning Operators with Kubeflow onKubernetes

● An Machine Learning Operator helps the deployment monitoring and management a model training life-cycle

● Some ML Operators found in Kubeflow are:○ TF-operator → Tensorflow Job○ Pytorch-operator → Pytorch Job○ Xgboost-operator → Xgboost Job○ Spark-operator → Spark and Spark ML Jobs

Page 29: Infrastructure Agnostic Machine Learning Workload Deployment

Cloud Agnostic Machine Learning Development

MLOps Model Training and Deployment Platform

Kubeflow Jupyter NoteBook Kubeflow Jupyter NoteBook Kubeflow Jupyter NoteBook Kubeflow Jupyter NoteBook

Namespace Namespace Namespace Namespace

Auto-Scalable CPU Node Pool Auto-Scalable GPU Node Pool

Spark Operator Spark Operator TensorFlow Operator Tensorflow Operator

Cloud Infrastructure Layer Running

Auto Scaling Node Pools Running Kubernetes

Machine Learning Operators running with Kubeflow

Feature Store

Page 30: Infrastructure Agnostic Machine Learning Workload Deployment

Using Spark Operator for Training ML Steps

PySpark ML Code

Containerizethe Python

Code

Create SparkApplication Kubernetes YAML

Deployment

Apply Deployment to

Kubernetes

Page 31: Infrastructure Agnostic Machine Learning Workload Deployment

Spark Operator on Kubernetes

API

Scheduler

OR OR OR

Spark Driver

Executors

Page 32: Infrastructure Agnostic Machine Learning Workload Deployment

Elastic Compute Resource ML Jobs

API

Scheduler

OR OR OR

kubectl apply -f ...

Page 33: Infrastructure Agnostic Machine Learning Workload Deployment

Deployment Configuration YAML

Spark Application Config that describes the job and the namespace where the job will run

Container that will run our Spark ML Code

Spark Drive and Executor Configuration

Page 34: Infrastructure Agnostic Machine Learning Workload Deployment

Connecting to Feature Store with Kubeflow Pipeline

Page 35: Infrastructure Agnostic Machine Learning Workload Deployment

Cost comparison with Managed Cloud service on AWS

30%

100%

15s

66s

Compute Utilization Cost Compute Startup Uptime Team Agility & Productivity

6x Productivity

Managed Services Running on AWS

Kubeflow + S3 Feast Storage ML workload

Page 36: Infrastructure Agnostic Machine Learning Workload Deployment

Summary● Implementing a Cloud neutral ML deployment approach

simplifies most of the complexities in a Multi-Cloud

environment

● After the initial hump, learning curve and the overall

team efficiency improves significantly

● Teams is not locked in to a particular Cloud

Infrastructure stack

● Easy to control cost and forecast future capacity

demands

Page 37: Infrastructure Agnostic Machine Learning Workload Deployment

THANK YOU!

Page 38: Infrastructure Agnostic Machine Learning Workload Deployment

Thank You!

If you are interested in learning more about how to run your Machine Learning Workloads on any Cloud Infrastructure or Onprem reach out to us

Drop us a mail [email protected]

Visit Us Onlinehttps://www.mavencode.com

Follow Ushttps://www.twitter.com/mavencode


Recommended