+ All Categories
Home > Documents > Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise...

Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise...

Date post: 20-May-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
36
Productionizing your Machine Learning Models Jon Peck Making state-of-the-art algorithms discoverable and accessible to everyone Fullstack Developer & Advocate [email protected] @peckjon bit.ly/AI-DW-19
Transcript
Page 1: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

Productionizing your Machine Learning Models

Jon Peck

Making state-of-the-art algorithms

discoverable and accessible to everyone

Fullstack Developer & Advocate

[email protected]

@peckjon

bit.ly/AI-DW-19

Page 2: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

2

The Problem: ML is in a huge growth phase, difficult/expensive for DevOps to keep up

Initially:

● A few models, a couple frameworks, 1-2 languages

● Dedicated hardware or VM Hosting

● IT Team or self-managed DevOps

● High time-to-deploy, manual discoverability

● Few end-users, heterogenous APIs (if any)

Pretty soon... ● > 9,500 algorithms (95k versions) on many runtimes / frameworks

● > 100k algorithm developers: heterogenous, largely unpredictable

● Each algorithm: 1 to 1,000 calls/second, a lot of variance

● Need auto-deploy, discoverability, low (10-15ms) latency

● Common API, composability, fine-grained security

Page 3: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

3

Challenges of deploying models in the enterprise

Machine learning

● CPU / GPU / Specialized hardware

● Multiple frameworks, languages,

dependencies

● Called from different devices &

architectures

“Snowflake” environments

● Unique cloud hardware and services

● DevOps teams not used to the specific

considerations of ML hosting

Security and Audit

● Stringent security and access controls

● “Who called what when” for audit & compliance

Uncharted territory

● Deployment is a new problem for datascience

teams; not a lot of literature / examples

● Redundant work across teams, lack of re-use

● New experience buying & managing

infrastructure or working w/ DevOps team

● How to handle chargebacks and billing

"Expecting your engineering and DevOps teams to deploy ML models well is like showing up to Seaworld with a giraffe since they are already handling large mammals.”

Page 4: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

MACHINE LEARNING !=

PRODUCTION MACHINE LEARNING

Page 5: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

Training vs Production

5

Data Scientists build and iterate over a model until it is ready to move to production

DevOps manages servers, task scheduling, etc to support execution of concurrent models

INFERENCE

Short compute bursts

TRAINING

Long compute cycle

Fixed load (Inelastic)

Stateful

Many users Single user

Stateless

Elastic

Users and Services run models ad-hoc (need: elasticity), and rarely from the same language they’re developed in (need: APIs)

Page 6: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

Training vs Production

6

Page 7: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

Deploying Models: raw server or cloud VM

1. Set up server ○ Select proper balance of CPU, GPU, memory, cost

○ Laborious to configure first time, but fairly easy to replicate

○ Expensive for higher-powered machines (especially GPUs)

2. Create microservice ○ Write API wrapper (e.g., Flask)

○ Will be usable from any language, environment

○ How to secure, meter, disseminate?

3. Add scaling ○ Cloud VMs can scale by adding more copies

(usu billed per machine-hour)

○ Write/config automation to predict load & create VMs

4. Repeat for each unique environment ○ Separate server for each model?

○ Or deal with dependency & resource conflicts?

Flask source: Jeff Klukas 7

Page 8: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

Deploying Models: serverless functions

● Initially, this looks great ○ Simple setup: just fill out a function body

○ Automatic API wrappers or configurable API gateway

○ No DevOps: maintenance handled by provider

○ Instant, elastic scaling (big cost savings)

○ Cheap: usu billed per-second, and free when not in use

● But there are some significant limitations ○ Not optimized for ML

○ Languages: Node & some Python, Java, C#

○ Limited dependency support

○ No GPUs!

○ Max execution time: 5-15 minute

○ Little/no consumer-facing UI

8

Page 9: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

What should a mature solution have?

● Broad lang & lib support: any language & dependencies

● GPU support: fast exec & memory for GPU models

● Elasticity & concurrency: instantly scale up/down with demand; many copies of different models

● Automatic API: datascientists not responsible for serializing JSON or managing server frameworks

● Pipelining: common API across models, data passing

● Built-in security: auth, process isolation, user data

● Long timeouts: predictions may take ms or an hour

● Versioning and Grouping: public / private / group visibility of models, all old versions executable

(no broken services)

● Portability: run in-house or on any cloud(s)

● Discoverability / model-management UI: find & share well-described models, “run an example”,

cut-and-paste API code in every language

9

Page 10: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

Building it: start with containers, add scaling / replication

User

Web Load Balancer

API Load Balancer

Web Servers

API Servers

Cloud Region #1

Worker xN

Docker(algorithm#1)

..

Docker(algorithm#n)

Cloud Region #2

Worker xN

Docker(algorithm#1)

..

Docker(algorithm#n)

+

+

10

● ML models as serverless microservices: allows isolation, promotes model re-use and modularity ● Ability to replicate containers and move between regions allows for scaling, portability, low-latency

Page 11: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

Design containers to support all languages, flexible enough to add any library

FoodClassifier

FruitClassifier VeggieClassifier

...don’t forget to make GPU versions, too 11

Page 12: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

Make it easy for datascientists to add new models

12

● Continuous Deployment speeds production: GIT code management, develop locally or Web IDE ● User and group namespaces, private / public / group visibility, pricing & dept chargebacks

Page 13: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

13

Add pipelining and intelligent orchestration

Known:

1. Typical execution path

2. Compute & memory per algo

Optimize for:

1. Minimum network latency

2. Maximum throughput

3. Minimum resource use

‣ CPU ‣ Memory

‣ GPU ‣ I/O

A

‣ CPU ‣ Memory

‣ GPU ‣ I/O

B

‣ CPU ‣ Memory

‣ GPU ‣ I/O

C

cat foo.txt | keyword.sh | ranker.sh

Page 14: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

14

● Semantic versioning for models, just like with any other software (1.2.x)

● All versions of model are runnable at any time

● Compare versions of the model, to verify and see changes in performance (speed,

accuracy), and manage model drift

● App Devs can stay a version behind, or use different versions for different contexts

● Rolling, non-interruptive deployments: model improvements that don’t break existing code

Support standardized versioning

Page 15: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

15

Key production metrics:

● Latency

● Resources used (CPU/GPU, I/O)

● System Capacity

● Scale up and Scale down

● Authentication

● API timing metrics and calls

● Errors rates

But also:

● What teams are using the models

● What applications are using them

● Billing & chargebacks

● Understand if AI investments are paying off

● See business impact across organization

Provide logging and analytics

Page 16: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

16

Compute EC2 CE VM ESX

Autoscaling Autoscaling Group Autoscaler Scale Set Orchestrator /

BYO

Load Balancing Elastic Load

Balancer Load Balancer Load Balancer NSX / BYO

DataBase RDS Cloud SQL Azure SQL DB BYO

Object Storage S3 Cloud Storage Azure Blobs BYO

Block Storage Elastic Block Store Persistent Disk Azure Disks VMFS

Build abstraction layers for all infrastructure providers

Page 17: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

17

Expose user-friendly storage abstraction

# No storage abstraction

s3 = boto3.client("s3")

obj = s3.get_object(Bucket="bucket-name", Key="records.csv")

data = obj["Body"].read()

# With storage abstraction

data = client.file("blob://records.csv").get()

s3://foo/bar

blob://foo/bar

hdfs://foo/bar

dropbox://foo/bar

etc.

Page 18: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

Build a model portfolio UI for easy discovery & testing

18

● Models are only as useful as their docs: creators write descriptions which live with the model ● Categories / tags / search for users to find the models they need (and see only the ones allowed) ● Test models right inside the catalog, before integrating into app code ● Encourage model re-use and improve efficiency across teams, while respecting access rights

Page 19: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

Design a consistent API with clients in every language

19

● Models are often written in one lang but consumed in another (or many) ● Provide cut-and-paste code for any model / language combination ● ZERO time from model deployment to usability: drastically reduce the length of total dev pipeline

Page 20: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

20

Make the public platform available to anyone, anywhere

Page 21: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

21

ALGORITHMIA ENTERPRISE - your company’s private ML inventory & model-as-a-service platform

Deploy

Develop models

in any language,

framework, or

infrastructure

Scale

Expose models as

highly-reliable

versioned APIs that

autoscale to 100s

of reqs/second

Discover

Describe your

model in a central

catalog where

peers can easily

discover & use it

Monitor

House thousands of

models under one

roof with a uniform

REST interface and a

central dashboard

Make your platform deployable on any org private cloud

Page 22: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

Try it yourself: deploy a model on Algorithmia

http://bit.ly/algodev -> digit_recognition

Looking for more?

Page 23: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

Jon Peck Developer Advocate

FREE STUFF

$50 free at Algorithmia.com signup code: AI-DW-19

WE ARE HIRING

algorithmia.com/jobs ● Seattle or Remote ● Bright, collaborative env ● Unlimited PTO ● Dog-friendly

[email protected]

@peckjon

bit.ly/AI-DW-19

THANK YOU!

Tell the world that the future is here

Page 24: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

Appendix

24

Page 25: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,
Page 26: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,
Page 27: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,
Page 28: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,
Page 29: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,
Page 30: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,
Page 31: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,
Page 32: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,
Page 33: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,
Page 34: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,
Page 35: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,
Page 36: Productionizing your Machine Learning Models · Challenges of deploying models in the enterprise Machine learning CPU / GPU / Specialized hardware Multiple frameworks, languages,

Jon Peck Developer Advocate

FREE STUFF

$50 free at Algorithmia.com signup code: AI-DW-19

WE ARE HIRING

algorithmia.com/jobs ● Seattle or Remote ● Bright, collaborative env ● Unlimited PTO ● Dog-friendly

[email protected]

@peckjon

bit.ly/AI-DW-19

THANK YOU!


Recommended