OS for AI: Serverless, Productionized Machine …...OS for AI: Serverless, Productionized Machine...

OS for AI: Serverless,

Productionized Machine Learning

Jon Peck

Making state-of-the-art algorithms discoverable and accessible to everyone

Full-Spectrum Developer & Advocate

[email protected]

@peckjon

bit.ly/dev-week-austin-19

2

The Problem: ML is in a huge growth phase, difficult/expensive for DevOps to keep up

Initially:

● A few models, a couple frameworks, 1-2 languages

● Dedicated hardware or VM Hosting

● IT Team or self-managed DevOps

● High time-to-deploy, manual discoverability

● Few end-users, heterogenous APIs (if any)

Pretty soon... ● > 9,500 algorithms (95k versions) on many runtimes / frameworks

● > 100k algorithm developers: heterogenous, largely unpredictable

● Each algorithm: 1 to 1,000 calls/second, a lot of variance

● Need auto-deploy, discoverability, low (10-15ms) latency

● Common API, composability, fine-grained security

3

Challenges of deploying ML models in the enterprise

Machine learning

● CPU / GPU / Specialized hardware

● Multiple frameworks, languages,

dependencies

● Called from different devices &

architectures

“Snowflake” environments

● Unique cloud hardware and services

● DevOps teams not used to the specific

considerations of ML hosting

Security and Audit

● Stringent security and access controls

● “Who called what when” for audit & compliance

Uncharted territory

● Deployment is a new problem for datascience

teams; not a lot of literature / examples

● Redundant work across teams, lack of re-use

● New experience buying & managing

infrastructure or working w/ DevOps team

● How to handle chargebacks and billing

"Expecting your engineering and DevOps teams to deploy ML models well is like showing up to Seaworld with a giraffe since they are already handling large mammals.”

MACHINE LEARNING !=

PRODUCTION MACHINE LEARNING

"Expecting your engineering and DevOps teams to deploy ML models well is like showing up to Seaworld with a giraffe since they are already handling large mammals.” - Mike Anderson

An Operating System:

• Provides common functionality needed by many programs

• Standardizes conventions to make systems easier to work with

• Presents a higher level abstraction of the underlying hardware

The evolution of modern Operating Systems followed a long evolution, as we collectively learned

what the common problems were, what abstractions to build to increase our productivity.

What will that evolution might look like for AI?

5

The Need: an “Operating System for AI”

iOS/Android Built-in App Store (Discoverability)

Punch Cards 1970s

Unix Multi-tenancy, Composability

DOS Hardware Abstraction

GUI (Win/Mac) Accessibility

Training vs Production ≅ Building vs Running Apps

6

Data Scientists build and iterate over a model until it is ready to move to production Similar to building an app

DevOps manages servers, task scheduling, etc to support execution of concurrent models

Users and Services run models ad-hoc (need: elasticity), and rarely from the same language they’re developed in (need: APIs) Similar to running an app in an OS

INFERENCE

Short compute bursts

TRAINING

Long compute cycle

Fixed load (Inelastic)

Stateful

Many users Single user

Elastic

Stateless

Training vs Production

7

Deploying Models: raw server or cloud VM

1. Set up server ○ Select proper balance of CPU, GPU, memory, cost

○ Laborious to configure first time, but fairly easy to replicate

○ Expensive for higher-powered machines (especially GPUs)

2. Create microservice ○ Write API wrapper (e.g., Flask)

○ Will be usable from any language, environment

○ How to secure, meter, disseminate?

3. Add scaling ○ Cloud VMs can scale by adding more copies

(usu billed per machine-hour)

○ Write/config automation to predict load & create VMs

4. Repeat for each unique environment ○ Separate server for each model?

○ Or deal with dependency & resource conflicts?

Flask source: Jeff Klukas 8

Deploying Models: serverless functions

● Initially, this looks great ○ Simple setup: just fill out a function body

○ Automatic API wrappers or configurable API gateway

○ No DevOps: maintenance handled by provider

○ Instant, elastic scaling (big cost savings)

○ Cheap: usu billed per-second, and free when not in use

● But there are some significant limitations ○ Not optimized for ML

○ Languages: Node & some Python, Java, C#

○ Limited dependency support, or local container build

○ No GPUs!

○ Max execution time: 5-15 minute

○ Little/no consumer-facing UI

9

● Broad lang & lib support: any language & dependencies

● GPU support: fast exec & memory for GPU models

● Elasticity & concurrency: instantly scale up/down with demand; many copies of different models

● Automatic API: datascientists not responsible for serializing JSON or managing server frameworks

● Pipelining: common API across models, data passing

● Built-in security: auth, process isolation, user data

● Long timeouts: predictions may take ms or an hour

● Versioning and Grouping: public / private / group visibility of models, all old versions executable

(no broken services)

● Portability: run in-house or on any cloud(s)

● Discoverability / model-management UI: find & share well-described models, “run an example”,

cut-and-paste API code in every language

10

The Need: an “Operating System for AI” AI/ML scalable infrastructure on demand + marketplace/UI

Building it: start with containers, add scaling / replication

User

Web Load Balancer

API Load Balancer

Web Servers

API Servers

Cloud Region #1

Worker xN

Docker(algorithm#1)

..

Docker(algorithm#n)

Cloud Region #2

Worker xN

Docker(algorithm#1)

..

Docker(algorithm#n) +

+

11

● ML models as serverless microservices: allows isolation, promotes model re-use and modularity ● Ability to replicate containers and move between regions allows for scaling, portability, low-latency

Design containers to support all languages, flexible enough to add any library

FoodClassifier

FruitClassifier VeggieClassifier

...don’t forget to make GPU versions, too 12

Make it easy for datascientists to add new models

13

● Continuous Deployment speeds production: GIT code management, develop locally or Web IDE ● User and group namespaces, private / public / group visibility, pricing & dept chargebacks

14

Add pipelining and intelligent orchestration

Known:

1. Typical execution path

2. Compute & memory per algo

Optimize for:

1. Minimum network latency

2. Maximum throughput

3. Minimum resource use

‣ CPU ‣ Memory

‣ GPU ‣ I/O

A

‣ CPU ‣ Memory

‣ GPU ‣ I/O

B

‣ CPU ‣ Memory

‣ GPU ‣ I/O

C

cat foo.txt | keyword.sh | ranker.sh

15

● Semantic versioning for models, just like with any other software (1.2.x)

● All versions of model are runnable at any time

● Compare versions of the model, to verify and see changes in performance (speed,

accuracy), and manage model drift

● App Devs can stay a version behind, or use different versions for different contexts

● Rolling, non-interruptive deployments: model improvements that don’t break existing code

Support standardized versioning

16

Key production metrics:

● Latency

● Resources used (CPU/GPU, I/O)

● System Capacity

● Scale up and Scale down

● Authentication

● API timing metrics and calls

● Errors rates

But also:

● What teams are using the models

● What applications are using them

● Billing & chargebacks

● Understand if AI investments are paying off

● See business impact across organization

Provide logging and analytics

17

Compute EC2 CE VM Nova

Autoscaling Autoscaling Group Autoscaler Scale Set Heat Scaling Policy

Load Balancing Elastic Load

Balancer Load Balancer Load Balancer LBaaS

Remote Storage Elastic Block Store Persistent Disk File Storage Block Storage

Partial source: Sam Ghods, KubeConf 2016

Build abstraction layers for all infrastructure providers

18

Expose user-friendly storage abstraction

# No storage abstraction

s3 = boto3.client("s3")

obj = s3.get_object(Bucket="bucket-name", Key="records.csv")

data = obj["Body"].read()

# With storage abstraction

data = client.file("s3://bucket-name/records.csv").get()

s3://foo/bar

blob://foo/bar

hdfs://foo/bar

dropbox://foo/bar

etc.

Build a model portfolio UI for easy discovery & testing

19

● Models are only as useful as their docs: creators write descriptions which live with the model ● Categories / tags / search for users to find the models they need (and see only the ones allowed) ● Test models right inside the catalog, before integrating into app code ● Encourage model re-use and improve efficiency across teams, while respecting access rights

Design a consistent API with clients in every language

20

● Models are often written in one lang but consumed in another (or many) ● Provide cut-and-paste code for any model / language combination ● ZERO time from model deployment to usability: drastically reduce the length of total dev pipeline

21

Make the public platform available to anyone, anywhere

ALGORITHMIA ENTERPRISE - your company’s private ML inventory & model-as-a-service platform

Deploy

Develop models

in any language,

framework, or

infrastructure

Scale

Expose models as

highly-reliable

versioned APIs that

autoscale to 100s

of reqs/second

Discover

Describe your

model in a central

catalog where

peers can easily

discover & use it

Monitor

House thousands of

models under one

roof with a uniform

REST interface and a

central dashboard

Make the platform deployable on any org’s private cloud

22

Jon Peck Developer Advocate

FREE STUFF

$50 free at Algorithmia.com signup code: dev-week-austin-19

WE ARE HIRING

algorithmia.com/jobs ● Seattle or Remote ● Bright, collaborative env ● Unlimited PTO ● Dog-friendly

[email protected]

@peckjon


THANK YOU!

http://algorithmia.com/jobs

Appendix

24

Try it yourself: deploy a model on Algorithmia

http://bit.ly/algodev -> digit_recognition

Looking for more?

http://bit.ly/algodev

https://github.com/algorithmiaio/sample-apps/tree/master/algo-dev-demo/digit_recognition

Jon Peck Developer Advocate

FREE STUFF

$50 free at Algorithmia.com signup code: dev-week-austin-19

WE ARE HIRING

algorithmia.com/jobs ● Seattle or Remote ● Bright, collaborative env ● Unlimited PTO ● Dog-friendly

[email protected]

@peckjon


THANK YOU!

http://algorithmia.com/jobs

Date post:	20-May-2020
Category:	Documents
Upload:	others
View:	29 times
Download:	0 times

OS for AI: Serverless, Productionized Machine …...OS for AI: Serverless, Productionized Machine...

Documents