How Microservices and Serverless Computing Enable the Next … · 2020-01-01 · How Microservices...

Post on 21-May-2020

6 views 0 download

transcript

How Microservices and Serverless ComputingEnable the Next Gen of Machine Intelligence

Jon Peck

Making state-of-the-art algorithmsdiscoverable and accessible to everyone

Full-Spectrum Developer & Advocate

jpeck@algorithmia.com@peckjon

Algorithmia.com

2

The Problem: ML is in a huge growth phase,

difficult/expensive for DevOps to keep up

Initially:

● A few models, a couple frameworks, 1-2 languages● Dedicated hardware or VM Hosting● IT Team for DevOps

● High time-to-deploy, manual discoverability● Few end-users, heterogenous APIs (if any)

Pretty soon...● > 5,000 algorithms (50k versions) on many runtimes / frameworks

● > 60k algorithm developers: heterogenous, largely unpredictable● Each algorithm: 1 to 1,000 calls/second, a lot of variance

● Need auto-deploy, discoverability, low (15ms) latency● Common API, composability, fine-grained security

3

The Need: an “Operating System for AI”AI/ML scalable infrastructure on demand + marketplace

● Function-as-a-service for Machine & Deep Learning

● Discoverable, live inventory of AI via APIs

● Anyone can contribute & use

● Composable, Monetizable

● Every developer on earth can make their app intelligent

An Operating System for AIWhat did the evolution of OS look like?

iOS/AndroidBuilt-in App Store(Discoverability)

Punch Cards1970s

UnixMulti-tenancy, Composability

DOSHardware Abstraction

GUI (Win/Mac)Accessibility

4

General-purpose computing had a long evolution, as we learned what the common problems were / what abstractions to build. AI is in the earlier stages of that evolution.

An Operating System:

• Provides common functionality needed by many programs• Standardizes conventions to make systems easier to work with• Presents a higher level abstraction of the underlying hardware

Use CaseJian Yang made an app to recognize food “SeeFood”

© H BO A ll R ights R eserved 5

Use CaseHe deployed his trained model to a GPU-enabled server

GPU-enabled Server

?

6

Use CaseThe app is a hit!

SeeFoodP ro d u c t iv i ty

7

??

Use Case… and now his server is overloaded.

GPU-enabled Server

?

xN

8

• Two distinct phases: training and inference

• Lots of processing power

• Heterogenous hardware (CPU, GPU, FPGA, TPU, etc.)

• Limited by compute rather than bandwidth

• “Tensorflow is open source, scaling it is not.”

Characteristics of AI

9

10

TRAINING

Long com pute cycle

Fixed load (Inelastic)

Statefu l

OWNER: Data Scientists

Single user

TRAINING

11

Long com pute cycle

Fixed load (Inelastic)

Statefu l

OWNER: Data Scientists

Single user

Analogous to dev tool chain.Building and iterating over a

model is similar to building an app.

Metal or VM

12

INFERENCE

Short com pute bursts

OWNER: DevOps

TRAINING

Long com pute cycle

Fixed load (Inelastic)

Statefu l

OWNER: Data Scientists

M ultip le usersSingle user

State less

Elastic

Analogous to dev tool chain.Building and iterating over a

model is similar to building an app.

Metal or VM

13

INFERENCE

Short com pute bursts

OWNER: DevOps

TRAINING

Long com pute cycle

Fixed load (Inelastic)

Statefu l

OWNER: Data Scientists

M ultip le usersSingle user

State less

Elastic

Analogous to an OS. Running concurrent models

requires task scheduling.

Analogous to dev tool chain.Building and iterating over a

model is similar to building an app.

Metal or VM

14

INFERENCE

Short com pute bursts

OWNER: DevOps

TRAINING

Long com pute cycle

Fixed load (Inelastic)

Statefu l

OWNER: Data Scientists

M ultip le usersSingle user

State less

Elastic

Containers

Analogous to an OS. Running concurrent models

requires task scheduling.

Analogous to dev tool chain.Building and iterating over a

model is similar to building an app.

Metal or VM

15

INFERENCE

Short com pute bursts

OWNER: DevOps

TRAINING

Long com pute cycle

Fixed load (Inelastic)

Statefu l

OWNER: Data Scientists

M ultip le usersSingle user

State less

Elastic

Containers Kubernetes

Analogous to an OS. Running concurrent models

requires task scheduling.

Analogous to dev tool chain.Building and iterating over a

model is similar to building an app.

Metal or VM

16

INFERENCE

Short com pute bursts

State less

Elastic

M ultip le users

Containers Kubernetes

OWNER: DevOps

TRAINING

Long com pute cycle

Fixed load (Inelastic)

Statefu l

S ingle user

OWNER: Data Scientists

Analogous to an OS. Running concurrent models

requires task scheduling.

Analogous to dev tool chain.Building and iterating over a

model is similar to building an app.

Metal or VM

MICROSERVICES: the design of a system as independently deployable, loosely coupled services.

Microservices & Serverless Computing => ML Hosting

ADVANTAGES

• Maintainable, Scalable• Software & Hardware Agnostic• Rolling deployments

SERVERLESS: the encapsulation, starting, and stopping of singular functions per request, with a just-in-time-compute model.

ADVANTAGES

• Elasticity, Cost Efficiency• Concurrency• Improved Latency

+ +17

Why Serverless - Cost EfficiencyC

alls

per

Sec

ond

Max calls/s

Avg calls/s

40

35

30

25

20

15

10

5

GP

U S

erve

r In

stan

ces

12AM

02AM

04AM

06AM

08AM

10AM

12PM

02PM

04PM

06PM

08PM

10PM

160

140

120

100

80

60

40

20

Jian Yang’s “SeeFood” is most active during lunchtime.

18

Traditional Architecture - Design for MaximumC

alls

per

Sec

ond

Max calls/s

Avg calls/s

40

35

30

25

20

15

10

5

12AM

02AM

04AM

06AM

08AM

10AM

12PM

02PM

04PM

06PM

08PM

10PM

40 machines 24 hours. $648 * 40 = $25,920 per month

GP

U S

erve

r In

stan

ces

160

140

120

100

80

60

40

20

19

Autoscale Architecture - Design for Local MaximumC

alls

per

Sec

ond

Max calls/s

Avg calls/s

40

35

30

25

20

15

10

5

12AM

02AM

04AM

06AM

08AM

10AM

12PM

02PM

04PM

06PM

08PM

10PM

19 machines 24 hours. $648 * 40 = $12,312 per month

GP

U S

erve

r In

stan

ces

160

140

120

100

80

60

40

20

20

Serverless Architecture - Design for MinimumC

alls

per

Sec

ond

Max calls/s

Avg calls/s

40

35

30

25

20

15

10

5

12AM

02AM

04AM

06AM

08AM

10AM

12PM

02PM

04PM

06PM

08PM

10PM

Avg. of 21 calls / sec, or equivalent of 6 machines. $648 * 6 = $3,888 per month

160

140

120

100

80

60

40

20

GP

U S

erve

r In

stan

ces

21

??

Why Serverless - Concurrency

GPU-enabled Servers

?

Lo

ad

Ba

lan

ce

r

22

Why Serverless - Improved LatencyPortability = Low Latency

23

24

+ +

Almost there! We also need:

GPU Memory Management, Job Scheduling, Cloud Abstraction,

Discoverability, Authentication, Logging, etc.

25

Elastic Scale

User

Web Load Balancer

API Load Balancer

Web Servers

API Servers

Cloud Region #1

Worker xN

D o c k e r ( a lg o r i th m # 1 )

. .

D o c k e r ( a lg o r i th m # n )

Cloud Region #2

Worker xN

D o c k e r ( a lg o r i th m # 1 )

. .

D o c k e r ( a lg o r i th m # n )

26

Elastic Scaling with

Intelligent Orchestration

Knowing that:

● Algorithm A always calls Algorithm B● Algorithm A consumes X CPU, X Memory, etc● Algorithm B consumes X CPU, X Memory, etc

Therefore we can slot them in a way that:

● Reduce network latency● Increase cluster utilization● Build dependency graphs

FoodClassifier

FruitClassifier VeggieClassifier

Runtime Abstraction

27

Composability

Composability is critical for AI workflows because of data

processing pipelines and ensembles.

Fruit or VeggieClassifier

FruitClassifier

VeggieClassifiercat file.csv | grep foo | wc -l

28

Cloud Abstraction - Storage

# No storage abstraction

s3 = boto3.client("s3")

obj = s3.get_object(Bucket="bucket-name", Key="records.csv")

data = obj["Body"].read()

# With storage abstraction

data = client.file("blob://records.csv").get()

s3://foo/bar

blob://foo/bar

hdfs://foo/bardropbox://foo/bar

etc.

29

Compute EC2 CE VM Nova

Autoscaling Autoscaling Group Autoscaler Scale Set Heat Scaling Policy

Load BalancingElastic Load

BalancerLoad Balancer Load Balancer LBaaS

Remote Storage Elastic Block Store Persistent Disk File Storage Block Storage

Partial Source: Sam Ghods, KubeConf 2016

Cloud Abstraction

30

Runtime Abstraction

Support any program m ing language or fram ew ork, includ ing interoperability betw een m ixed stacks.

Elastic Scale

Prioritize and autom atically optim ize execution of concurrent short-lived jobs.

Cloud Abstraction

Provide portab ility to algorithm s, includ ing public clouds or private clouds.

Discoverability, Authentication, Instrumentation, etc.

Shell & Services

Kernel

An Operating System for AI: the “AI Layer”

31

Discoverability: an App Store for AI

32

Algorithmia’s OS for AI: discover a model

1. Discover a model

● AppStore-like interface

● Categorized, tagged, rated

● Well-described

(purpose, source, API)

33

Algorithmia’s OS for AI: execute a model

2. Execute from any language

● Raw JSON, or lang stubs

● Common syntax

● Autoscaled elastic cloud-exec

● Secure, isolated

● Concurrent, orchestrated

● 15ms overhead

● Hardware agnostic

34

Algorithmia’s OS for AI: add a model

3. Add new models

● Many languages, frameworks

● Instant JSON API

● Call other models seamlessly

(regardless of lang)

● Granular permissions

● GPU environments

● Namespaces & versioning

Jon Peck Developer Advocate

Thank you!FREE STUFF

$50 free at Algorithmia.comsignup code: WOSC18

jpeck@algorithmia.com@peckjon

Algorithmia.com WE ARE HIRING

algorithmia.com/jobs● Seattle or Remote● Bright, collaborative env● Unlimited PTO● Dog-friendly