+ All Categories
Home > Technology > (CMP305) Deep Learning on AWS Made EasyCmp305

(CMP305) Deep Learning on AWS Made EasyCmp305

Date post: 20-Mar-2017
Category:
Upload: amazon-web-services
View: 2,150 times
Download: 1 times
Share this document with a friend
78
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Danny Bickson, Co-founder DATO CMP305 Deep Learning on AWS Made Easy October 2015
Transcript
Page 1: (CMP305) Deep Learning on AWS Made EasyCmp305

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Danny Bickson, Co-founder DATO

CMP305

Deep Learning on AWSMade Easy

October 2015

Page 2: (CMP305) Deep Learning on AWS Made EasyCmp305

2

Who is Dato?

Seattle-based Machine Learning Company

45+ and growing fast!

Page 3: (CMP305) Deep Learning on AWS Made EasyCmp305

Deep learning example

©Dato

Page 4: (CMP305) Deep Learning on AWS Made EasyCmp305

4

Image classification

Input: xImage pixels

Output: yPredicted object

Page 5: (CMP305) Deep Learning on AWS Made EasyCmp305

Neural networks

Learning *very* non-linear features

Page 6: (CMP305) Deep Learning on AWS Made EasyCmp305

6

Linear classifiers (binary)

Score(x) > 0 Score(x) < 0

Score(x) = w0 + w1 x1 + w2 x2 + … + wd xd

Page 7: (CMP305) Deep Learning on AWS Made EasyCmp305

7

Graph representation of classifier:

useful for defining neural networks

x1

x2

xd

y…

1

w2

> 0, output 1

< 0, output 0

Input Output

Score(x) = w0 + w1 x1 + w2 x2 + … + wd xd

Page 8: (CMP305) Deep Learning on AWS Made EasyCmp305

8

What can a linear classifier represent?

x1 OR x2 x1 AND x2

x1

x2

1

y x1

x2

1

y1

1

-0.5

1

1

-1.5

Page 9: (CMP305) Deep Learning on AWS Made EasyCmp305

9

What can’t a simple linear

classifier represent?

XOR the counterexample

to everything

Need non-linear features

Page 10: (CMP305) Deep Learning on AWS Made EasyCmp305

Solving the XOR problem:

Adding a layerXOR = x1 AND NOT x2 OR NOT x1 AND x2

z1

-0.5

1

-1

z1 z2

z2

-0.5

-1

1

x1

x2

1

y

1-0.5

1

1

Thresholded to 0 or 1

Page 11: (CMP305) Deep Learning on AWS Made EasyCmp305

11

A neural network• Layers and layers and layers of

linear models and non-linear transformations

• Around for about 50 years

• In last few years, big resurgence- Impressive accuracy on several benchmark problems

- Advanced in hardware allows computation (i.e. aws g2 instances)

x1

x2

1

z1

z2

1

y

Page 12: (CMP305) Deep Learning on AWS Made EasyCmp305

Application of deep learning

to computer vision

Page 13: (CMP305) Deep Learning on AWS Made EasyCmp305

13

Feature detection – traditional approach

• Features = local detectors- Combined to make prediction

- (in reality, features are more low-level)

Face!

Eye

Eye

Nose

Mouth

Page 14: (CMP305) Deep Learning on AWS Made EasyCmp305

14

SIFT [Lowe ‘99]

•Spin Images [Johnson & Herbert ‘99]

•Textons[Malik et al. ‘99]

•RIFT[Lazebnik ’04]

•GLOH[Mikolajczyk & Schmid ‘05]

•HoG

[Dalal & Triggs ‘05]

•…

Many hand created features exist for finding interest points…

Page 15: (CMP305) Deep Learning on AWS Made EasyCmp305

15

Standard image

classification approach

Input Use simple classifiere.g., logistic regression, SVMs

Face?

Extract features

Hand-created

features

Page 16: (CMP305) Deep Learning on AWS Made EasyCmp305

16

SIFT [Lowe

‘99]

•Spin Images [Johnson & Herbert ‘99]

•Textons[Malik et al. ‘99]

•RIFT[Lazebnik ’04]

•GLOH[Mikolajczyk & Schmid ‘05]

•HoG

[Dalal & Triggs ‘05]

•…

Many hand created features exist for finding interest points…

Hand-created

features

… but very painful to design

Page 17: (CMP305) Deep Learning on AWS Made EasyCmp305

17

Deep learning:

implicitly learns features

Layer 1 Layer 2 Layer 3 Prediction

Example

detectors

learned

Example

interest points

detected

[Zeiler & Fergus ‘13]

Page 18: (CMP305) Deep Learning on AWS Made EasyCmp305

Deep learning performance

Page 19: (CMP305) Deep Learning on AWS Made EasyCmp305

Deep learning accuracy

• German traffic sign recognition benchmark- 99.5% accuracy (IDSIA

team)

• House number recognition- 97.8% accuracy per character

[Goodfellow et al. ’13]

Page 20: (CMP305) Deep Learning on AWS Made EasyCmp305

ImageNet 2012 competition: 1.2M training images, 1000 categories

0

0.05

0.1

0.15

0.2

0.25

0.3

SuperVision ISI OXFORD_VGGErr

or

(best

of 5 g

uesses)

Huge

gain

Exploited hand-coded features like SIFT

Top 3 teams

Page 21: (CMP305) Deep Learning on AWS Made EasyCmp305

ImageNet 2012 competition:

1.2M training images, 1000 categoriesWinning entry: SuperVision

8 layers, 60M parameters [Krizhevsky et al. ’12]

Achieving these amazing results required:

• New learning algorithms

• GPU implementation

Page 22: (CMP305) Deep Learning on AWS Made EasyCmp305

Deep learning performance• ImageNet: 1.2M images

0

10

20

30

40

50

60

g2.xlarge g2.8xlarge

Running time (hours)

Page 23: (CMP305) Deep Learning on AWS Made EasyCmp305

Deep learning in computer vision

Page 24: (CMP305) Deep Learning on AWS Made EasyCmp305

Scene parsing with deep learning

[Farabet et al. ‘13]

Page 25: (CMP305) Deep Learning on AWS Made EasyCmp305

Retrieving similar imagesInput Image Nearest neighbors

Page 26: (CMP305) Deep Learning on AWS Made EasyCmp305

Deep learning usability

Page 27: (CMP305) Deep Learning on AWS Made EasyCmp305

Designed a simple user interface

#training the model

model = graphlab.neuralnet.create(train_images)

#predicting classes for new images

outcome = model.predict(test_images)

Page 28: (CMP305) Deep Learning on AWS Made EasyCmp305

Deep learning demo

Page 29: (CMP305) Deep Learning on AWS Made EasyCmp305

Challenges of deep learning

Page 30: (CMP305) Deep Learning on AWS Made EasyCmp305

Deep learning score cardPros

• Enables learning of features rather than hand tuning

• Impressive performance gains

- Computer vision

- Speech recognition

- Some text analysis

• Potential for more impact

Page 31: (CMP305) Deep Learning on AWS Made EasyCmp305

Deep learning workflow

Lots of

labeled

data

Training

set

Validation

set

Learn

deep

neural net

Validate

Adjust

parameters,

network

architecture,…

Page 32: (CMP305) Deep Learning on AWS Made EasyCmp305

32

Many tricks needed to work well…

Different types of layers, connections,… needed for high accuracy

[Krizhevsky et al. ’12]

Page 33: (CMP305) Deep Learning on AWS Made EasyCmp305

Deep learning score cardPros

• Enables learning of features rather than hand tuning

• Impressive performance gains

- Computer vision

- Speech recognition

- Some text analysis

• Potential for more impact

Cons

• Requires a lot of data for

high accuracy

• Computationally

really expensive

• Extremely hard to tune

- Choice of architecture

- Parameter types

- Hyperparameters

- Learning algorithm

- …

Computational cost+ so many

choices

=

incredibly hard to tune

Page 34: (CMP305) Deep Learning on AWS Made EasyCmp305

Deep features:

Deep learning

+

Transfer learning

Page 35: (CMP305) Deep Learning on AWS Made EasyCmp305

35

Standard image

classification approach

Input Use simple classifiere.g., logistic regression, SVMs

Face?

Extract features

Hand-created

features

Can we learn features

from data, even when

we don’t have data or

time?

Page 36: (CMP305) Deep Learning on AWS Made EasyCmp305

36

What’s learned in a neural net

Very specific

to Task 1

Should be ignored

for other tasks

More generic

Can be used as feature extractor

vs.

Neural net trained for Task 1: cat vs. dog

Page 37: (CMP305) Deep Learning on AWS Made EasyCmp305

37

Transfer learning in more detail…

Very specific

to Task 1

Should be ignored

for other tasks

More generic

Can be used as feature extractor

For Task 2, predicting 101 categories,

learn only end part of neural net

Use simple classifiere.g., logistic regression,

SVMs, nearest neighbor,…

Class?Keep weights fixed!

Neural net trained for Task 1: cat vs. dog

Page 38: (CMP305) Deep Learning on AWS Made EasyCmp305

38

Careful where you cut:

latter layers may be too task specific

Layer 1 Layer 2 Layer 3 Prediction

Example

detectors

learned

Example

interest points

detected

[Zeiler & Fergus ‘13]

Too specific

for new taskUse these!

Page 39: (CMP305) Deep Learning on AWS Made EasyCmp305

Transfer learning with deep features workflow

Some

labeled

data

Extract

features

with

neural net

trained on

different

task

Learn

simple

classifier

Validate

Training

set

Validation

set

Page 40: (CMP305) Deep Learning on AWS Made EasyCmp305

How general are deep features?

Page 41: (CMP305) Deep Learning on AWS Made EasyCmp305

Barcelona Buildings

Page 42: (CMP305) Deep Learning on AWS Made EasyCmp305

Architectural transition

Page 43: (CMP305) Deep Learning on AWS Made EasyCmp305

Deep learning in production on

AWS

Page 44: (CMP305) Deep Learning on AWS Made EasyCmp305

44

How to use deep learning in

production?

PredictiveUnderstands input &

takes actions or

makes decisions

InteractiveResponds in real time

LearningImproves its

performance

with experience

Page 45: (CMP305) Deep Learning on AWS Made EasyCmp305

Intelligent service at the core…

Page 46: (CMP305) Deep Learning on AWS Made EasyCmp305

46

Yo

ur in

tellig

en

t ap

plic

atio

nIntelligent

backend

service

Real-time

data

Predictions &

decisions

Historical

data

Machine

learning

model

Predictions &

decisions

Most ML

research here…

But ML research useless

without great

solution here…

Page 47: (CMP305) Deep Learning on AWS Made EasyCmp305

47

Essential ingredients of intelligent service

ResponsiveIntelligent applications

are interactive

Need low latency,

high throughput &

high availability

AdaptiveML models out-of-date the

moment learning is done

Need to constantly

understand & improve

end-to-end performance

ManageableMany thousands of models,

created by hundreds of people

Need versioning,

attribution, provenance &

reproducibility

Page 48: (CMP305) Deep Learning on AWS Made EasyCmp305

Responsive: Now and Always

ResponsiveIntelligent applications

are interactive

Need low latency,

high throughput &

high availability

AdaptiveML models out-of-date the

moment learning is done

Need to constantly

understand & improve

end-to-end performance

ManageableMany thousands of models,

created by hundreds of people

Need versioning,

attribution, provenance &

reproducibility

Page 49: (CMP305) Deep Learning on AWS Made EasyCmp305

Addressing latency

Page 50: (CMP305) Deep Learning on AWS Made EasyCmp305

50

Challenge: Scoring Latency

Compute predictions in < 20ms for complex

all while under heavy query load

Models Queries

Top K

Features

SELECT * FROM

users JOIN items,

click_logs, pages

WHERE …

Page 51: (CMP305) Deep Learning on AWS Made EasyCmp305

51

The Common Solutions to Latency

Faster Online

Model Scoring

“Execute Predict(query) in

real-time as queries arrive”

Pre-Materialization

and Lookup

“Pre-compute Predict(query)

for all queries and lookup

answer at query time”Dato Predictive Services does Both

Page 52: (CMP305) Deep Learning on AWS Made EasyCmp305

52

Faster Online Model Scoring:

Highly optimized machine learning

• SFrame: Native code, optimized data frame

- Available open-source (BSD)

• Model querying acceleration with native code,

e.g.,

- TopK and Nearest Neighbor eval:

• LSH, Ball Trees,…

Page 53: (CMP305) Deep Learning on AWS Made EasyCmp305

53

The Common Solutions to Latency

Faster Online

Model Scoring

“Execute Predict(query) in

real-time as queries arrive”

Pre-Materialization

and Lookup

“Pre-compute Predict(query)

for all queries and lookup

answer at query time”Dato Predictive Services does Both

Page 54: (CMP305) Deep Learning on AWS Made EasyCmp305

54

Smart Materialization Caching

Unique Queries

Qu

ery

Fre

quency

Example: top 10% of all unique queries cover

90% of all queries performed.

Caching a small number of unique

queries has a very large impact.

Page 55: (CMP305) Deep Learning on AWS Made EasyCmp305

55

Distributed shared caching

Distributed Shared Cache (Redis)

Cache:

Model query results

Common features (e.g., product info)

Scale-out improves

throughput and latency

Page 56: (CMP305) Deep Learning on AWS Made EasyCmp305

56

Dato Latency by the numbers

Easy Case: cache hit ~2ms

Hard Case: cache miss

• Simple Linear Models: 5-6ms

• Complex Random Forests: 7-8ms

- P99: ~ 15ms

[using aws m3.xlarge instance]

Page 57: (CMP305) Deep Learning on AWS Made EasyCmp305

57

Challenge: Availability

Heavy load substantial delays

Frequent model updates cache misses

Machine failures

Page 58: (CMP305) Deep Learning on AWS Made EasyCmp305

58

Scale-Out availability under load

Heavy Load

Elastic Load Balancing load balancer

Page 59: (CMP305) Deep Learning on AWS Made EasyCmp305

Adaptive:

Accounting for Constant Change

ResponsiveIntelligent applications

are interactive

Need low latency,

high throughput &

high availability

AdaptiveML models out-of-date the

moment learning is done

Need to constantly

understand & improve

end-to-end performance

ManageableMany thousands of models,

created by hundreds of people

Need versioning,

attribution, provenance &

reproducibility

Page 60: (CMP305) Deep Learning on AWS Made EasyCmp305

60

Change at Different Scales and Rates

Shopping

for Mom

Shopping

for Me

Months Rate of Change Minutes

Population Granularity of Change Session

Page 61: (CMP305) Deep Learning on AWS Made EasyCmp305

61

Months Rate of Change Minutes

Population Granularity of Change SessionIndividual and Session Level Change

Small Data

Online learning

Bandits to Assess Models

Shopping

for Mom

Shopping

for Me

Change at Different Scales and Rates

Page 62: (CMP305) Deep Learning on AWS Made EasyCmp305

62

The Dangerous Feedback Loop

I once looked at cameras on

Amazon …

Bags

Similar cameras

and

accessories

If this is all they showed how would they

learn that I also like bikes, and shoes?

Page 63: (CMP305) Deep Learning on AWS Made EasyCmp305

63

Exploration / Exploitation Tradeoff

Systems that can take actions can

adversely affect future data

Exploration Exploitation

Best

Action

Random

Action

Learn more about

what is good and bad

Make the best use

of what we believe is good.

Page 64: (CMP305) Deep Learning on AWS Made EasyCmp305

64

Dato Solution to Adaptivity

Rapid offline learning with GraphLab Create

Online bandit adaptation in Predictive Services

• Demo

Page 65: (CMP305) Deep Learning on AWS Made EasyCmp305

Manageable:

Unification and simplification

ResponsiveIntelligent applications

are interactive

Need low latency,

high throughput &

high availability

AdaptiveML models out-of-date the

moment learning is done

Need to constantly

understand & improve

end-to-end performance

ManageableMany thousands of models,

created by hundreds of people

Need versioning,

attribution, provenance &

reproducibility

Page 66: (CMP305) Deep Learning on AWS Made EasyCmp305

66

Ecosystem of Intelligent Services

Data

Infrastructure MySQL

MySQL

Serving

Data Science

ModelA ModelB

TableA

TableB

Service A

Service B

Complicated!Many systems, with overlapping roles,

no single source of truth for Intelligent Service.

Page 67: (CMP305) Deep Learning on AWS Made EasyCmp305

67

Dato Predictive Services

Responsive Adaptive Manageable

Page 68: (CMP305) Deep Learning on AWS Made EasyCmp305

68

Model Management like code management,

but for life cycle of intelligent applications

Provenance & Reproducibility

• Track changes & rollback

• Cover code, model type, parameters, data…

Collaboration

• Review, blame

• Share

• Common feature engineering pipelines

Continuous Integration

• Deploy & update

• Measure & improve

• Avoid down time and impact on end-users

Page 69: (CMP305) Deep Learning on AWS Made EasyCmp305

69

Dato Predictive Services

Responsive Adaptive Manageable

Dato Predictive Services

Serving Models and Managing the

Machine Learning LifecycleGraphLab Create

Accurate, Robust, and Scalable

Model Training

Page 70: (CMP305) Deep Learning on AWS Made EasyCmp305

GraphLab Create:Sophisticated machine learning made easy

High-level ML toolkits

AutoMLtune params, model

selection,…

so you can focus on creative parts

Reusablefeatures

transferrable feature engineering

accuracy with less data & less effort

Page 71: (CMP305) Deep Learning on AWS Made EasyCmp305

71

High-level ML toolkits get started with 4 lines of code,

then modify, blend, add yours…

RecommenderImage search

Sentiment analysis

Data matching

Auto tagging

Churn predictor

Object detectorProduct

sentimentClick

predictionFraud detection

User segmentation

Data completion

Anomaly detection

Document clustering

Forecasting Search ranking

Summarization …

import graphlab as gl

data = gl.SFrame.read_csv('my_data.csv')

model = gl.recommender.create(data,

user_id='user',

item_id='movie’,

target='rating')

recommendations = model.recommend(k=5)

Page 72: (CMP305) Deep Learning on AWS Made EasyCmp305

SFrame ❤️ all ML tools SGraph

SFrame:

Sophisticated machine learning made

scalable

Page 73: (CMP305) Deep Learning on AWS Made EasyCmp305

Opportunity for Out-of-Core ML

Capacity 1 TB

0.5 GB/s

10 TB

0.1 GB/s

0.1 TB

1 GB/sThroughput

Fast, but significantly

limits data sizeOpportunity for big data on 1 machine

For sequential reads only!

Random access very slow

Out-of-core ML opportunity is huge

Usual design → Lots of random access →

Slow

Design to maximize sequential access for

ML algo patterns

GraphChi early example

SFrame data frame for ML

Page 74: (CMP305) Deep Learning on AWS Made EasyCmp305

Performance of SFrame/SGraph

70 sec

251 sec

200 sec

2,128 sec

0 750 1500 2250

GraphLab Create

GraphX

Giraph

Spark

Connected components in Twitter graph

Source(s): Gonzalez et. al. (OSDI 2014)

Twitter: 41 million Nodes, 1.4 billion Edges

SGraph

16 machines

1 machine

Page 75: (CMP305) Deep Learning on AWS Made EasyCmp305

75

SFrame & SGraph

Optimizedout-of-core

computation for ML

High Performance

1 machine can handle:TBs of data

100s Billions of edges

Optimized for ML. Columnar transformation . Create features. Iterators. Filter, join, group-by, aggregate. User-defined functions . Easily extended through SDK

Tables,

graphs, text,

images

Open-

source ❤️BSD

license

Page 76: (CMP305) Deep Learning on AWS Made EasyCmp305

76

The Dato Machine Learning Platform

Predictive Services

Serve Models and Manage the

Machine Learning Lifecycle

GraphLab Create

Train Accurate, Robust,

and Scalable models

Page 77: (CMP305) Deep Learning on AWS Made EasyCmp305

77

Our customers

Page 78: (CMP305) Deep Learning on AWS Made EasyCmp305

Recommended