First steps with Keras 2: A tutorial with Examples

Post on 21-Jan-2018

562 views 2 download

transcript

Keras 2“You have just found Keras”

Felipe AlmeidaRio Machine Learning Meetup / June 2017

First Steps

1

Content● Intro● Neural Networks● Keras● Examples● Keras concepts● Resources

2

Intro● Neural nets are versatile, but there was a need for a simple

framework to design + experiment with them.

● Neural nets (particularly with multiple layers) need a lot of time to be trained

● Recent advances in algorithms (Layerwise-training, contrastive divergence, etc) and in hardware (leveraging GPUs for tensor operations), as well as the massive amounts of available data have made deep learning popular

3

Neural Networks● Generally speaking, neural networks are nonlinear machine

learning models.

● They can be used for supervised or unsupervised learning.

● Deep learning refers to training neural nets with multiple layers.○ They are more powerful but only if you have lots of data to train

them on.● Keras is used to create neural network models

4

Neural Networks - Sample Architectures

Source: neuralnetworksanddeeplearning.com 5

Source: neuralnetworksanddeeplearning.com 6

Neural Networks - Sample Architectures

Source: neuralnetworksanddeeplearning.com 7

Neural Networks - Sample Architectures

Source: neuralnetworksanddeeplearning.com 8

Neural Networks - Sample Architectures

Keras● Models created by Keras can be executed on a backend:

○ Tensorflow (default)○ Theano○ CNTK (Beta)○ MxNet (Beta)

● Keras has builtin GPU support with CUDA○ CUDA is a framework for using the GPU on Nvidia video cards

for mathematical (tensor) operations11

Keras● Keras is the de facto deep learning frontend

Sou

rce:

@fc

holle

t, Ju

n 3

2017

12

Keras● Keras is among the libraries supported by Apple’s CoreML

Source: @fchollet, Jun 5 2017

13

Example #1● The MNIST dataset contains 60,000 labelled handwritten digits (for

training) and 10,000 for testing.

14

Example #1● We can train a neural net to classify a digit’s pixels into one of the

10 digit classes:

NOTEBOOK - MNIST MLP

15

Example #2● The MNIST dataset can also be trained using multi-layer,

convolutional neural networks (CNNs).○ The results with a regular NN are already good, but it’s good to

show how to train a CNN

● NOTEBOOK - MNIST CNN

16

Example #2 - What are CNNs● While the model is being trained, let’s understand what a CNN

looks like and what it’s good for.

● CNNs use convolutional operations to extract features that are position invariant.○ In other words, they make it possible to train models that detect

features no matter what position they are in the input samples

17

Example #2 - What are CNNs● For this reason, they are often used for image classification:

18

Example #3● CNNs can also be used for text classification

○ In fact, they produce state-of-the-art results in tasks such as:■ Text classification■ Sentiment analysis

● Let’s train a CNN model to classify documents in the newsgroup_20 dataset

● NOTEBOOK IMDB CNN19

Keras: Models● The most important part of keras are models.

● Model = layers, loss and an optimizer

● These are the objects that you add Layers to, call compile() and fit() on.

● Models can be saved and checkpointed for later use

20

Keras: Layers● Layers are used to define what your architecture looks like

● Examples of layers are:○ Dense layers (this is the normal, fully-connected layer)○ Convolutional layers (applies convolution operations on the

previous layer)○ Pooling layers (used after convolutional layers)○ Dropout layers (these are used for regularization, to avoid

overfitting)21

Keras: Loss Functions● Loss functions are used to compare the network’s predicted output

with the real output, in each pass of the backpropagations algorithm○ Loss functions are used to tell the model how the weights

should be updated● Common loss functions are:

○ Mean squared error○ Cross-entropy○ etc.

22

Keras: Optimizers● Optimizers are strategies used to update the network’s weights in

the backpropagation algorithm.

● The most simple optimizer is the Stochastic Gradient Descent Algorithm (SGD), but there are many other you can choose, such as:○ RMSProp○ Adagrad

23

Keras: Optimizers● Most optimizers can be tuned using hyperparameters, such as:

○ The learning rate to use○ Whether or not to use momentum

24

Keras: CPU / GPU● If your computer has a good graphics card, it can be used to speed

up model training

● All models up to now were trained using the GPU.○ Let’s see what happens if we disable to the GPU, and force

keras to use the CPU instead.

25

Keras: Other information● Feature preprocessing

○ Although you can use any other method for feature preprocessing, keras has a couple of utilities to help, such as:■ To_categorical (to one-hot encode data)■ Text preprocessing utilities, such as tokenizing

26

Keras: Other information● You can integrate Keras models into a Scikit-learn Pipeline.

○ There are special wrapper functions available on Keras to help you implement the methods that are expected by a scikit-learn classifier, such as fit(), predict(), predict_proba(), etc.

○ You can also use things like scikit-learn’s grid_search, to do model selection on Keras models, to decide what are the best hyperparameters for a given task.

27

Keras: Other information● Nearly everything in Keras can be regularized. In addition to the

Dropout layer, there are all sorts of other regularizers available, such as:○ Weight regularizers○ Bias regularizers○ Activity regularizers

28