+ All Categories
Home > Documents > Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers...

Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers...

Date post: 17-Mar-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
110
Getting Started with Neural Networks and Fundamentals of Machine Learning May 1, 2019 [email protected] http://cross-entropy.net/ML410/Deep_Learning_3.pdf
Transcript
Page 1: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Getting Started with Neural Networksand

Fundamentals of Machine LearningMay 1, 2019

[email protected]

http://cross-entropy.net/ML410/Deep_Learning_3.pdf

Page 2: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Agenda for Tonight

• Homework Review

• [DLP] Chapter 2: Getting Started with Neural Networks

• [DLP] Chapter 3: Fundamentals of Machine Learning

Page 3: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Getting Started with Neural Networks

1. Anatomy of a Neural Network

2. Introduction to Keras

3. Setting Up a Deep Learning Workstation

4. Classifying Movie Reviews: a Binary Classification Example

5. Classifying Newswires: a MultiClass Classification Example

6. Predicting House Prices: a Regression Example

7. Chapter Summary

Page 4: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Neural Networks

Training involves …

Anatomy of a Neural Network

Page 5: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Relationship between the Network, Layers, Loss Function, and Optimizer

Anatomy of a Neural Network

Page 6: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Layers as the Lego Bricks of Deep Learning ☺

The first layer requires an input_shape parameter [the dimensions of a single observation], while additional layers do not require this parameter

Anatomy of a Neural Network

Page 7: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Networks of Layers

• A deep learning model is a directed, acyclic graph of layers

• Most common instance is a “linear” (simple, sequential) stack of layers

• Other common instances include …• Two-branch networks; e.g. question goes down one branch and text passage

goes down another

• Multi-head networks; e.g. we have both an image and a text description as inputs

• Inception blocks; e.g. we want to use a few different convolution (input filtering) approaches in parallel

Anatomy of a Neural Network

Page 8: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Loss Function and Optimizers

• Loss functions for this class include: cross entropy, mean squared error, mean absolute error, content loss, style loss, total variation loss, Kullback Leibler loss, temporal difference loss, actor loss, and critic loss

• Optimization functions for the class include: Stochastic Gradient Descent (SGD), Root Mean Squared (Gradient) Propagation (RMSProp), and Adaptive Moments (AdaM: RMSProp + Momentum)

Anatomy of a Neural Network

Page 9: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Keras Features

Introduction to Keras

Page 10: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Google Search Interest for Deep Learning Frameworks

Introduction to Keras

Page 11: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Deep Learning Software and Hardware Stack

• Nvidia Graphics Processing Units (GPUs) and Google Tensor Processing Units (TPUs) support efficient deep learning

• Nvidia’s Common Unified Device Architecture (CUDA) Application Programming Interface (API) and the CUDA Deep Neural Network (DNN) library provide an interface to Nvidia GPUs

• Eigen library implements the Basic Linear Algebra Subprograms (BLAS) specification, allowing tensor manipulation on Central Processing Units (CPUs)

Theano and CNTK are no longer maintained

Introduction to Keras

Page 12: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Typical Keras Workflow

Introduction to Keras

Page 13: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Network Definition:Sequential Model versus the Functional APISame model with both methods …

Introduction to Keras

Page 14: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Model Configuration and Training

Introduction to Keras

Page 15: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Two Options for Getting Keras Running

Setting Up a Deep Learning Workstation

Page 16: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Loading the Internet Movie DataBase (IMDB) Sentiment Analysis Datanum_words is the size of the vocabulary

Classifying Movie Reviews

Page 17: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Decoding a Document

Classifying Movie Reviews

Page 18: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Turning Lists of Integers into Tensors

Note: if more than one value in a row is one, we should refer to this as a multi-hot encoding

• one-hot encoding for identifying a class in a dense target vector

• multi-hot encoding to identify the tokens present in a document in a dense input vector

Classifying Movie Reviews

Page 19: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Encoding the Integer Sequences into a Binary Matrix

Classifying Movie Reviews

Page 20: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Architecture Decisions for Simple Feedforward Network• How many layers to use

• How many hidden units to choose for each layer

• Which activation functions to use• Do *not* forget to include activation functions: unexplained suboptimality

will ensue

Classifying Movie Reviews

Page 21: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Common Activation Functions

Rectified Linear Unit (ReLU): max(0,x)

[no saturation issue]

Sigmoid: 1/(1+exp(-x)

[usually used for output layer]

Classifying Movie Reviews

Page 22: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

IMDB Network Architecture

• Features flow from bottom to top• An output is called a “head”

• Two hidden layers and an output layer with weights• It’s a deep neural network

• We’ll get to more than one hundred layers soon enough

Classifying Movie Reviews

Page 23: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Model Definition

Classifying Movie Reviews

Page 24: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Parameters and Outputs for a Dense Layer

• Parameters• (Number of Inputs from Previous Layer + 1) * (Number of “Units”)

• + 1 for bias weights: one for each “unit”

• We used to refer to “units” as neurons• The names have been changed to protect the innocent? Our approach was inspired by

neuroscience, but our brains aren’t using RMSProp ☺

• These are the same weight vectors we’ve come to know and love: projecting inputs to a new representation, one feature at a time [the number of “units” is the number of new features for the new representation]

• Output Shape• (Batch Size) x (Number of “Units”)

Classifying Movie Reviews

Page 25: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Why Are Activation Functions Necessary?

Try omitting activation functions from 1) the output layer and 2) hidden layers … so you can recognize this issue later

Classifying Movie Reviews

Page 26: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Compiling the Model

Classifying Movie Reviews

Page 27: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Setting Aside a Validation Set

Classifying Movie Reviews

Page 28: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Training the Model

Classifying Movie Reviews

Page 29: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Plotting the Training and Validation Loss

Classifying Movie Reviews

Page 30: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Where Do We Start Overfitting?

Classifying Movie Reviews

Page 31: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Plotting the Training and Validation Accuracy

Classifying Movie Reviews

Page 32: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Where Do We Start Overfitting?

Classifying Movie Reviews

Page 33: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Retraining the Model from Scratch

Why are we “retraining the model from scratch”?

Classifying Movie Reviews

Page 34: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Generating Predictions on New Data

Classifying Movie Reviews

Page 35: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Ideas for Experiments

Classifying Movie Reviews

Page 36: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Wrapping Up the IMDB Example

Classifying Movie Reviews

Page 37: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Loading the Reuters Dataset

Classifying Newswires

Page 38: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Decoding Newswires Back to Text

Classifying Newswires

Page 39: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Preparing the Document Matrices

Classifying Newswires

Page 40: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Preparing the Target Matrices

Classifying Newswires

Page 41: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Defining the Model

Classifying Newswires

Page 42: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Notes About the Architecture

Classifying Newswires

Page 43: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Validating the Approach

Classifying Newswires

Page 44: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Where Do We Start Overfitting?

Classifying Newswires

Page 45: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Where Do We Start Overfitting?

Classifying Newswires

Page 46: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Retraining a Model from Scratch

Classifying Newswires

Why are we “retraining a model from scratch”?

Page 47: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Comparing to Random[and a Majority Classifier]

Nota bene (note well): 813 of the 2,356 test examples belonged to class 3

The accuracy of a majority classifier is 36.2%

Classifying Newswires

Page 48: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Generating Predictions for New Data

Classifying Newswires

Page 49: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Dense Versus Sparse Labels

Classifying Newswires

Page 50: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Model With an Information Bottleneck

71% accuracy: an 8% absolute drop

Classifying Newswires

Page 51: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Further Experiments

Classifying Newswires

Page 52: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Wrapping Up

Classifying Newswires

Page 53: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Loading the Boston Housing Dataset

1970s home prices in thousands of dollars

Predicting House Prices

Page 54: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Brief Discussion of Bias

• Boston Housing dataset has been used by many popular textbooks

• The data explicitly offers a race-related variable for modeling• Avoid using proxy variables that lead to discrimination based on race, gender,

religion, etc

• Example: don’t ask about gender if all you really want to know is whether the candidate can lift X pounds

Predicting House Prices

http://lib.stat.cmu.edu/datasets/boston

Page 55: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Normalizing the Data

Predicting House Prices

Page 56: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Model Definition

Predicting House Prices

Page 57: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

3-Fold Cross-Validation

Predicting House Prices

Page 58: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

4-Fold Cross-Validation Implementation

Predicting House Prices

Page 59: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Cross-Validation Loop

Predicting House Prices

Page 60: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Cross-Validation Results

Predicting House Prices

Page 61: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Alternative Implementation [saved history]

Predicting House Prices

Page 62: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Plotting the Average Mean Absolute Error (MAE)

Predicting House Prices

Page 63: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Visualization Suggestions

Predicting House Prices

Page 64: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Smoothing the Curve

The smoothed_points expression should look familiar: RMSProp (0.9 for last squared gradient) and AdaM (0.999 for last gradient)

Predicting House Prices

Page 65: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Plotting the Smoothed MAE

Predicting House Prices

Page 66: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Training the Final Model

Predicting House Prices

Page 67: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Wrapping Up

Predicting House Prices

Page 68: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Chapter Summary

Page 69: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Fundamentals of Machine Learning

1. Four Branches of Machine Learning

2. Evaluating Machine Learning Models

3. Data Preprocessing, Feature Engineering, and Feature Learning

4. Overfitting and Underfitting

5. The Universal Workflow of Machine Learning

Page 70: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Supervised Learning Examples

Four Branches of Machine Learning

Page 71: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Unsupervised Learning

• Dimensionality Reduction

• Clustering

Four Branches of Machine Learning

Page 72: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Self-Supervised Learning

• Learning Without Human Annotated Labels

• Autoencoders

• Trying to predict the next word given previous words

• Trying to predict the next frame given previous frames

Four Branches of Machine Learning

Page 73: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Reinforcement Learning

• Google Deep Mind used reinforcement learning to create a model to play Atari games

• AlphaGo was created to play Go

• Occasional rewards

• Examples of possible applications include: self-driving cars, robotics, resource management, and education

Four Branches of Machine Learning

Page 74: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

From Yann LeCun …

https://t.co/2LSb622114

Four Branches of Machine Learning

Page 75: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Also from Yann LeCun …

https://t.co/2LSb622114

Four Branches of Machine Learning

Page 76: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Classification and Regression Glossary

Four Branches of Machine Learning

Page 77: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Classification and Regression Glossary

Four Branches of Machine Learning

Page 78: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Simple Hold-out Validation Split

Evaluating Machine Learning Models

Page 79: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Hold-out Validation Implementation[note the concatenation]

Evaluating Machine Learning Models

Page 80: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

K-Fold Cross-Validation

Used for smaller data sets• If K is too small, we’ll experience high bias (underfitting)

• If K is too large, we’ll experience high variance (overfitting)

Evaluating Machine Learning Models

Page 81: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

K-Fold Cross Validation Implementation

Evaluating Machine Learning Models

Page 82: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Iterated K-Fold Cross-Validation with Shuffling

• history = []

• for i in range(iterationCount):• shuffle(data)

• history.append(crossValidation(data, K = k))

• Requires building iterationCount * K + 1 models

Evaluating Machine Learning Models

Page 83: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Things to Keep in Mind

Evaluating Machine Learning Models

Page 84: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Value Normalization

• Dividing by 255 was an example of min-max normalization:

• value = (value – min(value)) / (max(value) – min(value))

• The max pixel value was 255 and the min pixel value was 0

• Alternatively, you can use center-and-scale normalization:

[-1,1] is fine too

Consider removing outliers

Data Preprocessing, Feature Engineering, and Feature Learning

Page 85: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Missing Values

• “In general, with neural networks, it’s safe to input missing values as 0, with the condition that zero isn’t a meaningful value”

• It’s possible to add indicator variables: 1 if missing; 0 otherwise

• If you expect missing values at test time, be sure to train with missing values:• We train like we deploy and deploy like we train

Data Preprocessing, Feature Engineering, and Feature Learning

Page 86: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Feature Engineering Example

Three different inputs for the “What time is it?” model …

Why no radius on the polar coordinates?

Data Preprocessing, Feature Engineering, and Feature Learning

Page 87: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Feature Engineering

• Does this mean you don’t have to worry about feature engineering as long as you’re using deep neural networks?

• No …

Data Preprocessing, Feature Engineering, and Feature Learning

Page 88: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Original versus Lower Capacity Model

Original Model:16 units for each hidden layer

Lower Capacity Model: 4 units for each hidden layer

Overfitting and Underfitting

Page 89: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Original versus Lower Capacity Model

Smaller network starts overfitting later and it’s performance degrades more slowly

Overfitting and Underfitting

Page 90: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Original versus Higher Capacity Model:Validation Data

Validation Loss Noisierfor Higher Capacity Model(512 versus 16 units for each hidden layer)

Overfitting and Underfitting

Page 91: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Original versus Higher Capacity Model:Training DataMore capacity gives a model the ability to more quickly model the training data, but it also makes it susceptible to overfitting

Overfitting and Underfitting

Page 92: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Regularization [for Smaller Weights]

Overfitting and Underfitting

Page 93: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Example for Effect of Weight Regularization

Note: the goal of weight regularization is to improve generalization performance ☺

Overfitting and Underfitting

Page 94: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Additional Weight Regularizers for Keras

Overfitting and Underfitting

Page 95: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Adding Dropout(dropoutRate)

Overfitting and Underfitting

Page 96: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Adding Dropout to the IMDB Network

Overfitting and Underfitting

Page 97: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Recap of Most Common Ways to Prevent Overfitting

Overfitting and Underfitting

Page 98: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Define the Problem

Universal Workflow of Machine Learning

Page 99: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Hypothesis

Universal Workflow of Machine Learning

Page 100: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Choosing a Measure of Success

• Accuracy

• Precision and Recall

• Area Under the Receiver Operating Characteristic (ROC) Curve (AUC)

• Maximize Recall subject to a constraint on the False Positive Rate?

• Mean Average Precision

Universal Workflow of Machine Learning

Page 101: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Deciding on an Evaluation Protocol

Universal Workflow of Machine Learning

Page 102: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Preparing Your Data

Universal Workflow of Machine Learning

Page 103: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Key Choices for Your First Iteration

Universal Workflow of Machine Learning

Page 104: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Choosing the Last-Layer Activation and Loss Function

Universal Workflow of Machine Learning

Page 105: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

How Big Should the Model Be?

Developing a model that overfits …

Universal Workflow of Machine Learning

Page 106: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Regularizing the Model

Universal Workflow of Machine Learning

Page 107: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Tuning the Model

We call these hyperparameters to distinguish them from the parameters of the model; i.e. the weights.

Note: We tune against validation data. Much like the “private leaderboard”, we only get one look at test perf.

Universal Workflow of Machine Learning

Page 108: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Hyperas: Keras + Hyperopt

Bayesian optimization [as supported by Hyperas]• Start with random sets of hyperparameters for evaluation

• Iteratively select new hyperparameters for evaluation based on history• Partition the hyperparameter sets into two groups

• Sets where evaluated loss is lower than threshold tau

• Sets where evaluated loss is greater than or equal to threshold tau

• Hyperparameters can be viewed as having a hierarchy; e.g. we only need to optimize the number of hidden units for layer ‘k’ if we decide to add layer ‘k’

• Tree-based Parzen Estimators are used to select hyperparameter values based on

Expected Improvement, which is driven by: 𝑝 𝑥 | 𝑔𝑟𝑒𝑎𝑡𝑒𝑟𝑇ℎ𝑎𝑛𝑂𝑟𝐸𝑞𝑢𝑎𝑙

𝑝 𝑥 | 𝑙𝑒𝑠𝑠𝑇ℎ𝑎𝑛

Universal Workflow of Machine Learning

https://en.wikipedia.org/wiki/Kernel_density_estimation

https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf

Page 109: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Installing Hyperas and Running an Example

• pip install hyperas• wget https://github.com/maxpumperla/hyperas/blob/master/examples/mnist_readme.py

• python mnist_readme.py

model.add(Dropout({{uniform(0, 1)}}))model.add(Dense({{choice([256, 512, 1024])}}))model.add(Activation({{choice(['relu', 'sigmoid'])}}))if {{choice(['three', 'four'])}} == 'four’:

model.add(Dense(100))model.add({{choice([Dropout(0.5), Activation('linear')])}})optimizer={{choice(['rmsprop', 'adam', 'sgd'])}}batch_size={{choice([64, 128])}}

Universal Workflow of Machine Learning

Page 110: Getting Started with Neural Networks and Fundamentals of … · 2019-05-02 · Networks of Layers •A deep learning model is a directed, acyclic graph of layers •Most common instance

Chapter Summary


Recommended