Nakul Gopalan
Georgia Tech
Neural Networks
Introduction
Machine Learning CS 4641
These slides are from Vivek Srikumar, Mahdi Roozbahani and Chao Zhang.
Outline
2
• Perceptron
• Stacking Linear Threshold Units
• Neural Networks
• Expressivity of Neural Networks
• Predicting with Neural Networks
• Backpropogation
Linear Classifiers
These slides are from Vivek Srikumar
Linear Classifiers
These slides are from Vivek Srikumar
Perceptron
These slides are from Vivek Srikumar
Perceptron Algorithm
These slides are from Vivek Srikumar
These slides are from Vivek Srikumar
Outline
8
• Perceptron
• Stacking Linear Threshold Units
• Neural Networks
• Expressivity of Neural Networks
• Predicting with Neural Networks
• Backpropogation
Linear Threshold Unit
9
Features for Linear Threshold Unit
10
Features from Classifiers
11
Features from Classifiers
12
Features from Classifiers
13
Features from Classifiers
14
Features from Classifiers
15
Outline
16
• Perceptron
• Stacking Linear Threshold Units
• Neural Networks
• Expressivity of Neural Networks
• Predicting with Neural Networks
• Backpropogation
Neural Networks
17
Inspiration from Biological Neurons
18
Artificial Neurons
19
Activation Functions
20
Neural Network
21
Neural Network
22
A Brief History of Neural Network
23
Outline
24
• Perceptron
• Stacking Linear Threshold Units
• Neural Networks
• Expressivity of Neural Networks
• Predicting with Neural Networks
• Backpropogation
A Single Neuron with Threshold Activation
25
Two Layers with Threshold Activation
26Figure from [Shai Shalev-Shwartz and Shai Ben-David, 2014]
Three Layers with Threshold Activation
27Figure from [Shai Shalev-Shwartz and Shai Ben-David, 2014]
NNs are Universal Function Approximators
28
Outline
29
• Perceptron
• Stacking Linear Threshold Units
• Neural Networks
• Expressivity of Neural Networks
• Predicting with Neural Networks
• Backpropogation
Predicting with Neural Networks
30
Predicting with Neural Networks
31
Predicting with Neural Networks
32
Predicting with Neural Nets: The Forward Pass
33
The Forward Pass
34
The Forward Pass
35
The Forward Pass
36
Outline
37
• Perceptron
• Stacking Linear Threshold Units
• Neural Networks
• Expressivity of Neural Networks
• Predicting with Neural Networks
• Backpropogation
Backpropogation
• Smarter chain rule for derivatives
• What is chain rule:
Univariate case:
Multivariate case (Partial derivatives):
These slides are from Roger Grosse
Backpropogation
These slides are from Matt Gromley
Backpropogation
These slides are from Matt Gromley
Backpropogtion
1
2𝑦 − 𝑦∗ 2 𝑦 − 𝑦∗
These slides are from Matt Gromley
Backpropagation
• Backprop is used to train the overwhelming majority of neural
nets today.
Even optimization algorithms much fancier than gradient descent (e.g.
second-order methods) use backprop to compute the gradients.
• Despite its practical success, backprop is believed to be
neurally implausible. No evidence for biological signals
analogous to error derivatives.
All the biologically plausible alternatives we know about learn much more
slowly (on computers). So how on earth does the brain learn?
Slide from Roger Grosse
Take-Home Messages
43
• Stacking Linear Threshold Units
• Neural Networks
• Expressivity of Neural Networks
• Predicting with Neural Networks
• Backprop is chain rule with some book-keeping