Deep Learning Platforms• Caffe - Berkeley Vision and Learning Center
• AlexNet - U Toronto
• cuda-ConvNet - Fork out of AlexNet
• convNet.js - works in your browser
• CaffeOnSpark - Yahoo Labs
• TensorFlow - Google
• Neuromorphic processor (IBM), DaDianNao
3
Outline• Neural networks (NNs)
• Overview
• Brain analogy
• How NNs work
• Parallel NN computation
• Convolutional Neural Networks (ConvNets)
• Layers
• Assignment
4
Outline• Neural networks (NNs)
• Overview
• Brain analogy
• How NNs work
• Parallel NN computation
• Convolutional Neural Networks (ConvNets)
• Layers
• Assignment
5
Outline• Neural networks (NNs)
• Overview
• Brain analogy
• How NNs work
• Parallel NN computation
• Convolutional Neural Networks (ConvNets)
• Layers
• Assignment
8
Outline• Neural networks (NNs)
• Overview
• Brain analogy
• How NNs work
• Parallel NN computation
• Convolutional Neural Networks (ConvNets)
• Layers
• Assignment
10
Neural Network Mechanism
13
*
+
x
y
q
f
z
f = (x + y)z = qz q = x + y
gradients (influence): df/dq = z df/dz = q df/dx = df/dq * dq/dx = z df/dy = df/dq * dq/dy = z
-2
53
-4
-121
-4
3
-4
-4
Neural Network Mechanism• Forward propagation calculates the
values at each node
• each node/neuron/unit calculates a linear combination of inputs: w1x1 + w2x2 + … + wkxk
• applies activation function to above result, e.g. sigmoid
14
Neural Network Mechanism• Backward propagation
• error/cost/loss measured at output layer
• gradient of loss wrt each weight is calculated
• gradient and a learning rate is used to update the weights - similar to gradient descent
• regularization is also used
• Forward and backward propagation run in alternate passes for several iterations
15
Outline• Neural networks (NNs)
• Overview
• Brain analogy
• How NNs work
• Parallel NN computation
• Convolutional Neural Networks (ConvNets)
• Layers
• Assignment
16
Parallelism in NN computation• Embarassingly parallel
• CPUs: cores, vectors
• GPUs
• distributed machines
• Many levels of parallelism
• Parallelly use each training example (similar to stochastic gradient descent)
• Pipeline parallelism in layers
• Unit/neuron parallelism
• Weight multiplication parallelism in neurons
17
Outline• Neural networks (NNs)
• Overview
• Brain analogy
• How NNs work
• Parallel NN computation
• Convolutional Neural Networks (ConvNets)
• Layers
• Assignment
18
Convolution19
- output is a dot product of kernel and input - w1x1 + w2x2 + … + wkxk - Each kernel looks for some feature
Outline• Neural networks (NNs)
• Overview
• Brain analogy
• How NNs work
• Parallel NN computation
• Convolutional Neural Networks (ConvNets)
• Layers
• Assignment
21
ConvNet Layers• Convolution
• Activation e.g. sigmoid, ReLU etc.
• Pooling: downsampling
• e.g. 2x2 -> 1x1
• take max value in 2x2 matrix
• Usually convolution and activation layers come in pairs interspersed by pooling layers
• The task is to learn the weights in the convolution layers
22
Applying ConvNets
• What to do when there is:
• High bias => underfit NN => make NN bigger
• High variance => overfit NN => use more training examples
24
Outline• Neural networks (NNs)
• Overview
• Brain analogy
• How NNs work
• Parallel NN computation
• Convolutional Neural Networks (ConvNets)
• Layers
• Assignment
25