@nfmcclure
Introduction to Neural Networks with Tensorflow
Nick McClure
July 27th, 2016
Seattle, WA
@nfmcclure
Who am I?
• ddd
www.github.com/nfmcclure/tensorflow_cookbook
@nfmcclure
Why Does Any of This Matter?
88
19 152
@nfmcclure
Why Neural Networks?• The human brain can be considered
to be one of the best processors. (Estimated to contain ~1011
neurons.)
• Studies show that our brain can process the equivalent of ~20Mb/sec just through the optical nerves.
• If we can copy this design, maybe we can solve the “hard for a computer – easy for humans” problems.
• Speech recognition• Facial identification• Reading emotions• Recognizing images• Sentiment analysis• Driving a vehicle• Disease diagnosis
@nfmcclure
The Basic Unit: Operational Gate
• F(x,y) = x * y
• How can we change the inputs to decrease the output?
• The intuitive answer is to decrease 3 and/or decrease 4.
• Mathematically, the answer is to use the derivative…• We know if we increase the (3) input by 1, the output goes up
by a total of 4. (4*4=16)
*3
4
12
@nfmcclure
Multiple Gates• F(x, y, z) = (x + y) * z
• If G(a, b) = a + b, then F(x,y,z) = G(x, y) * z
• Or (change notation) F = G*z (we have solved this problem before)
• Mathematically, we will use the “chain rule” to get the derivatives:
+2
3
20
4
*
𝑑𝐹
𝑑𝑥=𝑑𝐹
𝑑𝐺∙𝑑𝐺
𝑑𝑥
𝑑𝐹
𝑑𝑥= 𝑧 ∙ 1
@nfmcclure
Loss Functions• You can’t learn unless you know how bad/good your past
results have been.
• This is designated by a ‘Loss function’.
• The idea is we have a labeled data set and we look at each data point. We run the data point forward through the network and then see if we need to increase or decrease the output.
• We then change the parameters according to the derivatives in the network.
• We loop through this many times until we reach the lowest error in our training.
@nfmcclure
Learning Rate• The learning rate controls how much of a change we make
to our model parameters.
@nfmcclure
Logistic Regression as a Neural Network• A neural network is very similar to what we have been
doing, except that we bound the outputs between 0 and 1… with a sigmoid function. (Called an activation function)
• The sigmoid function is nice, because it turns out that the derivative is related to itself:
• We can use this fact in our ‘chain rule’ for derivatives.
𝐹(𝑥, 𝑎, 𝑏) = 𝜎(𝑎𝑥 + 𝑏) σ(𝑥) =1
1 + 𝑒−𝑥
𝑑𝜎
𝑑𝑥= 𝜎(𝑥) ∙ (1 − 𝜎 𝑥 )
x
a*
b+ Output
x
1Output
a
b
OR
https://github.com/nfmcclure/tensorflow_cookbook/tree/master/03_Linear_Regression/08_Implementing_Logistic_Regression
@nfmcclure
Why Activation Functions?• Turns out, no matter how many additions and multiplications we pile
on, we will never be able to model non-linear outputs without having non-linear transformations.
• The sigmoid function is just one example of a gate function.
• There is also the Rectified Linear Unit (ReLU):
• Softplus:
• Leaky ReLU:
• Where 0<a<<1
σ(𝑥) =1
1 + 𝑒−𝑥
σ(𝑥) = max(𝑥, 0)
σ(𝑥) = ln(1 + 𝑒𝑥)
σ(𝑥) = max(𝑥, 𝑎𝑥)
@nfmcclure
Many Layers
• Neural networks can have many ‘hidden layers’.
• We can make them as deep as we want.
• Neural Networks can also have as many inputs or outputs as we would like as well.
Depth of Neural NetworkFully Connected Layers
@nfmcclure
What is Tensorflow?• An open source framework for creating
computational graphs.
• Computational graphs are actually extremely flexible.• Linear Regression (regular, multiple, logistic, lasso, ridge,
elasticnet,…)
• Support Vector Machines (any kernel)
• Nearest Neighbor methods
• Neural Networks
• ODE/PDE Solvers
@nfmcclure
Why Tensorflow?
• Compare to other open source deep learning frameworks
Popularity
Speed/Size
Caffe
Others
@nfmcclure
Why Tensorflow?
• Tensorflow is also very portable and flexible!• Virtually no code change to go from CPU-> GPU or…
@nfmcclure
How is the Tensorflow Community?• Very active github community
• In approximately 6 months: >5k commits, >250 contributors
• Over 2,000 questions tagged on stackoverflow (>50% answered within 24 hours)
@nfmcclure
How do Tensorflow Algorithms Work?
Computational Graph
Model Variables
Output
Loss Function
@nfmcclure
A One Hidden Layer NN in Tensorflow• Iris dataset: Predict Petal width from (Sepal length, Sepal
width, and petal length)
S.L.
S.W.
P.L.
bias
h1
h2
h3
h4
h5
Output
bias
15 weights + 5 biases 5 weights + 1 bias = 26 variables
Computational Graph
Placeholders
https://github.com/nfmcclure/tensorflow_cookbook/tree/master/06_Neural_Networks/04_Single_Hidden_Layer_Network
@nfmcclure
More in Tensorflow: Distributed
• Computational graphs can take very long to train.
• Quicker training across multiple machines.
• Just need a dictionary mapping jobs to network addresses:
job_spec = {“my_workers”: [“worker1.location.com:22”, “worker2.location.com:22”],
“other_workers”: [“gpu_worker.location.com:22”]}
tf.train.ClusterSpec(job_spec)
with tf.device(“/job:my_workers/task:0”):
# do task
with tf.device(“/job:other_workers/task:0”):
# do another task
• Scheduling and communication is done for you (in an efficient and greedy manner)
@nfmcclure
More in Tensorflow: GPU Capabilities
• Sometimes neural networks can have hundreds of millions of parameters to train.
• Graphic cards (GPUs) are built to handle matrix calculations very fast and efficiently.• It’s quite common to get 10x to 50x speedup when switching the
same code to a GPU
(1) Install GPU version of T.F.(2) Done.
As a side note, you can also tell T.F. to use specific devices.
/cpu:0
/gpu:0
/gpu:100
- The CPU of the machine.- The GPU of the machine.- The 100th GPU of the machine.
@nfmcclure
More in Tensorflow: Tensorboard
• A built in way to visualize the computational graph, accuracies, loss functions, gradients…
• (1) Specify a specific way to write and store summaries to a log file.
• (2) Point Tensorboard at logging directory, and navigate to 0.0.0.0:6006. (Can view during training).
tf.histogram_summary(‘variable_histogram’, variable_value)
tf.scalar_summary(‘loss_value’, loss)
merged = tf.merge_all_summaries()
my_writer = tf.train.SummaryWriter('my_logging_directory', sess.graph)
$tensorboard –logdir=my_logging_directory
@nfmcclure
More in Tensorflow: Tensorboard
Collapsable/Expandable Named Scopes!!
with tf.name_scope(‘hidden1’) as hidden_scope:
#Perform calculations.
@nfmcclure
More in Tensorflow: skflow• Easier scikit learn type implementation of common
algorithms.
• 3 layer NN:
• RNN:
• Easy to add custom layers by adding in layer functions from vanilla Tensorflow.
import tensorflow.contrib.learn as skflow
my_model = skflow.TensorFlowDNNClassifier(hidden_units=[N1, N2, N3], n_classes=n_out)
my_model.fit(x_data, y_target)
my_rnn = skflow.TensorFlowRNNClassifier(…)
@nfmcclure
What is a Convolutional Neural Network?• A CNN us a large pattern recognition neural
network that uses various parameter reducing techniques.
• E.g. Alexnet (2012 ImageNet Winner)
@nfmcclure
Reduction of Parameters
• CNNs use tricks to reduce the amount of parameters to fit.• Convolution layers: Assume weights are the same on
arrows in the same direction of a window we move across data.
w1
w2
w3There are only three weights in this layer, w1, w2, and w3
Note: Usually multiple convolutions are performed, creating ‘Layers’.
@nfmcclure
Other Tricks- Pooling
• Layer that is an aggregation across a window from prior layer.
• Types of pooling:• Max, min, mean, median, …
Aggregation over three nodes in prior layer. E.g. Max().
@nfmcclure
Other Tricks- Dropout• Dropout is a way to increase the training strength and regularize
the parameters.
• During training, we randomly select activation functions and make the output equal to zero.• Compare this to ‘zoneout’, which randomly replaced activation
functions with prior iteration value.
• This helps the network find multiple pathways for prediction. Because of this, the network will not rely on just one connection or edge for prediction.
• This also helps the training of other parameters in the network.
@nfmcclure
Larger Network Shorthand• Since neural networks can get quite large, we do
not want to write out all the connections nor nodes.
• Here’s how to convey the meaning in a shorter, less complicated way:
“Max of a set of features”E.g., Here, it appears to bea max of 2 features.
@nfmcclure
What is a Recurrent Neural Network?
• Recurrent Neural Networks (RNN) are networks that can ‘see’ outputs from prior layers. Very helpful for sequences.
• The most common usage is to predict the next letter from the prior letter. We train this network on large text samples.
U,V,W = Weight ParametersS = layerX = inputO = output
@nfmcclure
Regional Convolutional Networks• We can convolve a whole network over patches of the
image and see what regions score highly for a category. (Object Detection)
https://www.youtube.com/watch?v=lr1qVxhGUDw
@nfmcclure
Regional CNN + Recurrent NN = Image Captioning• We take the object detection output and feed it through a
recurrent neural network to generate text describing the image.
Microsoft Research 2015
@nfmcclure
Deep Dream (2015)• We can select specific nodes in an image
recognition pipeline and up weight the output…
https://www.youtube.com/watch?v=DgPaCWJL7XIhttp://www.fastcodesign.com/3057368/inside-googles-first-deepdream-art-show
@nfmcclure
Stylenet (2015)
• Train network on a “style image”. Try to reconstruct another image from that same pretrained network.
https://vimeo.com/139123754
Recoloring of black and white photos
@nfmcclure
CNN for Text (2015)• Zhang, X., et. al. released a paper (http://arxiv.org/abs/1509.01626) that
showed great results for treating strings as 1-D numerical vectors.
• Ways to get strings into numerical vectors:• Word level one-hot encoding.
• Word2Vec encoding or similar (GloVe or Doc2Vec or …)
• Character level one-hot encoding.
• Can’t resize vectors like pictures, so instead we pick a max length and pad many entries with zeros.
• Like many NN results, depend heavily on dataset and architecture.
• Paper shows good results for low # of classes (ratings or sentiment).
@nfmcclure
Parsey McParseface
• Google released a model called ‘Syntaxnet’ in May 2016, based on Tensorflow.
• Syntaxnet is arguably one of the best part-of-speech parsers available.
• They trained an English version, called ‘Parsey McParseface’.
• This is a free and open source model to use.
@nfmcclure
The Possibilities are Endless
• Predict age from facial photo - https://how-old.net
• Predict dog breed from photo - https://www.what-dog.net
• Inside a self driving car brain : https://www.youtube.com/watch?v=ZJMtDRbqH40
• Memory networks: https://arxiv.org/pdf/1410.3916.pdf• Frodo journeyed to Mount-Doom. Frodo dropped the ring there. Sauron died.
• Frodo went back to the Shire. Bilbo travelled to the Grey-havens.
• Where is the ring? A: Mount-Doom
• Where is Bilbo now? A: Grey-havens
• “Synthia”- Virtual driving school for self driving cars:• http://www.gizmag.com/synthia-dataset-self-driving-cars/43895/
• Solving hand-written chalk board problems in real time:• http://www.willforfang.com/computer-vision/2016/4/9/artificial-intelligence-for-
handwritten-mathematical-expression-evaluation
@nfmcclure
<Insert Name Here>-Neural Net• Last Month (June-July 2016):
• YodaNN: http://arxiv.org/abs/1606.05487• CMS-RCNN: http://arxiv.org/abs/1606.05413• Zoneout RNN: https://arxiv.org/abs/1606.01305• Pixel CNN: http://arxiv.org/abs/1606.05328• Memory CNN: http://arxiv.org/abs/1606.05262• DISCO Nets: http://arxiv.org/abs/1606.02556
• This Year (2016):• Stochastic Depth Net: https://arxiv.org/abs/1603.09382• Squeeze Net: https://arxiv.org/abs/1602.07360• Binary Nets: http://arxiv.org/abs/1602.02830
• Last Year:• PoseNet: http://arxiv.org/abs/1505.07427• YOLO Nets: http://arxiv.org/abs/1506.02640• Style Net: http://arxiv.org/abs/1508.06576• Residual Net: https://arxiv.org/abs/1512.03385• And so many more…
• What’s Your Architecture?
@nfmcclure
How to Keep ML Current• Machine learning advances happen more and more
frequently.• Can’t rely on journals that take multiple years to publish.
• http://arxiv.org/
• Others?
@nfmcclure
More Resources• More on CNNs:
• https://www.reddit.com/r/MachineLearning/
• https://www.reddit.com/r/deepdream/
• http://karpathy.github.io/
• RNNs:
• http://colah.github.io/posts/2015-08-Understanding-LSTMs/
• Recent Research: (Jan 16-June 16)
• Residual Conv. Nets: https://www.youtube.com/watch?v=1PGLj-uKT1w
• Determine location of a random image: https://www.technologyreview.com/s/600889/google-unveils-neural-network-with-superhuman-ability-to-determine-the-location-of-almost/
• Stochastic Depth Networks: http://arxiv.org/abs/1603.09382
Pre-trained models: http://www.vlfeat.org/matconvnet/pretrained/
@nfmcclure
Questions?
• Slides:• http://www.slideshare.net/NicholasMcClure1/introduction-to-
neural-networks-in-tensorflow
• Code and more:• https://github.com/nfmcclure/tensorflow_cookbook