NICTA Copyright 2012 From imagination to impact
Compacting ConvNets
for end to end Learning
Jose M. Alvarez
Joint work with Lars Pertersson, Hao Zhou, Fatih Porikli.
NICTA Copyright 2012 From imagination to impact
Success of CNN
Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet
Classification with Deep Convolutional Neural Networks, NIPS, 2012
Image Classification
NICTA Copyright 2012 From imagination to impact
Success of CNN
from Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, Faster R-CNN:
Towards Real-Time Object Detection with Region Proposal Networks,
arXiv:1506.01497
Object Detection
NICTA Copyright 2012 From imagination to impact
Success of CNN
Jifeng Dai, Kaiming He, Jian Sun, BoxSup: Exploiting Bounding Boxes to
Supervise Convolutional Networks for Semantic Segmentation, arXiv:1503.01640
Semantic Segmentation
NICTA Copyright 2012 From imagination to impact
Success of CNN
Andrej Karpathy, Li Fei-Fei, Deep Visual-Semantic Alignments for Generating
Image Description, CVPR, 2015
Image Captioning
Video classification …
NICTA Copyright 2012 From imagination to impact
Key of success
• Better training algorithms
– Batch normalization
– Initializations
– Momentum
NICTA Copyright 2012 From imagination to impact
Key of success
• Better training algorithms
• Large amount of data / labels
NICTA Copyright 2012 From imagination to impact
Key of success
• Better training algorithms
• Large amount of data / labels
• Hardware / Storage
– GPU, parallel systems
0
2
4
6
8
10
12
14
GTX-580 Titan Black ('14) Titan X ('15)
Memory GPU (in Gb)
NICTA Copyright 2012 From imagination to impact
Key of success
• Better training algorithms
• Large amount of data / labels
• Hardware / Storage
• Larger community of researchers
NICTA Copyright 2012 From imagination to impact
Key of success
• Enabled larger networks
0
20
40
60
80
100
120
140
160
LeNet-5 AlexNet VGGNet-16
Num. Parameters (in Millions)
NICTA Copyright 2012 From imagination to impact
Key of success
0
50
100
150
LeNet-5 AlexNet VGGNet-16
Num. Parameters (in Millions)
NICTA Copyright 2012 From imagination to impact
Key of success
0
50
100
150
LeNet-5 AlexNet VGGNet-16
Num. Parameters (in Millions)
NICTA Copyright 2012 From imagination to impact
Key of success
0
50
100
150
LeNet-5 AlexNet VGGNet-16
Num. Parameters (in Millions)
NICTA Copyright 2012 From imagination to impact
Embedded devices with limited resources / power
Challenges
2014 –Jetson TK1 2015/16 –Jetson TX1
NICTA Copyright 2012 From imagination to impact
Embedded devices with limited resources / power
- Memory is a limiting factor
- Real time operation
Challenges
NICTA Copyright 2012 From imagination to impact
Computational Cost
Forward-pass is time consuming AlexNet
NICTA Copyright 2012 From imagination to impact
Computational Cost
Memory bottleneck AlexNet
NICTA Copyright 2012 From imagination to impact
Computational Cost
Memory bottleneck
conv3-64 x 2 : 38,720 conv3-128 x 2 : 221,440 conv3-256 x 3 : 1,475,328 conv3-512 x 3 : 5,899,776 conv3-512 x 3 : 7,079,424 fc1 : 102,764,544 fc2 : 16,781,312 fc3 : 4,097,000 TOTAL : 138,357,544
VGGNet
NICTA Copyright 2012 From imagination to impact
Do we need all these parameters?
NICTA Copyright 2012 From imagination to impact
Over-Parameterization
• ‘Needed for high non-convex optimization’ 1
Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, Yann LeCun.
The Loss Surfaces of Multilayer Networks
NICTA Copyright 2012 From imagination to impact
Over-Parameterization
• ‘Needed for high non-convex optimization’
• Deeper structures, larger learning capacity1
1 Guido Montúfar, Razvan Pascanu, Kyunghyun Cho, Yoshua Bengio. On the Number of
Linear Regions of Deep Neural Networks. NIPS 2014
NICTA Copyright 2012 From imagination to impact
Over-Parameterization
• ‘Needed for high non-convex optimization’
• Deeper structures, larger learning capacity
• From images to Video -> Even larger nets?
A. Karpathy et. al. Large-scale Video Classification with Convolutional
Neural Networks. CVPR 2014.
NICTA Copyright 2012 From imagination to impact
Compacting CNN
NICTA Copyright 2012 From imagination to impact
Compacting CNN
• Network distillation
• Network pruning
• Structured parameters
– Ours
NICTA Copyright 2012 From imagination to impact
Compacting CNN
• Network distillation
NICTA Copyright 2012 From imagination to impact
Compacting CNN
• Network distillation
– Large network learns from data
– Generate labels using the trained network
– Train smaller nets using the output or soft layer
Geoffrey Hinton, Oriol Vinyals, Jeff Dean. Distilling the Knowledge in a Neural Network.
NIPSw 2015
NICTA Copyright 2012 From imagination to impact
Compacting CNN
• Network distillation (II)
– Use intermediate layers to guide the training
Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo
Gatta and Yoshua Bengio. FitNets: Hints for Thin Deep Nets. ICLR 2015
NICTA Copyright 2012 From imagination to impact
Compacting CNN
• Pros
– In general better generalization and faster.
– Equal or slightly better performance
• Cons
– Requires a larger network to learn from.
NICTA Copyright 2012 From imagination to impact
Compacting CNN
• Network distillation
• Network pruning
– Directly remove unimportant parameters during
training
• Requires second derivatives.
– Remove parameters + quantification1
• Good compression rates (orthogonal to other approaches)
1S. Han, H. Mao, and W. J. Dally. Deep compression: Compressing deep neural network
with pruning, trained quantization and huffman coding. CoRR, abs/1510.00149, 2015
NICTA Copyright 2012 From imagination to impact
Compacting CNN
• Network distillation
• Network pruning
• Structured parameters
NICTA Copyright 2012 From imagination to impact
Compacting CNN: Structured parameters
Max Jaderberg, Andrea Vedaldi, Andrew Zisserman Speeding up Convolutional Neural
Networks with Low Rank Expansions. BMVC 2014
• Low rank approximations
NICTA Copyright 2012 From imagination to impact
Compacting CNN: Structured parameters
Emily Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, Rob Fergus. Exploiting
Linear Structure Within Convolutional Networks for Efficient Evaluation. NIPS 2014
• Low rank approximations (II)
NICTA Copyright 2012 From imagination to impact
Compacting CNN: Structured parameters
• Low rank approximations (III)
– Weights are approximated by a sum of rank 1
tensors.
Emily Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, Rob Fergus. Exploiting
Linear Structure Within Convolutional Networks for Efficient Evaluation. NIPS 2014
NICTA Copyright 2012 From imagination to impact
Compacting CNN: Structured parameters
• Weak-Points
– Needs a full-rank network completely trained
– Not all filters can be approximated
– Theoretical speeds-up with drop of performance.
Emily Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, Rob Fergus. Exploiting
Linear Structure Within Convolutional Networks for Efficient Evaluation. NIPS 2014
NICTA Copyright 2012 From imagination to impact
Compacting CNN: Structured parameters
• Weak-Points
– Needs a full-rank network completely trained.
– Not all filters can be approximated.
– Drop of performance.
• Strengths
– Potential ability to aid in regularization during or post
training.
– Parameter sharing within the layer.
NICTA Copyright 2012 From imagination to impact
Compacting CNN: Structured parameters
K. Simonyan, A. Zisserman. Very Deep Convolutional Networks for Large-Scale Image
Recognition. ICLR, 2015
• Low rank approximations (IV)
– VGG nets restrict filters during training.
– Same ‘receptive field’
– Deeper networks (more nonlinearities)
– Less parameters (49C2 vs 3x(3x3)C2 )
NICTA Copyright 2012 From imagination to impact
Compacting CNN: Structured parameters
• Low rank approximations (Ours1)
– Filter restriction during training.
– Larger receptive fields
– Deeper networks (more nonlinearities)
– Parameter sharing
– Less parameters
1Joint work with Lars Pertersson. Under review
NICTA Copyright 2012 From imagination to impact
Compacting CNN: Structured parameters
• Low rank approximations (Ours)
– ImageNet Results (AlexNet).
Baseline: Alex Krizhevsky. Ilya Sutskever. Geoffrey Hinton. ImageNet Classification with
Deep. Convolutional Neural Networks. NIPS 2012
NICTA Copyright 2012 From imagination to impact
Compacting CNN: Structured parameters
• Low rank approximations (Ours)
– Stereo Matching.
Ours-1
32K
Ours-1
48K
Ours-3
32K
Baseline: Jure Zbontar, Yann LeCun. Computing the Stereo Matching Cost With a
Convolutional Neural Network. CVPR 2015
NICTA Copyright 2012 From imagination to impact
Memory?
NICTA Copyright 2012 From imagination to impact
Computational Cost
Memory bottleneck
conv3-64 x 2 : 38,720 conv3-128 x 2 : 221,440 conv3-256 x 3 : 1,475,328 conv3-512 x 3 : 5,899,776 conv3-512 x 3 : 7,079,424 fc1 : 102,764,544 fc2 : 16,781,312 fc3 : 4,097,000 TOTAL : 138,357,544
VGGNet
NICTA Copyright 2012 From imagination to impact
Computational Cost
Memory bottleneck AlexNet
NICTA Copyright 2012 From imagination to impact
Memory Bottleneck
• Sparse constraints during training (Ours2)
– Directly reduce the number of neurons.
– Select the optimum number of neurons.
– Significant memory reductions with minor drop of
performance
2Joint work with Hao Zhou, Fatih Porikli. Under review
NICTA Copyright 2012 From imagination to impact
Memory Bottleneck
• Sparse constraints during training (Ours2)
2Joint work with Hao Zhou, Fatih Porikli. Under review
NICTA Copyright 2012 From imagination to impact
Do we need all these parameters?
NICTA Copyright 2012 From imagination to impact
Compacting ConvNets
for end to end Learning
Jose M. Alvarez
Joint work with Lars Pertersson, Hao Zhou, Fatih Porikli.