Date post: | 16-Apr-2017 |
Category: |
Technology |
Upload: | christoph-koerner |
View: | 219 times |
Download: | 0 times |
Applications of Deep Learningin Computer Vision
Christoph Körner
Outline
1) Introduction to Neural Networks
2) Deep Learning
3) Applications in Computer Vision
4) Conclusion
Why Deep Learning?
● Wins every computer vision challenge (classification, segmentation, etc.)
● Can be applied in various domains (speech recognition, game prediction, computer vision, etc.)
● Beats human accuracy● Big communities and resources● Hardware for Deep Learning
Perceptron (1958)
● Weighted sum of inputs● Threshold operator
Artificial Neural Network (1960)
● Universal function approximator● Can solve the XOR problem
Backpropagation (1982)
● Propagate the error through the network● Allows Optimization (SGD, etc.)● Enables training of multi-layer networks
Convolution and Pooling (1989)
● Less parameters than hidden layers● More efficient training
Handwritten ZIP Codes (1989)
● 30 training passes● Achieved 92% accuracy
What happened until 2011?
● Better Initialization● Better Non-linearities: ReLU● 1000 times more training data● More computing power
● Factor 1 million speedup in training time through parallelization on GPUs
Deep Learning
● Conv-, Pool- and Fully-Connected Layers● ReLU activations● Deep nested models with many parameters● New layer types and structures● New techniques to reduce overfitting● Loads of training data and compute power
● 10.000.000 images● Weeks of training on multi-GPU machines
AlexNet (2012)
● 62.378.344 parameters (250MB)● 24 layers
VGGNet (2013)
● 102.908.520 parameters (412MB)● 23 layers
GoogLeNet (2014)
● 6.998.552 parameters (28MB)● 143 layers
Inception Module
● Heavy use of 1x1 convolutions (applied along the depth dimension)
● Very efficient
ResNet (2015)
● Residual learning● 152 layers
Applications in Computer Vision
Classification
● One class per image● Softmax layer at the end
Localization
● Bounding box Regression● Sigmoid layer with 4 outputs at the end
● Via Classification
Detection
● Multiple Objects, multiple classes● Solved using multiple networks
Segmentation
More Applications
● Compression● Auto-encoders, Self-organizing maps
● Image Captioning● Solved with Recurrent Architecture
● Image Stylization● Clustering● Many more...
Conclusion
● Powerful, learn from data instead of hand-crafted feature extraction● Better than humans
● Deeper is always better● Overfitting
● More data is always better● Data quality● Ground truth
Thank you!
Christoph Körner