Date post: | 21-Feb-2017 |
Category: |
Technology |
Upload: | embedded-vision-alliance |
View: | 308 times |
Download: | 3 times |
Copyright © 2016 Auviz Systems 1
Semantic Segmentation for Scene Understanding:
Algorithms and Implementations
Nagesh Gupta
May 3, 2016
Copyright © 2016 Auviz Systems 2
• Auviz Systems
• Introduction to Semantic Segmentation
• Quick survey of techniques
• Fully Convolutional Network
• Implementation architectures & results
• FPGA & GPU implementations
• References
Topics
Copyright © 2016 Auviz Systems 3
• ISV, specializes in implementing & optimizing algorithms on FPGAs
• Offers libraries of different classes of algorithms
• AuvizCV — optimized OpenCV algorithms
• AuvizLA — optimized BLAS
• AuvizDNN — optimized deep neural networks
• Develop Applications in Computer Vision, Linear Algebra, Deep
Learning & Machine Learning
• Available as OpenCL function calls for software users to abstract the
complexity of using an FPGA
• Visit our booth & see Semantic Segmentation running on Xilinx FPGA!
Auviz Systems
Copyright © 2016 Auviz Systems 4
Introduction — Image Classification
Computer
Vision Giraffe
Copyright © 2016 Auviz Systems 5
Introduction — Semantic Segmentation
Computer
Vision
Copyright © 2016 Auviz Systems 6
Object Detection vs. Semantic Segmentation
Copyright © 2016 Auviz Systems 7
Applications of Semantic Segmentation
Automotive: Free space detection
Monocular depth estimation
Boundary prediction
Copyright © 2016 Auviz Systems 8
A Survey of Different Methods for Semantic
Segmentation
Reference Paper SIFT-Flow pixel
accuracy
C. Liu, J. Yuen, and A. Torralba, “Sift flow: Dense correspondence across scenes and its
applications 76.7
D. Eigen and R. Fergus. Nonparametric image parsing using adaptive neighbor sets 77.1
H. J. Myeong, Y. Chang, and K. M. Lee. Learning object relationships via graph-based context
model 77.1
P. H. Pinheiro and R. Collobert, “Recurrent convolutional neural networks for scene parsing” 77.7
C. Farabet, C. Couprie, L. Najman, and Y. LeCun, “Learning hierarchical features for scene
labeling 78.5
J. Tighe and S. Lazebnik, “Finding things: Image parsing with regions and per-exemplar
detectors” 78.6
J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic
segmentation” 85.2
Guosheng Lin, Chunhua Shen, Anton van den Hengel, Ian Reid, "Exploring Context with
Deep Structured models for Semantic Segmentation" 88.1
Copyright © 2016 Auviz Systems 9
• An input image retains global features and loses the local details as it goes through
convolutions
• A CNN has several sub-sampling layers, which reduce the size of the input image
Classification Networks
Copyright © 2016 Auviz Systems 10
• Replacing the fully connected layers in a CNN with convolutions retains a heat-
map
• Use the “heat-map” to segment the original image
• Figure adapted from: J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional
networks for semantic segmentation”
From Classification to Semantic Segmentation
Copyright © 2016 Auviz Systems 11
• Multiple convolution layers followed by deconvolution layers and a
classifier
• Weights for all layers are learned through training using backpropagation
(gradient descent)
Fully Convolutional Networks (FCN)
Bird
Person
3D
convolution
3D
convolution
3D
convolution Deconvolution
S
o
f
t
m
a
x Sub-
sampling
Sub-
sampling
Sub-
sampling
Copyright © 2016 Auviz Systems 12
• High resolution local information is lost due to down-sampling as we go from left
to right
• Skip layers overcome this by combining the global semantic information with
shallow features from layers prior to down-sampling
• Figure adapted from: J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional
networks for semantic segmentation”
Skip Layers — Improve Pixel Accuracy
Copyright © 2016 Auviz Systems 13
Key parts of an FCN — Convolutions &
De-convolutions
Copyright © 2016 Auviz Systems 14
• Results on a Tesla K40c GPU to implement an FCN using Caffe
• FCN created using VGG16 produces the best results for mean IoU, at the
cost of additional latency
Implementation results — GPU
FCN —
AlexNet
FCN —
VGG16
FCN —
GoogLeNet
Mean IoU 39.8 56.0 42.5
Forward
time
50 ms 210 ms 59 ms
Conv layers 8 16 22
Max stride 32 32 32
IoU, Intersection over Union:
Sseg: pixels from segmentation
Shum: pixels from ground truth
Copyright © 2016 Auviz Systems 15
• GEMM
• Convolutions and de-convolutions can be mapped into a GEMM kernel [6]
• Requires significant data remapping – more resources and latency
• Re-mapping the data in the host CPU is another easy option using the
OpenCL development environment
• Convolutions
• Implement convolutions & de-convolutions using Convolution kernels
• Some data re-mapping is needed to use the convolution kernel for de-
convolutions
• Possible to achieve higher performance in the FPGA
Implementation Architectures — FPGA
Copyright © 2016 Auviz Systems 16
• OpenCL is a simpler and faster way to implement FPGA accelerator
• Xilinx SDAccel tools provide the OpenCL infrastructure
• Altera (Intel) supports OpenCL
• The following infrastructure blocks are needed in addition to the accelerator
• PCIe & DMA
• External Memory Interface
• In a mid-range 28 nm FPGA such as Xilinx Virtex 7 690T, 25-30% is taken up by
infrastructure blocks
• 60-70% of the FPGA is available to implement the accelerator kernel
• Expect to get 1024 – 1536 MACs, running in the frequency range of 200-300 MHz
• A good design can thus achieve 400-600 GOPS
FPGA Accelerator — Resource & Performance
Estimates
Copyright © 2016 Auviz Systems 17
Use Model — GPU
Fully
connected
Forward
convolution
Copyright © 2016 Auviz Systems 18
Use Model — FPGA
Forward
conv
Fully
connected
Copyright © 2016 Auviz Systems 19
• OpenCL is beginning to be the method of choice to implement CNNs [6] [7]
• AuvizDNN is a flexible framework built using OpenCL
FPGA Implementation Using OpenCL
Host C
ode
APIs calls are initiated by Host
Calling APIs with different parameters creates new networks
Recompile on CPU to create new networks
Use model similar to CPU/GPU K
ern
el B
inary
Highly optimized for performance
Supports a wide range of API parameters
FPGA recompilation/timing closure not needed
No FPGA tools expertise
Available for different accelerator boards supported by FPGA vendors
Copyright © 2016 Auviz Systems 20
FPGA — Implementation Results
• Semantic segmentation
with 2-21 classes on a
500x500 image
• Network similar to AlexNet
• Results for XC7VX690
device is based on
achieved performance; rest
are projected 0
20
40
60
80
100
120
140
Imag
es/S
eco
nd
Copyright © 2016 Auviz Systems 21
GPU FPGA
Mature use model and rich set of libraries
available
Libraries and use model are beginning
to catch up to GPU
Used extensively for training of CNNs Serious contender for deployment in the
data center & embedded applications
Traditionally higher in power Typically lower power draw
Well integrated into most CNN R&D
frameworks such as Caffe
Loosely integrated with Caffe
Entrenched in the research community —
used by most publications & researchers
FPGAs are extensively used in
embedded applications
Implementation Choice: FPGA/GPU
Copyright © 2016 Auviz Systems 23
• [1] Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2014). Semantic image
segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062.
• [2] Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning hierarchical features for scene
labeling. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8), 1915-1929.
• [3] Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 3431-3440).
• [4] Badrinarayanan, V., Handa, A., & Cipolla, R. (2015). Segnet: A deep convolutional encoder-decoder
architecture for robust semantic pixel-wise labeling. arXiv preprint arXiv:1505.07293.
• [5] C. Liu, J. Yuen, and A. Torralba, “Sift flow: Dense correspondence across scenes and its applications.”
IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 5, pp. 978-994, 2011
• [6] Naveen Suda et. al, “Throughput Optimized OpenCL-based FPGA Accelerator for Large-Scale CNNs”,
ISFPGA 2016
• [7] “Efficient Implementation of Neural Network Systems Built on FPGAs, Programmed with OpenCL”,
Altera White Paper
Reference