+ All Categories
Home > Technology > A brief introduction to recent segmentation methods

A brief introduction to recent segmentation methods

Date post: 21-Feb-2017
Category:
Upload: shunta-saito
View: 842 times
Download: 0 times
Share this document with a friend
30
A Brief Introduction to Recent Segmentation Methods Shunta Saito Researcher at Preferred Networks, Inc.
Transcript
  • A Brief Introduction to Recent Segmentation Methods

    Shunta Saito Researcher at Preferred Networks, Inc.

  • Semantic Segmentation? Classifying all pixels, so its also called Pixel-labeling

    * Feature Selection and Learning for Semantic Segmentation (Caner Hazirbas), Master's thesis, Technical University Munich, 2014.

    C C

    C C

    B

    B

    B

    C C

    B

    B

    B

    B B

    SS

    S S S S S S S

    S S S

    R

    R

    R

    R R

    R R

    R R

  • Tackling this problem with CNN, usually its formulated as:

    Typical formulation

    Image CNN Prediction Label

    Cross entropy

    The loss is calculated for each pixel independently It leads to the problem: How can the model consider the context to make a single

    prediction for a pixel?

  • How to leverage context information

    How to use law-level features in upper layers to make detailed predictions

    How to create dense prediction

    Common problems

  • Fully Convolutional Networks for Semantic Segmentation, Jonathan Long and Evan Shelhamer et al. appeared in arxiv on Nov. 14, 2014

    Fully Convolutional Network (1: Reinterpret classification as a coarse prediction)

    The fully connected layers in classification network can be viewed as convolutions with kernels that cover their entire input regions

    The spatial output maps of these convolutionalized models make them a natural choice for dense problems like semantic segmentation.

    See a Caffes example Net Surgery: https://github.com/BVLC/caffe/blob/master/examples/net_surgery.ipynb

    256-6x6

    4096-1x1

    4096-1x1

    If input is 451x451, output is 8x8 of 1000ch

    https://github.com/BVLC/caffe/blob/master/examples/net_surgery.ipynb

  • Fully Convolutional Network (2: Coarse to dense) 1 possible way:

    Shift-and-Stitch trick proposed in OverFeat paper (21 Dec, 2013)OverFeat is the winner at the localization task of ILSVRC 2013 (not detection)

    Shift input and stitch (=interlace) the outputs

  • by https://www.youtube.com/watch?v=h785loU4bh4

    Fully Convolutional Network (2: Coarse to dense)To understand OverFeats Shift-and-Stitch trick

    https://www.youtube.com/watch?v=h785loU4bh4

  • Fully Convolutional Network (2: Coarse to dense) Another way: Decreasing subsampling layer (like Max Pooling)

    It has tradeoff: The filters see finer information, but have smaller receptive fields

    and take longer to compute (due to large feature maps)

    movies and images are from: http://cs231n.github.io/convolutional-networks/

  • Fully Convolutional Network (2: Coarse to dense)

    Instead of all ways listed above, finally, they employed upsampling to make coarse predictions denser

    In a sense, upsampling with factor f is convolution with a fractional input stride of 1/f

    So, a natural way to upsample is therefore backwards convolution (sometimes called deconvolution) with an output stride of f

    Upsampling by deconvolution 74 74 'k ' x , .kz

    ftp.YE?D ,"iEIIIIiiIEE:# in,:ei:# a-

    Outputstride .f

    74k ,

    lI*.IE?IItiiIe#eEiidYEEEE.*ai.

    Deconvolution

  • Fully Convolutional Network (3: Patch-wise training or whole image training)

    Whole image fully convolutional training is identical to patchwise training where each batch consists of all the receptive fields of the units below the loss for an image

    yajif-ED

    yanIse###

    Patchwise training is loss sampling

    We performed spatial sampling of the loss by making an independent choice to ignore each final layer cell with some probability 1 p

    To avoid changing the effec- tive batch size, we simultaneously increase the number of images per batch by a factor 1/p

  • Fully Convolutional Network (4: Skip connection)Nov. 14, 2014

    Fuse coarse, semantic and local, appearance information

    : Deconvolution (initialized as bilinear upsampling, and learned)

    Added

    : Bilinear upsampling (fixed)

  • Fully Convolutional Network (5: Training scheme)

    1. Prepare a trained model for ILSVRC12 (1000-class image classification) 2. Discard the final classifier layer 3. Convolutionalizing all remaining fully-connected layers 4. Append a 1x1 convolution with the target class number of channels

    MomentumSGD (momentum: 0.9) batchsize: 20 Fixed LR: 10^-4 for FCN-VGG-16 (Doubled LR for biases) Weight decay: 5^-4 Zero-initialize the class scoring layer Fine-tuning was for all layers

    Other training settings:

  • Fully Convolutional Network

    1. Replacing all fully-connected layers with convolutions 2. Upsampling by backwards convolution, a.k.a. deconvolution (and

    bilinear upsampling) 3. Applied skip connections to use local, appearance information in the

    final layer

    Summary

  • Learning Deconvolution Network for Semantic Segmentation, Hyeonwoo Noh, et al. appeared in arxiv on May. 17, 2015

    Deconvolution Network

    * Unpooling here is corresponding to Chainers Upsampling2D function

  • Fully Convolutional Network (FCN) has limitations: Fixed-size receptive field yields inconsistent labels for large objects

    Skip connection cant solve this because there is inherent trade-off between boundary details and semantics

    Interpolating 16 x 16 output to the original input size makes blurred results The absence of a deep deconvolution network trained on a large dataset makes

    it difficult to reconstruct highly non- linear structures of object boundaries accurately.

    Deconvolution Network

    Lets do it to perform proper upsampling

  • Deconvolution Network

    Feature extractor Shape generator

  • Deconvolution Network

    Shape generator

    14 14 deconvolutional

    layer 28 28

    unpooling layer

    28 28 deconvolutional

    layer 56 56

    unpooling layer

    56 56 deconvolutional

    layer

    112 112 unpooling layer

    112 112 deconvolutional

    layer

    224 224 unpooling layer

    224 224 deconvolutional

    layer

    Activations of each layer

  • Deconvolution Network

    One can think: Skip connection is missing?

  • U-NetU-Net: Convolutional Networks for Biomedical Image Segmentation, Olaf Ronneberger, Philipp Fischer, Thomas Brox, 18 May 2015

  • U-Net

  • SegNet SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image

    Segmentation, Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla, 2 Nov, 2015

  • SegNet The training procedure is a bit complicated Encoder-decorder pair-wise training

    Theres a Chainer implementation: pfnet-research/chainer-segnet:

    https://github.com/pfnet-research/chainer-segnet

    https://github.com/pfnet-research/chainer-segnet

  • Dilated convolutions Multi-Scale Context Aggregation by Dilated Convolutions, Fisher Yu, Vladlen Koltun, 23 Nov, 2015 a.k.a stroud convolution, convolution with holes Enlarge the size of receptive field without losing resolution

    The figure is from WaveNet: A Generative Model for Raw Audio

  • Dilated convolutions For example, the feature maps of ResNet are

    downsampled 5 times, and 4 times in the 5 are done by convolutions with stride of 2 (only the first one is by pooling with stride of 2)

    1/4

    1/8

    1/16

    1/32

    1/2

  • Dilated convolutions By using dilated convolutions instead of vanilla

    convolutions, the resolution after the first pooling can be kept as the same to the end

    1/4

    1/8

    1/8

    1/8

    1/2

  • Dilated convolutions

    But, it is still 1/8

    1/4

    1/8

    1/8

    1/8

    1/2

  • RefineNet RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic

    Segmentation, Guosheng Lin, Anton Milan, Chunhua Shen, Ian Reid, 20 Nov. 2016

  • RefineNet Each intermediate feature map is refined through RefineNet module

  • RefineNet

    * Implementation has been done in Chainer, the codes will be public soon

  • Semantic Segmentation using Adversarial Networks Semantic Segmentation using Adversarial Networks, Pauline Luc, Camille Couprie,

    Soumith Chintala, Jakob Verbeek, 25 Nov. 2016


Recommended