+ All Categories
Home > Documents > Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and...

Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and...

Date post: 22-Mar-2020
Category:
Upload: others
View: 8 times
Download: 0 times
Share this document with a friend
34
Encoder-Decoder Networks for Semantic Segmentation Sachin Mehta
Transcript
Page 1: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Encoder-Decoder Networks for Semantic

Segmentation

Sachin Mehta

Page 2: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Outline

> Overview of Semantic Segmentation > Encoder-Decoder Networks > Results

Page 3: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

What is Semantic Segmentation?

Input: RGB Image Output: A segmentation Mask

Page 4: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Encoder-Decoder Networks

Encoder Decoder

Encoder • Takes an input image

and generates a high-dimensional feature vector

• Aggregate features at multiple levels

Decoder • Takes a high-

dimensional feature vector and generates a semantic segmentation mask

• Decode features aggregated by encoder at multiple levels

Page 5: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Building Blocks of CNNs

> Convolution > Down-Sampling > Up-Sampling

Page 6: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Convolution

a0 a1 a2

a3 a4 a5

a6 a7 a8

x0 x1 x2

x3 x4 X5

x6 x7 x8

Filter weights are learned from data

Page 7: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Down-Sampling

> Max-pooling > Average Pooling > Strided Convolution

a0 a1 b0 b1

a2 a3 b2 b3

c0 c1 d0 d1

c2 c3 d2 d3

a0 b1

c1 d3 Max-Pooling

𝒂� 𝒃�

𝒄� 𝒅� Avg. Pooling

𝒙𝒙 𝒙𝒙

𝒙𝒙 𝒙𝒙 Strided Convolution

(stride = 2)

Page 8: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Un-Pooling

Deconvolution

Up-Sampling

> Un-pooling > Deconvolution

a0 a1 b0 b1

a2 a3 b2 b3

c0 c1 d0 d1

c2 c3 d2 d3

a0 b1

c1 d3

a0 0 0 b1

0 0 0 0

0 c1 0 0

0 0 0 d3

Max-pooling

0 0 0 0

0 a0 b1 0

0 c1 d3 0

0 0 0 0

x0 x1 x2 x3

x4 x5 x6 x7

x8 x9 x10 x11

x12 x13 x14 x15

Page 9: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Encoder-Decoder Networks

Encoder Decoder

Page 10: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Encoder-Decoder Networks Different Encoding Block Types

• VGG • Inception • ResNet

Page 11: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Encoder-Decoder Networks Different Encoding Block Types

Max-Pool

Conv 3x3

Conv 3x3

Conv 3x3

Input

Output

• VGG • Inception • ResNetd

Page 12: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Encoder-Decoder Networks Different Encoding Block Types

• VGG • Inception • ResNet

Max-Pool

Conv 1x1

Conv 3x3

Concat

Input

Output

Max-Pool

Conv 1x1

Conv 1x1

Conv 5x5

Conv 1x1

Page 13: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Encoder-Decoder Networks Different Encoding Block Types

• VGG • Inception • ResNet

Conv 3x3

Conv 3x3

Sum

Input

Output

Page 14: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Different Encoding Block Types Performance on the ImageNet 2012 Validation Dataset

0

20

40

60

80

Mem

ory

(in M

B) Memory per image

0

50

100

150

Para

met

ers (

in

Mill

ion)

Parameters

0

10

20

30

40

Infe

renc

e Ti

me

(in

ms)

Inference Time

7.5

8

8.5

9

9.5

10

Erro

r (in

%)

Classification Error

VGG Inception ResNet-18

Page 15: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Encoder-Decoder Networks

Encoder Decoder

Page 16: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Encoder-Decoder Networks

Encoder Decoder

Page 17: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Encoder-Decoder Networks Different Decoding Block Types

• VGG • Inception • ResNet

Page 18: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Encoder-Decoder Networks Different Decoding Block Types

Un-Pool

Conv 3x3

Conv 3x3

Conv 3x3

Input

Output

• VGG • Inception • ResNet

Page 19: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Encoder-Decoder Networks Different Decoding Block Types

Deconv 1x1

Conv 1x1

Conv 3x3

Concat

Input

Output

Max-Pool

Conv 1x1

Conv 1x1

Conv 5x5

Conv 1x1

• VGG • Inception • ResNet

Page 20: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Encoder-Decoder Networks Different Decoding Block Types

DeConv 3x3

Sum

Input

Output

• VGG • Inception • ResNet

Page 21: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Classification vs Segmentation

VGG Inception

Source: Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." TPAMI. 2016.

9.05

9.1

9.15

9.2

9.25

9.3

9.35

VGG Inception

Erro

r (in

%)

Top-5 Classification Error

30

35

40

45

50

55

60

VGG Inception

Accu

racy

(in

%)

FCN-32s Segmentation Accuracy

(a) ImageNet Classificaiton Validation Set

(b) PASCAL VOC 2011 Validation Set

Page 22: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Our Work on Segmenting GigaPixel Breast Biopsy Images

Page 23: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Challenges with the dataset

> Limited computational resources > Sliding window approach is promising but

– Size of patch determines the context – Some biological structures may cover several patches

Page 24: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Challenges with the dataset

> Some biological structures are rare – Necrosis and Secretion have less than 1% of all the pixels

Page 25: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Training details

> Training Set: 30 ROIs – 25,992 patches of size 256x256 with augmentation – Split into training and validation set using 90:10 ratio

> Test Set: 28 ROIs

> Stochastic Gradient Descent for optimization

> Implemented in Torch – http://torch.ch/

Page 26: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Segmentation Results

RGB Image Ground Truth Label

Page 27: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Segmentation Results

Encoder-Decoder Network with skip connection

RGB Image Prediction

Page 28: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Segmentation Results

Multi-Resolution Encoder-Decoder Network

RGB Image Prediction

Page 29: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Segmentation Results

RGB Image Ground Truth

Plain Multi-Resolution

Page 30: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Segmentation Results

F1-Score

Page 31: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Why Segmentation?

> Segmented whole dataset (428 ROIs) with the model trained on 30 ROIs

> Extracted histograms from segmentation masks and then trained different classifiers

> Weak classifiers are as good as strong classifiers

Results on Diagnosis

Page 32: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Thank You!!

Page 33: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

References

1. Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." TPAMI. 2016. (FCN-8s)

2. Noh, Hyeonwoo, Seunghoon Hong, and Bohyung Han. "Learning deconvolution network for semantic segmentation." ICCV. 2015. (DeConvNet)

3. V. Badrinarayanan; A. Kendall; R. Cipolla, "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Scene Segmentation," TPAMI, 2017 (SegNet)

4. Yu, Fisher, and Vladlen Koltun. "Multi-scale context aggregation by dilated convolutions.“, ICLR, 2016 (Dilation)

5. Chen, Liang-Chieh, et al. "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs." arXiv preprint arXiv:1606.00915 (2016). (DeepLab)

6. Zheng, Shuai, et al. "Conditional random fields as recurrent neural networks." ICCV. 2015. (CRFasRNN)

7. Hariharan, Bharath, et al. "Hypercolumns for object segmentation and fine-grained localization." CVPR. 2015. (HyperColumn)

Page 34: Encoder-Decoder Networks for Semantic SegmentationDecoder Encoder • Takes an input image and generates a high-dimensional feature vector • Aggregate features at multiple levels

Two 3x3 filters are same as one 5x5 filter

Source: Rethinking the Inception Architecture for Computer Vision by Szegedy et al.


Recommended