Pr057 mask rcnn

Post on 22-Jan-2018

86 views 5 download

transcript

Yonsei UniversityMVP Lab.

Bbox Regression

Classification

RoIfromSelective Search

RoI PoolingFixed Size Representation

Bbox Regression

Classification

RoI PoolingFixed Size Representation

Bbox Regression

Objectness

RPNRegionProposalNetwork

32x32x3

Conv1

Pool1

16x16x64

Conv2

Pool2

8x8x128

Conv3

Pool3

4x4x256

Conv4

Pool4

2x2x512

Conv5

Pool5

1x1x512

1x1x512 Conv

1x1 Heatmap

x32 Upsample

Softmax

Remove Pooling1x1 Conv for Heatmap Output

Slide from Mask R-CNN Tutorial, K. He. ICCV 2017

Slide from Mask R-CNN Tutorial, K. He. ICCV 2017

Sheep Dog

Human

Sheep

Sheep Sheep Sheep

Sheep Dog

Human

Dog

Human

Sheep

Sheep

Sheep Sheep Sheep

BBoxClassification

SegmentationClassification

BBoxClassification

SegmentationClassification

Can Separate

Cannot Segment

BBoxClassification

SegmentationClassification

Can Separate

Cannot Segment

Cannot Separate

Can Segment

BBoxClassification

SegmentationClassification

Segmentationin BBox

Classification

+ =

Can Separate

Cannot Segment

Cannot Separate

Can Segment

BBoxClassification

SegmentationClassification

Segmentationin BBox

Classification

+ =

Can Separate

Cannot Segment

Cannot Separate

Can Segment

Faster R-CNN FCN

BBoxClassification

SegmentationClassification

Segmentationin BBox

Classification

Faster R-CNN FCN FCNon BBOX !

+ =

+ =

Can Separate

Cannot Segment

Cannot Separate

Can Segment

Slide from Mask R-CNN Tutorial, K. He. ICCV 2017

FCN• Pixel-level Classification• Per Pixel Softmax (Multinomial)• Multi Instance

FCN• Pixel-level Classification• Per Pixel Softmax (Multinomial)• Multi Instance

Faster R-CNN• Classification• Instance Level RoI

FCN• Pixel-level Classification• Per Pixel Softmax (Multinomial)• Multi Instance

Faster R-CNN• Classification• Instance Level RoI

FCN• Pixel-level Classification• Per Pixel SoftmaxSigmoid (Binary)• Multi Instance

Faster R-CNN• Classification• Instance Level RoI

FCN• Pixel-level Classification• Per Pixel SoftmaxSigmoid (Binary)• Multi Instance

Faster R-CNN• Classification• Instance Level RoI

DBBBox + Class + Mask

𝐿 = 𝐿𝑐𝑙𝑠 +𝐿𝑏𝑜𝑥 +𝐿𝑚𝑎𝑠𝑘

𝐿𝑐𝑙𝑠:Softmax Cross Entropy𝐿𝑏𝑜𝑥:Regression𝐿𝑚𝑎𝑠𝑘:Binary Cross Entropy

Training Phase

𝐿𝑚𝑎𝑠𝑘 = 𝐿𝑐1 +𝐿𝑐2 +⋯+𝐿𝑐𝑘

𝐿𝑚𝑎𝑠𝑘 = 𝐿𝑐3

if) GT Class is 3

Training Phase

𝐿𝑚𝑎𝑠𝑘 = 𝐿𝑐1 +𝐿𝑐2 +⋯+𝐿𝑐𝑘

𝐿𝑚𝑎𝑠𝑘 = 𝐿𝑐3

if) GT Class is 3

Mask Branch Only Learns How to Mask independent of Class

Test Phase

Predicts Human MaskPredicts Car MaskPredicts Horse MaskPredicts ...

Test Phase

Predicts Human MaskPredicts Car MaskPredicts Horse MaskPredicts ...

Winner Takes All

Slide from Mask R-CNN Tutorial, K. He. ICCV 2017

Slide from Mask R-CNN Tutorial, K. He. ICCV 2017

Slide from Mask R-CNN Tutorial, K. He. ICCV 2017 Faster R-CNN, S. Ren, NIPS 2015

Slide from Mask R-CNN Tutorial, K. He. ICCV 2017

Deconv2x2 str2

Deconv2x2 str2

Slide from Mask R-CNN Tutorial, K. He. ICCV 2017 3x3 Conv4 Layer

Slide from Mask R-CNN Tutorial, K. He. ICCV 2017

1x1 Conv

1x1 Conv

Slide from Mask R-CNN Tutorial, K. He. ICCV 2017

Bbox Regression

Classification

RoI PoolingFixed Size Representation

Pooled Feature7x7

RoI Pooling (Fast R-CNN)• Input: Each RoI• Output: 7x7 Pooled Feature

RoI Align (Mask R-CNN)• Input: Each RoI• Output: 7x7 Pooled Feature

RoI Pooling (Fast R-CNN)• Input: Each RoI• Output: 7x7 Pooled Feature

RoI Align (Mask R-CNN)• Input: Each RoI• Output: 7x7 Pooled Feature

Feature Map

RoI

Note: Region Proposal Network RoI Prediction = Floating Point Representation

Feature Map

RoI

Feature Map

RoI

Feature Map

RoI

Max Pooling

Feature Map

RoI

Max Pooling

Feature Map

RoI

Feature Map

RoI

Feature Map

RoI

2x2 Subcells for Precision

= 0.15 + 0.25

+ 0.25 + 0.35

RoI

Feature Map

RoI

2x2 Subcell Max Pooling

Bbox Regression

Classification

RoI Align

Bbox Regression

Objectness

RPN

Binary Mask

Bbox Regression

Classification

RoI Align

Bbox Regression

Objectness

RPN

Binary Mask

Paste Back

Slide from Mask R-CNN Tutorial, K. He. ICCV 2017

• Faster R-CNN + ResNetDeep Residual Learning for Image Recognition, K He, 2016 CVPR

• Faster R-CNN + FPNFeature Pyramid Networks for Object Detection, T.Y.Lin 2017 CVPR

• Faster R-CNN + ResNetDeep Residual Learning for Image Recognition, K He, 2016 CVPR

• Faster R-CNN + FPNFeature Pyramid Networks for Object Detection, T.Y.Lin 2017 CVPR

Faster R-CNN + Binary Mask Prediction + FCN + RoIAlign

Faster R-CNN + Binary Mask Prediction + FCN + RoIAlign

Detection Performance Improvement

Q&A?