Object Detection in Videos with Tubelet Proposal Networks
Proposals in Video Object Detection Tubelet Proposal Networks Qualitative Results
Framework
Encoder-decoder LSTM
YouTube-Objects DatasetExperimental Settings
Kai Kang1, Hongsheng Li1, Tong Xiao1, Wanli Ouyang1,4, Junjie Yan2, Xihui Liu3, Xiaogang Wang11The Chinese University of Hong Kong, 2SenseTime Group Limited
3Tsinghua University, 4The University of Sydney
Results on ImageNet VID Dataset
Frames
Per-frame staticproposals
Regression to GT Boxes(RegressBox)
Regression toGT Movement
(RegressMove, Ours)
t
y
x
Tubelet Proposal Network
Motion Prediction EncoderLSTM
DecoderLSTM
ClassLabel
TubeletFeatures
Tubelet Proposal Network Encoder-decoder LSTM
ClassificationCNN
Spatial Anchors t
y
x
TubeletCNN
Tubelet GenerationA
B
A A A A
B
B
B
B
4
2f
16
5f
W2
b2
W5
b5
Block Initialization
Iter 1Iter 2
Iter 3Iter 4
Iter 5
l
Parallel Generation
Class Label
Decoder LSTM
Encoder LSTM
Tubelet Features
Object localization on YouTube-Objects (YTO) dataset
Qualitative results on ImageNet VID validation set
Results on ImageNet VID validation set
Initialization of tubelet proposal networks