Date post: | 21-Jan-2018 |
Category: |
Technology |
Upload: | universitat-politecnica-de-catalunya |
View: | 940 times |
Download: | 0 times |
T-CNNObject Detection from Video
Kang, Kai and Ouyang, Wanli and Li, Hongsheng and Wang, Xiaogang
CVPR 2016
[arxiv] [code]
Slides by Andrea Ferri ([email protected])Computer Vision Reading Group @ UPC BarcelonaTech (Spring 2016)
Summary:
•Introduction;•Architecture;
I. Still-Image Detection;II. MCS & MGP;
III. Tubelet Re-Scoring;
•Experiment.
Introduction:
DET & VID challenges
are strongly DIFFERENT.
DET applied to VID has:→ Large Temporal Fluctuations→ Generate False Positives
T-CNN means:Tubelets - Convolutional
Neural Network Where Tubelets are:
Bounding Box Sequences Having:• Temporal Information;• Contextual Information.
Architecture:
T-CNN is a composition of nowadays State of the Art:• Still-Image Object Detection;• Object Tracking Algorithm;• A Lot of Cool Tricks.
I. Still-Image DetectionThe used Detectors are:•DeepID-Net (Improvement of R-CNN);•CRAFT (Extension of Fast R-CNN).Both use different Region Proposal pre-trained models and training strategies.
II. MCS & MGPMulti-Context Suppression
Multi-Context Suppression
→ Sort all detection scores of all proposals in a video in descending order
→ The classes of the high rankings are denoted as the confident
→ The scores of classes with low rankings are suppressed, while the scores of confident classes remain unchanged.
Motion-Guided Propagation
Motion-Guided Propagation
→ In each frame, some objects are not found by detector. However, detections on adjacent frames are complementary to each other;
→Detections are propagated to adjacent frames. Optical flow is used for guiding the propagation;
→Propagation results in redundant boxes, which can be easily handled by non- maximum suppression (NMS).
III. Tubelet Re-Scoring
1.High Confidence Tracking;
2.Spatial Max Pooling;
3.Temporal Re-Scoring.
High Confidence Tracking
1 → Obtain detection results from still-image detectors;
2 → Choose high-confidence detections as starting points (anchors) for tracking;
3 → Obtain tubelets, which are bounding box sequences generated from tracking algorithms.
Spatial Max Pooling
- Still-image detection results that have large overlaps with tubelet boxes are chosen for each tubelet;
- Only detections with maximum detection scores are left after spatial max-pooling;
Used the Kalman Filter to smooth the bounding box locations.
Temporal Re-Scoring
• Tubelet Classification. Classify tubelets based on statistics of detection scores (mean, median, top-k). A linear classifier is learnt based on the statistics;
• Tubelet Re-scoring. Map detection scores of positive tubelets to [0.5, 1], negative ones to [0, 0.5].
Used a Bayesian Classifier.
Experiments:
•Tricky work behind Dataset for training (Dataset Ratio 2:1=DET:VID);•Main Parameters:•MGP: 7 Frames;•MCS: 0,0003 Top classes of Boxes;
Results:
Reference:
• T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos : Kai Kang, Hongsheng Li, Junjie Yan, Xingyu Zeng, Bin Yang, Tong Xiao, Cong Zhang, Zhe Wang, Ruohui Wang, Xiaogang Wang, and Wanli Ouyang.
Andrea Ferri, [email protected]