Minimizing Annotation Effort - Intelligent...

transcript

Minimizing Annotation Effort

Dr. Antonio M. López

antonio@cvc.uab.es

June 9th, 2019

ACKNOWLEDGMENTS

ICREA Academia Programme

ACKNOWLEDGMENTS

MICINN Project TIN2017-88709-R ("DANA")

AGAUR 2017-SGR-01597

CERCA (Centres de Recerca de Catalunya)

ACCIÓ (Generalitat de Catalunya)

2antonio@cvc.uab.es

Divide-&-Conquer Engineering View: Modular approach (Perception Local Maneuver)

3antonio@cvc.uab.es

Deep CNNs Need Annotated Data

Let’s labelling data for fun!

4antonio@cvc.uab.es

2008-10 2011 2012-14 2015-16 2017-19

1rst object detector

fully trained using

videogame data.

ECCV’14, ICCV’15, ECCV’16, ICCV’17, ECCV’18*

(*) Friday, Full day at room N1095ZG, VisDA Challenge

DA VirtualReal

for DPM

Virtual/Augmented Reality

for Visual Artificial Intelligence (VARVAI)

Deep Learning “starts”

for Computer Vision

Transferring & Adapting Source Knowledge

in Computer Vision (TASK-CV)

ECCV’16 & ACM-MM’16

’18: Computer Graphics for Autonomous Driving

Explosion on the use of synthetic

data in Computer Vision: GTA-V,

Internet Models, ...

AD Challenge

@ CVPR’19

5antonio@cvc.uab.es

Pure Data-Driven AI View & Naturalistic View: End-to-End Autonomous Driving

6antonio@cvc.uab.es

Imitation Learning: No manual supervision

7antonio@cvc.uab.es

ALVINN (1988)¹ DAVE (2005)²

1.D. Pomerleau. ALVINN: An autonomous land vehicle in a neural network. NIPS, 1988.

2.Y. LeCun, U. Muller, J. Ben, E. Cosatto, and B. Flepp. Off-road obstacle avoidance through end-to-end learning. NIPS, 2005.

8antonio@cvc.uab.es

Pure Data-Driven AI View & Naturalistic View: End-to-End Autonomous Driving (P&LP)

Still, many diverse experiences are required!

• SYNTHIA: co-training object detectors

• CARLA: multimodal end-to-end driving

11antonio@cvc.uab.es

Unlabelled

Real-world

Self-labelled

Real-world

Object

DetectorDetect

Self-Learning, under domain shift

source: SYNTHIA, target: real-world dataset.

Detections as

Labelled Data

Basic assumption:

Source model is relatively good detecting

on target data.

Basic idea:

1. Start with a detector trained on SYNTHIA.

2. Use the detector to process images of an

unlabelled real-world dataset (e.g. KITTI).

3. Select the M images with highest detection

scores. (Thr high precision, low recall).

4. Use detections and backgrounds from such

M images as self-labelled real-world data.

5. Retrain the detector with the SYNTHIA

data and the self-labelled data.

6. Keep doing 2-5 for C cycles.

Co-Training, under domain shift source: SYNTHIA, target: real-world dataset.

Unlabelled

Real-world

Self-labelled

Real-world

Data #1

Object

Detector #1

Detect

Detections as

Labelled Data

TrainObject

Detector #2

Self-labelled

Real-world

Data # 2

Detect

Basic assumptions:

1. Source models are good

detecting on target data.

2. Both detectors behave

essentially different.

Basic idea:

1. ~ Self-learning: one detector

(#1) sends to the other (#2)

the M images with most

confident detections.

2. ~ Discrepancy: from such M

images, the other detector

(#2) only keeps the N with

lowest confidence, N<M.

3. Parallel training.

4. Keep doing 1-3 for C cycles.

Pure Data-Driven AI View & Naturalistic View: End-to-End Autonomous Driving (P&LP)

… by Imitation/demonstration (behavior cloning)

?StraightLeft Right Nothing

Trajectory Planning

Branched Architecture

“End to End Driving via Conditional Imitation Learning”, Codevilla et al., ICRA’2018

“Monocular Depth Estimation by Learning from Heterogeneous Datasets”,

A. Gurram, O. Urfalioglu, I. Halfaoui, F. Bouzaraa, A.M. Lopez,

IEEE Intelligent Vehicles Symposium, 2018

Depth ground truth: KITTI LiDAR

Semantic ground truth: Cityscapes semantic segmenation

21antonio@cvc.uab.es 21

Phase 1 – Discrete depth estimation (i.e. classification).

Phase 1 – Semantic segmentation (classification).

Phase 2 – Depth regression.

KITTI: Training set (LiDAR ground truth) & Testing set

Quantitative results

Eigen et al. KITTI split. DRN - Depth regression network, DC-DRN Depth regression model with pre-trained classification network. DSC-DRN - Depth

regression network trained with the conditional flow approach for depth ranges 1-80m & 1-50m. In Godard approaches, "K" means using KITTI for

training, "CS + K" means using Cityscapes too. Bold stands for best, italics for second best.

Cityscapes Testing! (cross-domain generalization)

Photo-realistic SYNTHIA

Multimodal end-to-end driving: RGB+D multisensory / single-sensor (monocular)

Yi et al. (arXiv:1906.03199)

Address

Edifici O, Campus UAB

08193 Bellaterra

Barcelona

Phone & Fax

Direct Line: +34 93 581 2561

Fax: +34 93 581 1670

www.cvc.uab.es

E-contact

www.cvc.uab.es/~antonio

antonio@cvc.uab.es

Dr. Antonio M. López, Principal Investigator UAB & CVC ADAS Group

In conclusion, we are

lazy annotators!!!

Many Thanks!!!Q?

Minimizing Annotation Effort - Intelligent...

Documents