Training models for road scene understanding with automated ground truth
Dan Levi
With: Noa Garnett, Ethan Fetaya, Shai Silberstein, Rafi Cohen, Shaul Oron, Uri Verner, Ariel Ayash, Kobi Horn, Vlad Golder, Tomer Peer
GM Advanced Technical Center in Israel (ATCI)
Agenda
• Road scene understanding
• Acquiring training data with automated ground truth (AGT)
• Test cases:• General obstacle detection• Road segmentation• General obstacle classification• Curb detection• Freespace
• Challenges and limitations
• Summary and future work
On-board road scene understanding
Static:
• Road edge
• Road markings, complex lane understanding
• Signs
• Obstacles: clutter, construction zone cones
Dynamic:
• Classified objects (cars, pedestrians, bicycles, animals …)
• General obstacles: animals, carts
Obstacle detection: general and category based
General obstacles, freespace, road segmentation
- Road segmentation (vision)Mono-camera using semantic segmentation
- General obstacle detection - Freespace (all non-flat road delimiters)3D sensors (Stereo, Lidar)
Training Data
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era [Sun et al. 2017]
Manual Annotation
• Time: ~60 min per image
• ~1000 annotators
The Cityscapes Dataset for Semantic Urban Scene Understanding [Cordts et al. 2016]
Computer graphics simulated data
• Photo-realism
• Scenario generation
Simulated data for optical flow
FlowNet: Learning Optical Flow with Convolutional Networks [Dosovitskiy et. al. 2015]
Automated ground truth(AGT) / Cross-sensor learning
Velodyne LIDAR
Depth from a single image
Semi-Supervised Deep Learning for Monocular Depth Map Prediction [Kuznietsov et al. 2017]
Map supervised road detection
Map-supervised road detection [Laddha et al. 2016]
AGT for road scene understanding – general setup
“Target” sensors
Perquisite: Full alignment and synchronization between sensors
“Supervising” sensors
AGT for road scene understanding: scheme
“Supervising” sensors:
“Target” sensors:
Task: object detection
Data
AGT:1. Compute Task on Supervising sensors:- Offline- Temporal
Ground truth
AGT:2. Project output to target sensor domain
Automated ground truth / Cross-sensor learning
1. Solve an “easier” problem- Run time- Completeness
2. Promise- Scalability- Continuous (un-bounded)
improvement
1. Challenging setup2. Annotation quality / accuracy3. Inherent limitations of “supervisor”:- Learning beyond supervisor
capabilities- Learning from the same sensor
(bootstrapping)
AGT for General obstacle detection
“Supervising” sensors:
“Target” sensors:
Task: General obs. Det.
Data
Ground truthAGT
Velodyne HDL64
Front camera
StixelNet: Monocular obstacle detection
Levi, Dan, Noa Garnett, Ethan Fetaya. StixelNet : A Deep Convolutional Network for Obstacle Detection and Road Segmentation. In BMVC 2015.
StixelNet column based approach
INPUT
OUTPUT
AGT for obstacle detection – version I
KITTI Dataset [Geiger et al. 2013]
Velodyne LIDAR
AGT for obstacle detection – version I
• Raw images: 56 sequences (50 Train, 6 Test). • 6,000 train images (every 5th frame) and 800 test.• Ground truth result:• After GT: 331K training columns and 57K testing
AGT for general obstacles in image plane
Figure taken from [Fernandes et al. 2015]
1. Project Lidar points to image plane 2. Interpolate depth to all pixels
3. Find columns with depth profile typical to transition: road roughly vertical obstacle
General obstacles AGT examples
Limitations:- Cannot handle: close obstacles, ‘’clear’’ columns- Low coverage (~30%)
INPUT OUTPUT
64200
5
3
Max pooling (8X4)
45
5
11
1
1024
Dense Dense Dense
Layer 1:convolution
al
Layer 2:convolution
al
Layer 3: fully connected
Layer 4:fully
connected
Layer 5:fully
connected
11
5
24
370
2048
50
3
Max pooling (4X3)
StixelNet (v1) 5 Layer CNN
y
Experimental Results (Max Probability)
Stereo [Badino et al. 2009]“StixelNet”
Comparison with stereo
AGT for Obstacle classification
“Supervising” sensors:
“Target” sensors:
Task: Obstacle classification
Data
Ground truthAGT
Front camera
AGT for obstacle classification
Image based detection
Source: http://self-driving-future.com/the-eyes/velodyne/
Lidar based verification
Obstacle classification trained net result: pedestrians
AGT for General obstacle detection (ver 2)
“Supervising” sensors:
“Target” sensors:
Task: General obs. Det.
Data
Ground truthAGT
Velodyne HDL64
Front camera
Unified network: StixelNet + Object detection + Object pose estimation
Noa Garnett, Shai Silberstein, Shaul Oron, Ethan Fetaya, Uri Verner, Ariel Ayash, Vlad Goldner, Rafi Cohen, Kobi Horn, Dan Levi. Real-time category-based and general obstacle detection for autonomous driving. CVRSUAD Workshop, ICCV2017.
Object-centric obstacle detection AGTEstimate and subtract road
plane
3D Clustering (objects above
20cm)
Project to image, smooth
Bottom contour via dynamic
programming
Detect clear columns:No object above 5cm + far enough returns
‘’near’’ obstacles:1. Below lidar
coverage2. During training
General obstacles: old vs. new AGT
New general obstacle dataset with fisheye lens camera
#images #instances (columns)
Kitti--train 6K 5M
Internal-train
16K 20M
Kitti-test 760 11K
Internal-test 910 19K
StixelNet2: New network architecture
Improved results on KITTI
0
0.2
0.4
0.6
0.8
1
Kitti - max Pr. Internal - max Pr. kitti - avg. Pr Internal - avg. Pr
Old New
Experimental results with new AGT
Experimental results with new AGT
0
0.2
0.4
0.6
0.8
1
Kitti - max Pr. Internal - max Pr. kitti - avg. Pr Internal - avg. Pr
Chart TitleOld New
Edge cases excluded (“near”, “clear”)
Cross dataset generalization
0
0.2
0.4
0.6
0.8
1
kitti-test max kitti-test avg internal-testmax
internal-testavg
Train on KITTI Train on internal Train on both
AGT for car pose estimation
“Supervising” sensors:
“Target” sensors:
Task: pose estimation
Data
Ground truthAGT
IMU
AGT for pose estimation
Source: http://self-driving-future.com/the-eyes/velodyne/
Multi sensor, temporal object detection
8 orientation bins pose representation
Dynamic Static
Pose estimation
trained with mixed AGT and Manual
AGT for Curb detection
Curb detection trained net result examples
AGT for freespace
“Supervising” sensors:
“Target” sensors:
Task: freespace
Data
Ground truthAGT
AGT for freespace with 3D beams
Analyze single Lidar “Beam”
Estimate and subtract road plane
Project freespace limit to image plane, find ‘’near’’ and ‘’clear’’
Project limit to ground plane
Velodyne
Velodyne scan direction
Obstacles vs. Freespace AGT
Freespace + object detection + car 3D pose
Freespace + object detection + car 3D pose
Freespace + object detection + car 3D pose
Freespace + object detection + car 3D pose
Freespace + object detection + car 3D pose
Finetuning from AGT: road segmentation
1. Fine-tune on KITTI Road segmentation (manually labelled)
2. Graph-cut segmentation
3. State-of-the-art accuracy among non-anonymous (94.88% MaxF)
AGT challenges: How accurate is the AGT?
AGT challenges: calibration, synchronization
AGT Perception mistakes
Non-flat road
Assumptions / coverage
Thank you!