+ All Categories
Home > Documents > PIXOR: Real-time 3D Object Detection from Point Cloudsbyang/projects/pixor/pixor_poster.pdf ·...

PIXOR: Real-time 3D Object Detection from Point Cloudsbyang/projects/pixor/pixor_poster.pdf ·...

Date post: 18-Mar-2020
Category:
Upload: others
View: 11 times
Download: 0 times
Share this document with a friend
1
PIXOR: Real-time 3D Object Detection from Point Clouds Bin Yang, Wenjie Luo, Raquel Urtasun Uber Advanced Technologies Group, University of Toronto Summary Ø 3D object detection is crucial for autonomous driving. Ø LIDAR data is widely used for accurate 3D perception. Ø Most LIDAR based 3D detectors run slowly, either because of the 3D LIDAR representation or a two-stage proposal based detection framework. Ø Approach: Single-shot, proposal-free detector that operates on bird’s eye view (BEV) LIDAR representation Ø Performance: State-of-the-art 3D object detection (1st on KITTI) with real-time speed (~28 FPS) BEV C ar Detection on KITTI Ø Dataset: 7,481 frames for training; 7,518 frames for testing. Ø Input: X [0, 70m], Y [-40m, 40m], 0.1m resolution Ø Runtime ablation on a TITAN Xp GPU: 35 ms = 1ms voxelization + 31ms network + 3ms NMS Conclusion Ø 3D detection can be accurate and real-time at the same time! LIDAR Representation Ø BEV voxelization: Height as channels Method Data Time/ms AP_mod. AP_easy AP_hard 3D FCN LIDAR >5000 62.54 69.54 55.94 MV3D LIDAR 240 77.00 85.82 68.94 VxNet LIDAR 225 79.26 89.35 77.39 NVLidarNet LIDAR 100 80.04 84.44 74.31 PIXOR LIDAR 35 81.92 87.25 76.01 Network Architecture Detection Loss Ø Object parameterization: {cos2θ, sin2θ, dx, dy, log(W), log(L)} Ø Multi-task loss: focal loss + smooth L1 loss BEV Car Detection on TOR4D Ø TOR4D: a large-scale 3D object detection benchmark collected at Uber ATG with over 1 million frames. Ø Training/validation/testing set: 5000/500/1000 video sequences Ø Input : X [-100m, 100m], Y [-40m, 40m], 0.2m resolution Ø Inference time: 24 ms network on a 1080TI GPU 80070423 33, 32 33, 32 Res_block_2 24-24-96, /2, #3 11, 196 Res_block_3 48-48-192, /2, #6 Res_block_4 64-64-256, /2, #6 Res_block_5 96-96-384, /2, #3 Upsample_6 128, 2 Upsample_7 96, 2 33, 96 33, 96 33, 96 33, 96 33, 1 33, 6 2001761 2001766 Deconv 33, 128, 2 Conv 11, 128 + Backbone Header Ø ResNet backbone with FPN multi-scale feature fusion. Ø Fully-convolutional header shared by classification and regression tasks. Ø Output pixel-wise dense predictions. Ø No pre-trained weights used. dy dx θ vehicle heading Rescaled version of groundtruth box with factor ρ_pos. Pixels inside are positive. Rescaled version of groundtruth box with factor ρ_neg. Pixels outside are negative. 3D LIDAR point cloud BEV LIDAR representation intensity occupancy Voxelize AVOD F-PointNet NVLidarNet VxNet MV3D PIXOR
Transcript

PIXOR: Real-time 3D Object Detection from Point CloudsBin Yang, Wenjie Luo, Raquel Urtasun

Uber Advanced Technologies Group, University of Toronto

SummaryØ 3D object detection is crucial for autonomous driving.Ø LIDAR data is widely used for accurate 3D perception.Ø Most LIDAR based 3D detectors run slowly, either because of the

3D LIDAR representation or a two-stage proposal based detection framework.

Ø Approach: Single-shot, proposal-free detector that operates onbird’s eye view (BEV) LIDAR representation

Ø Performance: State-of-the-art 3D object detection (1st on KITTI) with real-time speed (~28 FPS)

BEV Car Detection on KITTIØ Dataset: 7,481 frames for training; 7,518 frames for testing.Ø Input: X [0, 70m], Y [-40m, 40m], 0.1m resolutionØ Runtime ablation on a TITAN Xp GPU:

• 35 ms = 1ms voxelization + 31ms network + 3ms NMS

ConclusionØ 3D detection can be accurate and real-time at the same time!

LIDAR RepresentationØ BEV voxelization: Height as channels

Method Data Time/ms AP_mod. AP_easy AP_hard3D FCN LIDAR >5000 62.54 69.54 55.94MV3D LIDAR 240 77.00 85.82 68.94VxNet LIDAR 225 79.26 89.35 77.39NVLidarNet LIDAR 100 80.04 84.44 74.31PIXOR LIDAR 35 81.92 87.25 76.01

Network Architecture

Detection LossØ Object parameterization: {cos2θ, sin2θ, dx, dy, log(W), log(L)}

Ø Multi-task loss: focal loss + smooth L1 loss

BEV Car Detection on TOR4DØ TOR4D: a large-scale 3D object detection benchmark collected

at Uber ATG with over 1 million frames.Ø Training/validation/testing set: 5000/500/1000 video sequencesØ Input : X [-100m, 100m], Y [-40m, 40m], 0.2m resolutionØ Inference time: 24 ms network on a 1080TI GPU

800�704�23

3�3, 323�3, 32

Res_block_224-24-96, /2, #3

1�1, 196

Res_block_348-48-192, /2, #6

Res_block_464-64-256, /2, #6

Res_block_596-96-384, /2, #3

Upsample_6128, �2

Upsample_796, �2

3�3, 963�3, 96

3�3, 963�3, 96

3�3, 1 3�3, 6

200�176�1 200�176�6

Deconv3�3, 128, �2

Conv1�1, 128 +

Backbone

Header

Ø ResNet backbone with FPN multi-scale feature fusion.

Ø Fully-convolutional header shared by classification and regression tasks.

Ø Output pixel-wise dense predictions.

Ø No pre-trained weights used.

dydx

θ

vehicleheading

Rescaled version of groundtruth box withfactor ρ_pos. Pixels inside are positive.

Rescaled version of groundtruth box withfactor ρ_neg. Pixels outside are negative.

3DLIDAR point cloud BEVLIDARrepresentation

intensity

occupancyVoxelize

AVODF-PointNet

NVLidarNet VxNet

MV3D

PIXOR

Recommended