+ All Categories
Home > Documents > GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through...

GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through...

Date post: 31-May-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
51
Ben Eckart, NVIDIA Research, Learning and Perception Group, 3/20/2019 GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH HIERARCHICAL GAUSSIAN MIXTURES
Transcript
Page 1: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

Ben Eckart, NVIDIA Research, Learning and Perception Group, 3/20/2019

GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH HIERARCHICAL GAUSSIAN MIXTURES

Page 2: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

2

3D POINT CLOUD DATABasic data type for unstructured 3D data

Emergence of commercial depth sensors has made it ubiquitous

2

Page 3: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

3

POINT CLOUD PROCESSING CHALLENGESPoints are non-differentiable, non-probabilistic

Large amounts of often noisy data

Often spatially redundant, wide ranging density variance

Page 4: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

4

PREVIOUS APPROACHESWhat have people done before?

Discrete Approaches

Voxel Grids/Lists, Octrees, TSDFs

Though efficient, they inherit the same non-differentiable, non-probabilistic problems as point clouds

OctoMap

Page 5: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

5

PREVIOUS APPROACHESWhat have people done before?

Continuous Approaches

Gaussian Mixture Models, Gaussian Processes

Though theoretically attractive, in practice tend to be too slow for many applications

GMM Gaussian Process

Page 6: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

Proposal: Hierarchical Gaussian Mixture

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

“Level 2” GMM

“Level 3”

“Level 4”

Goals:

Efficiencybenefits of hierarchical structures like Octree

Theoretical benefits of a probabilistic generative model

Page 7: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

Talk Overview• Background

– Theory of generative modeling for point clouds

• Single-Layer Model (GMMs)– GPU-Accelerated Construction Algorithm

– Benefits: Compact and Data-Parallel

– Limitations: Scaling with model size, lack of memory coherence

• Hierarchical Models (HGMMs) – GPU-Accelerated Construction Algorithm

– Benefits: Fast and Parallelizable on GPU

– Application: Registration

Page 8: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

8

STATISTICAL / GENERATIVE MODELS

Interpret point cloud data (PCD) as an iid sampling of some unknown latent spatial probabilistic function

Generative property: Full joint probability space is represented

Model

Page 9: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

• Given a set of parameters describing the model, find the parameters that best “explain” the data (Maximum Data Likelihood)

Modeling as an MLE Optimization

ModelData

Page 10: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

Parametric Model as a Modified GMM

Interpret point cloud data as an iid sampling from a small number (J << N) of Gaussian and Uniform Distributions:

Page 11: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

GMM for Point Clouds: Intuition

Point samples representing pieces of the same local geometry could be aggregated into clusters with the local geometry encoded inside the covariance of that cluster.

Page 12: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

12

SOLVING FOR THE MLE GMM PARAMETERS

Typically done via the Expectation Maximization (EM) Algorithm

E Step M𝚯 Step

Update 𝚯

𝚯𝒊𝒏𝒊𝒕 𝚯𝒇𝒊𝒏𝒂𝒍

Point Cloud

EM Algorithm

Update point-cluster associations

Page 13: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

E Step: A Single PointZ

zi

For each point z, we want to find the relative likelihood (expectation) of it having been generated by each cluster

𝑂(𝑁)

Page 14: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

Z

We calculate the probability of each point with respect to each J Gaussian cluster. The expected

associations are denoted by the NxJ matrix γ

zi

𝑂(𝐽)

E Step: Expectation Vector

Page 15: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

15

M STEP: CLOSED FORM WEIGHTED SUMS

For the GMM case, the M Step has closed form

solutions given the NxJ matrix γ:

“Probabilistic generalization of K-Means Clustering”

Page 16: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

GPU Data Parallelism

Page 17: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

GMM Model Limitations

• Each point needs to access all Jcluster parameters in CUDA (poor memory locality and linear scaling with J)

• NxJ expectation matrix mostly sparse (thus wasted computation)

• Static number of Gaussians that must be set a priori

zi𝑂(𝐽)

Page 18: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

18

HIERARCHICAL GAUSSIAN MIXTURE

Suppose we restrict J to be only 8 Gaussians

The model would fit entirely in shared memory for each CUDA threadblock, removing need for global memory accesses

The expectation matrix will be dense (Nx8)

18

Page 19: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

19

HIERARCHICAL GAUSSIAN MIXTURE

After convergence of the J=8 GMM, we can use the Nx8expectation matrix as a partition function

Each point is partitioned via its maximum expectation

Now we have 8 partitions of roughly size N/8

19

Page 20: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

20

HIERARCHICAL GAUSSIAN MIXTURE

We can now run the algorithm recursively on each partition

Each partition contains ~N/8 points that will be modeled as another J=8 GMM

Note that this will produce 64 clusters in total

20

Page 21: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

21

PARALLEL PARTITIONING USING CUDAGiven each point's max expectation and associated cluster index, we can "invert" this index using parallel scans to group together point ID's having same partition #:

[0 0 1 0 1 1 1 2 0 2 2 2] ➔ [[0 1 3 8] [2 4 5 6] [7 9 10 11]]

Now we can run a 2D cuda kernel where

Dimension 1: index into original point cloud

Dimension 2: cluster of the parent

e.g. 3 clusters, 12 points, 2 threads/threadblock ➔ grid size of (2, 3)

Cluster 1 Cluster 2 Cluster 3

Page 22: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

22

HGMM COMPLEXITYEven though we now have 64 clusters, we only need to query 8 clusters for each point (avoiding the computation of all NxJ (sparse) expectations)

Due to the 2D cuda grid and indexing structure, this segmentation of the points into 64 clusters is the exact same complexity/speed as the original "simple" J=8 GMM.

Thus, we can keep increasing the complexity of the model eightfold while incurring only a linear time penalty

Page 23: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

23

HGMM ALGORITHM

Small EM algorithms (8 clusters at a time) are recursively performed on increasingly smaller partitions of the point cloud data

E Step: Associate points to clusters

M Step: Update mixture means, covariances, and weights

Partition Step: Before each recursion step, new point partitions are determined by maximum likelihood point-cluster associations from last E Step

Page 24: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

24

HGMM DATA STRUCTURE

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

GMMJ=8

“Level 2” GMM

“Level 3”

“Level 4”

Efficiencybenefits of hierarchical structures like Octree

Theoretical benefits of a probabilistic generative model

Page 25: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

25

E Step Performance

Page 26: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

Compactness vs Fidelity

Page 27: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

27

COMPACTNESS VS FIDELITYReconstruction Error (PSNR) vs Model Size (kB)

20 kB

Page 28: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

28

MODELING LARGE POINT CLOUDS

HGMM Level 6: <12 MB

Volume created from stochastically sampled Marching Cubes

Visualization is real-time: ~20 fps on Titan X

Endeavor Snapshots: ~80 GB of Point Cloud Data each

Page 29: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

29

ENDEAVOR DATA: BILLIONS OF POINTS

Page 30: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

30

APPLICATION: RIGID REGISTRATION

Point-sampled surfaces displaced by some rigid transformation

Recover translation, rotation that best overlaps point clouds

Page 31: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

31

Goal: Maximize data likelihood over Tgiven some probability model θ

MLE over Space of Rotations, Translations

Registration as EM with HGMM

Page 32: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

Outdoor Urban Velodyne Data

• Velodyne VLP-16– ~15k pts/frame

– ~10 frames/sec

• Frame-to-Frame model-building and registration with overlap estimation

Page 33: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

HGMM-Based Registration

• Average Frame-to-Frame Error: 0.0960

Page 34: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

Robust Point-to-Plane ICP

• Average Frame-to-Frame Error: 0.1519

• best result on libpointmatcher

Page 35: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

35

Speed vs Accuracy Trade-Off

Test: Random transformations of point cloud pairs while varying the subsampling rate.

Less subsampling yields better accuracy, but slower speeds.

Bottom left is fastest and most accurate.

Our proposed methods are red/teal/black.

Our

Proposed

Methods

Page 36: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

36

HGMM COMING TO ISAAC

~350 fps on Titan Xp

~30 fps on Xavier

Error: ~0.05° yaw

(median, 4 Hz

updates)

Page 37: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

37

DRIVEWORKS (Future Release)

With VelodyneHDL-64E:

~300 FPS on Titan Xp

~30 FPS on Xavier

Page 38: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

38

DNN-BASED STEREO DEPTH MAPS

Page 39: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

39

FINAL REMARKS

HGMM’s have many nice properties for modeling point clouds:

Efficient: Fast to compute via CUDA/GPU, even scaling to billions of points

Multi-Level: Can well-model the data distribution at multiple levels simultaneously

Probabilistic: allows Bayesian optimization for applications like registration

Compact and Continuous: no voxels and no aliasing artifacts, easy to transform

Page 40: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

40

QUESTIONS?

Page 41: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other
Page 42: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

42

REGISTRATION FROM DNN-BASED STEREONoisy point cloud output is well-suited for HGMM representation

Page 43: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

43

Frame-to-frame registration from point cloud data only (no depth maps), subsampled to 2000 points, first 100 frames. Histograms of average Euler angle error per frame shown.

GMM-Based

ICP-Based

Proposed

Stanford Lounge Dataset (Kinect)

Page 44: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

44

Page 45: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

Noise Handling

• Test: Random (uniform) noise injected at increasing amounts

• Result: Mixture component “stick” to areas of geometrically coherent, dense areas, disregarding areas of noise

Page 46: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

46

SAMPLING FOR PROBABILISTIC OCCUPANCY

Ƹ𝑝 = 𝐿Σ𝑝 + 𝜇∀ 𝜇, Σ ∈ Θ

Page 47: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

47

MESHING UNDER NOISE

Page 48: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

48

ADAPTIVE MULTI-SCALE

Page 49: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

49

MULTI-SCALE MODELINGMultilevel cross-sections can be adaptively chosen for robustness

Page 50: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

50

E Step: Parallelized Tree Search

Point-model associations are found through parallelized adaptive tree search in CUDA.

Complexity() is defined to be , but other suitable heuristics are possible.

Adaptive Thresholding Finds the Most Appropriate Scale to Associate Point Data to the Point Cloud Model

Page 51: GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH ... · Point-model associations are found through parallelized adaptive tree search in CUDA. Complexity() is defined to be , but other

51

The resulting form (1) is a weighted sum-of-squared Mahalanobis distances, further reduced to (2) by writing in terms of

sufficient statistics M𝑗{0,1}

.

We seek the transformation that maximizes the expected joint log-likelihood of our data and latent associations wrt the posterior over our current association estimates.

M-Step: Mahalanobis Estimation

Lastly, covariance eigendecomposition produces an equivalent weighted point-to-plane distance measure (3), which we can solve efficiently with least squares.

(3)

(1)

(2)


Recommended