Ben Eckart, NVIDIA Research, Learning and Perception Group, 3/20/2019
GPU-ACCELERATED 3D POINT CLOUD PROCESSING WITH HIERARCHICAL GAUSSIAN MIXTURES
2
3D POINT CLOUD DATABasic data type for unstructured 3D data
Emergence of commercial depth sensors has made it ubiquitous
2
3
POINT CLOUD PROCESSING CHALLENGESPoints are non-differentiable, non-probabilistic
Large amounts of often noisy data
Often spatially redundant, wide ranging density variance
4
PREVIOUS APPROACHESWhat have people done before?
Discrete Approaches
Voxel Grids/Lists, Octrees, TSDFs
Though efficient, they inherit the same non-differentiable, non-probabilistic problems as point clouds
OctoMap
5
PREVIOUS APPROACHESWhat have people done before?
Continuous Approaches
Gaussian Mixture Models, Gaussian Processes
Though theoretically attractive, in practice tend to be too slow for many applications
GMM Gaussian Process
Proposal: Hierarchical Gaussian Mixture
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
“Level 2” GMM
“Level 3”
“Level 4”
Goals:
Efficiencybenefits of hierarchical structures like Octree
Theoretical benefits of a probabilistic generative model
Talk Overview• Background
– Theory of generative modeling for point clouds
• Single-Layer Model (GMMs)– GPU-Accelerated Construction Algorithm
– Benefits: Compact and Data-Parallel
– Limitations: Scaling with model size, lack of memory coherence
• Hierarchical Models (HGMMs) – GPU-Accelerated Construction Algorithm
– Benefits: Fast and Parallelizable on GPU
– Application: Registration
8
STATISTICAL / GENERATIVE MODELS
Interpret point cloud data (PCD) as an iid sampling of some unknown latent spatial probabilistic function
Generative property: Full joint probability space is represented
Model
• Given a set of parameters describing the model, find the parameters that best “explain” the data (Maximum Data Likelihood)
Modeling as an MLE Optimization
ModelData
Parametric Model as a Modified GMM
Interpret point cloud data as an iid sampling from a small number (J << N) of Gaussian and Uniform Distributions:
GMM for Point Clouds: Intuition
Point samples representing pieces of the same local geometry could be aggregated into clusters with the local geometry encoded inside the covariance of that cluster.
12
SOLVING FOR THE MLE GMM PARAMETERS
Typically done via the Expectation Maximization (EM) Algorithm
E Step M𝚯 Step
Update 𝚯
𝚯𝒊𝒏𝒊𝒕 𝚯𝒇𝒊𝒏𝒂𝒍
Point Cloud
EM Algorithm
Update point-cluster associations
E Step: A Single PointZ
zi
For each point z, we want to find the relative likelihood (expectation) of it having been generated by each cluster
𝑂(𝑁)
Z
We calculate the probability of each point with respect to each J Gaussian cluster. The expected
associations are denoted by the NxJ matrix γ
zi
𝑂(𝐽)
E Step: Expectation Vector
15
M STEP: CLOSED FORM WEIGHTED SUMS
For the GMM case, the M Step has closed form
solutions given the NxJ matrix γ:
“Probabilistic generalization of K-Means Clustering”
GPU Data Parallelism
GMM Model Limitations
• Each point needs to access all Jcluster parameters in CUDA (poor memory locality and linear scaling with J)
• NxJ expectation matrix mostly sparse (thus wasted computation)
• Static number of Gaussians that must be set a priori
zi𝑂(𝐽)
18
HIERARCHICAL GAUSSIAN MIXTURE
Suppose we restrict J to be only 8 Gaussians
The model would fit entirely in shared memory for each CUDA threadblock, removing need for global memory accesses
The expectation matrix will be dense (Nx8)
18
19
HIERARCHICAL GAUSSIAN MIXTURE
After convergence of the J=8 GMM, we can use the Nx8expectation matrix as a partition function
Each point is partitioned via its maximum expectation
Now we have 8 partitions of roughly size N/8
19
20
HIERARCHICAL GAUSSIAN MIXTURE
We can now run the algorithm recursively on each partition
Each partition contains ~N/8 points that will be modeled as another J=8 GMM
Note that this will produce 64 clusters in total
20
21
PARALLEL PARTITIONING USING CUDAGiven each point's max expectation and associated cluster index, we can "invert" this index using parallel scans to group together point ID's having same partition #:
[0 0 1 0 1 1 1 2 0 2 2 2] ➔ [[0 1 3 8] [2 4 5 6] [7 9 10 11]]
Now we can run a 2D cuda kernel where
Dimension 1: index into original point cloud
Dimension 2: cluster of the parent
e.g. 3 clusters, 12 points, 2 threads/threadblock ➔ grid size of (2, 3)
Cluster 1 Cluster 2 Cluster 3
22
HGMM COMPLEXITYEven though we now have 64 clusters, we only need to query 8 clusters for each point (avoiding the computation of all NxJ (sparse) expectations)
Due to the 2D cuda grid and indexing structure, this segmentation of the points into 64 clusters is the exact same complexity/speed as the original "simple" J=8 GMM.
Thus, we can keep increasing the complexity of the model eightfold while incurring only a linear time penalty
23
HGMM ALGORITHM
Small EM algorithms (8 clusters at a time) are recursively performed on increasingly smaller partitions of the point cloud data
E Step: Associate points to clusters
M Step: Update mixture means, covariances, and weights
Partition Step: Before each recursion step, new point partitions are determined by maximum likelihood point-cluster associations from last E Step
24
HGMM DATA STRUCTURE
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
GMMJ=8
“Level 2” GMM
“Level 3”
“Level 4”
Efficiencybenefits of hierarchical structures like Octree
Theoretical benefits of a probabilistic generative model
25
E Step Performance
Compactness vs Fidelity
27
COMPACTNESS VS FIDELITYReconstruction Error (PSNR) vs Model Size (kB)
20 kB
28
MODELING LARGE POINT CLOUDS
HGMM Level 6: <12 MB
Volume created from stochastically sampled Marching Cubes
Visualization is real-time: ~20 fps on Titan X
Endeavor Snapshots: ~80 GB of Point Cloud Data each
29
ENDEAVOR DATA: BILLIONS OF POINTS
30
APPLICATION: RIGID REGISTRATION
Point-sampled surfaces displaced by some rigid transformation
Recover translation, rotation that best overlaps point clouds
31
Goal: Maximize data likelihood over Tgiven some probability model θ
MLE over Space of Rotations, Translations
Registration as EM with HGMM
Outdoor Urban Velodyne Data
• Velodyne VLP-16– ~15k pts/frame
– ~10 frames/sec
• Frame-to-Frame model-building and registration with overlap estimation
HGMM-Based Registration
• Average Frame-to-Frame Error: 0.0960
Robust Point-to-Plane ICP
• Average Frame-to-Frame Error: 0.1519
• best result on libpointmatcher
35
Speed vs Accuracy Trade-Off
Test: Random transformations of point cloud pairs while varying the subsampling rate.
Less subsampling yields better accuracy, but slower speeds.
Bottom left is fastest and most accurate.
Our proposed methods are red/teal/black.
Our
Proposed
Methods
36
HGMM COMING TO ISAAC
~350 fps on Titan Xp
~30 fps on Xavier
Error: ~0.05° yaw
(median, 4 Hz
updates)
37
DRIVEWORKS (Future Release)
With VelodyneHDL-64E:
~300 FPS on Titan Xp
~30 FPS on Xavier
38
DNN-BASED STEREO DEPTH MAPS
39
FINAL REMARKS
HGMM’s have many nice properties for modeling point clouds:
Efficient: Fast to compute via CUDA/GPU, even scaling to billions of points
Multi-Level: Can well-model the data distribution at multiple levels simultaneously
Probabilistic: allows Bayesian optimization for applications like registration
Compact and Continuous: no voxels and no aliasing artifacts, easy to transform
40
QUESTIONS?
42
REGISTRATION FROM DNN-BASED STEREONoisy point cloud output is well-suited for HGMM representation
43
Frame-to-frame registration from point cloud data only (no depth maps), subsampled to 2000 points, first 100 frames. Histograms of average Euler angle error per frame shown.
GMM-Based
ICP-Based
Proposed
Stanford Lounge Dataset (Kinect)
44
Noise Handling
• Test: Random (uniform) noise injected at increasing amounts
• Result: Mixture component “stick” to areas of geometrically coherent, dense areas, disregarding areas of noise
46
SAMPLING FOR PROBABILISTIC OCCUPANCY
Ƹ𝑝 = 𝐿Σ𝑝 + 𝜇∀ 𝜇, Σ ∈ Θ
47
MESHING UNDER NOISE
48
ADAPTIVE MULTI-SCALE
49
MULTI-SCALE MODELINGMultilevel cross-sections can be adaptively chosen for robustness
50
E Step: Parallelized Tree Search
Point-model associations are found through parallelized adaptive tree search in CUDA.
Complexity() is defined to be , but other suitable heuristics are possible.
Adaptive Thresholding Finds the Most Appropriate Scale to Associate Point Data to the Point Cloud Model
51
The resulting form (1) is a weighted sum-of-squared Mahalanobis distances, further reduced to (2) by writing in terms of
sufficient statistics M𝑗{0,1}
.
We seek the transformation that maximizes the expected joint log-likelihood of our data and latent associations wrt the posterior over our current association estimates.
M-Step: Mahalanobis Estimation
Lastly, covariance eigendecomposition produces an equivalent weighted point-to-plane distance measure (3), which we can solve efficiently with least squares.
(3)
(1)
(2)