Maximizing Parallelism in the Construction of BVHs, Octrees, and k-d Trees
Tero Karras
NVIDIA Research
Trees
2
Trees
Path tracingPharr & Humphreys
Real-time ray tracingNVIDIA
Collision detectionAgeia
Particle simulationNVIDIA
Voxel-basedglobal illumination
Crassin et al.
Surface reconstructionAmenta et al.
Photon mappingUchida
Better Faster
3
Outline
Fastest existing methods are sequential Parallelize within each hierarchy level But not between levels
4
Outline
Fastest existing methods are sequential Parallelize within each hierarchy level But not between levels
Lack of parallelism Small workloads bottlenecked by top levels Sub-linear scaling of performance
5
Outline
Novel way to build the entire tree in parallel Two algorithmic “building blocks” Fast, scalable
6
Outline
Novel way to build the entire tree in parallel Two algorithmic “building blocks” Fast, scalable
Main focus: BVHs Point-based octrees and k-d trees also covered
in the paper
7
Bounding volume hierarchy
8
Bounding volume hierarchy
?
9
LBVH - Lauterbach et al. [2009]
1. Assign Morton codes2. Sort primitives3. Generate hierarchy4. Fit bounding boxes
p
px = 0.py = 0.pz = 0.
1 0 1 00 1 1 11 1 0 0
10
01
0
LBVH - Lauterbach et al. [2009]
1. Assign Morton codes2. Sort primitives3. Generate hierarchy4. Fit bounding boxes
p
px = 0.py = 0.pz = 0.
1 1 00 1 11 1 0
1 0 1 00 1 1 1
1 1 0 0
11
LBVH - Lauterbach et al. [2009]
1. Assign Morton codes2. Sort primitives3. Generate hierarchy4. Fit bounding boxes
p
px = 0.py = 0.pz = 0.
1 0 1 00 1 1 1
1 1 0 01 0 1 01 1 0 0code =
12
LBVH - Lauterbach et al. [2009]
1. Assign Morton codes2. Sort primitives3. Generate hierarchy4. Fit bounding boxes
13
LBVH - Lauterbach et al. [2009]
1. Assign Morton codes2. Sort primitives3. Generate hierarchy4. Fit bounding boxes
14
Binary radix tree
00010
00100
00101
10011
11000
11001
11110
00001
0 1 2 3 4 5 6 7
15
Binary radix tree
00010
00100
00101
10011
11000
11001
11110
00001
0 1 2 3 4 5 6 7
000
16
Binary radix tree
00010
00100
00101
10011
11000
11001
11110
00001
2 3 4 5 6 7
00
000
000
0 1
17
Binary radix tree
00010
00100
00101
10011
11000
11001
11110
00001
4
00
000
0 1
0010
2 3
1
11
7
1100
5 6
18
Binary radix tree
1
11
00
1100
0010
000
00010
00100
00101
10011
11000
11001
11110
00001
0 1 2 3 4 5 6 7
n
n-1
19
Longest common prefix
1
11
00
1100
0010
000
00010
00100
00101
10011
11000
11001
11110
00001
0 1 2 3 4 5 6 7
20
Longest common prefix
00
00001
000010
100100
200101
310011
411001
611110
711000
5
δ(5,6) = 4
δ(5,6) = 4
41100
11000
511001
6
4
21
Longest common prefix
200
00001
000010
100100
200101
310011
411001
611110
711000
5
δ(5,6) = 4
1100
δ(0,3) = ?
00001
000101
3
δ(0,3) = 2δ(0,3) = 2
2
4
22
Garanzha et al. [2011]
Level 3 1 node
Level 2 3 nodes
Level 0 1 node
00010
00001
00100
00101
10011
11000
11001
11110
Level 1 2 nodes1 2
0
4 5 3
0 1 2 3 4 5 6 7
6
23
Our method
00010
00001
00100
00101
10011
11000
11001
11110
0 1 2 3 4 5 6 7
?24
Our method
Define a numbering scheme for the nodes Gain some knowledge of their identity Establish a connection with the keys
25
Our method
Define a numbering scheme for the nodes Gain some knowledge of their identity Establish a connection with the keys
Find the children of a given node Only look at node index and nearby keys
26
Our method
Define a numbering scheme for the nodes Gain some knowledge of their identity Establish a connection with the keys
Find the children of a given node Only look at node index and nearby keys
Do this for all nodes in parallel27
Numbering scheme
00010
00100
00101
10011
11000
11001
11110
00001
0 1 2 3 4 5 6 7
0
3 4
28
Numbering scheme
00010
00100
00101
10011
11000
11001
11110
00001
0 1 2 3 4 5 6 7
0
3 4
5
29
Numbering scheme
00010
00100
00101
10011
11000
11001
11110
00001
0 1 2 3 4 5 6 7
1
6
1
6
3 3
0 0
44
5522
30
Numbering scheme
00010
00100
00101
10011
11000
11001
11110
00001
0 1 2 3 4 5 6 7
0
3 4
1 5
6
2
31
Algorithm
00010
00100
00101
10011
11000
11001
11110
00001
0 1 2 3 4 5 6 7
0
3 4
1 5
6
2
32
δ(2,3) = 4
Algorithm
00001
000010
100100
200101
310011
411000
511001
611110
7
?00100
200101
300101
310011
4
δ(3,4) = 03
δ(2,3) = 4
33
Algorithm
00001
000010
100100
200101
310011
411000
511001
611110
7
00101
310011
4
δ(3,4) = 03
δ(2,3) = 4
?δ(1,3) = 2 δ(0,3) = 2
34
Algorithm
00001
000010
100100
200101
310011
411000
511001
611110
7
3
δ(0,3) = 2
? δ(2,3) = 4 δ(1,3) = 2
35
Algorithm
00001
000010
100100
200101
310011
411000
511001
611110
7
3
δ(0,3) = 2
2 2
1 2
36
For each node i=0..n-2 in parallel:1. Determine direction of the range2. Expand the range as far as possible3. Find where to split the range4. Identify children
Algorithm
Binary search
37
Duplicate keys
The algorithm only works with unique keys Duplicates are common in practice
38
Duplicate keys
The algorithm only works with unique keys Duplicates are common in practice
Trick: Augment each key with its index Distinguishes between duplicates Keys are still in lexicographical order
39
Duplicate keys
The algorithm only works with unique keys Duplicates are common in practice
Trick: Augment each key with its index Distinguishes between duplicates Keys are still in lexicographical order
Tie-break when evaluating δ(i,j)40
LBVH
1. Assign Morton codes2. Sort primitives3. Generate hierarchy4. Fit bounding boxes
41
Lauterbach et al. [2009]
42
Our method
Need a different approach How many levels are there? Which nodes are located on a given level?
43
Our method
Need a different approach How many levels are there? Which nodes are located on a given level?
Traverse paths in the tree in parallel Start from leaves, advance toward the root Terminate threads using per-node atomic flags
44
Our method
45
Results
Evaluate performance on GTX 480 (Fermi) CUDA, 30-bit Morton codes
46
Results
Evaluate performance on GTX 480 (Fermi) CUDA, 30-bit Morton codes
Compare against Garanzha et al. [2011] Identical tree (top-level SAH splits disabled)
47
Results
Evaluate performance on GTX 480 (Fermi) CUDA, 30-bit Morton codes
Compare against Garanzha et al. [2011] Identical tree (top-level SAH splits disabled)
Simulate large GPUs N times as many cores N times the memory bandwidth 48
Results
49
Our method Garanzha et al.
Fairy Forest174K triangles
milliseconds
Morton Sort Build AABB Morton Sort Build AABB
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Morton Sort Build AABB Morton Sort Build AABB
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2 ×1 × cores
4 ×
Results
50
Our method Garanzha et al.
Fairy Forest174K triangles
milliseconds
Morton Sort Build AABB Morton Sort Build AABB
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Morton Sort Build AABB Morton Sort Build AABB
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
1.7 ×1.1 ×
12.5 ×
33.6 ×
1.3 ×
2.4 ×
Results
51
Our method
Turbine Blade1.77M triangles
milliseconds
Garanzha et al.
Morton Sort Build AABB Morton Sort Build AABB
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Morton Sort Build AABB Morton Sort Build AABB
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
3.7 ×
7.0 ×
0.8 ×
1.0 ×
11.625
2.252.875 3.5
4.1254.75
5.375 66.625
7.257.875
0
1
2
3
4
5
6
Results
11.625
2.252.875 3.5
4.1254.75
5.375 66.625
7.257.875
0
1
2
3
4
5
6
Garanzha et al.
52
Stanford Dragon871K triangles
Morton
Sort
Build
AABB
milliseconds
Our method
milliseconds
Acknowledgements
Timo Aila Samuli Laine David Luebke Jacopo Pantaleoni Jaakko Lehtinen
For helpful suggestions and proofreading.
53
Thank You
Questions
Backup slides
Pseudocode
56
ResultsCommon Build AABB
Scene Cores Eval Sort Our Prev Our Prev
Fairy Forest(174K tris)
1x 0.05 0.56 0.15 1.88 0.23 0.292x 0.03 0.33 0.09 1.75 0.13 0.224x 0.02 0.30 0.05 1.68 0.08 0.19
Conference Room(283K tris)
1x 0.08 0.78 0.24 1.93 0.35 0.382x 0.04 0.51 0.13 1.72 0.19 0.264x 0.03 0.31 0.08 1.58 0.12 0.20
Stanford Dragon(871K tris)
1x 0.22 1.67 0.65 3.14 1.10 1.032x 0.12 1.09 0.34 2.38 0.57 0.604x 0.06 0.64 0.18 1.97 0.30 0.38
Turbine Blade(1.77M tris)
1x 0.45 2.73 1.28 4.73 2.10 1.772x 0.23 1.63 0.65 3.19 1.07 0.964x 0.12 1.08 0.34 2.37 0.56 0.55
57
Algorithm
00010
00100
00101
10011
11000
11001
11110
00001
0 1 2 3 4 5 6 7
0
3 4
1 2 5
6
58
Algorithm
00001
000010
100100
200101
310011
411000
511001
611110
7
?20
0010
100100
200100
200101
3
δ(1,2) = 2 δ(2,3) = 4 δ(2,3) = 4
59
δ(2,4) = 0
Algorithm
00001
000010
100100
200101
310011
411000
511001
611110
7
?00010
100100
2 δ(2,3) = 4δ(1,2) = 2
2
60
δ(2,3) = 4
Algorithm
00001
000010
100100
200101
310011
411000
511001
611110
7
2
? δ(2,3) = 4
61
δ(2,3) = 4
Algorithm
00001
000010
100100
200101
310011
411000
511001
611110
7
2
1 1
62