Advances in High-Performance GPU Ray Tracing for Physics-Based Simulation
Christiaan Gribble & Lee A. Butler GPU Technology Conference
21 March 2013
Introductions
Christiaan Gribble SURVICE Engineering [email protected]
Lee A. Butler US Army Research Laboratory [email protected]
Alexis Naveros SURVICE Engineering [email protected]
Mark Butkiewicz SURVICE Engineering [email protected]
SURVICE Engineering
• Support DoD community
• Focus on combat systems – Safety
– Survivability
– Effectiveness
• 400+ employees
• 10 locations nationally
US Army Research Laboratory
• US Army RDECOM – Corporate laboratory
– 2000 civilian employees
• Directorates – SLAD
– Army Research Office
– Many others
• Still in the Top 500 list
Agenda
• Application domains
• Technical motivation
• Rayforce GPU ray tracing engine
• Cognition-Driven Simulation
• Visual Simulation Laboratory
0 1
Agenda
• Application domains
• Technical motivation
• Rayforce GPU ray tracing engine
• Cognition-Driven Simulation
• Visual Simulation Laboratory
0 1
Agenda
• Application domains
• Technical motivation
• Rayforce GPU ray tracing engine
• Cognition-Driven Simulation
• Visual Simulation Laboratory
0 1
Agenda
• Application domains
• Technical motivation
• Rayforce GPU ray tracing engine
• Cognition-Driven Simulation
• Visual Simulation Laboratory
0 1
Agenda
• Application domains
• Technical motivation
• Rayforce GPU ray tracing engine
• Cognition-Driven Simulation
• Visual Simulation Laboratory
0 1
Application domains
• Ballistic penetration
• Radio frequency propagation
• Thermal radiative transport
• High-energy particle transport
Application domains
• Ballistic penetration
• Radio frequency propagation
• Thermal radiative transport
• High-energy particle transport
Application domains
• Ballistic penetration
• Radio frequency propagation
• Thermal radiative transport
• High-energy particle transport
Technical motivation
Interval computation Interval generation
• Difficult or impossible – Negative epsilon hacks
– Missed/repeated hits
• Performance impacts – Traversal restart
– Operational overhead
Technical motivation
Interval computation Interval generation
• Difficult or impossible – Negative epsilon hacks
– Missed/repeated hits
• Performance impacts – Traversal restart
– Operational overhead
Technical motivation
Interval computation Interval generation
• Difficult or impossible – Negative epsilon hacks
– Missed/repeated hits
• Performance impacts – Traversal restart
– Operational overhead
Rayforce
• Programmable ray tracing engine
• Designed for NVIDIA GPUs
• High performance
– Modern techniques
– Novel acceleration structure
– Multiple traversal algorithms
Rayforce
• Programmable ray tracing engine
• Designed for NVIDIA GPUs
• High performance
– Modern techniques
– Novel acceleration structure
– Multiple traversal algorithms
Rayforce
• Programmable ray tracing engine
• Designed for NVIDIA GPUs
• High performance
– Modern techniques
– Novel acceleration structure
– Multiple traversal algorithms
State-of-the-art ray tracing
• Leverages modern techniques – Ray packets – Frustum tracing
• Exploits hardware features – SIMD processing (v2.1) – Architecture-specific optimizations
Proven techniques bolster high performance
State-of-the-art ray tracing
• Leverages modern techniques – Ray packets – Frustum tracing
• Exploits hardware features – SIMD processing (v2.1) – Architecture-specific optimizations
Proven techniques bolster high performance
State-of-the-art ray tracing
• Leverages modern techniques – Ray packets – Frustum tracing
• Exploits hardware features – SIMD processing (v2.1) – Architecture-specific optimizations
Proven techniques bolster high performance
Acceleration structure
• kd-tree
• Binary Space Partitioning tree
• Regular grid
• Bounding Volume Hierarchy
Acceleration structure
• kd-tree
• Binary Space Partitioning tree
• Regular grid
• Bounding Volume Hierarchy
Graph-based spatial indexing
Graph-based spatial indexing
• Efficient
– Uses memory very carefully
– Improves cache performance
– Reduces memory bandwidth
• Flexible
• Scalable
Graph-based spatial indexing
• Efficient
• Flexible
– Several traversal algorithms
– Minimal overhead
– User-configurable pipelines
• Scalable
Graph-based spatial indexing
• Efficient
• Flexible
• Scalable
– Handles complex scenes
– Performance depends only on complexity along a ray
Traversal algorithms
• First-hit
– Nearest intersected primitive?
– Visibility/bounce rays
• Any-hit
• Multi-hit
Traversal algorithms
• First-hit
• Any-hit
– Is any primitive intersected?
– Shadow/ambient occlusion rays
• Multi-hit
Traversal algorithms
• First-hit
• Any-hit
• Multi-hit
– Which primitives are intersected?
– Transparency & non-optical rendering
Performance – tests
Coherent workloads
• vis – first-hit visibility
– N · V shading
• x-ray – all multi-hit intersections
– alpha blending
Incoherent workloads
• ao – first-hit visibility
– 32 AO rays/intersection
• kajiya – first-hit visibility
– shadows + 2 diffuse bounces
Performance – tests
Coherent workloads
• vis – first-hit visibility
– N · V shading
• x-ray – all multi-hit intersections
– alpha blending
Incoherent workloads
• ao – first-hit visibility
– 32 AO rays/intersection
• kajiya – first-hit visibility
– shadows + 2 diffuse bounces
Performance – tests
Coherent workloads
• vis – first-hit visibility
– N · V shading
• x-ray – all multi-hit intersections
– alpha blending
Incoherent workloads
• ao – first-hit visibility
– 32 AO rays/intersection
• kajiya – first-hit visibility
– shadows + 2 diffuse bounces
Performance – tests
Coherent workloads
• vis – first-hit visibility
– N · V shading
• x-ray – all multi-hit intersections
– alpha blending
Incoherent workloads
• ao – first-hit visibility
– 32 AO rays/intersection
• kajiya – first-hit visibility
– shadows + 2 diffuse bounces
Performance – scenes
Images rendered at 1024x768 pixels on a NVIDIA GeForce GTX 690
ktank 1M tris
conference 282K tris
san miguel 10M tris
Performance – results
0
200
400
600
800
1000
vis x-ray ao kajiya
Incoherent workloads
Coherent workloads
Mrps
Just for Fun …
0
200
400
600
800
1000
1200
1400
vis
• 1920x1080 vs 1024x768
• Single hit
• No color, Lambertian only
Mrps
Multi-hit traversal
• Which primitives are intersected? – One or more, & possibly all
– Ordered by t-value along ray
• Core operation in Rayforce
• Critical to interval generation
• Applications
Multi-hit traversal
• Which primitives are intersected?
• Core operation in Rayforce – Avoids negative epsilon hacks
– Alleviates traversal restart
• Critical to interval generation
• Applications
Multi-hit traversal
• Which primitives are intersected?
• Core operation in Rayforce
• Critical to interval generation – Handles bad geometry gracefully
– Enables early exit
• Applications
Multi-hit traversal
• Which primitives are intersected?
• Core operation in Rayforce
• Critical to interval generation
• Applications – Physically based simulation
– Order-independent transparency
– …
Naïve multi-hit
1 function TRAVERSE(root, ray)
2 INITIALIZE(hitList)
3 node root
4 while VALID(node) do
5 if !EMPTY(node) then
6 for tri in node do
7 if INTERSECT(tri, ray) then
8 hitData (t-value, u, v, …)
9 ADD(hitList, hitData)
10 end if
11 end for
12 end if
13 node NEXT(node)
14 end while
...
...
15 for hitData in hitList
16 if !USERHIT(ray, hitData) then
17 goto fini
18 end if
19 end for
20 label fini:
21 USEREND(ray)
22 end function
Simple & effective, but potentially slow
Find all hits
Process desired hits
Rayforce multi-hit
1 function TRAVERSE(root, ray)
2 node root
3 while VALID(node) do
4 if !EMPTY(node) then
5 SET(flags, INIT)
6 while TRUE do
7 INITIALIZE(hitList)
8 for tri in node do
9 if !DONE(hitMask, tri) then
10 if INTERSECT(tri, ray) then
11 hitData (t-value, u, v, …)
12 if ADD(hitList, hitData) then
13 SET(flags, REPEAT)
14 end if
15 end if
16 end if
17 end for
...
...
18 if GET(flags) == (INIT & REPEAT) then
19 INITIALIZE(hitMask)
20 UNSET(flags, INIT)
21 end if
22 for hitData in hitList do
23 if !USERHIT(ray, hitData) then
24 goto fini
25 end if
26 if GET(flags) == REPEAT then
27 DONE(hitMask, hitData, TRUE)
28 end if
29 end for
...
Find some hits
Early exit
Rayforce multi-hit
...
30 if GET(flags) != REPEAT then
31 break
32 end if
33 UNSET(flags, REPEAT)
34 end while
35 end if
36 node NEXT(node)
37 end while
38 label fini:
39 USEREND(ray)
40 end function
Gains efficiency with early exit
Per-ray cleanup
Early Exit Buys Performance
0
50
100
150
200
250
ktank conf san miguel
+39.05%
+91.00%
Rayforce multi-hit outperforms naïve algorithm by 1.8x +104.01%
Rayforce
• Battle-tested techniques
• Novel acceleration structure
• Multi-hit ray traversal
• Hand-tuned for CUDA
Demonstrated high performance GPU ray tracing
first-hit
any-hit
multi-hit
Demonstration Quadro 3000M
240 Fermi CUDA Cores @ 900 MHz
Rayforce
• Modern techniques
• Novel acceleration structure
• Multi-hit ray traversal
• Hand-tuned for CUDA
Demonstrated high performance GPU ray tracing
first-hit
any-hit
multi-hit
Rayforce
• Battle-tested techniques
• Novel acceleration structure
• Multi-hit ray traversal
• Hand-tuned for CUDA
Demonstrated high performance GPU ray tracing
first-hit
any-hit
multi-hit
Public LGPL v2.0 release of Rayforce now available!
Cognition-Driven Simulation
• Perform visualization during simulation – As a by-product of computation
– As computation progress
• Key advantages
• Managed computation
Cognition-Driven Simulation
• Perform visualization during simulation
• Key advantages – Enables exploration & steering
– Drives understanding & confidence
– User Cognition must be managed: • Too fast details missed
• Too slow disengage
• Managed computation
Cognition-Driven Simulation
• Perform visualization during simulation
• Key advantages
• Managed computation – Focus on most interesting features
– Avoid uninteresting parts of parameter space
Visual Simulation Laboratory
• A cross-platform, open-source application framework
– Qt, OpenSceneGraph, & other technologies
• The foundation used for several CDS simulation applications
Visual Simulation Laboratory
• A cross-platform, open-source application framework
– Qt, OpenSceneGraph, & other technologies
• The foundation used for several CDS simulation applications
Public LGPL v2.0 release of VSL now available!