8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
1/59
A Hardware Pipeline for Accelerating
Ray traversal Algorithms onStreaming Processors
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
2/59
Introduction
Ray tracing
Ray tracing algorithms
Ray traversal hardware pipeline
Streaming processors
GPGPU
Performance degradation of 1.5X-2.5X
2Roll No:7 Mtech CSIS FISATJanuary 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
3/59
Introduction
2 stage traversal process
1. Hardware implementation2. User defined algorithm
3Roll No:7 Mtech CSIS FISATJanuary 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
4/59
Introduction
Performance Simulator created
streaming processor architecture
Kd tree as software traversal algorithm
Software traversal reduced by 32X
Instruction executed reduced by 2.15X.
Roll No:7 Mtech CSIS FISATJanuary 11
4
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
5/59
Previous Work
Accelerated Data Structures Hierarchical Space Subdivision Schemes
Bounding Volume Hierarchies
GPU implementations Vector operations
Graphics Hardware
Large programmable multi-core architectures
Graphics computations in parallel
Multiple threads on each processor
Software kernels
Vector operations and vectorized processors 5Roll No:7 Mtech CSIS FISATJanuary 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
6/59
Pipeline Traversal Algorithm
Group Uniform Grid (GrUG)
Axis-aligned subdivision of space
Two hierarchical layers
Top Layer
L
owerL
ayer
6Roll No:7 Mtech CSIS FISATJanuary 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
7/59
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
8/59
Grid Concepts
Spatial Subdivisions
Roll No:7 Mtech CSIS FISATJanuary 11
8
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
9/59
Stepping Between Neighbours
DDA method is used
tmax , delta and step
9Roll No:7 Mtech CSIS FISATJanuary 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
10/59
Ray projection from original GrUG grouping in A to next GrUG
grouping in B. To compute the next point along the ray for the
hash function,the ray is projected by the tmin value.
10Roll No:7 Mtech CSIS FISATJanuary 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
11/59
A
DC
KD-Tree
B
X
Y
Z
X
Y Z
A B C D
tmin
tmax
11Roll No:7 Mtech CSIS FISATJanuary 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
12/59
DC
A
B
X
Y
Z
KD-Tree Traversal
X
Y Z
A B C D
12Roll No:7 Mtech CSIS FISATJanuary 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
13/59
DC
A
B
X
Y
Z
Observation
X
Y Z
A B C D
Current leafs tmax Next leafs tmin= 13Roll No:7 Mtech CSIS FISATJanuary 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
14/59
Overview of GrUG
2 spatial seperation methods
Uniform Grid
GrUG groups Traversal of GrUG
Hash Table
Performs 2 mappings Input:ray location
Output:memory address of GrUG group
14Roll No:7 Mtech CSIS FISATJanuary 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
15/59
Hash function starting with X,Y,Z coordinates and outputting the
memory address of a GrUG grouping that can be passed to a software
traversal algorithm.
15Roll No:7 Mtech CSIS FISATJanuary 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
16/59
Hash function implementation
3 axes concatenated to form CellID
Allows parallel processing
Roll No:7 Mtech CSIS FISATJanuary 11
16
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
17/59
Hash Function Implementation
17Roll No:7 Mtech CSIS FISATJanuary 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
18/59
Architecture of Group Uniform Grid
18Roll No:7 Mtech CSIS FISATJanuary 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
19/59
Data Structure Creation
2 memory spaces
Hash table
User defined tree data structure
Starts at GrUG groupings
Kd tree is used
Uniform grid structure Only leaf nodes need to be present in memory
19Roll No:7 Mtech CSIS FISATJanuary 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
20/59
Pipeline Architecture
Standalone processing block inside processor
Fixed Hardware
Memory address registers Ray Projection
Ray undergoes GrUG traversal
Read bounding box of the GrUG groups
tmax value is computed
20Roll No:7 Mtech CSIS FISATJanuary 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
21/59
Pipeline architecture
Rays per clock cycle
Pipeline stages can be vectorized
Ideal for streaming processors
Roll No:7 Mtech CSIS FISATJanuary 11
21
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
22/59
Integration of the GrUG pipeline into a multi-core
graphics processor
and the fixed hardware stages for the GrUG pipeline.
22Roll No:7 Mtech CSIS FISATJanuary 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
23/59
Hash Function
Determine grid cell of a ray
Grid cell id to memory address
Locate root node for software traversal
Input: Ray location (x,y,z)
Output: 9 bit value from each hash functionpipeline
Maximum grid size support 512 X 512 X 512
Floating point values from -1.0 to 1.0
23Roll No:7 Mtech CSIS FISATJanuary 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
24/59
Architecture of GrUG hash function for
one axis using a 512 grid
24Roll No:7 Mtech CSIS F
ISAT
January 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
25/59
Implementation Simulator
GPGPU SIM simulator
PTX assembly files generated-NVIDIA NVCC
compiler
PTX assembly code modification
25Roll No:7 Mtech CSIS F
ISAT
January 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
26/59
Implementation
Kernel Code
Ray generation
Post GrUG traversal operation
Read selected GrUG grouping bounding box
Compute rays tmax value
Kd tree algorithm
Radius CUDA
Ray triangle intersection
Walds algorithm
26Roll No:7 Mtech CSIS F
ISAT
January 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
27/59
Kernel Code
27Roll No:7 Mtech CSIS F
ISATJanuary 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
28/59
Benchmark Scenes
8 scenes
Resolution 512 X 512
28Roll No:7 Mtech CSIS F
ISATJanuary 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
29/59
Roll No:7 Mtech CSIS F
ISATJanuary 11 29
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
30/59
Roll No:7 Mtech CSIS F
ISATJanuary 11 30
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
31/59
Roll No:7 Mtech CSI
S FI
SATJanuary 11 31
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
32/59
Roll No:7 Mtech CSI
S FI
SATJanuary 11 32
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
33/59
Roll No:7 Mtech CSI
S FI
SATJanuary 11 33
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
34/59
Roll No:7 Mtech CSI
S FI
SATJanuary 11 34
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
35/59
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
36/59
Roll No:7 Mtech CSIS FISAT
January 11 36
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
37/59
Roll No:7 Mtech CSIS FISAT
January 11 37
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
38/59
Roll No:7 Mtech CSIS FISAT
January 11 38
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
39/59
Roll No:7 Mtech CSIS FISAT
January 11 39
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
40/59
Roll No:7 Mtech CSIS FISAT
January 11 40
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
41/59
Roll No:7 Mtech CSIS FISAT
January 11 41
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
42/59
Roll No:7 Mtech CSIS FISAT
January 11 42
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
43/59
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
44/59
Roll No:7 Mtech CSIS FISAT
January 11 44
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
45/59
Roll No:7 Mtech CSIS FISAT
January 11 45
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
46/59
Roll No:7 Mtech CSIS FISAT
January 11 46
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
47/59
Roll No:7 Mtech CSIS FISAT
January 11 47
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
48/59
Results
a) Performance
Relative speedup over brute-force intersection.
12.9
Box Bunny Robots Kitchen
48Roll No:7 Mtech CSIS FISAT
January 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
49/59
Performance Results
Reduced the number o f tree traversal steps by 32.5xfor visible rays.
Overall Speedup : Average 1.6X for visible rays
Performance for grid size of 128 is improved over
software implementation
by 1.9X compared to 2.15X
for a grid size of 512.
Conference benchmark
scene at resolution 128
Roll No:7 Mtech CSIS FISAT
January 11 49
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
50/59
Results
b) Memory
50Roll No:7 Mtech CSIS FISAT
January 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
51/59
Memory Requirements
Overhead of storing hash table in memory
4 bytes / grid cell -> 4,294,967,296 GrUG groups
512 MB hash table
2 bytes / grid cell -> 65536 GrUG groups
256 MB hash table
Smaller grid size -> upto 4MB hash table
128 grid size -> 1.5 times memory of kd tree 512 grid size -> 27.6 times memory of kd tree
Roll No:7 Mtech CSIS FISAT
January 11 51
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
52/59
Memory Requirements
Smaller grid sizes are more efficient
Balance between performance and memory
Stores kd tree structure
bounding dimensions of threshold nodes
Similar memory requirement for storing a full
kd tree.
Roll No:7 Mtech CSIS FISAT
January 11 52
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
53/59
Results
c) Bandwidth
53Roll No:7 Mtech CSIS FISAT January 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
54/59
Bandwidth requirements
Average memory bandwidth per frame issmaller
Less down tree traversals -> less device
memory transactions Bandwidth is used for post GrUG software
traversal
GrUG Memory bandwidth + down treetraversal < down traversals by full softwareimplementation
Roll No:7 Mtech CSIS FISAT
January 11 54
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
55/59
Advantages
Maintains user programmability
Increases ray tracing performance
Diverse implementation scope
55
Roll No:7 Mtech CSIS FISAT
January 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
56/59
Conclusion
New graphics hardware architecture
Small fixed hardware pipeline
Offload part of the acceleration traversalcomputations
Diverse implementation scope of processor
architecture
User programmability
Overall run time performance
56
Roll No:7 Mtech CSIS FISAT
January 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
57/59
Future Work
57
Roll No:7 Mtech CSIS FISAT
January 11
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
58/59
References
[1] Algorithm for 3D digital differential algorithm
CG351-551 Raytracing Algorithm for 3DDDA.htm
[2] Introduction to GRIDS
flipcode - Raytracing Topics & Techniques.mht
[3] KD-Tree Acceleration Structures for a GPU Raytracer.
Tim Foley, Jeremy Sugerman Stanford University
[4] Design and Evaluation of a Hardware Accelerated Ray Tracing Data Structure
Michael Steffen and Joseph Zambreno , Department of Electrical and Computer Engineering
Iowa State University, USA.
[5] Analyzing CUDA Workloads Using a Detailed GPU Simulator
Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, Henry Wong and Tor M. Aamodt
University of British Columbia,Vancouver, BC, Canada,
{bakhoda,gyuan,wwlfung,henryw,aamodt}@ece.ubc.ca[6] Ray Tracing on a GPU with CUDA Comparative Study of Three Algorithms
Martin Zlatuka Czech Technical University in Prague,Faculty of Electrical Engineering
Czech Republic,zlatum1{@}fel.cvut.cz
[7] Wikepedia, Ray Tracing basics.
Roll No:7 Mtech CSIS FISAT
January 11 58
8/6/2019 A Hardware Pipeline for Accelerating Ray Traversal Algorithms
59/59
Thank you
Roll No:7 Mtech CSIS FISAT