Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | jeremiah-stain |
View: | 215 times |
Download: | 2 times |
Photon Mapping on Photon Mapping on Programmable Graphics Programmable Graphics
HardwareHardware
Timothy J. Timothy J. PurcellPurcell
Mike Mike CammaranoCammarano
Pat HanrahanPat Hanrahan
Stanford Stanford UniversityUniversity
Craig DonnerCraig Donner
Henrik Wann Henrik Wann JensenJensen
University of University of California, San California, San
DiegoDiego
MotivationMotivation
• Interactive global illumination on the Interactive global illumination on the GPUGPU• Nearly have sufficient compute power and Nearly have sufficient compute power and
flexibilityflexibility
• Explore GPU-based computation Explore GPU-based computation algorithms algorithms
Related WorkRelated Work
• CPU-based interactive global CPU-based interactive global illuminationillumination• Supercomputers [Parker et al.]Supercomputers [Parker et al.]
• Clusters [Tole et al., Wald et al.] Clusters [Tole et al., Wald et al.]
• Global illumination on programmable Global illumination on programmable GPUsGPUs• Ray tracing [Carr et al., Purcell et al.]Ray tracing [Carr et al., Purcell et al.]
• Photon mapping [Ma et al.]Photon mapping [Ma et al.]
• Radiosity [Carr et al., Coombe et al.]Radiosity [Carr et al., Coombe et al.]
• Translucency [Carr et al., Stamminger et al.]Translucency [Carr et al., Stamminger et al.]
Photon Mapping Algorithm Photon Mapping Algorithm ReviewReview• Photon tracingPhoton tracing
• Emission, scattering, Emission, scattering, storing into kd-treestoring into kd-tree
• Similar to ray tracingSimilar to ray tracing
• RenderingRendering• Ray tracing for direct Ray tracing for direct
illuminationillumination
• Photon map Photon map visualizationvisualization
• Indirect bounceIndirect bounce
Computational Challenge for Computational Challenge for GPUs #1GPUs #1
• Constructing Constructing a irregular or a irregular or sparse data sparse data structurestructure
Computational Challenge for Computational Challenge for GPUs #2GPUs #2
• Adaptive Adaptive nearest nearest neighbor neighbor searchsearch• Noise vs. blurNoise vs. blur
Computational Challenge for Computational Challenge for GPUs #2GPUs #2
• Adaptive Adaptive nearest nearest neighbor neighbor searchsearch• Noise vs. blurNoise vs. blur
Photon Mapping on the CPUPhoton Mapping on the CPU
• Balanced kd-treeBalanced kd-tree• Compact storage of photonsCompact storage of photons
• EfficientEfficient
• O(log n) searchO(log n) search
• Priority queuePriority queue• Nearest neighbor searchNearest neighbor search
• Incremental insertion and removal of photonsIncremental insertion and removal of photons
Algorithmic Changes for the Algorithmic Changes for the GPUGPU• Direct visualization of photon mapDirect visualization of photon map
• Keeps rendering costs lowKeeps rendering costs low
• Use grid instead of kd-treeUse grid instead of kd-tree• Tried kd-tree…Tried kd-tree…
•Kd-tree construction is difficultKd-tree construction is difficult
•Radiance estimateRadiance estimate– Fixed radius search works fineFixed radius search works fine– Adaptive search needs priority queueAdaptive search needs priority queue
• No priority queueNo priority queue• Can’t build on GPUCan’t build on GPU
•Too much stateToo much state
ContributionsContributions
• Mapped complete grid-based photon Mapped complete grid-based photon mapping algorithm onto the GPUmapping algorithm onto the GPU• Including photon tracing, ray tracing, etc.Including photon tracing, ray tracing, etc.
• Implemented an adaptive Implemented an adaptive kk-nearest -nearest neighbor searchneighbor search• kNN-gridkNN-grid
• Show how to construct a sparse data Show how to construct a sparse data structure on the GPUstructure on the GPU• Bitonic merge sort with binary searchBitonic merge sort with binary search
• Stencil routingStencil routing
Configuring the GPU for Configuring the GPU for ComputingComputing• GPU as data parallel compute engineGPU as data parallel compute engine
• Fragment programs execute compute kernelsFragment programs execute compute kernels
• Screen sized quad initializes computationScreen sized quad initializes computation
•SIMD executionSIMD execution
• Floating point texture memoryFloating point texture memory• Render-to-texture for intermediate resultsRender-to-texture for intermediate results
• Data structure storageData structure storage
•Pointer dereferencing via dependent fetchesPointer dereferencing via dependent fetches
Computational Challenge #1Computational Challenge #1
Building a Sparse Data Building a Sparse Data StructureStructure
Building a Sparse Data Building a Sparse Data StructureStructure• Requires scatterRequires scatter
• Dependent texture writeDependent texture write
• Why don’t we have fragment Why don’t we have fragment scatter?scatter?• Fragment processing has highly coherent Fragment processing has highly coherent
blocked memory writesblocked memory writes
• Extra hardware support would be needed Extra hardware support would be needed
•Write hazardsWrite hazards
•Memory latenciesMemory latencies
Scatter on the GPUScatter on the GPU
• Sort photons into grid cellsSort photons into grid cells• Grid cell is sort keyGrid cell is sort key
• Simulate scatter with fragment Simulate scatter with fragment programsprograms• Bitonic merge sort followed by binary searchBitonic merge sort followed by binary search
•Compact gridCompact grid
•O(logO(log22 n) rendering passes n) rendering passes
Bitonic Merge SortBitonic Merge Sort
1
3
2
4
7
6
8
5
2
3
1
4
7
5
8
6
3
2
4
1
7
5
8
6
3
7
4
8
2
5
1
6
3
8
4
7
2
6
1
5
1
2
3
4
5
6
7
8
3
8
7
4
5
6
1
2
O(logO(log22 n) rendering passes n) rendering passes
Binary SearchBinary Search
• Grid cell searches for self in photon Grid cell searches for self in photon listlist• If none, find first element in next cellIf none, find first element in next cell
•Empty grid cells waste computeEmpty grid cells waste compute
• Log(n) + 1 stepsLog(n) + 1 steps
Binary SearchBinary Search
• Grid cell searches for self in photon Grid cell searches for self in photon listlist• If none, find first element in next cellIf none, find first element in next cell
•Empty grid cells waste computeEmpty grid cells waste compute
• Log(n) + 1 stepsLog(n) + 1 steps
v0v0 v0v0 v2v2 v2v2 v5v5v0v0 v5v5SortedSortedPhoton ListPhoton List
v2v2
Searching for first v5 photon
initialize
Binary SearchBinary Search
• Grid cell searches for self in photon Grid cell searches for self in photon listlist• If none, find first element in next cellIf none, find first element in next cell
•Empty grid cells waste computeEmpty grid cells waste compute
• Log(n) + 1 stepsLog(n) + 1 steps
v0v0 v0v0 v2v2 v2v2 v5v5v0v0 v5v5SortedSortedPhoton ListPhoton List
v0v0 v0v0 v2v2 v2v2 v2v2v0v0 v5v5
v2v2
v5v5
Searching for first v5 photon
initialize
step 1
v5v5
Binary SearchBinary Search
• Grid cell searches for self in photon Grid cell searches for self in photon listlist• If none, find first element in next cellIf none, find first element in next cell
•Empty grid cells waste computeEmpty grid cells waste compute
• Log(n) + 1 stepsLog(n) + 1 steps
v0v0 v0v0 v2v2 v2v2 v5v5v0v0 v5v5SortedSortedPhoton ListPhoton List
v0v0 v0v0 v2v2 v2v2 v2v2v0v0 v5v5
v0v0 v0v0 v2v2 v2v2 v5v5v0v0
v2v2
v5v5
v2v2
Searching for first v5 photon
initialize
step 1
step 2
v5v5
Binary SearchBinary Search
• Grid cell searches for self in photon Grid cell searches for self in photon listlist• If none, find first element in next cellIf none, find first element in next cell
•Empty grid cells waste computeEmpty grid cells waste compute
• Log(n) + 1 stepsLog(n) + 1 steps
v0v0 v0v0 v2v2 v2v2 v5v5v0v0 v5v5SortedSortedPhoton ListPhoton List
v0v0 v0v0 v2v2 v2v2 v2v2v0v0 v5v5
v0v0 v0v0 v2v2 v2v2 v5v5v0v0
v0v0 v0v0 v2v2 v2v2 v2v2v0v0 v5v5
v2v2
v5v5
v2v2
v5v5
Searching for first v5 photon
initialize
step 1
step 2
step 3
v5v5
Binary SearchBinary Search
• Grid cell searches for self in photon Grid cell searches for self in photon listlist• If none, find first element in next cellIf none, find first element in next cell
•Empty grid cells waste computeEmpty grid cells waste compute
• Log(n) + 1 stepsLog(n) + 1 steps
v0v0 v0v0 v2v2 v2v2 v5v5v0v0 v5v5SortedSortedPhoton ListPhoton List
v0v0 v0v0 v2v2 v2v2 v2v2v0v0 v5v5
v0v0 v0v0 v2v2 v2v2 v5v5v0v0
v0v0 v0v0 v2v2 v2v2 v2v2v0v0 v5v5
v0v0 v0v0 v2v2 v2v2 v2v2v0v0 v5v5
v2v2
v5v5
v2v2
v5v5
v5v5
Searching for first v5 photon
initialize
step 1
step 2
step 3
step 4
Scatter on the GPUScatter on the GPU
• Vertex programs can scatterVertex programs can scatter• Draw point to buffer Draw point to buffer
•Collisions?Collisions?
Scatter on the GPUScatter on the GPU
• Vertex programs can scatterVertex programs can scatter• Draw point to buffer Draw point to buffer
•Collisions?Collisions?
• Stencil routingStencil routing
•Limit photon count per grid cellLimit photon count per grid cell– Pre-allocate grid cell spacePre-allocate grid cell space
•Draw photons as pointsDraw photons as points– Vertex program computes grid cellVertex program computes grid cell
•Stencil buffer controls location within cellStencil buffer controls location within cell
•Single rendering passSingle rendering pass
Stencil RoutingStencil Routing
• Fix each grid cell Fix each grid cell size to nsize to n22 pixels pixels
• Draw fat points to Draw fat points to cover each fat cellcover each fat cell• glPointSize(n)glPointSize(n)
Vertex ( photon_pos )
Vertex Program
Flattened Grid
4 pixels
Stencil RoutingStencil Routing
• Control location Control location written to with written to with stencilstencil• Pass when stencil is nPass when stencil is n22 - -
11
• Stencil always Stencil always incrementsincrements
• Location written Location written depends on draw orderdepends on draw order
Vertex ( photon_pos )
Vertex Program
Flattened Grid
1 pixel
Stencil
4 pixels
Stencil Values
0 1
2 3
1 2
3 4
0 1
2 3
0 1
2 3
Computational Challenge #2Computational Challenge #2
Adaptive Nearest Neighbor Adaptive Nearest Neighbor SearchSearch
Adaptive Nearest Neighbor Adaptive Nearest Neighbor SearchSearch• Iterative algorithmIterative algorithm
• Accept or reject photons in cell visit Accept or reject photons in cell visit orderorder
kNN-grid AlgorithmkNN-grid Algorithm
sample point
photons in estimatecandidate photon
Want a 4 photon estimate
kNN-grid AlgorithmkNN-grid Algorithm
• Candidate photons Candidate photons must be within max must be within max search radiussearch radius
• Visit voxels in Visit voxels in order of distance order of distance to sample pointto sample point
sample point
photons in estimatecandidate photon
Want a 4 photon estimate
kNN-grid AlgorithmkNN-grid Algorithm
• If current number If current number of photons in of photons in estimate is less estimate is less than number than number requested, grow requested, grow search radiussearch radius
1
sample point
photons in estimatecandidate photon
Want a 4 photon estimate
kNN-grid AlgorithmkNN-grid Algorithm
• If current number If current number of photons in of photons in estimate is less estimate is less than number than number requested, grow requested, grow search radiussearch radius
2
sample point
photons in estimatecandidate photon
Want a 4 photon estimate
kNN-grid AlgorithmkNN-grid Algorithm
• Don’t add photons Don’t add photons outside maximum outside maximum search radiussearch radius
• Don’t grow search Don’t grow search radius when radius when photon is outside photon is outside maximum radiusmaximum radius2
sample point
photons in estimatecandidate photon
Want a 4 photon estimate
kNN-grid AlgorithmkNN-grid Algorithm
• Add photons within Add photons within search radiussearch radius
3
sample point
photons in estimatecandidate photon
Want a 4 photon estimate
kNN-grid AlgorithmkNN-grid Algorithm
• Add photons within Add photons within search radiussearch radius
4
sample point
photons in estimatecandidate photon
Want a 4 photon estimate
kNN-grid AlgorithmkNN-grid Algorithm
• Don’t expand Don’t expand search radius if search radius if enough photons enough photons already foundalready found
4
sample point
photons in estimatecandidate photon
Want a 4 photon estimate
kNN-grid AlgorithmkNN-grid Algorithm
• Add photons within Add photons within search radiussearch radius
5
sample point
photons in estimatecandidate photon
Want a 4 photon estimate
kNN-grid AlgorithmkNN-grid Algorithm
• Visit all other Visit all other voxels accessible voxels accessible within determined within determined search radiussearch radius
• Add photons within Add photons within search radiussearch radius
6
sample point
photons in estimatecandidate photon
Want a 4 photon estimate
kNN-grid AlgorithmkNN-grid Algorithm
• Finds all photons Finds all photons within a sphere within a sphere centered about centered about sample pointsample point
• May locate more May locate more than requested than requested kk--nearest neighborsnearest neighbors6
sample point
photons in estimatecandidate photon
Want a 4 photon estimate
System ImplementationSystem Implementation
• NVIDIA GeForce FX 5900 Ultra NVIDIA GeForce FX 5900 Ultra (NV35)(NV35)
• Cg compiler 1.1Cg compiler 1.1
TracePhoton
s
BuildPhoton
Map
RayTraceScene
ComputeRadianceEstimate
Compute Lighting Render Image
Open Issues (1)Open Issues (1)
• How to prevent program execution How to prevent program execution over a subset of pixels?over a subset of pixels?• Non-uniform pixel computation distributionNon-uniform pixel computation distribution
•Radiance estimateRadiance estimate
• KILL is only a write maskKILL is only a write mask
• Early-z occlusion cullingEarly-z occlusion culling
•No pixel level controlNo pixel level control
• Compute mask, branching, or stream buffer?Compute mask, branching, or stream buffer?
• Improve radiance estimate speed by 30-70% Improve radiance estimate speed by 30-70% over tilingover tiling
Open Issues (2)Open Issues (2)
• ScatterScatter• Makes (a programmer’s) life easierMakes (a programmer’s) life easier
• Is it worth implementing?Is it worth implementing?
•Gain factor of logGain factor of log2 2 n avoiding sortn avoiding sort
Future WorkFuture Work
• Kd-treesKd-trees
• Photon power redistributionPhoton power redistribution
• Adaptive samplingAdaptive sampling
• Progressive refinementProgressive refinement
ConclusionsConclusions
• The GPU can compute an entire global The GPU can compute an entire global illumination solutionillumination solution• Nearly interactiveNearly interactive
• Implemented an adaptive Implemented an adaptive kk-nearest -nearest neighbor query for the GPUneighbor query for the GPU• kNN-gridkNN-grid
• Shown how to construct sparse data Shown how to construct sparse data structures on the GPUstructures on the GPU• Bitonic merge sort and binary searchBitonic merge sort and binary search
• Stencil routingStencil routing
• Sorting and searching algorithms Sorting and searching algorithms applicable to other computationsapplicable to other computations
AcknowledgmentsAcknowledgments
• Stanford FlashGStanford FlashG• Ian Buck, Mike Houston, Kekoa ProudfootIan Buck, Mike Houston, Kekoa Proudfoot
• Stencil routingStencil routing• Kurt Akeley, Matt PapakiposKurt Akeley, Matt Papakipos
• Hardware and driversHardware and drivers• David Kirk, Nick TriantosDavid Kirk, Nick Triantos
• FundingFunding• NVIDIA, DARPA, NSF, 3ComNVIDIA, DARPA, NSF, 3Com