Afrigraph 2003 Course onAfrigraph 2003 Course on
Advanced Interactive Ray TracingAdvanced Interactive Ray Tracingandand
Interactive Global IlluminationInteractive Global Illumination
Ingo Wald Carsten Benthin Philipp SlusallekIngo Wald Carsten Benthin Philipp Slusallek
Saarland UniversitySaarland University
First: What is Ray Tracing ?First: What is Ray Tracing ?
Ray-Generation
Ray-Traversal
Intersection
Shading
Framebuffer
Feb 3rd, 2003 Afrigraph 2003 3
AgendaAgenda
• Introduction & MotivationIntroduction & Motivation– Why Interactive Ray Tracing at all ?Why Interactive Ray Tracing at all ?
• Part I – Interactive Ray Tracing ArchitecturesPart I – Interactive Ray Tracing Architectures– Software Ray TracingSoftware Ray Tracing– Ray Tracing on Programmable GPUsRay Tracing on Programmable GPUs– Dedicated Ray Tracing HardwareDedicated Ray Tracing Hardware
• Part II – Advanced Ray Tracing IssuesPart II – Advanced Ray Tracing Issues– Handling Dynamic ScenesHandling Dynamic Scenes– The OpenRT Interactive Ray Tracing APIThe OpenRT Interactive Ray Tracing API
• Part III – New ApplicationsPart III – New Applications– Industrial Application: Interactive Visualization of Car HeadlightsIndustrial Application: Interactive Visualization of Car Headlights– Interactive Global IlluminationInteractive Global Illumination
• Summary and ConclusionsSummary and Conclusions
Why Interactive Ray Tracing ?Why Interactive Ray Tracing ?
Feb 3rd, 2003 Afrigraph 2003 5
We have NVidia – so what do We have NVidia – so what do we need Ray Tracing for ?we need Ray Tracing for ?
• Because it is high quality…Because it is high quality…– Fully Programmable and Arbitrary Shading OperationsFully Programmable and Arbitrary Shading Operations
– All operations performed in floating pointAll operations performed in floating point
– Flexibility: Can shoot arbitrary RaysFlexibility: Can shoot arbitrary Rays• Shadows, reflections, refractions, …Shadows, reflections, refractions, …
• Even suitable for global illuminationEven suitable for global illumination
– Simple Programming ModelSimple Programming Model• No need for multiple passes or OpenGL ‘tricks’No need for multiple passes or OpenGL ‘tricks’
• For indirect effect (like shadows): just shoot a ray !For indirect effect (like shadows): just shoot a ray !
– Automatic ‘correctness’Automatic ‘correctness’• No need for approximations (like reflection maps)No need for approximations (like reflection maps)
Ray Tracing is much more flexible and powerful rendering Ray Tracing is much more flexible and powerful rendering algorithm than ‘classical’ triangle rasterizationalgorithm than ‘classical’ triangle rasterization
Feb 3rd, 2003 Afrigraph 2003 6
We have NVidia – so what do We have NVidia – so what do we need Ray Tracing for ?we need Ray Tracing for ?
• But not only that : It’s also efficient !But not only that : It’s also efficient !– Logarithmic scene complexity Logarithmic scene complexity
• Useful for increasingly complex scenes (“1 mtri, no problem !” …)Useful for increasingly complex scenes (“1 mtri, no problem !” …)
– No multiple rendering passesNo multiple rendering passes
– ‘‘Automatic’ Visibility Culling & Occlusion CullingAutomatic’ Visibility Culling & Occlusion Culling• Hidden geometry not even touched …Hidden geometry not even touched …
• Depth complexity not an issueDepth complexity not an issue
– No overdraw, shading performed No overdraw, shading performed exactly once exactly once per rayper ray• Very useful for increasingly costly shadingVery useful for increasingly costly shading
– Small bandwidth requirements (if you do it right…)Small bandwidth requirements (if you do it right…)• Memory access coherence + culling + single shading + …Memory access coherence + culling + single shading + …
Feb 3rd, 2003 Afrigraph 2003 7
We have NVidia – so what do We have NVidia – so what do we need Ray Tracing for ?we need Ray Tracing for ?
To summarize:To summarize:• … … it’s highly flexibleit’s highly flexible• … … it’s high-qualityit’s high-quality• … … it’s efficientit’s efficient
• And: All of that combines automaticallyAnd: All of that combines automatically– Can do some of that sometimes in HW, but usually not all togetherCan do some of that sometimes in HW, but usually not all together
Feb 3rd, 2003 Afrigraph 2003 8
““If its so good, then why isn’t If its so good, then why isn’t it real ?”it real ?”
• 1.) Better asymptotic complexity, but huge constants1.) Better asymptotic complexity, but huge constants– 1 ray ~ 1000 CPU-cycles1 ray ~ 1000 CPU-cycles
– Runs on hardware that it doesn’t really fit to…Runs on hardware that it doesn’t really fit to…• Uses only tiny fraction of today’s CPUs, no parallelism, …Uses only tiny fraction of today’s CPUs, no parallelism, …
– Need Need manymany rays/sec for full interactivity rays/sec for full interactivity• ~ 1Mpix/frame * 4-fold anitaliasing *25 frames/sec * 10 rays/pixel ~ 1Mpix/frame * 4-fold anitaliasing *25 frames/sec * 10 rays/pixel
One One billionbillion rays per second … rays per second …
• 2.) Graphics users don’t have the choice2.) Graphics users don’t have the choice– Rasterization has highly sophisticated HW implementationsRasterization has highly sophisticated HW implementations
HW technology for rasterization 10 years ahead of RT HW…HW technology for rasterization 10 years ahead of RT HW…
– There There isis no interactive ray tracing chip (yet), no matter the cost… no interactive ray tracing chip (yet), no matter the cost… All applications are designed for OpenGLAll applications are designed for OpenGL
There is no There is no market market for interactive ray tracing (really ?)for interactive ray tracing (really ?) Still more money/time/effort spent on improving rasterizationStill more money/time/effort spent on improving rasterization
Feb 3rd, 2003 Afrigraph 2003 9
Why is there no Ray Tracing Why is there no Ray Tracing Hardware ?Hardware ?
Because Graphics hardware evolved 20 years ago !Because Graphics hardware evolved 20 years ago !• And: Rasterization And: Rasterization was was the better choice back then…the better choice back then…
– Small scenes Small scenes (asymptotic) complexity doesn’t matter for small N(asymptotic) complexity doesn’t matter for small N
– Large triangles Large triangles Coherence: incremental ops & interpolation, low bandwidthCoherence: incremental ops & interpolation, low bandwidth
– Simple (integer-)operations, highly pipelinedSimple (integer-)operations, highly pipelinedFPU-requirements of ray tracing FPU-requirements of ray tracing unthinkableunthinkable 10 years ago… 10 years ago…
– No fragment ops except interpolation No fragment ops except interpolation – Programmability not an issue Programmability not an issue
Very deep pipelines: no dependencies, no branches, no nothing, … Very deep pipelines: no dependencies, no branches, no nothing, … Can be built in HW very efficient, very fast, very cheapCan be built in HW very efficient, very fast, very cheap
• Note: All of this is changing today !Note: All of this is changing today !– Eg today, GForce 3 already has more FPU power than Eg today, GForce 3 already has more FPU power than anyany CPU… CPU…
Feb 3rd, 2003 Afrigraph 2003 10
Todays State of the Art in Todays State of the Art in Realtime Ray TracingRealtime Ray Tracing
Software Implementations are slowly becoming availableSoftware Implementations are slowly becoming available• Michael Muuss, Army Research LabsMichael Muuss, Army Research Labs
– Huge Cluster of SGI machines…Huge Cluster of SGI machines…
• Parker et al, University of UtahParker et al, University of Utah– 32-128 CPU SGI Origin32-128 CPU SGI Origin
• Saarland UniversitySaarland University– 4 dual PIII’s in 2000, up to 24 dual Athlon 1800+ today4 dual PIII’s in 2000, up to 24 dual Athlon 1800+ today
Hardware Architectures are already beeing designedHardware Architectures are already beeing designed• SaarCOR (Schmittler et al., HWWS 2002)SaarCOR (Schmittler et al., HWWS 2002)• Ray Tracing on Programmable GPUs (Purcell, SigGraph 2002)Ray Tracing on Programmable GPUs (Purcell, SigGraph 2002)• Hybrid Software/GPU system (Hart, HWWS 2002)Hybrid Software/GPU system (Hart, HWWS 2002)
Several alternatives for future realtime ray tracingSeveral alternatives for future realtime ray tracing– Can’t yet decide which is best, only know: “It’ll come”Can’t yet decide which is best, only know: “It’ll come”
Feb 3rd, 2003 Afrigraph 2003 11
Todays State of the Art in Todays State of the Art in Realtime Ray TracingRealtime Ray Tracing
• Even today, IRT solves tasks that even high-end graphics Even today, IRT solves tasks that even high-end graphics hardware still cannot handle !hardware still cannot handle !– Highly complex models (Muuss, Utah, Saarland [RW2001])Highly complex models (Muuss, Utah, Saarland [RW2001])
– High-quality Isosurface and Volume Visualization (Utah)High-quality Isosurface and Volume Visualization (Utah)
– Shadows, reflections, arbitrary shading… [Saarland, Utah]Shadows, reflections, arbitrary shading… [Saarland, Utah]
– High-quality reflection simulation of car headlights [PGV2002]High-quality reflection simulation of car headlights [PGV2002]
– Interactive Global Illumination [RW2002]Interactive Global Illumination [RW2002]
Feb 3rd, 2003 Afrigraph 2003 12
Todays State of the ArtTodays State of the Art- Some Snapshots- Some Snapshots
VideoVideo
Part IPart I
Different Approaches toDifferent Approaches toRealtime Ray TracingRealtime Ray Tracing
Feb 3rd, 2003 Afrigraph 2003 15
Different Approaches to Different Approaches to Realtime Ray TracingRealtime Ray Tracing
Basically three choices:Basically three choices:• Pure Software ImplementationsPure Software Implementations
– Today: Highly parallelToday: Highly parallel• Shared Memory (Utah), or PC Clusters (Saarland)Shared Memory (Utah), or PC Clusters (Saarland)
– Future: Single PC ? Future: Single PC ? • Moore’s Law also holds for CPUs !Moore’s Law also holds for CPUs !• Perhaps with streaming co-processors (e.g. “SSE++”)Perhaps with streaming co-processors (e.g. “SSE++”)
• Mixed SW/HW: RT on Programmable GPUsMixed SW/HW: RT on Programmable GPUs– Purcell et al., StandfordPurcell et al., Standford– Converges to the ‘coprocessor’ approachConverges to the ‘coprocessor’ approach
• Pure HWPure HW– Dedicated RT hardware (Schmittler et al., SaarCOR)Dedicated RT hardware (Schmittler et al., SaarCOR)
Summarize all three approachesSummarize all three approaches
Alternative IAlternative I
Software Ray TracingSoftware Ray Tracing(examplary on the Saarland engine)(examplary on the Saarland engine)
Feb 3rd, 2003 Afrigraph 2003 17
The OpenRT Interactive Ray The OpenRT Interactive Ray Tracing EngineTracing Engine
Features of OpenRT:Features of OpenRT:• Highly efficient implementation of RT kernelsHighly efficient implementation of RT kernels
– On a single Athlon MP 1800+ CPU: ~ 500.000-1.5 million rays On a single Athlon MP 1800+ CPU: ~ 500.000-1.5 million rays per second for average models (100ktri – 1 Mtri)per second for average models (100ktri – 1 Mtri)
– Up to 10 million rps (rays/sec) range (no shading, simple scenes)Up to 10 million rps (rays/sec) range (no shading, simple scenes)
• Sophisticated parallelization on cluster of PCsSophisticated parallelization on cluster of PCs– Dynamic load-balancingDynamic load-balancing– Using up to 24 dual-Athlon MP 1800+ or 25 dual P4 Xeon Using up to 24 dual-Athlon MP 1800+ or 25 dual P4 Xeon
2.4GHz2.4GHz
• Dynamically loadable, fully programmable ShadersDynamically loadable, fully programmable Shaders– Arbitrary c-code shading, arbitrary raysArbitrary c-code shading, arbitrary rays– Renderman-like Shading LanguageRenderman-like Shading Language
• Can handle dynamic scenes (later)Can handle dynamic scenes (later)• OpenGL-like API (later)OpenGL-like API (later)
Feb 3rd, 2003 Afrigraph 2003 18
Where does the speed come Where does the speed come from ?from ?
Speed depends on several factors…Speed depends on several factors…• Using fastest available hardwareUsing fastest available hardware
– Fast CPUs, and many CPUsFast CPUs, and many CPUs
• Good algorithms – Avoid operations in the first placeGood algorithms – Avoid operations in the first place– Fast Intersection and Traversal (kd-trees)Fast Intersection and Traversal (kd-trees)
– Minimize Intersections and Trv-steps with high-quality BSPsMinimize Intersections and Trv-steps with high-quality BSPs
• Just as important – Make sure you’re using your silicon Just as important – Make sure you’re using your silicon correctly !correctly !– Highly efficient implementationHighly efficient implementation
– Machine-dependent code, if necessary (SSE)Machine-dependent code, if necessary (SSE)
Feb 3rd, 2003 Afrigraph 2003 19
Where does the speed come Where does the speed come from ?from ?
Keep the Computational Units busy !Keep the Computational Units busy !• Make CPU doesn’t stallMake CPU doesn’t stall
– Avoiding pipeline stalls has top priorityAvoiding pipeline stalls has top priority
Look at memory, caches and bandwidth !!!Look at memory, caches and bandwidth !!!– Example: Cache miss during triangle intersection costs about 4 Example: Cache miss during triangle intersection costs about 4
times as much as the computations themselves !!!times as much as the computations themselves !!! Packing, aligning, cache-friendly data layout, prefetching, …Packing, aligning, cache-friendly data layout, prefetching, …
• But: no details hereBut: no details here– Already covered that at Afrigraph 2001Already covered that at Afrigraph 2001
– It’s not one single method, its more a principle It’s not one single method, its more a principle
Feb 3rd, 2003 Afrigraph 2003 20
Distributed Ray TracingDistributed Ray Tracing
• One CPU still not fast enough One CPU still not fast enough – 1 Mray/sec is fast, but not enough1 Mray/sec is fast, but not enough
– Need more CPUs Need more CPUs Cluster’s are cheap ($20k-$50k) Cluster’s are cheap ($20k-$50k)
• Many approaches:Many approaches:– Static vs dynamic load balancing Static vs dynamic load balancing
– Object-space vs image-space vs ray-based task partitioning, …Object-space vs image-space vs ray-based task partitioning, …
– Pixel-interleaved (load balancing) vs tiles (coherence)Pixel-interleaved (load balancing) vs tiles (coherence)
– ……
• Problem: Interactivity constraintProblem: Interactivity constraint– Have to finish whole frame in 1/10Have to finish whole frame in 1/10thth of a second of a second
– Few time for sophisticated reordering/schedulingFew time for sophisticated reordering/scheduling
Feb 3rd, 2003 Afrigraph 2003 21
Distributed Ray TracingDistributed Ray Tracing
Our approach (mostly Carsten Benthin)Our approach (mostly Carsten Benthin)• Image-based task partitioningImage-based task partitioning
Break image up into ‘tiles’ (usually 16x16 or 32x32)Break image up into ‘tiles’ (usually 16x16 or 32x32)
– Since API: Can dynamically change task partitioning schemeSince API: Can dynamically change task partitioning scheme
• Strongly varying workload Strongly varying workload Need dynamic load balancing: Let clients ask for work …Need dynamic load balancing: Let clients ask for work …
• Have to care about network-latenciesHave to care about network-latencies– (10ms Network-latency = 10.000 rays !)(10ms Network-latency = 10.000 rays !)
– Highly efficient networking/communication code Highly efficient networking/communication code Double-buffering, prefetching, packing, streaming, asynchronous Double-buffering, prefetching, packing, streaming, asynchronous
sending and rendering, interleaving of different tasks, sending and rendering, interleaving of different tasks, multithreading, …multithreading, …
Feb 3rd, 2003 Afrigraph 2003 22
Distributed Ray TracingDistributed Ray TracingResultsResults
• Can efficiently use many CPUsCan efficiently use many CPUs– 32x32 tiles at 640x480 = 150 tiles 32x32 tiles at 640x480 = 150 tiles enough for many CPUs enough for many CPUs
• Usually limiting factor: Pixels/second (not rays/sec)Usually limiting factor: Pixels/second (not rays/sec)– Bandwidth limited at server: 640x480 at 10-15 frames/secBandwidth limited at server: 640x480 at 10-15 frames/sec
– For < 10 fps: Usually achieve 90-99% client utilizationFor < 10 fps: Usually achieve 90-99% client utilization
– Client bandwidth usually not an issue … (100Mbit)Client bandwidth usually not an issue … (100Mbit)
• Rendering Complexity helps !Rendering Complexity helps !– More costly tiles = better compute/BW ratio, less Pixels/secMore costly tiles = better compute/BW ratio, less Pixels/sec
• Can use more CPUs without hitting bandwidth limitCan use more CPUs without hitting bandwidth limit
– Doubling rays/pixel easier than doubling framerateDoubling rays/pixel easier than doubling framerate• Framerate scales linearly only up to max framerateFramerate scales linearly only up to max framerate
• But always scales linearly in rays/pixelBut always scales linearly in rays/pixel
• Better networking hardware would definitely helpBetter networking hardware would definitely help
Realtime Ray TracingRealtime Ray TracingApproach IIApproach II
Ray Tracing on Programmable GPUsRay Tracing on Programmable GPUs
Feb 3rd, 2003 Afrigraph 2003 24
Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs
Graphics Hardware todayGraphics Hardware today• GPUs are extremely powerful GPUs are extremely powerful
– Already more transistors than P4Already more transistors than P4
– Full IEEE floating point !Full IEEE floating point !
– Many, many, many parallel FPU’sMany, many, many parallel FPU’s
– Moore’s Law: Faster growth than for CPUsMoore’s Law: Faster growth than for CPUs
• GPUs become more and more programmableGPUs become more and more programmable– First: ‘Register Combiners’First: ‘Register Combiners’
– Then: ‘Vertex Shaders’Then: ‘Vertex Shaders’• Programmable per vertexProgrammable per vertex
• linear interpolation inside the verticeslinear interpolation inside the vertices
– Today: ‘Pixel Shaders’, ‘Fragment Programs’Today: ‘Pixel Shaders’, ‘Fragment Programs’• Fully programmable for each fragmentFully programmable for each fragment
Feb 3rd, 2003 Afrigraph 2003 25
Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs
GPU programmability today:GPU programmability today:• Full IEEEFull IEEE• SIMD computationsSIMD computations• Access to ‘memory’ (textures) in every instructionAccess to ‘memory’ (textures) in every instruction• Multiple indirections (pointer chasing) now possibleMultiple indirections (pointer chasing) now possible
– ““dependent texture reads”dependent texture reads”
• Still: Several restrictionsStill: Several restrictions– Conditionals, loops, recursion, dependent texture writes …Conditionals, loops, recursion, dependent texture writes …
• Typically programmed in ‘GPU-assembler’Typically programmed in ‘GPU-assembler’• Most recent: High-level ‘meta’ languagesMost recent: High-level ‘meta’ languages
– E.g. ‘CG’ (‘C’ for GPUs)E.g. ‘CG’ (‘C’ for GPUs)
Feb 3rd, 2003 Afrigraph 2003 26
Streaming Computations on Streaming Computations on Programmable GPUsProgrammable GPUs
Idea: Use GPU as streaming co-processorIdea: Use GPU as streaming co-processor– Don’t use it for rasterizing at all…Don’t use it for rasterizing at all…
• Pixels form a ‘stream’ of elementsPixels form a ‘stream’ of elements– Apply small program (‘kernel’) for whole streamApply small program (‘kernel’) for whole stream
• Render screen-aligned quad with a fragment shaderRender screen-aligned quad with a fragment shader Fragment program executed for each screen pixelFragment program executed for each screen pixel
• Each pixel operates on different dataEach pixel operates on different data– Read data from texturesRead data from textures
• Screen-aligned textures : 1 texel for each pixelScreen-aligned textures : 1 texel for each pixel
– Output to framebuffer : 1 ‘pixel’ for each fragment programOutput to framebuffer : 1 ‘pixel’ for each fragment program
– Feedback Loop: Copy framebuffer to texturesFeedback Loop: Copy framebuffer to textures
– Future: Directly write into texturesFuture: Directly write into textures
KernelKernel
(Fragment Shader)(Fragment Shader)
Memory (Textures)Memory (Textures)
Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs
FrameFrame
BufferBuffer
ScreenScreen
alignedaligned
QuadQuad
FragmentFragment
OutputOutput
DataData
(Texels)(Texels)
KernelKernel
(Fragment Shader)(Fragment Shader)
Memory (Textures)Memory (Textures)
Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs
FrameFrame
BufferBuffer
ScreenScreen
alignedaligned
QuadQuad
FragmentFragment
OutputOutput
DataData
(Texels)(Texels)
KernelKernel
(Fragment Shader)(Fragment Shader)
Memory (Textures)Memory (Textures)
Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs
FrameFrame
BufferBuffer
ScreenScreen
alignedaligned
QuadQuad
FragmentFragment
OutputOutput
DataData
(Texels)(Texels)
KernelKernel
(Fragment Shader)(Fragment Shader)
Memory (Textures)Memory (Textures)
Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs
FrameFrame
BufferBuffer
ScreenScreen
alignedaligned
QuadQuad
FragmentFragment
OutputOutput
DataData
(Texels)(Texels)
KernelKernel
(Fragment Shader)(Fragment Shader)
Memory (Textures)Memory (Textures)
Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs
FrameFrame
BufferBuffer
ScreenScreen
alignedaligned
QuadQuad
FragmentFragment
OutputOutput
DataData
(Texels)(Texels)
KernelKernel
(Fragment Shader)(Fragment Shader)
Memory (Textures)Memory (Textures)
Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs
FrameFrame
BufferBuffer
ScreenScreen
alignedaligned
QuadQuad
FragmentFragment
OutputOutput
DataData
(Texels)(Texels)
KernelKernel
(Fragment Shader)(Fragment Shader)
Memory (Textures)Memory (Textures)
Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs
FrameFrame
BufferBuffer
ScreenScreen
alignedaligned
QuadQuad
FragmentFragment
OutputOutput
DataData
(Texels)(Texels)
KernelKernel
(Fragment Shader)(Fragment Shader)
Memory (Textures)Memory (Textures)
Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs
FrameFrame
BufferBuffer
ScreenScreen
alignedaligned
QuadQuad
FragmentFragment
OutputOutput
DataData
(Texels)(Texels)
KernelKernel
(Fragment Shader)(Fragment Shader)
Memory (Textures)Memory (Textures)
Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs
FrameFrame
BufferBuffer
ScreenScreen
alignedaligned
QuadQuad
FragmentFragment
OutputOutput
DataData
(Texels)(Texels)
KernelKernel
(Fragment Shader)(Fragment Shader)
Memory (Textures)Memory (Textures)
Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs
FrameFrame
BufferBuffer
ScreenScreen
alignedaligned
QuadQuad
FragmentFragment
OutputOutput
DataData
(Texels)(Texels)
Feedback !Feedback !
Feb 3rd, 2003 Afrigraph 2003 37
Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs
Mapping Ray Tracing to the GPUMapping Ray Tracing to the GPU• Use textures for the storing ‘variables’Use textures for the storing ‘variables’
– Ray: ‘origin’ and ‘direction’ 2D textures (3 floats each)Ray: ‘origin’ and ‘direction’ 2D textures (3 floats each)
– Hit: 2D texture (3 floats: u,v,id)Hit: 2D texture (3 floats: u,v,id)
– Vertices: 1D-texture of vertex positions (3 floats each)Vertices: 1D-texture of vertex positions (3 floats each)
– Triangles: 1D-texture of vertex ids (1 Triangles: 1D-texture of vertex ids (1 floatfloat each) each)
– Acceleration structure: e.g. 3D-texture for simple gridAcceleration structure: e.g. 3D-texture for simple grid
• Multiple indirections no problemMultiple indirections no problem– E.g. use triangle[i] as texture coordinate into vertex[] textureE.g. use triangle[i] as texture coordinate into vertex[] texture
– Up to 4 indirections (grid Up to 4 indirections (grid triangle list triangle list triangle triangle vertex) vertex)
Feb 3rd, 2003 Afrigraph 2003 38
Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs
Write ‘kernels’ for different ray tracing opsWrite ‘kernels’ for different ray tracing ops• Ray GenerationRay Generation
– Get pixel position from texture coordinates Get pixel position from texture coordinates
– Somehow get camera settings (e.g. from quad color, or texture)Somehow get camera settings (e.g. from quad color, or texture)
– Compute corresponding rayCompute corresponding ray
– Write to ‘origin’, ‘direction’, ‘state’ texturesWrite to ‘origin’, ‘direction’, ‘state’ textures
• Triangle IntersectionTriangle Intersection– Read triangle ID to be intersected from stateRead triangle ID to be intersected from state
– Get triangle vertices from texturesGet triangle vertices from textures
– IntersectIntersect
– Update state textureUpdate state texture
• Similar for traversal, triangle list intersection, shading, …Similar for traversal, triangle list intersection, shading, …
Feb 3rd, 2003 Afrigraph 2003 39
Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs
• Have kernels for ray generation, traversal, intersection, etc.Have kernels for ray generation, traversal, intersection, etc.• Each ray is in exactly one ‘state’ Each ray is in exactly one ‘state’
– E.g. in ‘intersection’ stateE.g. in ‘intersection’ state– Make sure only rays in ‘correct’ state are processedMake sure only rays in ‘correct’ state are processed
• E.g. apply intersection kernel only to rays in intersect stateE.g. apply intersection kernel only to rays in intersect state• Usual GL masking methods, e.g. stencil bits, early pixel kill etc.Usual GL masking methods, e.g. stencil bits, early pixel kill etc. Can generate overhead, but usually ok …Can generate overhead, but usually ok …
– Fragment program can change state of rayFragment program can change state of ray• E.g. change from ‘traversal’ to ‘intersection’ in non-empty voxelE.g. change from ‘traversal’ to ‘intersection’ in non-empty voxel
• Combine different kernels by just calling them in turnCombine different kernels by just calling them in turn– E.g. rendering an ‘intersection’ quad will do one intersection stepE.g. rendering an ‘intersection’ quad will do one intersection step
(but only for rays in intersect state !)(but only for rays in intersect state !)
– Secondary rays rel. easy for ‘Shader’ kernel Secondary rays rel. easy for ‘Shader’ kernel • Update origin&direction textures, go back to ‘traversal’ state…Update origin&direction textures, go back to ‘traversal’ state…
Feb 3rd, 2003 Afrigraph 2003 40
Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs
Results:Results:• Easy to exploit parallelism in the GPUEasy to exploit parallelism in the GPU
– Many more pixels than fragment pipelinesMany more pixels than fragment pipelines
• Comparable performance to single CPUComparable performance to single CPU– Even though its only a prototype implementationEven though its only a prototype implementation
– Limited by fragment pipeline very soon…Limited by fragment pipeline very soon…
• Main LimitationMain Limitation– Fragment processing speedFragment processing speed
– Texture memoryTexture memory• Need many textures for each pixelNeed many textures for each pixel
• Also need to store whole scene in textureAlso need to store whole scene in texture
– BandwidthBandwidth
– Number of different states must be small !Number of different states must be small !
Feb 3rd, 2003 Afrigraph 2003 41
Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs
Additional limitations of current GPUsAdditional limitations of current GPUs• Bandwidth problems due to missing loopsBandwidth problems due to missing loops
– Often have to write data just to save it for next iterationOften have to write data just to save it for next iteration
• Overhead due to missing ‘write’ capabilityOverhead due to missing ‘write’ capability• Accuracy problems – no ints, all floatsAccuracy problems – no ints, all floats
– E.g. rounding modes when reading IDs from a texture …E.g. rounding modes when reading IDs from a texture …
• Problems due to missing ‘dependent writes’Problems due to missing ‘dependent writes’– Many textures for input, but only one framebuffer for outputMany textures for input, but only one framebuffer for output
• Need multiple passes computing more than 3 values per pix.Need multiple passes computing more than 3 values per pix.
– Each fragment shader writes to exactly one predetermined positionEach fragment shader writes to exactly one predetermined position
– Hard to do recursive operations with that limitationHard to do recursive operations with that limitation• Kd-tree construction ?Kd-tree construction ?
Feb 3rd, 2003 Afrigraph 2003 42
Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs
Ray tracing on GPUs in the future ?Ray tracing on GPUs in the future ?• Many limitations will (probably) changeMany limitations will (probably) change
– Loops, branches, dependent writes, int textures, texture memory, Loops, branches, dependent writes, int textures, texture memory, early pixel kill …early pixel kill …
• Performance will increase faster than for CPUsPerformance will increase faster than for CPUs
Might soon be faster, and similarly flexible, as ray tracing Might soon be faster, and similarly flexible, as ray tracing on a CPU !on a CPU !
Realtime Ray TracingRealtime Ray TracingApproach IIIApproach III
Dedicated Ray Tracing HardwareDedicated Ray Tracing Hardware
Feb 3rd, 2003 Afrigraph 2003 44
Dedicated Ray Tracing Dedicated Ray Tracing HardwareHardware
• Relatively low efficiency when using GPU for RTRelatively low efficiency when using GPU for RT– Many units not needed at all (rasterization, z-buffer, clipping, Many units not needed at all (rasterization, z-buffer, clipping,
lighting, …)lighting, …)
– Lots of overheadLots of overhead
– Programmable units can never be as efficient as dedicated HWProgrammable units can never be as efficient as dedicated HW
Dedicated ray tracing HW should be more efficientDedicated ray tracing HW should be more efficient
• Building RT HW is feasible todayBuilding RT HW is feasible today– FPU power not a problem any more FPU power not a problem any more
(see GForce3 FPU performance)(see GForce3 FPU performance)
– Die size/Nr of transistors not a problem any moreDie size/Nr of transistors not a problem any more
– Main problem: Off-chip bandwidth !Main problem: Off-chip bandwidth !• Already between chip and cacheAlready between chip and cache
Feb 3rd, 2003 Afrigraph 2003 45
Dedicated Ray Tracing Dedicated Ray Tracing HardwareHardware
Bandwidth: Same problem as in SWBandwidth: Same problem as in SW• Approach in SW: Bandwidth reduction by Coherent Ray Approach in SW: Bandwidth reduction by Coherent Ray
Tracing (packet traversal) Tracing (packet traversal) • HW: Much larger packets (64x64 vs 2x2 !)HW: Much larger packets (64x64 vs 2x2 !)
Much bigger bandwidth savingMuch bigger bandwidth saving
• Target realtime full-screen resolutionsTarget realtime full-screen resolutions Larger packet sizes not a problem Larger packet sizes not a problem Lots of coherence Lots of coherence
• Avoiding overhead simple in HWAvoiding overhead simple in HW– Much simpler than with SSEMuch simpler than with SSE
Feb 3rd, 2003 Afrigraph 2003 46
SaarCOR ArchitectureSaarCOR Architecture
FeaturesFeatures• Based on interactive software ray tracerBased on interactive software ray tracer
– Exactly same data structures, …Exactly same data structures, …
• KD-trees as accelleration structureKD-trees as accelleration structure• Pakets of rays to reduce bandwidthPakets of rays to reduce bandwidth• Fixed OpenGL-like shading…Fixed OpenGL-like shading…• … … plus shadow and reflection raysplus shadow and reflection rays
Goals:Goals:• Simple low bandwidth memory interfaceSimple low bandwidth memory interface• Half the floating point requirements of GeForce3Half the floating point requirements of GeForce3• Achieves frame rates comparable to today’s gfxcardsAchieves frame rates comparable to today’s gfxcards
Feb 3rd, 2003 Afrigraph 2003 47
SaarCOR Architecture:SaarCOR Architecture:System overviewSystem overview
Feb 3rd, 2003 Afrigraph 2003 48
• ScalableScalable
• Fully pipelinedFully pipelined
• Multi threading for latency hidingMulti threading for latency hiding
• Simple communication pattern (no routing)Simple communication pattern (no routing)
• Highly asynchronousHighly asynchronous
SaarCOR Architecture:SaarCOR Architecture:FeaturesFeatures
Feb 3rd, 2003 Afrigraph 2003 49
SaarCOR – Current StatusSaarCOR – Current Status
Simulation on register-transfer levelSimulation on register-transfer level
• Core @ 533MHz, Memory 64 Bit @ 133 MHz Core @ 533MHz, Memory 64 Bit @ 133 MHz (simple SD-RAM, no DDR!)(simple SD-RAM, no DDR!)
• Each pipeline uses 36 FP-unitsEach pipeline uses 36 FP-units
• Standard SaarCOR: Standard SaarCOR: – 4 pipelines4 pipelines– 16 threads per pipe16 threads per pipe– 1 GB/s bandwidth to memory (!)1 GB/s bandwidth to memory (!)– 272 KB for caches (!)272 KB for caches (!)
• Four pipes ~ Four pipes ~ ½½ FP-resources of GeForce 3 FP-resources of GeForce 3
Feb 3rd, 2003 Afrigraph 2003 50
IssuesIssues
On-chip memory of standard SaarCOROn-chip memory of standard SaarCOR
• Caches: Caches: 272 KB272 KB• RF for rays: RF for rays: 288 KB288 KB• RF for stack: RF for stack: 535 KB535 KB
Register level simulations onlyRegister level simulations only
Simple shading only Simple shading only
Feb 3rd, 2003 Afrigraph 2003 51
Benchmarks:Benchmarks:ScenesScenes
OpenGL-Like Shading:OpenGL-Like Shading:• No shadow raysNo shadow rays• No reflection raysNo reflection rays
Full screen resolutionFull screen resolution
1024 x 768 pixel1024 x 768 pixel
Feb 3rd, 2003 Afrigraph 2003 52
Benchmarks: Benchmarks: Scenes (2)Scenes (2)
Feb 3rd, 2003 Afrigraph 2003 53
Benchmarks:Benchmarks:ResultsResults
Today’s CPUs: 0.5 – 0.8 mrays/s factor of 100-200!
Feb 3rd, 2003 Afrigraph 2003 54
Efficiency of standard SaarCOREfficiency of standard SaarCOR
Performance scales with number of pipelines, Performance scales with number of pipelines, threads, cache size and bandwidth.threads, cache size and bandwidth.
16 threads 16 threads 32 threads: + 10% 32 threads: + 10%
Benchmarks:Benchmarks:Results (3)Results (3)
Feb 3rd, 2003 Afrigraph 2003 55
What about shading?What about shading?
• Right now: Shading only coarsely approximatedRight now: Shading only coarsely approximated– Fixed phong shader w/ bilinear texturingFixed phong shader w/ bilinear texturing
– Programmable Shading currently evaluatedProgrammable Shading currently evaluated
• Shading packets of rays exploits coherenceShading packets of rays exploits coherence
• BQD scene with bilinear texturesBQD scene with bilinear textures– 14 MB for shading data per frame14 MB for shading data per frame
– 300 – 600 MB/s bandwidth300 – 600 MB/s bandwidth
• Shading BW ~ Ray Tracing BWShading BW ~ Ray Tracing BW
Feb 3rd, 2003 Afrigraph 2003 56
ConclusionsConclusions
SaarCOR architectureSaarCOR architecture
• Scales well in the numberScales well in the numberof pipelinesof pipelines
• Highly efficientHighly efficient– Uses half the FP power of GeForce3Uses half the FP power of GeForce3
– Requires very low bandwidthRequires very low bandwidth
• Provides full featured ray tracingProvides full featured ray tracing• Same frame rates as today’s graphics cardsSame frame rates as today’s graphics cards
Feb 3rd, 2003 Afrigraph 2003 57
Current WorkCurrent Work
• Programmable shadingProgrammable shading• API: OpenRT [Wald’02]API: OpenRT [Wald’02]• Virtual Memory ManagementVirtual Memory Management• Incorporate Features and Algorithms from SW systemIncorporate Features and Algorithms from SW system
– Large Models [Wald’01]Large Models [Wald’01]
– Dynamic scenes [Wald’02]Dynamic scenes [Wald’02]
– Global Illumination [Wald’02]Global Illumination [Wald’02]
• Building a prototype …Building a prototype …
Realtime Ray TracingRealtime Ray TracingApproaches I-IIIApproaches I-III
Summary and ConclusionsSummary and Conclusions
Feb 3rd, 2003 Afrigraph 2003 59
Realtime Ray TracingRealtime Ray Tracing
Summary:Summary:• Different upcoming (and competing !) architectures.Different upcoming (and competing !) architectures.• All these have different advantages / disadvantagesAll these have different advantages / disadvantages
– PC clusters: most flexible, but not useful for consumer marketPC clusters: most flexible, but not useful for consumer market
– GPUs: better performance growth, cheap, but awkward to useGPUs: better performance growth, cheap, but awkward to use
– HW: best performance, best efficiency, but costlyHW: best performance, best efficiency, but costly
Cannot yet predict which one will “win”…Cannot yet predict which one will “win”…
Feb 3rd, 2003 Afrigraph 2003 60
Realtime Ray TracingRealtime Ray Tracing
Summary:Summary:• Different upcoming (and competing !) architectures.Different upcoming (and competing !) architectures.• All these have different advantages / disadvantagesAll these have different advantages / disadvantages
– PC clusters: most flexible, but not useful for consumer marketPC clusters: most flexible, but not useful for consumer market
– GPUs: better performance growth, cheap, but awkward to useGPUs: better performance growth, cheap, but awkward to use
– HW: best performance, best efficiency, but costlyHW: best performance, best efficiency, but costly
Cannot yet predict which one will “win”…Cannot yet predict which one will “win”…
But:But:
Question is not “Question is not “willwill realtime ray tracing ever come ?” realtime ray tracing ever come ?”
Questions rather is “how” and “when” will it come.Questions rather is “how” and “when” will it come.
End of Part I - Questions ?End of Part I - Questions ?
Part IIPart II
Advanced Ray Tracing IssuesAdvanced Ray Tracing Issues
Feb 3rd, 2003 Afrigraph 2003 63
Advanced Ray Tracing IssuesAdvanced Ray Tracing Issues
• Conclusions from Part I : Realtime Ray Tracing will comeConclusions from Part I : Realtime Ray Tracing will come
• Problem: All these architectures mostly focus only on the Problem: All these architectures mostly focus only on the core ray tracing algorithms, i.e. traversal & intersectioncore ray tracing algorithms, i.e. traversal & intersection
• Ubiquitous Realtime Ray Tracing opens new problemsUbiquitous Realtime Ray Tracing opens new problems– Dynamic Scenes ?Dynamic Scenes ?
– Suitable API(s) ?Suitable API(s) ?
– Implications for future Applications / SceneGraph libraries ?Implications for future Applications / SceneGraph libraries ?
Feb 3rd, 2003 Afrigraph 2003 64
Interactive Ray TracingInteractive Ray Tracing
So far:So far:• Interactive RT possible even today, can already beat Interactive RT possible even today, can already beat
SGI/NVidia SGI/NVidia – Complex modelsComplex models
– High-Quality ApplicationsHigh-Quality Applications
Can do high-quality, Can do high-quality, interactive walkthroughs interactive walkthroughs
• But: “Walkthrough” is not But: “Walkthrough” is not really really interactiveinteractive– Not if scene remains static…Not if scene remains static…
Feb 3rd, 2003 Afrigraph 2003 65
Issue I : Dynamic ScenesIssue I : Dynamic Scenes
• Fact: Ray Tracing Fact: Ray Tracing needs needs acceleration structureacceleration structure– Building it is very costlyBuilding it is very costly
– Precomputation only works for static scenesPrecomputation only works for static scenes
• But: ‘Real’ scenes usually aren’t static…But: ‘Real’ scenes usually aren’t static… “ “What is ‘interactive’ if I cannot interact with it ?”What is ‘interactive’ if I cannot interact with it ?”
• Problem: Few research on this topic…Problem: Few research on this topic…– Just wasn’t interesting before interactive ray tracing…Just wasn’t interesting before interactive ray tracing…
– Previous work: Usually on special casesPrevious work: Usually on special cases• Utah ‘Hack’: Keep dynamic objects out of accel structure…Utah ‘Hack’: Keep dynamic objects out of accel structure…
• [Reinhard RW2001]: Incremental updates of Uniform Grid[Reinhard RW2001]: Incremental updates of Uniform Grid– Costly, not hierarchicalCostly, not hierarchical
• [Moeller, EG2001]: Only rigid-body animation[Moeller, EG2001]: Only rigid-body animation
Feb 3rd, 2003 Afrigraph 2003 66
Handling Dynamic ScenesHandling Dynamic Scenes
• Different kinds of dynamic behavior Different kinds of dynamic behavior – Hierarchical, rigid-body motion vs unstructured motionHierarchical, rigid-body motion vs unstructured motion
– Constrained unstructured motion (e.g. maximum displacement)Constrained unstructured motion (e.g. maximum displacement)
– All triangles animated vs few triangles animatedAll triangles animated vs few triangles animated
– Amortized over many rays/frames or over few raysAmortized over many rays/frames or over few rays
– ……
Feb 3rd, 2003 Afrigraph 2003 67
Handling Dynamic ScenesHandling Dynamic Scenes
• Different kinds of dynamic behavior Different kinds of dynamic behavior – Hierarchical, rigid-body motion vs unstructured motionHierarchical, rigid-body motion vs unstructured motion
– Constrained unstructured motion (e.g. maximum displacement)Constrained unstructured motion (e.g. maximum displacement)
– All triangles animated vs few triangles animatedAll triangles animated vs few triangles animated
– Amortized over many rays/frames or over few raysAmortized over many rays/frames or over few rays
– ……
• Inherently different problems need different solutions…Inherently different problems need different solutions…
• One single algorithm will hardly do the jobOne single algorithm will hardly do the job
Feb 3rd, 2003 Afrigraph 2003 68
Handling Dynamic ScenesHandling Dynamic Scenes
Alternative approach:Alternative approach:• Offer suite of different techniquesOffer suite of different techniques
– Hierarchical animation of whole objectsHierarchical animation of whole objects
– Fast Rebuild of objects for unstructured motion Fast Rebuild of objects for unstructured motion (with sacrifices in traversal speed)(with sacrifices in traversal speed)
– High-quality bsps for often-used static objects High-quality bsps for often-used static objects (with relatively long rebuild time)(with relatively long rebuild time)
• Let the application decide, which one is best for what !Let the application decide, which one is best for what !– IfIf anybody knows what’s best, it’s the application programmer anybody knows what’s best, it’s the application programmer
– Just like OpenGL: AJust like OpenGL: Applications pplications build display lists, not the build display lists, not the drivers !drivers !
– Allow combination of techniquesAllow combination of techniques• E.g.‘some’ unstructured motion but otherwise hierarchically animatedE.g.‘some’ unstructured motion but otherwise hierarchically animated
App needs good API to do that !App needs good API to do that !
Feb 3rd, 2003 Afrigraph 2003 69
Handling Dynamic ScenesHandling Dynamic Scenes
Combining techniques in a hierarchical wayCombining techniques in a hierarchical way• Application groups geometry into ‘objects’Application groups geometry into ‘objects’
– Similar to building display lists (Similar to building display lists (API)API)
– Each object has separate BSP (just like PowerPlant)Each object has separate BSP (just like PowerPlant)
• ‘‘Hints’ can be given to control quality/speed tradeoffHints’ can be given to control quality/speed tradeoff– E.g. whether the object will be static or unstructuredE.g. whether the object will be static or unstructured
• Objects can be ‘instantiated’Objects can be ‘instantiated’– Just like ‘calling’ a display list (Just like ‘calling’ a display list ( API) API) Hierarchical animation: Just re-instantiate with new transform…Hierarchical animation: Just re-instantiate with new transform…
• Objects are kept in additional hierarchy levelObjects are kept in additional hierarchy level– With separate, fast With separate, fast andand high-quality BSP high-quality BSP
– During traversal, just transform the rays when they hit an objectDuring traversal, just transform the rays when they hit an object
Feb 3rd, 2003 Afrigraph 2003 70
Handling Dynamic ScenesHandling Dynamic Scenes- Results- Results
• Side Effect: Instantiation is for free Side Effect: Instantiation is for free – Terrain: 1000 instances of 20ktri-tree: 20 Mtri (and dynamic !)Terrain: 1000 instances of 20ktri-tree: 20 Mtri (and dynamic !)– Sunflowers: 36.000 x 24ktri-sunflowers: 1 Sunflowers: 36.000 x 24ktri-sunflowers: 1 GigaGigaTri (dynamic !)Tri (dynamic !)
• TopLevel BSP reconstruction tolerableTopLevel BSP reconstruction tolerable– Some milliseconds even for a few thousand objectsSome milliseconds even for a few thousand objects– But: scalability bottleneck (redundant computation on each client)But: scalability bottleneck (redundant computation on each client)
• Hierarchical animation is cheapHierarchical animation is cheap– Transformations are cheap (compared with the rest)Transformations are cheap (compared with the rest)
• But: Unstructured motion still costlyBut: Unstructured motion still costly– Especially for big objects (Especially for big objects ( have to use low(er)-quality BSPs) have to use low(er)-quality BSPs)– High bandwidth requirements for sending data over network !!!High bandwidth requirements for sending data over network !!!– Tolerable for moderately complex objects (16k-64ktri)Tolerable for moderately complex objects (16k-64ktri)
• In practice: Total overhead usually ~10-20%In practice: Total overhead usually ~10-20%
Feb 3rd, 2003 Afrigraph 2003 71
Handling Dynamic ScenesHandling Dynamic Scenes- Conclusions- Conclusions
• Works for many different scenes (BART Benchmark suite)Works for many different scenes (BART Benchmark suite)– ‘‘Robots’: Game-like scene, hierarchical animation of 161 ObjectsRobots’: Game-like scene, hierarchical animation of 161 Objects
– ‘‘Kitchen’: Mostly static, with many secondary effectsKitchen’: Mostly static, with many secondary effects
– ‘‘Museum’: Completely unstructured motionMuseum’: Completely unstructured motion• Correct (inter-)reflections, shadows, etc. also on moving triangles !Correct (inter-)reflections, shadows, etc. also on moving triangles !
• Also works for all applications we have built so farAlso works for all applications we have built so far– OpenRT based VRML97 viewer with VRML animationsOpenRT based VRML97 viewer with VRML animations
– Inventor-’port’ under way…Inventor-’port’ under way…
– Dynamic scenes in Interactive Global Illumination applicationDynamic scenes in Interactive Global Illumination application
Feb 3rd, 2003 Afrigraph 2003 72
Handling Dynamic ScenesHandling Dynamic Scenes- Results- Results
Feb 3rd, 2003 Afrigraph 2003 73
Handling Dynamic ScenesHandling Dynamic Scenes- Results- Results
VideoVideo
Feb 3rd, 2003 Afrigraph 2003 74
Handling Dynamic ScenesHandling Dynamic Scenes- Remaining Problems- Remaining Problems
• Lots of potential for future research !Lots of potential for future research !– Faster kd-tree generation ?Faster kd-tree generation ?
– Kd-tree generation in HW ?Kd-tree generation in HW ?
– On-demand generation of kd-trees ?On-demand generation of kd-trees ?
– More efficient solutions for special problemsMore efficient solutions for special problems• Skinning, morphing, progressive meshes, …Skinning, morphing, progressive meshes, …
– ……
Feb 3rd, 2003 Afrigraph 2003 75
Issue II – API IssuesIssue II – API Issues
So far:So far:• Fast, cheap, efficient, …Fast, cheap, efficient, …• Flexible, powerful shading …Flexible, powerful shading …• Can do big models and dynamic scenes, …Can do big models and dynamic scenes, …
So why is nobody using it ?So why is nobody using it ?
Feb 3rd, 2003 Afrigraph 2003 76
Issue II – API IssuesIssue II – API Issues
So far:So far:• Fast, cheap, efficient, …Fast, cheap, efficient, …• Flexible, powerful shading …Flexible, powerful shading …• Can do big models and dynamic scenes, …Can do big models and dynamic scenes, …
So why is nobody using it ?So why is nobody using it ?
Because without a proper API, you can’t !Because without a proper API, you can’t !
Feb 3rd, 2003 Afrigraph 2003 77
Issue II – API IssuesIssue II – API Issues
• Why do we need an API for Interactive Ray Tracing ?Why do we need an API for Interactive Ray Tracing ?– Side Effect: An API helps to ‘divide-n-conquer’ problems Side Effect: An API helps to ‘divide-n-conquer’ problems
(e.g. shaders, globillum, raytracing kernels, …) …(e.g. shaders, globillum, raytracing kernels, …) …
• E.g., can work separately on frontend and backend…E.g., can work separately on frontend and backend…
• Can Abstract from dynamic scene issues in globillum shader aso.Can Abstract from dynamic scene issues in globillum shader aso.
– It helps to create a ‘critical mass’ of usersIt helps to create a ‘critical mass’ of users• Rasterization only Rasterization only really really took off after OpenGLtook off after OpenGL
• Enables code portabilityEnables code portability
– Without an API, nobody will (or can) use it - except ‘insiders’Without an API, nobody will (or can) use it - except ‘insiders’• Not everybody has his own realtime raytracerNot everybody has his own realtime raytracer
• Not everybody wants to - or should - know all implementation detailsNot everybody wants to - or should - know all implementation details
For widespread Realtime Ray Tracing, we do need an APIFor widespread Realtime Ray Tracing, we do need an API
Feb 3rd, 2003 Afrigraph 2003 78
Issue II – API IssuesIssue II – API Issues
• Problem: There are no suitable APIsProblem: There are no suitable APIs
• API has to support both “interactive” API has to support both “interactive” andand “ray tracing” “ray tracing”– OpenGL interactive, OpenGL interactive,
but not suitable for ray tracingbut not suitable for ray tracing
– Renderman/Rayshade/Povray ray tracing capable, Renderman/Rayshade/Povray ray tracing capable, but but inherentlyinherently offline … offline …
Need to find new API(s)…Need to find new API(s)…
Feb 3rd, 2003 Afrigraph 2003 79
Issue II – API IssuesIssue II – API Issues
Goals for an Interactive Ray Tracing API:Goals for an Interactive Ray Tracing API:• As easy to learn and use as (standard) OpenGLAs easy to learn and use as (standard) OpenGL
– Leverage existing programmers’ experience with OpenGLLeverage existing programmers’ experience with OpenGL
• As powerful in Shading as RenderManAs powerful in Shading as RenderMan
Our Approach (OpenRT): Combine the best of bothOur Approach (OpenRT): Combine the best of both• Application API much like OpenGL/GLUTApplication API much like OpenGL/GLUT
– With necessary modifications for Ray Tracing (Shaders, Objects)With necessary modifications for Ray Tracing (Shaders, Objects)
• Shader API like RenderManShader API like RenderMan
Feb 3rd, 2003 Afrigraph 2003 80
The OpenRT Interactive Ray The OpenRT Interactive Ray Tracing APITracing API
• Application API very OpenGL-likeApplication API very OpenGL-like– Geometry: rtVertex3f, rtNormal3f, …Geometry: rtVertex3f, rtNormal3f, …
– Primitives: rtBegin/End(RT_TRIANGLES, RT_QUAD, …)Primitives: rtBegin/End(RT_TRIANGLES, RT_QUAD, …)
– Transformation: rtPushMatrix(), rtMatrixMode(…), …Transformation: rtPushMatrix(), rtMatrixMode(…), …
– Geometry ObjectsGeometry Objects• Just like Display Lists (except: no side effects)Just like Display Lists (except: no side effects)
• rtNewObjects(), rtBeginObject(), rtEndObject(), rtInstantiate(),…rtNewObjects(), rtBeginObject(), rtEndObject(), rtInstantiate(),…
– Shader ObjectsShader Objects• Surface, Light, and Pixel Shaders, exchangeable ‘Renderer Object’Surface, Light, and Pixel Shaders, exchangeable ‘Renderer Object’
• Even support GLUT-like functionality …Even support GLUT-like functionality … Porting GL/GLUT-applications relatively easy Porting GL/GLUT-applications relatively easy
(except multi-pass, of course, …)(except multi-pass, of course, …)
Feb 3rd, 2003 Afrigraph 2003 81
The OpenRT Interactive Ray The OpenRT Interactive Ray Tracing APITracing API
• Shader ObjectsShader Objects– Similar to Stanford Programmable Shading APISimilar to Stanford Programmable Shading API
– Dynamically loaded from DLLs/.so’s: Dynamically loaded from DLLs/.so’s: • rtShaderFile(), rtCreateShader(), rtBindShader()rtShaderFile(), rtCreateShader(), rtBindShader()
– Light shaders : Light shaders : rtCreateShader(…), rtUseLight(…)rtCreateShader(…), rtUseLight(…)
– Application-to-Shader communication via Shader Parameters: Application-to-Shader communication via Shader Parameters: • rtDeclareParam(), rtParameterHandle(…), rtParameter3f(…), …rtDeclareParam(), rtParameterHandle(…), rtParameter3f(…), …
• Parameters can be per vertex, per triangle, per shader, …Parameters can be per vertex, per triangle, per shader, …
• Retained-Mode / Frame Semantics: Retained-Mode / Frame Semantics: – Rendering uses Shader Parameters active at ‘end of frame’Rendering uses Shader Parameters active at ‘end of frame’
NOT at the time that shader/triangle was created…NOT at the time that shader/triangle was created…
– Actual rendering triggered at ‘rtSwapBuffers’Actual rendering triggered at ‘rtSwapBuffers’
– Rendering always done asynchronouslyRendering always done asynchronously
Feb 3rd, 2003 Afrigraph 2003 82
The OpenRT Interactive Ray The OpenRT Interactive Ray Tracing APITracing API
• Shader API – Or how to write a shaderShader API – Or how to write a shader– Declare and Export Shader ParametersDeclare and Export Shader Parameters
• Store as member variablesStore as member variables
– Write callback-functionsWrite callback-functions• ‘‘Shade()’, ‘Illuminate()’, …Shade()’, ‘Illuminate()’, …
– Access Scene Data with RenderMan like APIAccess Scene Data with RenderMan like API• Geometry: rtsShadingNormal(), …Geometry: rtsShadingNormal(), …
• Lights: rtsIlluminate(), rtsOccluded(), rtsLightTransparency(), …Lights: rtsIlluminate(), rtsOccluded(), rtsLightTransparency(), …
– Shoot Arbitrary Secondary RaysShoot Arbitrary Secondary Rays• rtsTrace(…)rtsTrace(…)
Porting RenderMan shaders relatively easy, too…Porting RenderMan shaders relatively easy, too…
Feb 3rd, 2003 Afrigraph 2003 83
The OpenRT Interactive Ray The OpenRT Interactive Ray Tracing APITracing API
• OpenRT: SummaryOpenRT: Summary– Fast and Interactive RenderingFast and Interactive Rendering
– Dynamic ScenesDynamic Scenes
– Very Powerful ShadingVery Powerful Shading
– API for using it …API for using it …
OpenRT is a complete 3D Rendering Engine …OpenRT is a complete 3D Rendering Engine …
Kernel behind OpenRT: Saarland RTRTKernel behind OpenRT: Saarland RTRT Might be changed to e.g. SaarCOR as soon as available…Might be changed to e.g. SaarCOR as soon as available…
Feb 3rd, 2003 Afrigraph 2003 84
OpenRT Example 1OpenRT Example 1VRML97 @OpenRTVRML97 @OpenRT
• Example 1: VRML97 Viewer ported from OpenGLExample 1: VRML97 Viewer ported from OpenGL– Porting relatively easy, almost all functionality was therePorting relatively easy, almost all functionality was there
– Only Modification: Have to gather small objects into fewer bigger Only Modification: Have to gather small objects into fewer bigger objects for performance reasons…objects for performance reasons…
• ResultsResults– Can render all of VRML97 Can render all of VRML97
• Almost no matter how big…Almost no matter how big…
– Can put any kind of shader on any triangle (e.g. GlobIllum…)Can put any kind of shader on any triangle (e.g. GlobIllum…)
– Can do VRML animations, move objects, edit shaders & lightsCan do VRML animations, move objects, edit shaders & lights
Car Headlight,Car Headlight,
800.000 tri800.000 tri
Soda hall FloorSoda hall Floor
400.000 tris400.000 tris
Feb 3rd, 2003 Afrigraph 2003 85
OpenRT Example 2OpenRT Example 2The BART BenchmarkThe BART Benchmark
• Example 2: The BART Benchmark scenesExample 2: The BART Benchmark scenes– To our knowledge, only system so far to render those at all…To our knowledge, only system so far to render those at all…
– All different kind of dynamic behavior, including reflections, All different kind of dynamic behavior, including reflections, refractions, shadows, …refractions, shadows, …
– With ‘GL’ Shader: > 10 frames per secondWith ‘GL’ Shader: > 10 frames per second
– With ‘Raytracing’ Shader : 2-5 frames per secondWith ‘Raytracing’ Shader : 2-5 frames per second
Feb 3rd, 2003 Afrigraph 2003 86
OpenRT Example 3OpenRT Example 3Complex Outdoor SceneComplex Outdoor Scene
• Example 3: Massive Instantiation for Outdoor ScenesExample 3: Massive Instantiation for Outdoor Scenes– Pixel-accurate shadows !Pixel-accurate shadows !
Feb 3rd, 2003 Afrigraph 2003 87
OpenRT Example 3OpenRT Example 3Complex Outdoor SceneComplex Outdoor Scene
Feb 3rd, 2003 Afrigraph 2003 88
OpenRT Example 4OpenRT Example 4Massive Model VisualizationMassive Model Visualization
• Example 4: The PowerPlantExample 4: The PowerPlant– 12.5-37.5 million triangles12.5-37.5 million triangles
– Currently: With replication, without demand-loading/reorderingCurrently: With replication, without demand-loading/reordering
– Just recently: Can now also move the furnace ;-)Just recently: Can now also move the furnace ;-)
Feb 3rd, 2003 Afrigraph 2003 89
OpenRT Example 5OpenRT Example 5Complex Shading Stress TestComplex Shading Stress Test
Feb 3rd, 2003 Afrigraph 2003 90
OpenRT Example 5OpenRT Example 5Complex Shading Stress TestComplex Shading Stress Test
• Example 5: Shading Stress TestExample 5: Shading Stress Test– Volume Shader (CT Head)Volume Shader (CT Head)
• Applied to a ‘box’ of geometryApplied to a ‘box’ of geometry
– Lightfield Shader – on simple quadLightfield Shader – on simple quad
– Procedural Wood and MarbleProcedural Wood and Marble
– Procedural Bump-Mapping on mirrorProcedural Bump-Mapping on mirror Procedurally bump-mapped reflectionsProcedurally bump-mapped reflections
• Result: Everything combines perfectly:Result: Everything combines perfectly:– Transparent Shadow from Volume on Procedural Wood ShaderTransparent Shadow from Volume on Procedural Wood Shader
– Lightfield reflected in procedurally bump-mapped mirror…Lightfield reflected in procedurally bump-mapped mirror…
– … … attenuated by semi-transparent volumeattenuated by semi-transparent volume
– Multiple interreflectionsMultiple interreflections
– Of course, everything is interactive and fully dynamicOf course, everything is interactive and fully dynamic
Feb 3rd, 2003 Afrigraph 2003 91
OpenRT Example 5OpenRT Example 5Complex Shading Stress TestComplex Shading Stress Test
Feb 3rd, 2003 Afrigraph 2003 92
OpenRT Example 6OpenRT Example 6Interactive Global IlluminationInteractive Global Illumination
Implementation: Not now…Implementation: Not now…
Feb 3rd, 2003 Afrigraph 2003 93
OpenRT Example 6OpenRT Example 6Interactive Global IlluminationInteractive Global Illumination
• Fully implemented in OpenRTFully implemented in OpenRT• GlobIllum Application is ‘Shader’ like any otherGlobIllum Application is ‘Shader’ like any other
– Automatically inherit capability for handline dynamic scenes, Automatically inherit capability for handline dynamic scenes, distribution, …distribution, …
• Same frontend as e.g. BART/OfficeSame frontend as e.g. BART/Office– Automatically inherit parser, user interface, etc…Automatically inherit parser, user interface, etc…
– Can be used from different applications (e.g. VRML viewer)Can be used from different applications (e.g. VRML viewer)
• Algorithms & Implementation: Later (Part III)Algorithms & Implementation: Later (Part III)
Questions ?Questions ?
For more info, also visitFor more info, also visit
http://www.OpenRT.dehttp://www.OpenRT.de
Part IIIPart III
New Applications enabled by New Applications enabled by Realtime Ray TracingRealtime Ray Tracing
For more information on OpenRT, seeFor more information on OpenRT, see
http://www.OpenRT.http://www.OpenRT.dede
Feb 3rd, 2003 Afrigraph 2003 97
The Saarland Interactive Ray The Saarland Interactive Ray Tracing ProjectTracing Project
• Started Jan 1Started Jan 1stst, 2000, 2000• (Original) Goal:(Original) Goal:
– Evaluate practicability of RT as an Interactive Rendering EngineEvaluate practicability of RT as an Interactive Rendering Engine
– Do a fair comparison and analysis of “RT vs GL”Do a fair comparison and analysis of “RT vs GL”• ““What are the advantages and disadvantages ?”What are the advantages and disadvantages ?”
• Compare on common ground: OpenGL like+shadows+reflectionsCompare on common ground: OpenGL like+shadows+reflections– No global illumination, no shading, no advanced features, …No global illumination, no shading, no advanced features, …
– And: Find out why is it so slow…And: Find out why is it so slow… Therefore, needed to build Fast Ray TracerTherefore, needed to build Fast Ray Tracer
Feb 3rd, 2003 Afrigraph 2003 98
The Saarland Interactive Ray The Saarland Interactive Ray Tracing ProjectTracing Project
• Goals have constantly changed since thenGoals have constantly changed since then– It worked, so continue working on it …It worked, so continue working on it …
– One CPU not fast enough, so distribute …One CPU not fast enough, so distribute …
– Great for many triangles, so work on Great for many triangles, so work on really really large models …large models …
– People demand high quality, build full-featured ray tracer …People demand high quality, build full-featured ray tracer …
– If it’s good in Software, why not build it in hardware …If it’s good in Software, why not build it in hardware …
– Static scenes too limiting, make it dynamic …Static scenes too limiting, make it dynamic …
– Others want to use it, so build an API …Others want to use it, so build an API …
– And, if we have it anyway, why not do global illumination …And, if we have it anyway, why not do global illumination …
– ……
Feb 3rd, 2003 Afrigraph 2003 99
Ray Tracing on Ray Tracing on Programmable GPUsProgrammable GPUs
• Application program relatively easy:Application program relatively easy:– Just render many screen-aligned quads with different fragment Just render many screen-aligned quads with different fragment
shadersshaders
• Need some way of ‘load balancing’Need some way of ‘load balancing’– Want to not execute ‘shade’ kernel if no rays is in shade stateWant to not execute ‘shade’ kernel if no rays is in shade state
• Important: Approach is Important: Approach is notnot SIMD SIMD– 1 Quad (=1 fragment program) for whole screen, 1 Quad (=1 fragment program) for whole screen, butbut
– Different rays can be in different statesDifferent rays can be in different statesDifferent pixels in fact behave differentlyDifferent pixels in fact behave differently
• No problem to already shade pixel 2 while still intersecting pixel 1…No problem to already shade pixel 2 while still intersecting pixel 1…
Feb 3rd, 2003 Afrigraph 2003 100