+ All Categories
Home > Documents > B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Date post: 04-Feb-2016
Category:
Upload: bina
View: 55 times
Download: 0 times
Share this document with a friend
Description:
B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes. Sven Woop Gerd Marmitt Philipp Slusallek. Outline. Previous Work B-KD Tree as new Spatial Index Structure DynRT Architecture Traveral Processing Unit Update Processor Prototype Implementation Live Demo Conclusion. - PowerPoint PPT Presentation
Popular Tags:
72
Saarland University, Germany B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes Sven Woop Gerd Marmitt Philipp Slusallek
Transcript
Page 1: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Saarland University, Germany

B-KD Trees for Hardware Accelerated Ray Tracing of

Dynamic Scenes

B-KD Trees for Hardware Accelerated Ray Tracing of

Dynamic Scenes

Sven Woop Gerd Marmitt

Philipp Slusallek

Page 2: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

OutlineOutline

• Previous Work

• B-KD Tree as new Spatial Index Structure

• DynRT Architecture

• Traveral Processing Unit

• Update Processor

• Prototype Implementation

• Live Demo

• Conclusion

Page 3: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Previous WorkPrevious Work

• Ray Tracers for Static Scenes

• CPU based: [OpenRT], [MLRT SIGGRAPH05]

• GPU based: Purcell (Grids) [SIGGRAPH02], Foley et al. (KD Trees) [GH05]

• Custom Hardware: Commercial Hardware (ART-VPS) Schmittler (KD Trees) [GH04] RPU (KD Trees) [SIGGRAPH05]

• Ray Tracers for Dynamic Scenes

• CPU based: Wald (Grids) [SIGGRAPH06] Wald (AABVHs) [TOG / Tech. Rep. 2006]

• Custom Hardware: Woop (B-KD Trees) [GH06]

Page 4: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Definition of B-KD TreesDefinition of B-KD TreesB-KD Tree (Bounded KD-Tree)

• Binary Tree

• 1D bounding intervalls for each child

• Leaf nodes point to a single primitive

split axisT1

T T

T 0

10

Page 5: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

B-KD Tree SubdivisionB-KD Tree Subdivision• Bounding Volume Hierarchy (partially unbounded)

• Each node can be associated with a full bounding box

• Bounds may overlap

Primitives in single leaf nodes

More traversal steps as for KD Tree

Support for dynamic scenes

T T10

T01T00 T11T10

T1T 0

10T

11T

00T

01T

Page 6: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

B-KD Tree SubdivisionB-KD Tree Subdivision• Bounding Volume Hierarchy (partially unbounded)

• Each node can be associated with a full bounding box

• Bounds may overlap

Primitives in single leaf nodes

More traversal steps as for KD Tree

Support for dynamic scenes

T T10

T01T00 T11T10

T1T 0

10T

11T

00T

01T

Page 7: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

B-KD Tree SubdivisionB-KD Tree Subdivision• Bounding Volume Hierarchy (partially unbounded)

• Each node can be associated with a full bounding box

• Bounds may overlap

Primitives in single leaf nodes

More traversal steps as for KD Tree

Support for dynamic scenes

T T10

T01T00 T11T10

T1T 0

10T

11T

00T

01T

Page 8: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

B-KD Tree SubdivisionB-KD Tree Subdivision• Bounding Volume Hierarchy (partially unbounded)

• Each node can be associated with a full bounding box

• Bounds may overlap

Primitives in single leaf nodes

More traversal steps as for KD Tree

Support for dynamic scenes

T T10

T01T00 T11T10

T1T 0

10T

11T

00T

01T

Page 9: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

B-KD Tree SubdivisionB-KD Tree Subdivision• Bounding Volume Hierarchy (partially unbounded)

• Each node can be associated with a full bounding box

• Bounds may overlap

Primitives in single leaf nodes

More traversal steps as for KD Tree

Support for dynamic scenes

T T10

T01T00 T11T10

T1T 0

10T

11T

00T

01T

Page 10: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

B-KD Tree ConstructionB-KD Tree Construction

• If #primitives > 1 then

Page 11: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

B-KD Tree ConstructionB-KD Tree Construction

• If #primitives > 1 then

• Compute center of mass

Page 12: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

B-KD Tree ConstructionB-KD Tree Construction

• If #primitives > 1 then

• Compute center of mass

• Spatial Median

• Object Median

Page 13: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

B-KD Tree ConstructionB-KD Tree Construction

• If #primitives > 1 then

• Compute center of mass

• Spatial Median

• Object Median

Page 14: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

B-KD Tree ConstructionB-KD Tree Construction

• If #primitives > 1 then

• Compute center of mass

• Sort geometry along all three dimensions

Page 15: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

B-KD Tree ConstructionB-KD Tree Construction

• If #primitives > 1 then

• Compute center of mass

• Sort geometry along all three dimensions

• Partitionings can be determined by splitting a list at a position

Page 16: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

B-KD Tree ConstructionB-KD Tree Construction

• If #primitives > 1 then

• Compute center of mass

• Sort geometry along all three dimensions

• Partitionings can be determined by splitting a list at a position

• Build all possible partitionings in all three dimensions

Page 17: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

B-KD Tree ConstructionB-KD Tree Construction

• If #primitives > 1 then

• Compute center of mass

• Sort geometry along all three dimensions

• Partitionings can be determined by splitting a list at a position

• Build all possible partitionings in all three dimensions

• Find the partitioning with smallest SAH cost

Page 18: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

B-KD Tree ConstructionB-KD Tree Construction

• If #primitives > 1 then

• Compute center of mass

• Sort geometry along all three dimensions

• Partitionings can be determined by splitting a list at a position

• Build all possible partitionings in all three dimensions

• Find the partitioning with smallest SAH cost

• Create node and recurse

Page 19: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

B-KD Tree ConstructionB-KD Tree Construction

• If #primitives > 1 then

• Compute center of mass

• Sort geometry along all three dimensions

• Partitionings can be determined by splitting a list at a position

• Build all possible partitionings in all three dimensions

• Find the partitioning with smallest SAH cost

• Create node and recurse

• Else if #primitives = 1 then

• Create leaf node

Page 20: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

B-KD Tree ConstructionB-KD Tree Construction

• Rendering Performance

• 20% to 100% better than center splitting approaches

• Two-level B-KD Trees

• Top-level B-KD tree over object instances

• Bottom-level B-KD tree for each object

Page 21: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

B-KD Trees for Dynamic ScenesB-KD Trees for Dynamic Scenes

• On changed object geometry

• B-KD tree bounds are updated from bottom up

• B-KD tree structure remains constant

Linear updating complexity

Page 22: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

ExamplesExamples

• Bounding approaches perform well for

• Continous motion

• Structure of motion must match tree structure

• E.g. skinned meshes, characters, water surfaces, ...

Page 23: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

ExamplesExamples

• Bounding approaches perform well for

• Continous motion

• Structure of motion must match tree structure

• E.g. skinned meshes, characters, water surfaces, ...

Page 24: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

ExamplesExamples

• Bounding approaches perform well for

• Continous motion

• Structure of motion must match tree structure

• E.g. skinned meshes, characters, water surfaces, ...

Page 25: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

ExamplesExamples

• Bounding approaches perform well for

• Continous motion

• Structure of motion must match tree structure

• E.g. skinned meshes, characters, water surfaces, ...

Page 26: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

ExamplesExamples

• Bounding approaches perform well for

• Continous motion

• Structure of motion must match tree structure

• E.g. skinned meshes, characters, water surfaces, ...

Page 27: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

ExamplesExamples

• Bounding approaches perform well for

• Continous motion

• Structure of motion must match tree structure

• E.g. skinned meshes, characters, water surfaces, ...

Page 28: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

ExamplesExamples

• Bounding volume approaches are less efficient for

• Non-continous motion

• Structure of motion does not match tree structure

• High traversal cost due to large overlapping boxes

Page 29: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

ExamplesExamples

• Bounding volume approaches fail for

• Non-continous motion

• Structure of motion does not match tree structure

• High traversal cost due to large overlapping boxes

Page 30: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

ExamplesExamples

• Bounding volume approaches fail for

• Non-continous motion

• Structure of motion does not match tree structure

• High traversal cost due to large overlapping boxes

Page 31: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

ExamplesExamples

• Bounding volume approaches fail for

• Non-continous motion

• Structure of motion does not match tree structure

• High traversal cost due to large overlapping boxes

Page 32: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

ExamplesExamples

• Bounding volume approaches fail for

• Non-continous motion

• Structure of motion does not match tree structure

• High traversal cost due to large overlapping boxes

Page 33: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Comparison for Gael Scene Comparison for Gael Scene

52k triangles

Index type Index size # trav-cost # tri-ints

KD 1.4 MB 31 4.8

B-KD 1.1 MB 116 6.8

AABVH 2.2 MB 253 5.3

KD tree B-KD tree AABVH

Page 34: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

DynRT ArchitectureDynRT Architecture

• Extension of RPU approach

TraversalProcessing

Unit

Geometry Unit

Node Cache128 Bit wide

Vertex Cache128 Bit wide

from memory

Shading Unit

to framebuffer

from memory

Shader Cache128 Bit wide

from memory

Skinning Processor

instructions from memory

nodes tomemory

instructions from memory

vertices tomemory

Update Processor

vertices from memory

Page 35: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

DynRT ArchitectureDynRT Architecture

• Rendering Units

• Highly multi-threaded

• Higher hardware usage

• Synchronous execution of packets of 4 rays

• Memory bandwidth reduction

• First level caches

• Memory bandwidth reduction

vertices from memory

TraversalProcessing

Unit

Geometry Unit

Node Cache128 Bit wide

Vertex Cache128 Bit wide

from memory

Shading Unit

to framebuffer

from memory

Shader Cache128 Bit wide

from memory

Skinning Processor

instructions from memory

nodes tomemory

instructions from memory

vertices tomemory

Update Processor

Page 36: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

DynRT ArchitectureDynRT Architecture

• Programmable Shading Unit

• Similar to RPU shading processor

• Ray generation tasks

• Material shading

• Calls Ray Casting Units to cast rays

vertices from memory

TraversalProcessing

Unit

Geometry Unit

Node Cache128 Bit wide

Vertex Cache128 Bit wide

from memory

Shading Unit

to framebuffer

from memory

Shader Cache128 Bit wide

from memory

Skinning Processor

instructions from memory

nodes tomemory

instructions from memory

vertices tomemory

Update Processor

Page 37: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

DynRT ArchitectureDynRT Architecture

• Programmable Shading Unit

• Ray Casting Units

vertices from memory

TraversalProcessing

Unit

Geometry Unit

Node Cache128 Bit wide

Vertex Cache128 Bit wide

from memory

Shading Unit

to framebuffer

from memory

Shader Cache128 Bit wide

from memory

Skinning Processor

instructions from memory

nodes tomemory

instructions from memory

vertices tomemory

Update Processor

Page 38: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

DynRT ArchitectureDynRT Architecture

• Programmable Shading Unit

• Ray Casting Units

• Traversal Processing Unit

• Efficient traversal of B-KD trees

• Two level B-KD trees supported

vertices from memory

TraversalProcessing

Unit

Geometry Unit

Node Cache128 Bit wide

Vertex Cache128 Bit wide

from memory

Shading Unit

to framebuffer

from memory

Shader Cache128 Bit wide

from memory

Skinning Processor

instructions from memory

nodes tomemory

instructions from memory

vertices tomemory

Update Processor

Page 39: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

DynRT ArchitectureDynRT Architecture

• Programmable Shading Unit

• Ray Casting Units

• Traversal Processing Unit

• Efficient traversal of B-KD trees

• Two level B-KD trees supported

• Geometry Unit

• Ray transformations

• Vertex-based ray/triangle intersection [Möller Trumbore]

• Shared vertices save memory 6x

vertices from memory

TraversalProcessing

Unit

Geometry Unit

Node Cache128 Bit wide

Vertex Cache128 Bit wide

from memory

Shading Unit

to framebuffer

from memory

Shader Cache128 Bit wide

from memory

Skinning Processor

instructions from memory

nodes tomemory

instructions from memory

vertices tomemory

Update Processor

Page 40: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

DynRT ArchitectureDynRT Architecture

• Programmable Shading Unit

• Ray Casting Units

• Scene Changes

• Skinning Processor (see paper)

• Skeleton Subspace Deformation

• Re-uses Geometry Unit

• Pure stream architecture

vertices from memory

TraversalProcessing

Unit

Geometry Unit

Node Cache128 Bit wide

Vertex Cache128 Bit wide

from memory

Shading Unit

to framebuffer

from memory

Shader Cache128 Bit wide

from memory

Skinning Processor

instructions from memory

nodes tomemory

instructions from memory

vertices tomemory

Update Processor

Page 41: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

DynRT ArchitectureDynRT Architecture

• Programmable Shading Unit

• Ray Casting Units

• Scene Changes

• Skinning Processor (see paper)

• Skeleton Subspace Deformation

• Re-uses Geometry Unit

• Pure stream architecture

• Update Processor

• Stream-like architecture

• Partial breadth-first execution

• One B-KD node update per clock cycle peak

vertices from memory

TraversalProcessing

Unit

Geometry Unit

Node Cache128 Bit wide

Vertex Cache128 Bit wide

from memory

Shading Unit

to framebuffer

from memory

Shader Cache128 Bit wide

from memory

Skinning Processor

instructions from memory

nodes tomemory

instructions from memory

vertices tomemory

Update Processor

Page 42: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

vertices from memory

TraversalProcessing

Unit

Geometry Unit

Node Cache128 Bit wide

Vertex Cache128 Bit wide

from memory

Shading Unit

to framebuffer

from memory

Shader Cache128 Bit wide

from memory

Skinning Processor

instructions from memory

nodes tomemory

instructions from memory

vertices tomemory

Update Processor

Page 43: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Traversal of B-KD TreesTraversal of B-KD Trees

Traversal of B-KD Trees

• Early ray termination

• Clipping of near/far interval against both bounding intervalls

• Take closer child, push farther child to stack

• Traversal order does not affect correctness

Complexity

• 4x computational cost of KD tree traversal step

• 2x stack memory

near

I

R

Tcloser ch ild

Tfarther ch ild

far

I 1 0

10

Page 44: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Traversal Processing UnitTraversal Processing Unit

• Stack control computes next address

Node Cache128 Bit wide

from memory

Stack Control Unit

Memory Access Unit

Slice 0 Slice 1 Slice 2 Slice 3

Decide0 Decide1 Decide2 Decide3

Packet Decision Unit

to Geometry Unit

start traversal

finished if stack empty

Page 45: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Traversal Processing UnitTraversal Processing Unit

• Stack control computes next address

• Next node is fetched from cache

Node Cache128 Bit wide

from memory

Stack Control Unit

Memory Access Unit

Slice 0 Slice 1 Slice 2 Slice 3

Decide0 Decide1 Decide2 Decide3

Packet Decision Unit

to Geometry Unit

start traversal

finished if stack empty

Page 46: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Traversal Processing UnitTraversal Processing Unit

• Stack control computes next address

• Next node is fetched from cache

• 4 traversal slices compute 4x4 distances to bounding planes

Node Cache128 Bit wide

from memory

Stack Control Unit

Memory Access Unit

Slice 0 Slice 1 Slice 2 Slice 3

Decide0 Decide1 Decide2 Decide3

Packet Decision Unit

to Geometry Unit

start traversal

finished if stack empty

Page 47: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Traversal Processing UnitTraversal Processing Unit

• Stack control computes next address

• Next node is fetched from cache

• 4 traversal slices compute 4x4 distances to bounding planes

• 4 Decision Units compute per ray traversal decision

Node Cache128 Bit wide

from memory

Stack Control Unit

Memory Access Unit

Slice 0 Slice 1 Slice 2 Slice 3

Decide0 Decide1 Decide2 Decide3

Packet Decision Unit

to Geometry Unit

start traversal

finished if stack empty

Page 48: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Traversal Processing UnitTraversal Processing Unit

• Stack control computes next address

• Next node is fetched from cache

• 4 traversal slices compute 4x4 distances to bounding planes

• 4 Decision Units compute per ray traversal decision

• Packet Decision Unit computes packet traversal decision

• Packet goes left if exists a that ray goes left

• Packet goes right if exists a ray that goes right

• Packet goes from left to right if exists a ray that goes into both children from left to right

Node Cache128 Bit wide

from memory

Stack Control Unit

Memory Access Unit

Slice 0 Slice 1 Slice 2 Slice 3

Decide0 Decide1 Decide2 Decide3

Packet Decision Unit

to Geometry Unit

start traversal

finished if stack empty

Page 49: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Traversal Processing UnitTraversal Processing Unit

• Stack control computes next address

• Next node is fetched from cache

• 4 traversal slices compute 4x4 distances to bounding planes

• 4 Decision Units compute per ray traversal decision

• Packet Decision Unit computes packet traversal decision

• Packet goes left if exists a that ray goes left

• Packet goes right if exists a ray that goes right

• Packet goes from left to right if exists a ray that goes into both children from left to right

Incoherent packets possible

Node Cache128 Bit wide

from memory

Stack Control Unit

Memory Access Unit

Slice 0 Slice 1 Slice 2 Slice 3

Decide0 Decide1 Decide2 Decide3

Packet Decision Unit

to Geometry Unit

start traversal

finished if stack empty

Page 50: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

vertices from memory

TraversalProcessing

Unit

Geometry Unit

Node Cache128 Bit wide

Vertex Cache128 Bit wide

from memory

Shading Unit

to framebuffer

from memory

Shader Cache128 Bit wide

from memory

Skinning Processor

instructions from memory

nodes tomemory

instructions from memory

vertices tomemory

Update Processor

Page 51: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Update of B-KD TreesUpdate of B-KD Trees

• Leaf Node

x

y

leaf leaf leaf

Page 52: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Update of B-KD TreesUpdate of B-KD Trees

• Leaf Node

• Fetch vertices

x

y

leaf leaf leaf

Page 53: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Update of B-KD TreesUpdate of B-KD Trees

• Leaf Node

• Fetch vertices

• Compute leaf boxes x

y

leaf leaf leaf

Page 54: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Update of B-KD TreesUpdate of B-KD Trees

• Leaf Node

• Fetch vertices

• Compute leaf boxes

• Inner Node

• Update 1D node bounds

x

y

leaf leaf leaf

Page 55: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Update of B-KD TreesUpdate of B-KD Trees

• Leaf Node

• Fetch vertices

• Compute leaf boxes

• Inner Node

• Update 1D node bounds

• Merge boxes of both children

x

y

leaf leaf leaf

Page 56: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Update of B-KD TreesUpdate of B-KD Trees

• Leaf Node

• Fetch vertices

• Compute leaf boxes

• Inner Node

• Update 1D node bounds

• Merge boxes of both children

y

leaf leaf leaf

x

Page 57: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Update of B-KD TreesUpdate of B-KD Trees

• Leaf Node

• Fetch vertices

• Compute leaf boxes

• Inner Node

• Update 1D node bounds

• Merge boxes of both children

y

leaf leaf leaf

x

Page 58: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Update ProcessorUpdate Processor

• ¼ more memory for instructions

• Optimized Instruction Set

• Load vertex

• Merge 3 vertices to a box

• Merge 2 boxes (plus update node)

• 64 Vertex and 64 Box Registers

• Optimal re-use of data

• Stream Based

• Reads one instruction stream

• Writes a sequential node stream

• Vertices are accessed as sequential as possible

InstructionFetch

from memory

Fetch Vertex Unit

from memory

Merge unitMerge Boxes

Merge Vertices

Box Writeback

Vertex writeback

Register Access Read 2 Boxes

Read 3 Vertices

Node Update

nodesto memory

child bounds

vertex addressvertex destination

4x32 Bit

instruction

merged box

fetched vertex

128 Bit

128 Bit

InstructionScheduler

Page 59: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Update ProcessorUpdate Processor

• ¼ more memory for instructions

• Optimized Instruction Set

• Load vertex

• Merge 3 vertices to a box

• Merge 2 boxes (plus update node)

• 64 Vertex and 64 Box Registers

• Optimal re-use of data

• Stream Based

• Reads one instruction stream

• Writes a sequential node stream

• Vertices are accessed as sequential as possible

InstructionFetch

from memory

Fetch Vertex Unit

from memory

Merge unitMerge Boxes

Merge Vertices

Box Writeback

Vertex writeback

Register Access Read 2 Boxes

Read 3 Vertices

Node Update

nodesto memory

child bounds

vertex addressvertex destination

4x32 Bit

instruction

merged box

fetched vertex

128 Bit

128 Bit

InstructionScheduler

Page 60: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Update ProcessorUpdate Processor

• ¼ more memory for instructions

• Optimized Instruction Set

• Load vertex

• Merge 3 vertices to a box

• Merge 2 boxes (plus update node)

• 64 Vertex and 64 Box Registers

• Optimal re-use of data

• Stream Based

• Reads one instruction stream

• Writes a sequential node stream

• Vertices are accessed as sequential as possible

InstructionFetch

from memory

Fetch Vertex Unit

from memory

Merge unitMerge Boxes

Merge Vertices

Box Writeback

Vertex writeback

Register Access Read 2 Boxes

Read 3 Vertices

Node Update

nodesto memory

child bounds

vertex addressvertex destination

4x32 Bit

instruction

merged box

fetched vertex

128 Bit

128 Bit

InstructionScheduler

Page 61: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Update ProcessorUpdate Processor

• ¼ more memory for instructions

• Optimized Instruction Set

• Load vertex

• Merge 3 vertices to a box

• Merge 2 boxes (plus update node)

• 64 Vertex and 64 Box Registers

• Optimal re-use of data

• Stream Based

• Reads one instruction stream

• Writes a sequential node stream

• Vertices are accessed as sequential as possible

InstructionFetch

from memory

Fetch Vertex Unit

from memory

Merge unitMerge Boxes

Merge Vertices

Box Writeback

Vertex writeback

Register Access Read 2 Boxes

Read 3 Vertices

Node Update

nodesto memory

child bounds

vertex addressvertex destination

4x32 Bit

instruction

merged box

fetched vertex

128 Bit

128 Bit

InstructionScheduler

Page 62: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Update ProcessorUpdate Processor

• ¼ more memory for instructions

• Optimized Instruction Set

• Load vertex

• Merge 3 vertices to a box

• Merge 2 boxes (plus update node)

• 64 Vertex and 64 Box Registers

• Optimal re-use of data

• Stream Based

• Reads one instruction stream

• Writes a sequential node stream

• Vertices are accessed as sequential as possible

InstructionFetch

from memory

Fetch Vertex Unit

from memory

Merge unitMerge Boxes

Merge Vertices

Box Writeback

Vertex writeback

Register Access Read 2 Boxes

Read 3 Vertices

Node Update

nodesto memory

child bounds

vertex addressvertex destination

4x32 Bit

instruction

merged box

fetched vertex

128 Bit

128 Bit

InstructionScheduler

Page 63: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Update ProcessorUpdate Processor

• ¼ more memory for instructions

• Optimized Instruction Set

• Load vertex

• Merge 3 vertices to a box

• Merge 2 boxes (plus update node)

• 64 Vertex and 64 Box Registers

• Optimal re-use of data

• Stream Based

• Reads one instruction stream

• Writes a sequential node stream

• Vertices are accessed as sequential as possible

InstructionFetch

from memory

Fetch Vertex Unit

from memory

Merge unitMerge Boxes

Merge Vertices

Box Writeback

Vertex writeback

Register Access Read 2 Boxes

Read 3 Vertices

Node Update

nodesto memory

child bounds

vertex addressvertex destination

4x32 Bit

instruction

merged box

fetched vertex

128 Bit

128 Bit

InstructionScheduler

Page 64: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Update ProcessorUpdate Processor

• ¼ more memory for instructions

• Optimized Instruction Set

• Load vertex

• Merge 3 vertices to a box

• Merge 2 boxes (plus update node)

• 64 Vertex and 64 Box Registers

• Optimal re-use of data

• Stream Based

• Reads one instruction stream

• Writes a sequential node stream

• Vertices are accessed as sequential as possible

InstructionFetch

from memory

Fetch Vertex Unit

from memory

Merge unitMerge Boxes

Merge Vertices

Box Writeback

Vertex writeback

Register Access Read 2 Boxes

Read 3 Vertices

Node Update

nodesto memory

child bounds

vertex addressvertex destination

4x32 Bit

instruction

merged box

fetched vertex

128 Bit

128 Bit

InstructionScheduler

Page 65: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Update ProcessorUpdate Processor

• ¼ more memory for instructions

• Optimized Instruction Set

• Load vertex

• Merge 3 vertices to a box

• Merge 2 boxes (plus update node)

• 64 Vertex and 64 Box Registers

• Optimal re-use of data

• Stream Based

• Reads one instruction stream

• Writes a sequential node stream

• Vertices are accessed as sequential as possible

InstructionFetch

from memory

Fetch Vertex Unit

from memory

Merge unitMerge Boxes

Merge Vertices

Box Writeback

Vertex writeback

Register Access Read 2 Boxes

Read 3 Vertices

Node Update

nodesto memory

child bounds

vertex addressvertex destination

4x32 Bit

instruction

merged box

fetched vertex

128 Bit

128 Bit

InstructionScheduler

Page 66: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Update ProcessorUpdate Processor

• ¼ more memory for instructions

• Optimized Instruction Set

• Load vertex

• Merge 3 vertices to a box

• Merge 2 boxes (plus update node)

• 64 Vertex and 64 Box Registers

• Optimal re-use of data

• Stream Based

• Reads one instruction stream

• Writes a sequential node stream

• Vertices are accessed as sequential as possible

InstructionFetch

from memory

Fetch Vertex Unit

from memory

Merge unitMerge Boxes

Merge Vertices

Box Writeback

Vertex writeback

Register Access Read 2 Boxes

Read 3 Vertices

Node Update

nodesto memory

child bounds

vertex addressvertex destination

4x32 Bit

instruction

merged box

fetched vertex

128 Bit

128 Bit

InstructionScheduler

Page 67: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Update ProcessorUpdate Processor

• ¼ more memory for instructions

• Optimized Instruction Set

• Load vertex

• Merge 3 vertices to a box

• Merge 2 boxes (plus update node)

• 64 Vertex and 64 Box Registers

• Optimal re-use of data

• Stream Based

• Reads one instruction stream

• Writes a sequential node stream

• Vertices are accessed as sequential as possible

InstructionFetch

from memory

Fetch Vertex Unit

from memory

Merge unitMerge Boxes

Merge Vertices

Box Writeback

Vertex writeback

Register Access Read 2 Boxes

Read 3 Vertices

Node Update

nodesto memory

child bounds

vertex addressvertex destination

4x32 Bit

instruction

merged box

fetched vertex

128 Bit

128 Bit

InstructionScheduler

Page 68: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Prototype ImplementationPrototype Implementation

Hardware

• FPGA board from Alpha Data

• Xilinx Virtex4 LX160

• 128 MB DDR Memory

Implementation

• Packets of 4 rays

• 32 packets of rays

• 24 bit floating point

• 66 MHz

Virtex4 Board

Page 69: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

ResultsResults

Update Performance

• 66 million B-KD tree node updates

• 200 updates per second for characters with 80k triangles

• 1 to 15.0 % of rendering time

Ray Casting Performance

• 2 to 8 million rays per second

• 10 to 40 fps at 512x386

Page 70: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Conclusions and Future WorkConclusions and Future Work

• Ray Tracing Hardware Design

• Efficient for coherent dynamic scenes

• Less efficient for non-continous scene changes

• Working Prototype Implementation

• Even FPGA achieves high performance

• 2x - 3x OpenRT on Pentium 4 2,6 GHz

• Post layout ASIC Results [RT06]

• 90nm, 400 MHz, 200mm^2, 19.5 GB/s

• Performs up to 40x faster (80-200 fps at 1024x768)

Page 71: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Live DemoLive Demo

Page 72: B-KD Trees for Hardware Accelerated Ray Tracing of Dynamic Scenes

Questions?Questions?

• Project Homepage:http://www.saarcor.de

• Computer Graphics Lab at Saarland University:http://graphics.cs.uni-sb.de


Recommended