+ All Categories
Home > Technology > PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

Date post: 13-Jan-2015
Category:
Upload: amd-developer-central
View: 596 times
Download: 2 times
Share this document with a friend
Description:
Presentation PT-4055 by Tzachi Cohen at the AMD Developer Summit (APU13) November 11-13, 2013.
Popular Tags:
35
OPTIMIZING RAYTRACING ON GCN WITH AMD DEVELOPMENT TOOLS TZACHI COHEN NOVEMBER 2013
Transcript
Page 1: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

OPTIMIZING RAYTRACING ON GCN WITH AMD DEVELOPMENT TOOLS

TZACHI COHEN NOVEMBER 2013

Page 2: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

2 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

AGENDA

Overview of Raytracing & KD Trees

Review of GCN Architecture

Mapping Raytracing to GPUs

Optimizing Raytracing using CodeXL

Page 3: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

Overview Of Raytracing

Page 4: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

4 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

ACCELERATION STRUCTURES TRADE OFFS

Bounding Volume Hierarchies

KD Tree Uniform Grid

Construction Speed

Tracing Speed

Page 5: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

5 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

HIERARCHICAL KD TREE – 2D

B C

A

D E F G

A

B E

C

F

D G

Page 6: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

6 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

KD TREE – 3D

Page 7: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

7 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

STACK BASED TRAVERSAL KD TREE – 2D

B C

A

D E F

A

B E

C

F

D

tMin

tMax

t1

t2

t1

G

G

Page 8: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

8 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

TRAVERSING KD TREES – PSEUDO CODE

stack.push(KDroot,sceneMin,sceneMax)

tHit=infinity while !(stack.empty()): (node,tStart,tEnd)=stack.pop() while !(node.isLeaf()): tSplit = ( node.value - ray.origin[node.axis] ) / ray.direction[node.axis] (near, far) = findNear(ray.origin[node.axis], node.left, node.right) if( tSplit >= tEnd or tSplit < 0) node=near else if( tSplit <= tStart) node=second else stack.push( far, tSplit, tEnd) node=near tEnd=tSplit for prim in node.primitives(): tHit=min(tHit,prim.Intersect(ray)) if tHit<tEnd: return tHit return tHit

Page 9: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

GCN ARCHITECTURE

Page 10: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

10 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

First introduced with the “Southern Island” family of GPUs.

Is available with the upcoming “Kaveri” APU.

Scalar architecture.

ECC support. (with some models).

Double precision support.

Multiple concurrent queues for compute.

Page 11: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

11 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

GPU SCALAR ARCHITECTURE VS CPU SSE EXTENSIONS

Thread 1 Thread 2 Thread 3 Thread 4

float x;

X = x+1;

Thread 5 Thread 6 Thread 7 Thread 8

Thread 9 Thread 10 Thread 11 Thread 12

Thread 13 Thread 14 Thread 15 Thread 16

Scalar code does not utilize the SSE capabilities of the CPU.

Page 12: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

12 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

GCN

float x;

X = x+1;

HOW SCALAR CODE IS EXECUTED

T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16

Page 13: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

13 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

IMPLICATIONS FOR RAY TRACING

Ray Packetization – having a single thread trace several rays in one KD tree traverse to achieve better utilization of the SIMD and cache.

No explicit ray packetization is required on GCN.

The HW is implicitly packetizing every 64 threads. All 64 threads of a Wavefront

execute the same instruction together.

Page 14: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

14 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

A SEQUENCER FOR EVERY COMPUTE UNIT

Compute Unit

SQ

Compute Unit

SQ

Compute Unit

SQ

Compute Unit

SQ

A sequencer is a HW block responsible for issuing program instructions.

A compute unit can run up to 40 Wavefronts each with a distinct program counter.

GPU under-utilization due to long traversing rays may happen only on the Wavefront level.

Page 15: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

15 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

HOW MUCH ON CHIP MEMORY DO WE HAVE?

HD 7970 – “Tahiti”

256 KB VGPR per CU X 32 = 8.192 MB

8 KB SGPR per CU X 32 = 0.256 MB 16 KB L1 V-Data cache per CU X 32 = 0.512 MB 16 KB L1 S-Data cache per 4 CUs X 8 = 0.128 MB 32 KB instruction cache per 4 CUs X 8 = 0.256 MB L2 Data Cache = 768 KB LDS 64KB per CU X32 = 2.048 MB

Total : 12.16 MB

Page 16: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

16 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

AMD CODE XL

Coherent, innovative and unified developer tools suite

‒ Debug, Profile, and Analyze applications

‒ Support OpenCL™ and OpenGL.

‒ AMD CPUs, GPUs and APUs

‒ Standalone and integrated into Microsoft® Visual Studio®

‒ Supported on Windows® and Linux®

‒ Does not require source code modifications

Page 17: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

17 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

BE SURE YOUR KERNEL SIZE DOES NOT EXCEED INSTRUCTION CACHE SIZE

Page 18: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

Mapping Raytracing To GPUs

Page 19: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

19 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

HOW CAN A GPU TRAVERSE A TREE?

Node

Node Node

Node Node Node Node

Nest all the nodes on a buffer, wrap the buffer with CL mem object.

When using HSA we can leverage the unified memory architecture and access the tree as-is.

Page 20: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

20 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

HOW MUCH MEMORY DO WE NEED FOR THE STACK?

Per Wave front = Maximal Depth Of the Tree X size of frame X 64 .

25 X 12 X 64 = ~19 KB

Leads to GPR spilling to local memory or low scheduling utilization.

GPRs spilled to local memory are also known as Scratch Registers.

GPR spilling is decided upon by the OCL compiler on compile time.

Page 21: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

21 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

HOW TO DETECT SCRATCH REGISTERS USING CODEXL

Page 22: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

22 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

STACKLESS TRACE – RESTART TRAVERSAL

B C

A

D E F G

A

B E C

F

D G

tmin

t1

t2

t3

tMax

t3 tMax

t2 t3

t2 t1

t1 tMax

Page 23: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

23 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

KD RESTART ALGORITHM tStart=tEnd=sceneMin timeHit=infinity while (tEnd<sceneMax): node=root tStart=tEnd tEnd=sceneMax while (not node.isLeaf()): axis = node.axis tSplit = ( node.PlanePos - ray.origin[axis] ) / ray.direction[axis] (near, far) = findNear(ray.origin[axis], node.left, node.right) if( tSplit >= tEnd or tSplit <= 0) node=near else if( tSplit <= tStart) node=far else node=near tEnd=tSplit for prim in node.primitives(): timeHit=min(tHit,prim.Intersect(ray)) if timeHit<tEnd: return tHit return tHit

Page 24: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

24 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

EFFECT ON GPR SPILLAGE

Page 25: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

Demo

Page 26: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

Optimizing Raytracing using CodeXL

Page 27: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

27 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

CAN THIS BE FURTHER REFINED?

What on chip memory aren’t we using ?

LDS = Local Data Store.

Short Stack Algorithm – initialize a stack smaller than the maximum depth of the tree. If we overflow, fall back to KD-Restart algorithm.

If we place the short stack in the LDS, what should be

the depth of the “short stack”?

Page 28: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

28 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

HOW MANY WAVEFRONTS ARE EXECUTED CONCURRENTLY

Use CodeXL application trace to discover how many Wavefronts are executed concurrently with stackless traversal

Page 29: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

29 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

OCCUPANCY GRAPHS

Page 30: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

30 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

WHAT SHOULD BE THE SIZE OF THE SHORT STACK?

64 KB / 12 wavefronts / 64 threads / sizeof (Frame) = 7

Page 31: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

Demo

Page 32: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

32 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

RESULTS

60

70

80

90

100

110

120

Full stack stackless short stack Short stack onLDS

Results are in Million rays per second on Radeon™ HD 7970.

Page 33: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

33 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

Questions?

Page 34: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

34 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

DISCLAIMER & ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.

OpenCL™ is a trademark of Apple Inc. which is licensed to the Khronos organization. Linux™ is the trademark of Linus Torvalds.

Microsoft™ and Windows™ are the trademarks of Microsoft Corp. All other names used in this presentation are for

informational purposes only and may be trademarks of their respective owners.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION

© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.

Page 35: PT-4055, Optimizing Raytracing on GCN with AMD Development Tools, by Tzachi Cohen

35 | Optimizing Raytracing on GCN with AMD Development Tools | NOVEMBER 2013

REFERENCES

Introduction to GCN

‒ http://developer.amd.com/wordpress/media/2013/06/2620_final.pdf

GCN white paper

‒ http://www.amd.com/us/Documents/GCN_Architecture_whitepaper.pdf

CodeXL home page

‒ http://developer.amd.com/tools-and-sdks/heterogeneous-computing/codexl/

AMD OpenCL programmers guide

‒ http://developer.amd.com/download/AMD_Accelerated_Parallel_Processing_OpenCL_Programming_Guide.pdf


Recommended