GS-4147, TressFX 2.0, by Bill-Bilodeau

Post on 05-Dec-2014

1,799 views 1 download

description

Presentation GS-4147 by Bill Bilodeau at the AMD Developer Summit (APU13) November 11-13, 2013

transcript

TressFX 2.0 AND BEYOND BILL BILODEAU, AMD

DONGSOO HAN, AMD

2 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

TressFX 2.0 AND BEYOND

TressFX Overview

TressFX Rendering

‒ TressFX 2.0 improvements

TressFX Physics

Future Work

AGENDA

TressFX OVERVIEW

4 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

TressFX OVERVIEW

Realistic hair rendering and simulation

‒ Used in Tomb Raider

Goes beyond simple shells and fins representation used in games

Hair is rendered as thousands of strands with self shadowing, antialiasing and transparency

Physical simulation for each strand using GPU compute shaders

Very flexible to allow for different hair styles and different conditions

WHAT IS TressFX?

5 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

TressFX RENDERING

What goes into good hair?

‒ Anti-aliasing

‒ Volumetric self shadowing

‒ Transparency

WHAT MAKES IT LOOK GOOD

Basic Rendering Antialiasing Antialiasing

+ Self Shadowing

Antialiasing

+ Self Shadowing

+ Transparency

TressFX RENDERING

7 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

TressFX RENDERING

Kajiya-Kay Hair Lighting Model

‒ Anisotropic hair strand lighting model

‒ Uses the tangent along the strand instead of the normal for light reflections

‒ Instead of cos(N, H) , use sin(T,H)

Marschner Model

‒ Two specular highlights

‒ Primary light colored highlight shifted towards the tip

‒ Secondary hair colored highlight shifted towards the root

‒ TressFX uses an approximation of the Marchner technique when rendering two highlights

LIGHTING MODEL

Primary Highlights

Secondary Highlights

8 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

TressFX RENDERING

Every hair strand is anti-aliased manually

‒ Not using Hardware MSAA!

Compute pixel coverage on edges of hair strands and convert it to an alpha value

ANTI-ALIASING

9 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

TressFX RENDERING

Self Shadowing

‒ Uses a simplified Deep Shadow Map technique

SELF SHADOWING

No Self Shadows With Self Shadows

10 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

TressFX RENDERING

Order Independent Transparency (OIT) using a Per-Pixel Linked Lists (PPLL)

Fragments are stored in link lists on the GPU

Nearest K fragments are rendered in back to front order

TRANSPARENCY

No Transparency With Transparency

11 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

TressFX 1.0 RENDERING

TressFX 1.0 Rendering

‒ Render hair strand geometry into A-buffer

‒ Do lighting, shadowing, and antialiasing

‒ Store fragment color with depth and coverage in per-pixel linked list (PPLL)

‒ Render the K nearest fragments (K-buffer) in back to front order

‒ Blend nearest K fragments in the correct order with transparency

‒ Blend the remaining fragments without sorting

How rendering was done in version 1.0

12 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

TressFX 1.0 RENDERING A-BUFFER PASS

Hair Geometry

Vertex Shader Pixel Shader

Head UAV

PPLL UAV

Coverage

Lighting

Shadows depth

color

coverage

next

13 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

TressFX 1.0 RENDERING

GPU implementation of order independent transparency (OIT)

Head UAV

‒ Each pixel location has a “head pointer” to a linked list in the PPLL UAV

PPLL UAV

‒ As new fragments are rendered, they are added to the next open location in the PPLL (using UAV counter)

‒ A link is created to the fragment pointed to by the head pointer

‒ Head pointer then points to the new fragment

PER-PIXEL LINKED LIST

Head UAV

PPLL UAV // Retrieve current pixel count and increase counter

uint uPixelCount = LinkedListUAV.IncrementCounter();

uint uOldStartOffset;

// Exchange indices in LinkedListHead texture corresponding to pixel location

InterlockedExchange(LinkedListHeadUAV[address], uPixelCount, uOldStartOffset);

// Append new element at the end of the Fragment and Link Buffer

Element.uNext = uOldStartOffset;

LinkedListUAV[uPixelCount] = Element;

14 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

TressFX 1.0 RENDERING K-BUFFER PASS

Full Screen Quad

Vertex Shader Pixel Shader

depth

color

coverage

depth

color

coverage

depth

color

coverage

depth

color

coverage

K-Buffer

Transparency

15 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

TressFX 1.0 RENDERING

Observation

‒ All fragments are lit and shadowed equally

‒ Even the ones buried under dozens of hair fragments that you can’t see

Solution

‒ Defer the lighting and shadowing until the k-buffer pass

‒ Render the nearest K fragments with high quality

‒ Render the remaining fragments with lower quality (but faster)

HOW CAN WE MAKE IT FASTER?

16 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

TressFX 2.0 RENDERING A-BUFFER PASS

Hair Geometry

Vertex Shader Pixel Shader

Coverage

depth

coverage

tangent

next

17 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

TressFX 2.0 RENDERING K-BUFFER PASS

Full Screen Quad

Vertex Shader Pixel Shader

K-Buffer

Lighting Shadows

depth

coverage

tangent

depth

coverage

tangent

depth

coverage

tangent

depth

coverage

tangent

Transparency

18 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

TressFX 2.0 IMPROVEMENTS

Distance to camera can be used for reducing the density of the hair ‒ Uniformly remove hair strands from the rendering

‒ To compensate for missing strands, thicken the hair

‒ Adjust the minimum pixel coverage with distance

CONTINUOUS LODs

Full Density Hair Reduced Density Hair Reduced Density with Thicker Strands

19 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

TressFX 2.0 IMPROVEMENTS

TressFX11 Sample Code is much more modular

All of the necessary TressFX code in separate files for

‒ Rendering

‒ Simulation

‒ Mesh management

‒ Asset loading

Code for head rendering and sample framework are completely separate

‒ Take the “TressFX” files to get just what you need

Better variable names

Removal of dead code

CODE RESTRUCTURING

Main

TressFXSimulate TressFXRender SceneRender

TressFXMesh

TressFXAssetLoader

TressFXSimulate TressFXRender SceneRender

Gaussian Filter

DX11Mesh

ObjImport TressFX Code

20 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

TressFX 2.0 IMPROVEMENTS

Vertex shader optimizations for rendering

‒ Draw call for hair now uses an index buffer with a triangle list instead of looking up indices from a buffer

PPLL head buffer uses a RWTexture2D for better caching (tiled)

Hair shadow on model is softer and less blocky

Various shader code optimizations

Porting Guide

Download the new TressFX 2.0 sample soon from our Radeon SDK :

http://developer.amd.com

MISCELLANEOUS IMPROVEMENTS

21 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

TressFX 2.0 RENDERING

A-Buffer

‒ 2 UAVs

‒ Size determined by resolution

‒ Head of the Linked List UAV

‒ Screen resolution RWTexture2D, DXGI_FORMAT_R32_UINT

‒ Per-Pixel Linked List UAV

‒ Structured Buffer, size = (number of pixels) x (avg hair layers) x (sizeof(LinkedListStructure))

‒ Default average number of hair layers is 8

‒ Linked list structure is currently 3 DWORDs: depth, coverage, tangent

Limited memory, but unbounded linked list

‒ This means too many fragments for a given pixel can overflow the PPLL

‒ Can cause artifacts

‒ Typically this only happens if the camera gets too close

MEMORY CONSIDERATIONS

0.00

50.00

100.00

150.00

200.00

250.00

Total A-BufferMemory (MB)

Linked List Head Per-Pixel Linked List

720p

1080p

22 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

TressFX 2.0 RENDERING PERFORMANCE RESULTS

0

0.5

1

1.5

2

2.5

Total Hair RenderTime (ms)

A-Buffer Pass K-Buffer Pass

TressFX 1.0

TressFX 2.0

0

0.5

1

1.5

2

2.5

3

Total Hair RenderTime (ms)

A-Buffer Pass K-Buffer Pass

TressFX 1.0

TressFX 2.0

R9 290x R9 280x

> 2X performance increase!

TressFX SIMULATION

24 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

TressFX 1.0 Simulation Overview

‒ Main Interest

‒ Simulation Overview

‒ Constraints

‒ Global shape constraints

‒ Local Shape Constraints

‒ Edge length constraints

‒ Problems

TressFX Beyond

‒ General Constraint Formulation

‒ Tridiagonal Matrix-free Formulation

‒ Solving Linear System

‒ Benefits

TressFX Simulation Topics

25 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

Main Interest

Main interest of TressFX simulation

‒ Performance, performance and performance! – DirectCompute

‒ Styled hair – bending and twisting forces are important

‒ Stability – position based dynamics

- Conditions – wet, dry or heavy

- Wind – helps express dynamics even the character in the idle mode

26 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

Simulation Overview

CPU

GPU – DirectCompute

load hair data

precompute rest-state values – can be offline

while simulation running do

apply gravity

integrate

apply GSC (Global Shape Constraints)

apply LSC (Local Shape Constraints)

apply wind

apply ELC (Edge Length Constraints)

collision handling

GPU – Rendering pipeline vertex buffer

27 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

GLOBAL SHAPE CONSTRAINTS

GSC(Global Shape Constraints)

‒ The initial positions of particles serve as the global goal positions

‒ The goal positions are rigid w.r.t character head transform.

‒ You can think the initial positions are some cage and vertices are trapped in that cage during simulation.

‒ Easy and cheap. Help maintain the global shape but lose the detailed simulation

initial goal position current position

final position

28 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

LOCAL SHAPE CONSTRAINTS

LSC(Local Shape Constraints)

‒ The goal positions are determined in the local frames.

‒ Still the goal positions are transformed in world frames and applied to vertex positions.

initial goal position

current position

final position

29 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

LOCAL SHAPE CONSTRAINTS – CONT’

Local Transforms

‒ As in robotic arm, an open-chain structure has joints and each joint has parent-child relationships to its connected joints.

‒ 𝑇 𝑖−1

𝑖 is to transform (translate and rotate) child space(i) to its parent space(i-1)

‒ With local transforms in chain structure, we can get a global transforms.

‒ Local frames should be updated at each particles

𝑇 𝑤

𝑖 = 𝑇 𝑤

0 ∙ 𝑇 0

1 … ∙ 𝑇 𝑖−2

𝑖−1 ∙ 𝑇 𝑖−1

𝑖

30 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

LOCAL SHAPE CONSTRAINTS – CONT’

Initialize and update local and global transforms

‒ Initialization is performed in CPU or offline only once.

‒ Update is performed at each frame in GPU.

‒ Update is serial process but independent to other strands. We update multiple strands in massive parallel processes in GPU.

‒ With local and global transforms, we can calculate target vertex positions for local shape constraints.

‒ Finally, update two neighboring vertices to get stable convergence.

i-1

i Computing on local transform

Updating position

Zero

31 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

EDGE LENGTH CONSTRAINTS

0.5

how much stretched or compressed unit edge vector

32 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

Problems

Extreme acceleration

‒ When character makes a sudden move, it can generate extreme linear and angular acceleration which stretch hair very long.

‒ Even with high iterations with Edge Length Constraints, hair doesn’t recover the original length and as a result, hair can look too stretchy.

‒ Possible solution was to enforce Edge Length Constraints in the serial fashion from the root to the end of hair with extra damping – used for Tomb Raider

‒ We need a better way! And we did research!

33 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

Problems EXTREME ACCELERATION

Future TressFX Simulation

35 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

General Constraint Formulation

36 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

Tridiagonal Matrix Formulation

Special Formulation for Chain Structure such as Hair

‒ We don’t want to solve a big matrix equation, especially in GPU!

‒ Let’s take advantage of linear topology and serial indexing

General case. We don’t want this!

Special case. Much simpler!

Known. Easy to compute them.

Unknown and what we are solving for

37 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

SOLVING LINEAR SYSTEM

Solving Linear System

‒ The formulation doesn’t require explicit matrix – Good for GPU!

‒ Diagonal, super and sub diagonal elements are non-zero - Sparse!

‒ The equation is diagonally dominant – Good for choice of direct solver!

‒ We can use tridiagonal matrix algorithm (Thomas algorithm)

‒ So we can solve it in GPU!

38 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

FUR CASTLE

39 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

FUR MUSHROOM

40 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

GRASS

41 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

BENEFITS

No more iterations for Edge Length Constraints

‒ Needn’t have to guess number of iterations

‒ Fixed computation cost

‒ Fast convergence

42 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

TressFX 2.0

TressFX 2.0 performance now makes hair rendering faster than the previous version

‒ More than 2X faster in some cases

TressFX is now fast enough to use on consoles

More modular code structure means easier porting to your game

Realistic physics for hair simulation can now be extended to other objects

Stay tuned for more!

‒ Ongoing research to improve and expand the use of this technology

CONCLUSIONS

43 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

REFERENCE

Real-time Hair Simulation with Efficient Hair Style Preservation – Han, et al. VRIPHYS 2012

Tridiagonal Matrix Formulation for Inextensible Hair Strand Simulation – Han, et al. VRIPHYS 2013

44 TressFX 2.0 and Beyond NOVEMBER 12, 2013 | AMD DEVELOPER SUMMIT

DISCLAIMER & ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION

© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners.