Slusallek-SIG2011course-AnySL

AnySLEfficient and Portable Multi-Language Shading

Philipp SlusallekSebastian Hack, Ralf Karrenberg, Dmitri Rubinstein

German Research Center for Artificial Intelligence (DFKI)Intel Visual Computing Institute

Saarland University

Monday, August 15, 2011

Saarbrücken


Saarland Campus


Computer Scienceat the Saarland Campus











MultimodalComputingandInteraction



MultimodalComputingandInteraction


Shaders● Programmable Shading

– Allows for controlling core rendering features● Material properties, light emission, participating media, …

– Today: Many different shading languages● HLSL, glsl, Cg, RenderMan, MetaSL, OSL, OpenRL,

many C++ dialects, …● Mostly the same features, expressed differently

– We need a portable way to exchange materials● Common specification of shading features● Ease implementation for different renderers and HW

● Here: Efficient and Portable Implementation


Shaders• A plug-in for the innermost loops

– From one-liners to thousands of lines of code– Run for every new ray, surface hit, light sample, …

● Sometimes, once for every MADD along ray● Efficient implementation

– Low overhead interface to renderer● Ideally works directly on internal data structures

– Highly optimized code for specific HW architectures● Use of SIMD (SSE, AVX, PTX, …)


Implementation Choices

Data

Code

Renderer

GlueCode

C/C++API

/ABI

C/C++API

/ABI

C++Shader

Shader DSO/DLL

● Shaders code in C++– API specifies interface to renderer– Separate C/C++ compilation to DLL/DSO– API gets mapped directly to platform specific ABI

● Predefined data layout, function call overhead● No optimization options in interface



Data

Code

Renderer

Gen.API/ABI

Gen.API/ABI

Shader DSO/DLL

● Using a Shading Language Compiler– Compiler can transform and optimize shader code

● E.g. use of renderer internal APIs: No glue code● Transform shaders to SIMD

– Requires renderer and language specific compiler● Most renders support only one shading language

– Renderer-specific code gets embedded in result

SLShader



● AnySL: Embedded SL Compiler– Any language compiled into portable format– Types, data layout, interface not fixed yet– Renderer supplies implementations at runtime– Embedded compiler links and optimizes code

Data

Code

Renderer

GlueCode

Compiler (LLVM)

Data

Code

Renderer

OptimizedShader

Compiler (LLVM)

API

SLShader


AnySL: Portable Shading• “Any” Shading Language Supported

• Currently: RenderMan, C++ dialects, Javascript, …• Common Intermediate Format

• Independent of renderer and HW architecture• Easy Implementation by Renderer

• Need only supply the glue code• Different Backends

• Ray Tracing: PBRT, Manta, RTfact, …• Rasterization: Deferred shading (with RTT)• HW: x86, SSE, AVX, PTX, OpenCL, glsl, …


AnySL & XML3D: Interactive RenderMan in Your Web Browser


AnySL

Implementation


AnySL: Implementation Designing an Interpreter: Options

− Many OP-codes with large switch() statement− Replace OP-codes with function calls

“Subroutine Threaded Code”− Long list of function calls

Even for control flow (“if”) and types (allocate a “float”)− Nice for portability, implementations can be replaced

E.g.: use predication for “if” or substitute own “float” type− Can be directly encoded in compiled code

Use LLVM bitcode for representation → Efficiency


Subroutine Threaded Code

Conversion to Threaded Code

Its implementation(supplied by renderer)

Handling control flow: RM illuminace loop

Mapping to Threaded Code

Possible implementation(supplied by renderer)

Original shader code


But Interpreters are Slow?!? STC is used for portable representation only

− Eliminated at runtime with embedded compiler “Type Replacement”

− Substitute own types and operators− Inline all interpreter calls− Perform all usual scalar optimization

Can be used for special shader functionality− Taking derivatives of arbitrary expressions− Bounding the result of shader over intervals

− E.g. using Affine Arithmetic [Heidrich et al., 1998]


How it All Fits Together


Special Functionality Derivatives of arbitrary expressions

− Implemented through “Automatic Differentiation” Each type stores and maintains (2) derivatives Each operation updates value and derivatives Input provides initial derivatives (e.g. w.r.t screen space)

Bounding the value of a shader over interval− Implemented through Interval or Affine Arithmetic

Each type stores and maintain value plus interval− AA: plus terms for linear dependencies on (all) input values

Each operation updates value and derivatives Input provides initial interval (e.g. w.r.t parameter space)

All maps nicely to Type Replacement


ResultsAutomatic differentiation for anti-aliasing

Point sampling Analytic AA: Blend to average near Nyquist


Optimization:Packet-Based Shading Modern ray tracers shoot packets of rays

Exploit SIMD instructions of modern CPUs− Can execute instruction on k ≤ n floats at once− Current architectures:

SSE (4), AVX (8), KNF (16), GPU (32) Shader function has to shade n hit points at once


AnySL:Packetized Shaders Writing packetized shaders is REALLY HARD

− Not an option for any application You may not want to do this by hand:


AnySL:Packetized Shaders Given:

− A shader is given by a control-flow graph of scalar instructions

Needed:− A packetized shader is a new shader that executes k

instances of the original shader at once Control flow of instances can diverge!


Main Issues: Control Flow Diverging control flow of a shader

− Need to efficiently merge flows again!

Shaders are nested in a deep recursion− Must handle closures and reordering of packets


Packetized Shaders Approach

− Program transformation− Flatten control flow− Every instance executes

all instructions− Mask out wrong results− Loops are iterated until

last instance is done− Already exited instances

are invalidated− Simulate what GPUs do in HW


AnySL:Dealing With Data Divergence SSE has no gather/scatter support

− Data must be in multiple of four and properly aligned Need to resort to serial load/store

− Extract individual values from SSE vector− Load/Store − Merge/blend results back into SSE register− Very expensive (lots of dependencies)

Calling non-packetized functions− Essentially, the same as scatter/gather− E.g. hand-crafted SSE noise() function


Packetized Shader Results

Packet size of 4 (SSE)− Completely automated (LLVM)− Shaders are packetized automatically− On average 3.2x speedup

for complete rendering− Not specific to graphics− Can be used wherever

data parallelism is available


AnySL Results


Applications Beyond Graphics Whole Function Vectorization

− Transform a function over one or more scalar parameter into function over SIMD parameters

− Maintaining semantics within each SIMD lane− Application to shader code & packet ray tracing

− OpenCL-Compiler− Simply add an OpenCL-Frontend− Re-use existing AnySL backends− Currently fastest OpenCL compiler for CPUs & GPUs


AnyDSL Vision

− Language, enabling domain specific environments A new base language (others are to complex already) New environments can be written in AnyDSL

− Think libraries of types, code, syntax, etc.− Meta programming

Ensures predictable performance Programmer directly controls which parts of a program are

evaluated at compile time Convenient syntax, no special templates

− Implicit support for parallelism− Based on continuation passing style


ECOUSS Project “Efficient and Open Compiler Environment for

Semantically Annotated Parallel Simulations” German National Project

− Application Partners− Supercomputing Center HLRS, Stuttgart− Cray Computer− BMW Group− Böhringer-Ingelheim (Pharmacy)

− Research Partners− Intel Visual Computing Institute− German Research Center for Artificial Intelligence (DFKI)− Karlsruhe Institute of Technology


Conclusions AnySL

− Shaders are compiled to platform-independent code− Can be produced from any shading language

− Reduce work for the renderer implementer− Need only supply renderer-specific code and link to AnySL

− Highly-optimizing JIT compiler within the renderer− Eliminates interfaces and optimized code

− High-performance through packetization− Significant speedup on benchmarks (~3.2x )− Eliminated need for SIMD shader coding

− Many applications beyond graphics


Date post:	13-Apr-2015
Category:	Documents
Upload:	yurymik
View:	8 times
Download:	2 times

Slusallek-SIG2011course-AnySL

Documents