+ All Categories
Home > Documents > Slusallek-SIG2011course-AnySL

Slusallek-SIG2011course-AnySL

Date post: 13-Apr-2015
Category:
Upload: yurymik
View: 8 times
Download: 2 times
Share this document with a friend
Description:
Slusallek-SIG2011course-AnySL.pdf
36
AnySL Efficient and Portable Multi-Language Shading Philipp Slusallek Sebastian Hack, Ralf Karrenberg, Dmitri Rubinstein German Research Center for Artificial Intelligence (DFKI) Intel Visual Computing Institute Saarland University Monday, August 15, 2011
Transcript
Page 1: Slusallek-SIG2011course-AnySL

AnySLEfficient and Portable Multi-Language Shading

Philipp SlusallekSebastian Hack, Ralf Karrenberg, Dmitri Rubinstein

German Research Center for Artificial Intelligence (DFKI)Intel Visual Computing Institute

Saarland University

Monday, August 15, 2011

Page 2: Slusallek-SIG2011course-AnySL

Saarbrücken

Monday, August 15, 2011

Page 3: Slusallek-SIG2011course-AnySL

Saarland Campus

Monday, August 15, 2011

Page 4: Slusallek-SIG2011course-AnySL

Computer Scienceat the Saarland Campus

Monday, August 15, 2011

Page 5: Slusallek-SIG2011course-AnySL

Computer Scienceat the Saarland Campus

Monday, August 15, 2011

Page 6: Slusallek-SIG2011course-AnySL

Computer Scienceat the Saarland Campus

Monday, August 15, 2011

Page 7: Slusallek-SIG2011course-AnySL

Computer Scienceat the Saarland Campus

Monday, August 15, 2011

Page 8: Slusallek-SIG2011course-AnySL

Computer Scienceat the Saarland Campus

Monday, August 15, 2011

Page 9: Slusallek-SIG2011course-AnySL

Computer Scienceat the Saarland Campus

MultimodalComputingandInteraction

Monday, August 15, 2011

Page 10: Slusallek-SIG2011course-AnySL

Computer Scienceat the Saarland Campus

MultimodalComputingandInteraction

Monday, August 15, 2011

Page 11: Slusallek-SIG2011course-AnySL

Shaders● Programmable Shading

– Allows for controlling core rendering features● Material properties, light emission, participating media, …

– Today: Many different shading languages● HLSL, glsl, Cg, RenderMan, MetaSL, OSL, OpenRL,

many C++ dialects, …● Mostly the same features, expressed differently

– We need a portable way to exchange materials● Common specification of shading features● Ease implementation for different renderers and HW

● Here: Efficient and Portable Implementation

Monday, August 15, 2011

Page 12: Slusallek-SIG2011course-AnySL

Shaders• A plug-in for the innermost loops

– From one-liners to thousands of lines of code– Run for every new ray, surface hit, light sample, …

● Sometimes, once for every MADD along ray● Efficient implementation

– Low overhead interface to renderer● Ideally works directly on internal data structures

– Highly optimized code for specific HW architectures● Use of SIMD (SSE, AVX, PTX, …)

Monday, August 15, 2011

Page 13: Slusallek-SIG2011course-AnySL

Implementation Choices

Data

Code

Renderer

GlueCode

C/C++API

/ABI

C/C++API

/ABI

C++Shader

Shader DSO/DLL

● Shaders code in C++– API specifies interface to renderer– Separate C/C++ compilation to DLL/DSO– API gets mapped directly to platform specific ABI

● Predefined data layout, function call overhead● No optimization options in interface

Monday, August 15, 2011

Page 14: Slusallek-SIG2011course-AnySL

Implementation Choices

Data

Code

Renderer

Gen.API/ABI

Gen.API/ABI

Shader DSO/DLL

● Using a Shading Language Compiler– Compiler can transform and optimize shader code

● E.g. use of renderer internal APIs: No glue code● Transform shaders to SIMD

– Requires renderer and language specific compiler● Most renders support only one shading language

– Renderer-specific code gets embedded in result

SLShader

Monday, August 15, 2011

Page 15: Slusallek-SIG2011course-AnySL

Implementation Choices

● AnySL: Embedded SL Compiler– Any language compiled into portable format– Types, data layout, interface not fixed yet– Renderer supplies implementations at runtime– Embedded compiler links and optimizes code

Data

Code

Renderer

GlueCode

Compiler (LLVM)

Data

Code

Renderer

OptimizedShader

Compiler (LLVM)

API

SLShader

Monday, August 15, 2011

Page 16: Slusallek-SIG2011course-AnySL

AnySL: Portable Shading• “Any” Shading Language Supported

• Currently: RenderMan, C++ dialects, Javascript, …• Common Intermediate Format

• Independent of renderer and HW architecture• Easy Implementation by Renderer

• Need only supply the glue code• Different Backends

• Ray Tracing: PBRT, Manta, RTfact, …• Rasterization: Deferred shading (with RTT)• HW: x86, SSE, AVX, PTX, OpenCL, glsl, …

Monday, August 15, 2011

Page 17: Slusallek-SIG2011course-AnySL

AnySL & XML3D: Interactive RenderMan in Your Web Browser

Monday, August 15, 2011

Page 18: Slusallek-SIG2011course-AnySL

AnySL

Implementation

Monday, August 15, 2011

Page 19: Slusallek-SIG2011course-AnySL

AnySL: Implementation Designing an Interpreter: Options

− Many OP-codes with large switch() statement− Replace OP-codes with function calls

“Subroutine Threaded Code”− Long list of function calls

Even for control flow (“if”) and types (allocate a “float”)− Nice for portability, implementations can be replaced

E.g.: use predication for “if” or substitute own “float” type− Can be directly encoded in compiled code

Use LLVM bitcode for representation → Efficiency

Monday, August 15, 2011

Page 20: Slusallek-SIG2011course-AnySL

Subroutine Threaded Code

Conversion to Threaded Code

Its implementation(supplied by renderer)

Handling control flow: RM illuminace loop

Mapping to Threaded Code

Possible implementation(supplied by renderer)

Original shader code

Monday, August 15, 2011

Page 21: Slusallek-SIG2011course-AnySL

But Interpreters are Slow?!? STC is used for portable representation only

− Eliminated at runtime with embedded compiler “Type Replacement”

− Substitute own types and operators− Inline all interpreter calls− Perform all usual scalar optimization

Can be used for special shader functionality− Taking derivatives of arbitrary expressions− Bounding the result of shader over intervals

− E.g. using Affine Arithmetic [Heidrich et al., 1998]

Monday, August 15, 2011

Page 22: Slusallek-SIG2011course-AnySL

How it All Fits Together

Monday, August 15, 2011

Page 23: Slusallek-SIG2011course-AnySL

Special Functionality Derivatives of arbitrary expressions

− Implemented through “Automatic Differentiation” Each type stores and maintains (2) derivatives Each operation updates value and derivatives Input provides initial derivatives (e.g. w.r.t screen space)

Bounding the value of a shader over interval− Implemented through Interval or Affine Arithmetic

Each type stores and maintain value plus interval− AA: plus terms for linear dependencies on (all) input values

Each operation updates value and derivatives Input provides initial interval (e.g. w.r.t parameter space)

All maps nicely to Type Replacement

Monday, August 15, 2011

Page 24: Slusallek-SIG2011course-AnySL

ResultsAutomatic differentiation for anti-aliasing

Point sampling Analytic AA: Blend to average near Nyquist

Monday, August 15, 2011

Page 25: Slusallek-SIG2011course-AnySL

Optimization:Packet-Based Shading Modern ray tracers shoot packets of rays

Exploit SIMD instructions of modern CPUs− Can execute instruction on k ≤ n floats at once− Current architectures:

SSE (4), AVX (8), KNF (16), GPU (32) Shader function has to shade n hit points at once

Monday, August 15, 2011

Page 26: Slusallek-SIG2011course-AnySL

AnySL:Packetized Shaders Writing packetized shaders is REALLY HARD

− Not an option for any application You may not want to do this by hand:

Monday, August 15, 2011

Page 27: Slusallek-SIG2011course-AnySL

AnySL:Packetized Shaders Given:

− A shader is given by a control-flow graph of scalar instructions

Needed:− A packetized shader is a new shader that executes k

instances of the original shader at once Control flow of instances can diverge!

Monday, August 15, 2011

Page 28: Slusallek-SIG2011course-AnySL

Main Issues: Control Flow Diverging control flow of a shader

− Need to efficiently merge flows again!

Shaders are nested in a deep recursion− Must handle closures and reordering of packets

Monday, August 15, 2011

Page 29: Slusallek-SIG2011course-AnySL

Packetized Shaders Approach

− Program transformation− Flatten control flow− Every instance executes

all instructions− Mask out wrong results− Loops are iterated until

last instance is done− Already exited instances

are invalidated− Simulate what GPUs do in HW

Monday, August 15, 2011

Page 30: Slusallek-SIG2011course-AnySL

AnySL:Dealing With Data Divergence SSE has no gather/scatter support

− Data must be in multiple of four and properly aligned Need to resort to serial load/store

− Extract individual values from SSE vector− Load/Store − Merge/blend results back into SSE register− Very expensive (lots of dependencies)

Calling non-packetized functions− Essentially, the same as scatter/gather− E.g. hand-crafted SSE noise() function

Monday, August 15, 2011

Page 31: Slusallek-SIG2011course-AnySL

Packetized Shader Results

Packet size of 4 (SSE)− Completely automated (LLVM)− Shaders are packetized automatically− On average 3.2x speedup

for complete rendering− Not specific to graphics− Can be used wherever

data parallelism is available

Monday, August 15, 2011

Page 32: Slusallek-SIG2011course-AnySL

AnySL Results

Monday, August 15, 2011

Page 33: Slusallek-SIG2011course-AnySL

Applications Beyond Graphics Whole Function Vectorization

− Transform a function over one or more scalar parameter into function over SIMD parameters

− Maintaining semantics within each SIMD lane− Application to shader code & packet ray tracing

− OpenCL-Compiler− Simply add an OpenCL-Frontend− Re-use existing AnySL backends− Currently fastest OpenCL compiler for CPUs & GPUs

Monday, August 15, 2011

Page 34: Slusallek-SIG2011course-AnySL

AnyDSL Vision

− Language, enabling domain specific environments A new base language (others are to complex already) New environments can be written in AnyDSL

− Think libraries of types, code, syntax, etc.− Meta programming

Ensures predictable performance Programmer directly controls which parts of a program are

evaluated at compile time Convenient syntax, no special templates

− Implicit support for parallelism− Based on continuation passing style

Monday, August 15, 2011

Page 35: Slusallek-SIG2011course-AnySL

ECOUSS Project “Efficient and Open Compiler Environment for

Semantically Annotated Parallel Simulations” German National Project

− Application Partners− Supercomputing Center HLRS, Stuttgart− Cray Computer− BMW Group− Böhringer-Ingelheim (Pharmacy)

− Research Partners− Intel Visual Computing Institute− German Research Center for Artificial Intelligence (DFKI)− Karlsruhe Institute of Technology

Monday, August 15, 2011

Page 36: Slusallek-SIG2011course-AnySL

Conclusions AnySL

− Shaders are compiled to platform-independent code− Can be produced from any shading language

− Reduce work for the renderer implementer− Need only supply renderer-specific code and link to AnySL

− Highly-optimizing JIT compiler within the renderer− Eliminates interfaces and optimized code

− High-performance through packetization− Significant speedup on benchmarks (~3.2x )− Eliminated need for SIMD shader coding

− Many applications beyond graphics

Monday, August 15, 2011


Recommended