Advanced Uses of Pixel Local Storage
Marius Bjørge
ARM Developer Day - London
Graphics Research Engineer
December 3rd 2015
© ARM 2015 2
Agenda
Pixel Local Storage
Indirect lighting pipeline
© ARM 2015 3
Pixel Local Storage
Exposed as EXT_shader_pixel_local_storage
Per-pixel scratch memory available to fragment shaders
Automatically discarded once a tile is fully processed
No impact on external memory bandwidth
Shader declares a view of PLS memory
Re-interpret PLS between different passes
Can have separate input and output views
Independent of framebuffer format
© ARM 2015 4
Pixel Local Storage
Rendering pipeline changes
slightly when PLS is enabled
Writing to PLS bypasses blending
Note
Fragment order
Fragment tests still applies
PLS and color share the same
memory location
Tile executionMemory
Primitive setup
Rasterization
Fragment Shading
Position data
Varyings
Textures
Framebuffer Writeback
Uniforms
Blending
Tilebuffer
PLS / Color
© ARM 2015 5
Deferred Shading
Popular technique on PC and console games
Very memory bandwidth intensive
Traditionally not a good fit for mobile
© ARM 2015 6
Order Independent Transparency
Depth peeling
Approximate approaches
Multi-Layer Alpha Blending [Salvi
et al, 2014]
Adaptive Range
© ARM 2015 7
Pixel Local Storage
OIT phaseOpaque phase
Fill gbufferLight
accumulationTonemap
Pixel Local Storage
Init OIT
R32UI R32UI R32UI R32UI
Transparent rendering
Resolve
ColorRGB10A2 RGB10A2 RG16F RG16F
At this point we change the layout of the PLS
© ARM 2015 8
Sample Code
http://malideveloper.arm.com/resources/sdks/mali-opengl-es-sdk-for-
android/
Indirect Lighting Pipeline
© ARM 2015 10
Related Work
Efficient Rendering With Tile
Local Storage [Bjorge14]
SH Irradiance Volumes
[Tatarchuk05]
Reflective Shadow Maps
[Dachsbacher09]
Virtual Point Lights
Light Propagation Volumes
[Kaplanyan09]
© ARM 2015 11
Global illumination with ARM
Enlighten remains the GI solution of choice
for mobile
Performant
High quality
Proven
PLS + Enlighten = bandwidth saving! [Bjorge14]
This R&D project is separate
Exploration of PLS capabilities
Indirect lighting as first challenging use case
Results remain an approximation
© ARM 2015 12
The Idea
A grid of probes are placed throughout the scene
Albedo, normal and depth information is stored
Resolution can be quite low (64x64, 32x32 or lower)
Selectively store as either cubemaps/latlon maps or spherical harmonics
Relighting does simple deferred shading style rendering per probe
Output: SH irradiance + reflection map
© ARM 2015 13
High-Level Overview
Offline Runtime
Probe data
Probe scene for indirect lighting information
Relight probes
Irradiance volume
Reflection maps
Compute SH
Dynamic lights and reflector objects
Direct Lighting
Indirect Lighting
Reflections
Final Image
Lighting Only
Offline
© ARM 2015 20
Offline – Probe Generation
For every probe in grid
Render scene from every direction
storing albedo, normal and depth
Save to a texture atlas
Depth == 1.0
Hemisphere
© ARM 2015 21
Offline – Probe Storage
High frequency
Cubemaps / latlon maps
Low frequency
Spherical harmonics
Both are used in the demo application
Cubemaps for probes that are used for
environment mapping
Spherical harmonics for the rest
Gives the best quality/performance trade-off
© ARM 2015 22
Offline – Probe Storage
Cubemaps
6kb
Latlon
2kb
Spherical harmonics
40 bytes
© ARM 2015 23
Offline – Meta Data
Neighbour visibility information
Reduce amount of work required when updating probes
Snap texture coordinates to reduce light leakage
Hemisphere visibility
Avoid updating probes that are never affected by global lights (such as the hemisphere
or sun light)
Local cubemap bounding box
Runtime
© ARM 2015 25
Runtime – Relighting
First step is to determine which probes require updating
Probe meta-data is really useful here
Dirty probes are pushed to queue for processing
© ARM 2015 26
Runtime – Per Probe Pipeline
Determine contributing lights
Accumulate lightingParallel sum of
lighting data
Dynamic lights and reflector objects
Probe data
Blit to reflection map
Compute 2nd order SH
Store to 3D textures
© ARM 2015 27
Runtime – Deferred Probe Relighting
Deferred shading implementation
PLS is really useful here
Sample from the baked albedo/depth and
normal maps
Possible to re-use existing deferred shading
lighting code path
Makes it easy to support custom light types
and reflector objects
Heavily batched
-X +X -Y +Y -Z +Z
-X +X -Y +Y -Z +Z
-X +X -Y +Y -Z +Z
-X +X -Y +Y -Z +Z
-X +X -Y +Y -Z +Z
Probe 0
Probe 1
Probe 2
Probe 3
Probe n
© ARM 2015 28
Runtime – Compute SH
Parallel sum with spherical harmonics coefficients
Compute shader
… and/or multiple fragment reduction passes
Output formats
Pack down to separate red, green and blue 3D irradiance textures
Store in FP16 for best visual quality
© ARM 2015 29
Runtime – Irradiance Volume
2nd order SH
3x 3D textures for red, green and blue
coefficients
Simple 3D texture lookup using
world-space position
© ARM 2015 30
Runtime – Reflections
Output to cubemap matching size of
probes
glGenerateMipmap
Cubemap filter too expensive for runtime
© ARM 2015 31
Runtime – Multiple Bounces
Temporal probe relighting
Slowly accumulate into irradiance volume
Single Bounce
Multiple Bounces
© ARM 2015 34
Future Work
Move “everything“ to compute
Octree representation
Occlusion culling?
© ARM 2015 35
For more information visit the
Mali Developer Centre:
http://malideveloper.arm.com
• Revisit this talk in PDF and audio
format post event
• Download tools and resources
The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its
subsidiaries) in the EU and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their
respective owners.
Copyright © 2015 ARM Limited
Thank you!
Questions?
© ARM 2015 37
References
1. http://www.ppsloan.org/publications/StupidSH36.pdf
2. http://www.crytek.com/cryengine/cryengine3/presentations/light-
propagation-volumes-in-cryengine-3
3. http://developer.amd.com/wordpress/media/2012/10/Tatarchuk_Irradi
ance_Volumes.pdf
4. http://www.vis.uni-stuttgart.de/~dachsbcn/download/rsm.pdf
5. http://www.geomerics.com/wp-content/uploads/2014/11/Efficient-
Rendering-with-Tile-Local-Storage.pdf