Date post: | 17-Dec-2015 |
Category: |
Documents |
Upload: | spencer-warren |
View: | 220 times |
Download: | 2 times |
Battle-Tested Deferred Battle-Tested Deferred Rendering on PS3, Xbox Rendering on PS3, Xbox 360 and PC360 and PC
Tibor KlajnTibor Klajnscek,scek,
Technical Director,Technical Director,
ZootFlyZootFly
Overview The G-Buffer Rendering pipeline Lighting details Anti-aliasing HDR Platform-specific issues
G-Buffer We use a full deferred shading
approach A single, heavily #ifdef-ed material
shader writes the G-Buffer 3 RTs on consoles + native depth 32-bit (8888) RTs, 16 bytes/pixel Using DX9 on PC so 4 RTs since we
have to write depth as well
G-Buffer shader Supports all standard stuff (skinning,
parallax, reflection...) Detail texture (UV offset and normal bend) Overlay texture (own UV set) Rim light Self illumination from texture Vertex shader wind Per-polygon billboarding
G-Buffer layout
Accumulation buffer needed for forward lighting (e.g. lightmaps, self-illumination, rim light, fog)
DOF amount calculated here to avoid extra depth reads in post process stage
R G B A
RT 0 Accumulation RGB DOF Amount
RT 1 Color RGB Spec Exponent
RT 2 Normal Phi Normal Theta Translucency Spec Amount
RT 3 Linear depth (encoded into 8-bit channels, PC only)
G-Buffer visualization Color
Normal
G-Buffer visualization
Depth (exaggerated)
G-Buffer visualization
Specular amount
G-Buffer visualization
Specular exponent
G-Buffer visualization
G-Buffer normals Hemispheric normals looked bad
- Projection lets you see neg. normals - Shading can swim because of this
Straight RGB 888 world-space was good, but needed an extra channel
Stored in spherical coordinates - Two 8-bit channels – just 16 bits - Looks better than other two - Conversion cost can be quite high - Lookup texture can be a win here
G-Buffer position On PC store linear Z as RGB encoded
float On consoles use the main z-buffer
- undo projection in light shader World space position
- Interpolated camera to far plane vector * linear Z + camera pos
Google “reconstruct position from depth”
The PipelineThe Pipeline
1. Opaque & Alpha Test
Lays down initial G-Buffer and Z Fill accumulation buffer with ambient, IBL
and self-illumination Z-prepass was not a win for us We render sorted by material first and still
get good early Z Just in case you forgot: make sure to render
alpha test last OUT: Z, accum, color, normal
2. Decals
Alpha test, alpha blend, multiply, additive
Can write all RTs except depth Change normals & color before lighting Can't change specular
- output alpha used for blending - but specular is in the alpha channel
OUT: accum, color, normal
Color without decals
Color with decals
3. Background
Vanilla sky box (optional) Any geometry labeled as background by
the artists Simple shader, no lighting Up to artists to make it look good Far 10% of Z range reserved for this
pass OUT: Z, accum
4. Lighting
Explained in detail in a moment Most of the work happens here We support all standard light types
plus a few custom additions Ambient, Point, Spot, Volume,
Directional, Ortho OUT: accum
5. Transparencies
Alpha geometry & particles sorted Forward shader Lighting only via 3rd order SH
- Compute lighting for center of obj. SH coefficients efficient to calculate in jobs Artists hand tweak cases where it doesn’t look
right - Split mesh into more chunks - Tweak mesh/vertex colors
Lighting details
Lighting - before
Lighting – after
Lighting – overdraw 33 lights in view, all pretty large
Three color light
Artists specify three diffuse colors Front color (N•L) Mid color 1-abs(N•L) Back color (-N•L) Wrap around(-ish)
- Back = black - Mid = 0.5 * front - Almost correct
Less lights needed FASTER!
Three color light
Sub-surface scattering / Translucency Just front color bleeding through to the
back We’re not actually doing proper
scattering... But looks really cool on leaves and
other thin surfaces Also helps noses, earlobes etc. You also get shadows from behind!
Without SSS
With SSS
*Note the shadows
SSS Mask in the G-Buffer
Projected texture
Every light can project a texture It’s just multiplied at the end Cube texture for point lights Had issues with MIP LOD calculation on Z
discontinuities Only solution was to manually override LOD
(tex2Dlod) Select LOD based on screen-space size, but be
aggressive Tweak selection until it looks OK
Lighting shader code
float lightdot = dot( Normal , ToLight );
// Fake sub-surface scattering
float3 SSSColor = FrontColor * SSSAmount;
SSSColor *= 0.3 + shadow*0.7;
float3 Result;
Result = saturate( lightdot) * lerp( BackColor , FrontColor , shadow );
Result += saturate(1-abs(lightdot)) * MidColor;
Result += saturate(-lightdot) * (BackColor + SSSColor);
Result *= PixelMaterialColor;
Result += SpecularBlinn( Normal , HalfVec ) * SpecColor * Shadow;
Result *= ProjectedMaskTexture;
Excerpted just the relevant bits...
Light filters/groups
We have no filtering Could use IDs, but didn’t
- shader would run on tons of pixels that would get rejected in the end
- needs extra channel in g-buffer Artists use custom water tight
meshes grouped under the light in Maya to contain lights
Multiplicative lights
All our lights can be set to use multiply as blend mode
Useful for adding in dark spots without many lights
Also helps if you need to add a dark spot in a hurry before shipping
Multiplicative lights Before
Multiplicative lights After
Ambient light
Box shape with a nice fade It’s basically a SH light probe
- Group a bunch of point, spot and directional lights under it in Maya
- Plus a standard ambient term - They all get baked into 3rd order SH
- Just lookup with the pixel normal
Directional light
Cascaded shadow map Cascades rendered as boxes Final non-shadow pass is a fullscreen
quad - quad at far plane to stencil mask out
sky/background Projector texture is tiled and animated
cheap, fake cloud shadows!
Early stencil rejection
Without it we’d run at about 4 fps so I can’t stress the importance of it enough!
Very simple to set up, but easy to break too
Very fast rendering Cuts down light rendering time
tremendously
Early stencil rejection
All lights are rendered as geometry - Sphere for point, cone for spot etc. - 50-100 polys
Use same geometry for stencil mask unless artist supplies a mesh
We use a standard Z-Fail approach Yes, we should be using Z pass to get early Z in
the masking pass But this pass was always fast so we chose to fix
other stuff first
Early stencil rejection
Mask pass (no pixel shader): TwoSidedStencilMode = true StencilFunc = Always StencilZFail = Invert CCW_StencilFunc = Always CCW_StencilZFail = Invert StencilWriteMask = 1 SCull/HiZ = Equal to 1
Light shader pass: StencilFunc = Equal StencilRef = 1
This works well with SCull (PS3)
Early stencil example #1
Simple case, light geometry
e
Early stencil example #1
Simple case, light geometry
e
Early stencil example #2
Custom geometry
e
Early stencil example #2
Custom geometry
e
Directional light stencil
Every cascade must only light pixels untouched by previous cascades
Cascade overlap unpredictable when FOV & settings change
Came up with a way to always keep stencil test EQUAL to 1
Plays nice with SCull Every cascade rendered into stencil twice,
but still plenty fast
Directional light stencil
Write Mask: 00000001Z Fail: Invert
Green Stencil == 1Red Stencil > 1
Mask cascade #1 and do lighting
Directional light stencil
Clear cascade #1
Write Mask: 00000010Z Fail: Invert
Green Stencil == 1Red Stencil > 1
Directional light stencil
Mask cascade #2 and do lighting
Write Mask: 00000001Z Fail: Invert
Green Stencil == 1Red Stencil > 1
Directional light stencil
Clear cascade #2
Write Mask: 00000100Z Fail: Invert
Green Stencil == 1Red Stencil > 1
Directional light stencil
Mask cascade #2 and do lighting
Write Mask: 00000001Z Fail: Invert
Green Stencil == 1Red Stencil > 1
Directional light stencil
Clear cascade #2
Write Mask: 00001000Z Fail: Invert
Green Stencil == 1Red Stencil > 1
Directional light stencil
Mask far plane and do final pass
Write Mask: 00000001Z Fail: Invert
Green Stencil == 1Red Stencil > 1
Antialiasing overview
Render G-buffer into 2xMSAA RT Perform lighting for each sample Need render target access at sample not
pixel level Effectively supersample lighting Can be expensive There’s a ton of hacky methods that
might work for you
Antialiasing hack #1
Distribute shadow sampling between MSAA samples
Suggested by Guerilla guys in their excellent presentation
We use it, works great Just do it
Antialiasing hack #2
Render lighting at pixel resolution, but with 2xMSAA (per-sample stencil tests)
Light both samples in the shader and output averaged result
Saves output bandwidth compared with super sampled rendering
Stencil testing causes artifacts
Antialiasing hack #2
Edges on stencil discontinuities still darken on resolve - Averaged in the shader already - But then one sample rejected on stencil fail - Can be fixed by sampling stencil in the shader
as well, but it may not be trivial (i.e. Xbox 360 with float depth)
Not a whole lot of benefit from this alone, but allows for more optimizations in the shader
Antialiasing hack #3
Use one position for both samples Manually loop just part of shader Shadows, light falloff, projected textures all
break on edges Was too visible for us (lots of lights, shadows
and projected textures everywhere) Caused borders around characters Might work for you
Antialiasing hack #4
Pre-resolve color buffer to avoid two lookups
Wrong lighting on edges since the color bleeds between background and foreground
Can work if your scenes are uniform enough - gray & brown are popular lately - can work for outdoors
Antialiasing on PC
Works properly with DX10.1 We only support DX9 so tough luck Super-sampling was too slow
- PC resolution unpredictable - But HW is becoming really fast
Using centroid sampling do hacky approximate AA (google it) - Couldn’t afford extra geometry pass
Edge-detection AA + filtering - slow and looks crappy
Antialiasing on PC
Some success with jittered 2X AA Apply sub-pixel offset to projection Alternate between frames Always show 50-50 blend Visible feedback if framerate is low Can use temporal re-projection to fix it somewhat
or enable only if framerate is high enough (60+) Left it out in the end, it was untested so we
played it safe
Antialiasing on PS3
Confessions first... We render at 1120x576 2xMSAA
- AA was added late in the project - Lights & textures were already in - Couldn’t afford 35+ MB buffers - Fillrate was also an issue in certain cases
Preferred the image quality over 1280x720 and no MSAA.
We have lots of thin steel bars
Antialiasing on PS3
Alias same memory as: - 1120x576 2xMSAA - 2240x576 non-AA
Do ping-pong post between left and right half of 2240x576 RT
Our render targets: - 2x 1280x720 Front/back buffer - 3x 1120x576x2 RTs - 1x 1120x576x2 Depth
Total memory: 26.7 MB
Antialiasing on PS3
1. Activate 1120 2xMSAA MRT2. Render G-buffer3. Switch to 2240 no-AA RT
1. - PS3 has a nice MSAA layout for this2. - Reload ZCull!
4. Render lighting as usual, lighting each sample as a pixel
Antialiasing on PS3
5. Switch back to 1120 2xMSAA RT5. - Don’t forget to reload ZCull!
6. Render transparencies with MSAA7. Quincunx resolve at the end
1. - Resolve into same memory!2. - To left part of 2240 no-AA texture3. - Didn’t cause any artifacts for us
Antialiasing on Xbox 360
Confessions again... We render at 1120x576 2xMSAA Same reasons as PS3, but fillrate was less of
an issue Lighting can render without tiling Without this we'd have to cache all shadow
maps Not really an option with a bunch of shadow
casting lights
Antialiasing on Xbox 360
Our render targets: - 2x 1120x576x2 Accum./FB RT - 1x 1120x576x2 Color RT - 1x 1120x576x2 Normal RT - 1x 1120x576x2 Depth RT
720p frame buffers in same memory as the accum. buffers - Alternate between frames - Can’t do this on PS3 due to tiled memory limitations
Total memory: 24.6 MB
Antialiasing on Xbox 360
1. Activate 1120 2x MSAA MRT2. Render G-buffer3. Resolve both samples of all RTs4. Activate 2240 no-AA RT5. Restore depth & accumulation to EDRAM as
2xWidth with custom shader (emulate PS3’s layout)
6. Render lighting as usual, lighting each sample as a pixel
Antialiasing on Xbox 360
7. Resolve to 2240 no-AA texture8. Using a custom shader average
samples into a 1120 no-AA EDRAM surface
9. Render alpha, particles and the rest without MSAA
10. Resolve into the left part of 2240 no-AA texture
Antialiasing future work
We should really only be doing lighting twice for edge pixels
Huge potential speed boost Didn’t research further at the time
since it was fast enough Must find a way to make it play well
with SCull/Hi-Stencil without breaking our stencil masking
HDR
All our buffers are 8:8:8:8 We use Valve style HDR with
histogram analysis HDR multiplier is passed into all
shaders that write to the accumulation buffer
Output color is multiplied before output
HDR
Not really correct, but hey, it looks convincing
In current project, exposure is limited to 0.5 – 2.0 range since HDR was added mid-project
Tried with larger exposures ranges and still looked cool
Light blending fails if exposure is really low and light contribution is below 1/255
Exposure = 0.5
HDR
Exposure = 1.0
HDR
Exposure = 2.5
HDR
Exposure = 5
HDR
Post-processing
This is one of the best things with deferred rendering
For each pixel you have access to: - Color - Normal - Position / Depth - Final lit result
You can pretty much do any post process your want with this
Post-processing
But it's very easy to absolutely devastate performance on both consoles so be careful
Cram as much as you can into a single shader to avoid re-reading data
Check the end of the slides for our post processing method
Platform specific issues
There are times where you just want to...
Platform specific issues
...burn you PC!
Platform specific issues
... smash your Xbox 360!
Platform specific issues
... make a grill out of your PS3!
Platform specific issues
We had those moments ourselves Unfortunately dev kits cost too
much.... So we had no choice but to solve
the issues... So here’s what we learned
PS3 Performance Killers
If you don’t setup MRT properly, your performance will be SLOW
Memory tiler makes reuse hard some times (pitch must match)
ZCull needs reloading to work SCull hates any changes Make sure you read all Sony docs on the
subject, it’s already been covered a lot
PS3 SCull horrors
SCull is very, very touchy Changing SCull compare value kills it for the
frame (at least for us) Best to just bind it once and leave it alone
forever All lights just use EQUAL to 1 as stencil pass
criterion Must clear stencil after every light
WARNING - This also applies to GeForce 6 & 7 series PC parts
PS3 improvements
We’re still doing all rendering on RSX If you’re cross platform you’ll likely wind up
with spare SPU time Moving post processing to SPUs is an easy
way too free up the RSX You can even do parts or all of the shading
with SPUs, but that’s a bit more involved. Remember – SPUs are FAST!
Xbox 360 EDRAM
VERY fast and generally awesome But can be quite inflexible at times Once you start running low you’re
pretty much out of luck But it’s mostly forgiven since it’s really
fast Plan you EDRAM use otherwise you’ll
be in a world of pain...
Xbox 360 EDRAM
When rendering shadow maps the accumulation buffer is evicted from EDRAM - Restored for each shadow casting light, but
fillrate was better than PS3 so we could afford this
Higher resolutions don't scale linearly - Start requiring 3 tiles at g-buffer pass (much
slower) - 2 tiles for lighting (not good)
Xbox 360 gamma
Started paying attention too late Had to undo 360 gamma correction to get a
proper image All our textures and lighting were done so
there was no other way Artifacts not really noticable by the end
user Might just keep it like this since the image
is consistent across all platforms
Final thoughts
Deferred rendering is cool and practical
Enables really large light counts MSAA is not an issue Some of the best looking games use
some variant of deferred rendering It’s my opinion that it makes cross
platform development easier
QUESTIONS?
E-mail: [email protected] Feel free to send spam, I already
get lots
Slides available soon on www.zootfly.com
Stuff that didn’t make it into the talk, but is still cool
Our post process
Hi-Pass & SSAO
at ½ res
Downsample to ¼ res
Horizontal blur Horizontal blur
Vertical blur Vertical blur
Z/Pos Buffer
Downsampleto ¼ res
Accumulationbuffer
COMBINE
100% 50% 25%Resolution:
Z-Downsampling
Use any applicable MSAA hacks when downsampling Z
Quarter res Z is needed for low res particle rendering anyway
Huge bandwidth savings when sampling from lower res texture
SSAO
Calculated at 50% resolution, but blurred at 25%?!
It’s much more stable this way - Higher frequency input
We also tried blurring at 50% res, but there was no visual difference except the framerate drop
Depth-of-field
We always apply DOF to geometry very close (<1m) to the camera
Hides low res textures this way and just looks cool
Very simple, just four parameters: - Near plane distance - Near plane fade - Far plane distance - Far plane fade
Combine
Final step, munges it all together Color correction before output
- Apply levels filter - Apply curves filter
Controllable saturation of base and bloom images
On consoles upscale to 1280x720
Combine code// DOF
Out = lerp( Accum, BlurredAccum, doffac );
// Ambient occlusion
Out *= AmbientOcclusionFactor;
// Bloom with adjustable saturation & intensity
Out = ApplySat(Out,BaseSat);
Bloom = ApplySat(Bloom,BloomSat) * BloomIntensity;
Out *= (1 - saturate(Bloom));
Out += blurred;
// Levels filter
Output = sat((Output + LevelsInAdd) * LevelsInMul);
Output = pow( Output , LevelsGamma );
// Curves
Out.r = tex1D( CurvesSampler , Out.r ).r;
Out.g = tex1D( CurvesSampler , Out.g ).g;
Out.b = tex1D( CurvesSampler , Out.b ).b;
Our material system
Material textures R G B A Format
Color Color RGB Alpha (OPT.) DXT1 / 5
Normal Self illum. N.Y N.Z N.X DXT5
Mask Reflection Spec Amount Spec Exp. Height (OPT.) DXT1 / 5
Env Cube Color RGB additive NULL DXT1
Overlay Color RGB Photoshop overlay blend mode NULL DXT1
Detail U Offset V Offset Normal Bend X Normal Bend Y 8888
All textures are optional (#ifdefs) Detail:
- tiled, uses multiplied primary UV set - offset UV for texture lookups - then bend the normal
DXT Red & Blue suck, but good enough for what they contain
Material settings Bunch of checkboxes and sliders Less is faster, more is better No custom artist shaders Allows programmer to optimize