+ All Categories
Home > Technology > A Bizarre Way to do Real-Time Lighting

A Bizarre Way to do Real-Time Lighting

Date post: 09-Jun-2015
Category:
Upload: steven-tovey
View: 13,370 times
Download: 0 times
Share this document with a friend
Popular Tags:
65
A Bizarre Way to do Real-Time Lighting Stephen McAuley & Steven Tovey Graphics Programmers, Bizarre Creations Ltd. [email protected] [email protected] http://www.bizarrecreations.com/
Transcript
Page 1: A Bizarre Way to do Real-Time Lighting

A Bizarre Way to do Real-Time Lighting

Stephen McAuley & Steven ToveyGraphics Programmers, Bizarre Creations Ltd.

[email protected]

[email protected]

http://www.bizarrecreations.com/

Page 2: A Bizarre Way to do Real-Time Lighting

“Welcome, I think not!” Let us start by wishing you a

good bonfire night!

Page 3: A Bizarre Way to do Real-Time Lighting

Agenda A sneak preview of Blur Light Pre-Pass Rendering 10 Step Guide to free Lighting on PS3 The Future...

Page 4: A Bizarre Way to do Real-Time Lighting

Blur

Coming 2010 on X360, PS3 and PC. Twenty cars on track for intense wheel-

to-wheel racing. Exciting power-ups bring depth and

strategy to racing. Real-world cars and locations, set

between dusk and dawn. Extensive multiplayer options.

Page 5: A Bizarre Way to do Real-Time Lighting
Page 6: A Bizarre Way to do Real-Time Lighting

Technical Analysis So, we have twenty cars, racing around a track

in the dark…

…they all have headlights, rear lights, brake lights…

…not to mention any other effects we might have going on around the track…

…therefore, we need some sort of real-time lighting solution.

Page 7: A Bizarre Way to do Real-Time Lighting

Light Pre-Pass

Many people came up with this… so you know it’s good!

Given its name by [Engel08]. Credits also due to [Balestra08]. Half-way between traditional and

deferred rendering.

Page 8: A Bizarre Way to do Real-Time Lighting

NormalsFinal Colour

Geometry

Geometry

Depth

Real-Time Lighting

Light Pre-Pass

Page 9: A Bizarre Way to do Real-Time Lighting

Light Pre-Pass in Blur

Final Image

Page 10: A Bizarre Way to do Real-Time Lighting

Step #1: Render Pre-Pass Render scene normals and depth.

We pack view space normals and depth into one RGBA8 surface:

This means all the info we need is in one texture, not two!

It’s also faster to calculate view space position than world space position.

normal x normal y depth hi depth lo

R G B A

Page 11: A Bizarre Way to do Real-Time Lighting

Step #1: Render Pre-Pass Pack depth:

Unpack depth:

(Note: here fDepth is in [0, 1] range)

half2 vPackedDepth =half2( floor(fDepth * 255.f) / 255.f,

frac(fDepth * 255.f) );

float fDepth =vPackedDepth.x + vPackedDepth.y * (1.f / 255.f);

Page 12: A Bizarre Way to do Real-Time Lighting

Step #1: Render Pre-Pass Get view space position from texture

coordinates and depth:

float3 vPosition= float3(g_vScale.xy * vUV + g_vScale.zw, 1.f)* fDepth;

In [0, FarClip] range

In some circumstances, possible to move this to the vertex shader.

g_vScale moves vUV into [-1, 1] range and scales by inverse projection matrix values

Page 13: A Bizarre Way to do Real-Time Lighting

Step #1: Render Pre-Pass

Normals X & Y Depth Hi & Lo

Normal X, Normal Y, Depth Hi, Depth Lo

Page 14: A Bizarre Way to do Real-Time Lighting

Step #1: Render Pre-Pass Some good advice: at this stage, it’s really best to

render only what you need…

So don’t render geometry that isn’t affected by real-time lights!

Why not also try bringing in the far clip plane?

We also don’t render the very, very vertex-heavy cars.They get their real-time lighting from a spherical

harmonic. Doesn’t look too bad!

Page 15: A Bizarre Way to do Real-Time Lighting

Step #2: The Lighting

We render the lighting to an RGBA8 texture.

Lighting is in [0, 1] range.We just about got away with range and

precision issues. Two types of lights:

Point lightsSpot lights

Page 16: A Bizarre Way to do Real-Time Lighting

Step #2: Point Lights First up, it’s the point lights turn. Let’s copy [Balestra08] and render them tiled. Split the screen into tiles:

Big savings!Save on fill rate.Minimise overhead of unpacking view space

position and normal.

for each tilegather affecting lightsselect shaderrender tile

end

Page 17: A Bizarre Way to do Real-Time Lighting

Step #2: Point Lights

1 1

1 2 1

1 1

Page 18: A Bizarre Way to do Real-Time Lighting

Step #2: Point Lights Optimise: mask out the sky in the stencil buffer.

Page 19: A Bizarre Way to do Real-Time Lighting

Step #2: Point Lights

Real-Time Lighting (Point Lights)

Page 20: A Bizarre Way to do Real-Time Lighting

Step #2: Spot Lights Next, it’s the spot lights. Three different types:

Bog standard.2D projected texture.Volume texture.

Render as volumes.A cone for the bog-standard and projected.A box for the volume textured.

If they’re big enough on screen, do a stencil test.

Page 21: A Bizarre Way to do Real-Time Lighting

Step #2: Spot Lights

Render back faces:Colour write disabledDepth test greater-equalStencil write enabled

Page 22: A Bizarre Way to do Real-Time Lighting

Step #2: Spot Lights

Render front faces:Colour write enabledDepth test less-equalStencil test enabled

Page 23: A Bizarre Way to do Real-Time Lighting

Step #2: Spot Lights

Hold on a minute… what happens if the camera goes inside the light volume?

Rendering the front faces doesn’t work any more…

Page 24: A Bizarre Way to do Real-Time Lighting

Step #2: Spot Lights

Worst case scenario! Not only does the light fill the whole screen, but…

You just have to bite your tongue and only render back faces.You lose your stencil test. And maybe even early-z too.

Page 25: A Bizarre Way to do Real-Time Lighting

Step #2: Spot Lights

Page 26: A Bizarre Way to do Real-Time Lighting

Step #2: The Lighting

Real-Time Lighting

Page 27: A Bizarre Way to do Real-Time Lighting

Step #3: Render the Scene Just do everything as you normally

would…

Except that you now have a texture containing the real-time lighting for each pixel!

But remember to composite it properly…

Page 28: A Bizarre Way to do Real-Time Lighting

Step #3: Render the Scene

You’d probably want to do something clever involving a Fresnel term here.

half3 vDiffuseLighting =vStaticLighting.rgb + vDynamicLighting.rgb;

half3 vFinalColour =vDiffuseLighting * vAlbedoColour.rgb +vSpecularLighting;

The real-time lighting from the texture.

From our lightmaps.

Page 29: A Bizarre Way to do Real-Time Lighting

And Finally…

Page 30: A Bizarre Way to do Real-Time Lighting

Real-Time Lighting in Blur

Point Lights: brake lights, rear lights

Page 31: A Bizarre Way to do Real-Time Lighting

Real-Time Lighting in Blur

Point Lights: pick-ups

Page 32: A Bizarre Way to do Real-Time Lighting

Real-Time Lighting in Blur

Point Lights: power-up effects

Page 33: A Bizarre Way to do Real-Time Lighting

Real-Time Lighting in Blur

Spot Lights: headlights

Page 34: A Bizarre Way to do Real-Time Lighting

Real-Time Lighting in Blur

Spot Lights: start line effects

Page 35: A Bizarre Way to do Real-Time Lighting

Great, It Works!

But can we make it faster?

Deferred lighting is image processing – no rasterization required.See how we draw our point lights.

Seems like this suits the PLAYSTATION®3’s SPUs…

Page 36: A Bizarre Way to do Real-Time Lighting

PLAYSTATION®3: In Brief Time to switch gears a little bit...

So you’ve heard this stuff a million times before... Here are the important takeaway facts:PS3 has 6 SPUs.SPUs are fast!

○ (...Given the right data! )

Page 37: A Bizarre Way to do Real-Time Lighting

SPU

SPU

PLAYSTATION®3: In Brief

Main Memory(XDR - 256MB)

RSX™

Graphics Memory(GDDR3 - 256MB)

SPU

SPU SPU

SPU

Page 38: A Bizarre Way to do Real-Time Lighting

SPE

PLAYSTATION®3: In Brief

MFC

SPU

Local Store(256KiB)

SXU

Main Memory(256MB)

Graphics Memory(256MB)

Page 39: A Bizarre Way to do Real-Time Lighting

Goals for PLAYSTATION®3 Reduce overall frame latency to

acceptable level (<33ms). Preserve picture quality (and resolution).

Blur runs @ 720p on X360 and PS3. Preserve lighting accuracy. Lighting and main scene must match:

Cars move fast... Deferring the lighting simply not an option,

works great in [Swoboda09] though.

Page 40: A Bizarre Way to do Real-Time Lighting

Step #1: Look At The Data Data is *really* important!

Trivially easy in this case as we’re coming from a stream processing model, but never hurts to understand it anyway.

Kinda gives us a small glimpse of DX11 compute shaders .

Page 41: A Bizarre Way to do Real-Time Lighting

Step #1: Look At The Data

Lights

xform

Page 42: A Bizarre Way to do Real-Time Lighting

Step #1: Look At The Data

Lights

xform

Page 43: A Bizarre Way to do Real-Time Lighting

Step #2: Parallelism

Stream processing highly suited to parallelisation and we have 6 x SPUs.

The obvious question arises:What size should a unit of work be?

Answer: Look at the data again!

Page 44: A Bizarre Way to do Real-Time Lighting

Step #3: Look At The Data Fun fact: Frame buffers are not usually linear!

Many reasons for this (Think filtering and RSX™ quads).

Our unit size is closely tied to the internal format of frame buffer produced by the RSX™.

Not going to get into the exact formats here, it’s dull and it’s all in the Sony SDK Docs – RTFM!

Recommend PhyreEngine for good reference examples.

Page 45: A Bizarre Way to do Real-Time Lighting

Step #4: Arbitrating Work Synchronisation points are fail. Keep to an

absolute minimum.

Solution: Atomics are your friend! Target hardware has an ATO, Use it, <3 it...

Move through data in tiles, tile dictated by an index – DMA into the local store for processing.

Page 46: A Bizarre Way to do Real-Time Lighting

Index

Step #4: Arbitrating Work

SPU SPU SPU SPU SPU SPU

Page 47: A Bizarre Way to do Real-Time Lighting

Step #5: Multi-Buffering Move data and process data at the

same time. Costs local store, but usually worth it. Different tag group for each buffer.

Page 48: A Bizarre Way to do Real-Time Lighting

Step #5: Multi-Buffering We used triple-buffering, since we’re

decoding the normal/depth buffer.Normal/Depth Buffer (Main)

Lighting Buffer (Main) MFC

SXU

Page 49: A Bizarre Way to do Real-Time Lighting

Step #6: Lighting (SOA) SOA is basically a transpose of the

obvious layout:

qword dot_xx = si_fm(v, v);qword dot_xx_r4 = si_rotqbyi(dot_xx, 4); dot_xx = si_fa(dot_xx, dot_xx_r4);qword dot_xx_r8 = si_rotqbyi(dot_xx, 8); dot_xx = si_fa(dot_xx, dot_xx_r8);return si_to_float(dot_xx);

qword dot_x = si_fm(x, x);qword dot_y = si_fma(y, y, dot_x);qword dot_z = si_fma(z, z, dot_y);return dot_z;

Vs.1x square length (~18 cycles)

4 x square lengths (~12 cycles)

Z

X Y Z W

X

X

X

X XXX

Y

Y

Y

Y Y Y YZ

Z Z Z Z Z

W

W

W W W W W

Page 50: A Bizarre Way to do Real-Time Lighting

Step #6: Lighting (SOA) Pre-transpose lighting data, splat values

across entire qword.16 byte aligned, single lqd.

struct light{

float m_x[4];float m_y[4];float m_z[4];float m_inv_radius_sq[4];float m_colour_r[4];float m_colour_g[4];float m_colour_b[4];

};

Never actually used radius, pre-compute

(1/radius)^2

4 copies of world-space X, in each element of the

array

Page 51: A Bizarre Way to do Real-Time Lighting

Step #6: Lighting (Batch I) qword everywhere. Batch reads and

writes into 16 byte chunks.Read 4 pixels from

normal/depth.Write 4 pixels to

lighting buffer.

qword clmp0 = si_cfltu(diffuse0, 0x20);qword clmp1 = si_cfltu(diffuse1, 0x20);qword clmp2 = si_cfltu(diffuse2, 0x20);qword clmp3 = si_cfltu(diffuse3, 0x20);qword r = si_ila(0x8000);qword scl = si_ilh(0xff00); dif0 = si_mpyhhau(clmp0, scl, r); dif1 = si_mpyhhau(clmp1, scl, r); dif2 = si_mpyhhau(clmp2, scl, r); dif3 = si_mpyhhau(clmp3, scl, r);const vector unsigned char _shuf_uint = { 0xc0, 0x00, 0x04, 0x08, 0xc0, 0x10, 0x14, 0x18, 0xc0, 0x00, 0x04, 0x08, 0xc0, 0x10, 0x14, 0x18 };qword shuf_ = (const qword)_shuf_uint;qword base_add = si_from_ptr(pResult);qword p0_1 = si_shufb(dif0, dif1, shuf_);qword p0_2 = si_shufb(dif2, dif3, shuf_);qword pix0 = si_selb(p0_1, p0_2, m_00ff);

si_stqd(pix0, base_add, 0x0);

qword depth_addr = si_from_ptr(depth_buf);qword depth0 = si_lqd(depth_addr, 0x00); qword depth1 = si_lqd(depth_addr, 0x10);qword depth2 = si_lqd(depth_addr, 0x20);qword depth3 = si_lqd(depth_addr, 0x30);

Page 52: A Bizarre Way to do Real-Time Lighting

Step #6: Lighting (Balance) Lighting SPU program performance limited by number

of instructions issued. Pipeline balance is vital!

SPU dual issues if: Correctly aligned within single fetch group. No dependencies. Instructions are for correct pipelines.

Luckily, compiler maintained balance quite well with nop/lnop insertion and some instruction re-ordering.

Lighting larger batches helps out balance at the cost of register file usage Mileage may vary here again, how bad are you hammering

the even pipe?

Page 53: A Bizarre Way to do Real-Time Lighting

Step #6: Lighting (Batch II)

Fixed setup cost for a single line of our sub-tile size (32 pixels wide). Unfortunately, too many to

process at once despite SPU’s massive register file . Loop is pipelined and lots of live variables to multiplex onto register file.

Settled for 16 pixels, no spilling . Note: First attempt worked on

4 pixel batches like RSX™. Lots of wasted cycles in inner loop – less dual issue.

32 Pixels

16 Pixels

Register spilling...

Wasted cycles and increased setup overhead

Happy medium 16 Pixels

4 4 4 4 4 4 4 4

Page 54: A Bizarre Way to do Real-Time Lighting

Step #7: Culling Culling works on more granular sub-tiles. Allows us

to potentially reject more tiles (of course, YMMV ). (Note: diagram below is an example, it’s not our actual sub-

tile size).

Similar to GPU, basically a tile is culled if... Depth max and min depth are both far clip. No lights intersect the frustum constructed for the tile.

Sub-tile

Page 55: A Bizarre Way to do Real-Time Lighting

Step #7: Culling

Remember, SPUs can execute general purpose code.Take advantage of high-level constructs

where they are suitable – this means branches, early-outs, etc.○ Note: Branches generally suck. Not suitable in

lighting inner-loop, discard an entire sub-tile at once.

Page 56: A Bizarre Way to do Real-Time Lighting

Step #8: Synchronisation Custom SPURS policy module made

RSX™ initiated jobs easy. Our jobs can optionally depend on a 128 byte line written by RSX™ (or PPU, whatever).

Non-blocking :Freedom to run other scheduler tasks while waiting.

○ Really should investigate using SPE’s mailboxes to stop us from hammering the bus.

Physics team happy again!Not pre-emptive.

Page 57: A Bizarre Way to do Real-Time Lighting

Step #8: Synchronisation Can be painful! Expect hard to find bugs here.

We had a couple, *ahem* both were other Steve’s fault ;-)

Worth it in the end though! Keep an eye on overall timings.

Originally lighting pushed out physics.Very easy to forget the bigger picture.Impossible to predict up front.

Page 58: A Bizarre Way to do Real-Time Lighting

Step #9: Slotting it in...

Mirror Reflection

Lighting #3

Main Scene

Lighting #2

Lighting #1

Pre-Pass

Physics

Physics

Physics Car Damage

Audio

Physics

Physics

Command Buffer

Command Buffer

Command Buffer

Command Buffer

GPU:

SPU:

Audio Command Buffer

Scene Graph

Scene Graph

Scene Graph

Page 59: A Bizarre Way to do Real-Time Lighting

Step #9: Slotting it in... Ended up running the lighting on 3

SPUs, still easily within our timeframe and no longer pushed the physics out.

Page 60: A Bizarre Way to do Real-Time Lighting

Step #10: Profit! SPU implementation faster than RSX™ even

without parallelism. (~2-3ms on 3 SPUs). Overall frame latency reduced by up to 25%! More benefits:

Blending in alternative colour space becomes trivial.Add value by outputting other useful stuff from SPU

program – down-sampled Z buffer anyone? Lighting becomes free*.

* - In the strictest computer science sense of the word, ;-).

Page 61: A Bizarre Way to do Real-Time Lighting

The Future... MSAA -- Big challenge, but solvable... Experiment with different colour spaces? Remove de-coding step...

Upsets my OCD as not really needed for the data transformation –

But also allows us to overlap input and output buffers. Specular. Better normals:

Ideally higher precision for use in main pass.Fix positive z-component sign assumption.

○ Stereographic Projection○ Lambert Azimuthal Equal-area Projection et al.

Page 62: A Bizarre Way to do Real-Time Lighting

References[Engel08] W. Engel, “Light Pre-Pass Renderer”,

http://diaryofagraphicsprogrammer.blogspot.com/2008/03/light-pre-pass-renderer.html, accessed on 4th July 2009

[Balestra08] C. Balestra and P. Engstad, “The Technology of Uncharted: Drake’s Fortune”, GDC2008.

[Swoboda09] M. Swoboda, “Deferred Lighting and Post Processing on PLAYSTATION®3”, GDC2009.

Page 63: A Bizarre Way to do Real-Time Lighting

Special Thanks!

Matt Swoboda and Colin Hughes (SCE R&D)

and

The Bizarre Creations Core Tech Team

Page 64: A Bizarre Way to do Real-Time Lighting

Shameless Plug

Steve and I contributed to this book... It’s out

March 2010, you should buy it for your desk, studio library, etc.

http://gpupro.blogspot.com

Page 65: A Bizarre Way to do Real-Time Lighting

Thanks for Listening! Questions?

Check out www.blurgame.com


Recommended