Depth of Field and Motion Blur in Realtime Computer … ﬁeld rendering on modern graphics...

}

Depth of Field and Motion Blur in

Realtime Computer Graphics

Advanced Seminar by Prof. Dr. Alfred Nischwitz

University of Applied Sciences Munich

}

Thomas Post

Summer Term 2013

Abstract

Recent video games increasingly try to produce a more cinematic atmosphere. Animportant part of cinematography is depth of field and motion blur. With depth of fielda director can focus a viewer’s attention to a certain region of the scene. It can be seen asan artistic tool for motion pictures. To bring video games more closer to motion picturesthis artistic tool is also widely used in recent video games. Although motion blur is moreof a by product relating to the physical limitations of photo and video cameras, it canalso be used as an artistic tool to visually enable a better perception of movement andactions in motion pictures and video games.

This report takes a look at some state of the art techniques for motion blur and depthof field rendering on modern graphics pipelines in realtime. Depth of field is covered withan older approach that is often used and is extended with the e↵ect of a fast aperturebokeh. Motion blur is covered with a rather new approach that can also be used fordepth of field. This approach is intended for future generations of graphics hardware andis not production ready today.

Contents

1. Introduction 41.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2. Research Papers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3. Document Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2. Depth of Field 62.1. Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2. Basics of Camera Lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3. Bokeh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.4. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.5. Screen-Space Depth of Field . . . . . . . . . . . . . . . . . . . . . . . . . . 82.6. Screen-Space Bokeh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.6.1. Separable Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.6.2. Combining Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.6.3. Artefacts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.6.4. Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.6.5. Performance & Results . . . . . . . . . . . . . . . . . . . . . . . . . 12

3. Motion Blur 133.1. Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2. Shutter and Image Sensor . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.3. Camera - and Object Motion Blur . . . . . . . . . . . . . . . . . . . . . . 133.4. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.5. Multi Sample Anti Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . 143.6. Decoupled Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.6.1. Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.6.2. Transparency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.6.3. Performance and Results . . . . . . . . . . . . . . . . . . . . . . . 19

3.7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

A. Code Listings 26

B. Data CD 29

C. Erklarung 30

3

1. Introduction

This work emerged from an advancedseminar with topics from computer graphicsand digital image processing. It is part ofthe computer science masters degree courseat the Munich University of Applied Sci-ences. The Seminar was held by Prof. Dr.Alfred Nischwitz and focused on the currentstate of realtime computer graphics.

This work covers the toppics of Depth

of Field and Motion Blur. Two state ofthe art research papers are evaluated anddiscussed. Where it suits the evaluation ordiscussion, implementations are either madeor reviewed. The first paper is based on awidley used approach and extends to adda bokeh e↵ect to the depth of field. Thesecond paper introduces a new approach onhow to render depth of field and motionblur.

1.1. Overview

In real-time computer graphics a pin-holecamera model is used. While this modelis classed as mathematically ideal, in thereal world no camera is a perfect pin-hole.The aperture nor the shutter speed can beinfinite small or short and there are lensesinvolved that bring distortion and refractionwith them. Since we are used to the e↵ectsproduced by these non ideal pin-hole cam-eras, we want them in computer graphicsas well. They produce a cinematographice↵ect which makes it look more realisticand are used as an artistic tool. In motionpictures DOF is used to attract the viewersattention to a certain region in an image

or video. The same can be done in videogames e.g in cut scene.

The work here focuses on Depth ofField(DOF), Motion Blur and Bokeh whichin itself is an addition to DOF. DOF is af-fected by the curvature of the lens and thenot infinite small aperture. Motion Bluroccurs because the shutter speed is not in-finite small. So the images start lookingblurry on fast movements of the camera orparts of the scene.

1.2. Research Papers

The Motion Blur Rendering: State of theArt [Navarro et al., 2011] research is a goodstarting point for motion blur rendering.All kinds of techniques are reviewed andcompared. According to this paper one ofthe most recent and most promising motionblur techniques seems to be Decoupled sam-pling for graphics pipelines [Ragan-Kelleyet al., 2011]. It introduces a new approachto decouple shading from visibility sampling.Despite the fact that this is a very new pa-per, there is already some research thattakes this approach further. Since deferredshading1 [Hargreaves, 2004] becomes moreand more popular in recent game engines,combining it with decoupled sampling seemspopular, as seen in [Liktor and Dachsbacher,2012] and [Petrik et al., 2013].

1“Deferred shading is a screen-space shading tech-

nique. It is called deferred because no shading is

actually performed in the first pass of the vertex

and pixel shaders: instead shading is deferreduntil a second pass.” [Wikipedia, 2013b]

4

1.3. DOCUMENT STRUCTURE CHAPTER 1. INTRODUCTION

Latest research in stylized depth of fieldis E�ciently Simulating the Bokeh of Polyg-onal Apertures in a Post-Process Depth ofField Shader [McIntosh et al., 2012]. Herea rather simple and fast screen-space ap-proach is taken to simulate polygonal aper-tures and simulate the so called Bokeh e↵ect.It is based on a very popular technique forDOF rendering first introduced by [Scheuer-mann and Tatarchuk, 2004] and [Rigueret al., 2004].

1.3. Document Structure

This introduction gave a general overviewof the covered topics and evaluated researchpapers. The following chapters cover theseresearch papers in more detail. Chapter 2Depth of Field covers the E�ciently Sim-ulating the Bokeh of Polygonal Aperturesin a Post-Process Depth of Field Shaderpaper. First a look at how DOF with a con-ventional photo or video camera works andhow bokeh comes into existence. It gives anintroduction of the general problematics ofthe screen space approach and how they canbe solved. An implementation of the paperwas made and results of it are presented.

The next chapter, 3 Motion Blur, cov-ers the Decoupled sampling for graphicspipelines paper. This is a relatively newapproach to simulate stochastic renderingof Motion Blur and DOF. It is based on theidea of Multi Sample Anti Aliasing (MSAA).In addition to this, a more detailed look attwo suggested implementations from [Lik-tor and Dachsbacher, 2012] is taken. Itshows a more practical implementation andextends decoupled sampling with deferredshading. Results from both [Ragan-Kelleyet al., 2011] and [Liktor and Dachsbacher,2012] are compared and evaluated.

Finally the reviewed research is

summed up and regarded in matter ofprogress in these fields of research.

5

2. Depth of Field

2.1. Outline

This chapter covers depth of field render-ing with bokeh in a screen-space technique.Based on the research in [McIntosh et al.,2012].

First we take a look at the physicalmodel of cameras and lenses and show howdepth of field and bokeh comes into exis-tense. Sections 2.2 Basics of Camera Lensesand 2.3 Bokeh cover the physical aspects.

Section 2.4 Related Work takes a lookat the related work of [McIntosh et al., 2012].From this, the idea of screen-space is demon-strated and how DOF is achieved with thistechnique. Section 2.5 Screen-Space Depthof Field discusses the implementation andshows the problems and inaccuracies com-pared to the physical model.

In the last section, 2.6 Screen-SpaceBokeh, the bokeh technique of [McIntoshet al., 2012] is reviewed. A complete im-plementation was achieved with the render-ing engine Irrlicht. A closer look at thisimplementation and the problems of thistechniques are exposed.

2.2. Basics of Camera Lenses

When a real camera with a lens takes a pic-ture the light rays are refracted by the lensand, depending on the distance between thelens and the object, they end up at di↵er-ent locations on the image plane. Figure2.1 shows that objects at di↵erent distances(DN , S,DF ) end up at di↵erent locations onthe image plane (VF , V, VN ). The blue ob-

ject becomes sharp on the image plane, butthe red and green objects become blurredeither because they’re too far away or tooclose. The size of the aperture c also changeswhich decides how intensively blurred theobject becomes.

Since realtime rendering pipelines douse a pinhole camera model they do nothave a lens that would generate DOF. Thise↵ect therefore has to be applied addition-ally. In non realtime rendering with ray-tracing a lens can be simulated relativelyeasily. Multiple rays from slightly di↵erentpositions can simply be shot through a fo-cal point. For more on this see [Cook et al.,1984].

2.3. Bokeh

Figure 2.2.: Bokeh Photography1

The word Bokeh origins from japanese (ˇ⌘) and is blur or the aesthetic quality of

1Rushing Mania / CC-BY-NC-2.0

6

http://creativecommons.org/licenses/by-nc/2.0/

2.4. RELATED WORK CHAPTER 2. DEPTH OF FIELD

Figure 2.1.: DOF with Lense2

the blur, in out-of-focus areas of an image[Wikipedia, 2013a]. Its origin lies in theshape of the aperture and lens.

This means that the size and geometryof d in figure 2.1 e↵ects how the out-of-focusvarious areas (VF , VN ) look like. In figure2.2 a seven edge polygon shape is used as anaperture. This results in seven edge shapedfragments in the blurred areas.

2.4. Related Work

In [McIntosh et al., 2012] five di↵rent cate-gories of depth of field techniques are listed:

• Distributed traced rays across the sur-face of a lens. This technique deliversthe best quality but is for ray-tracingand is not very suitable for moderngraphics pipelines and real-time com-puter graphics.

• Rendering the scene from multiple cam-eras (accumulation-bu↵er). This alsodelivers qualitative good results but is

2Je↵ Conrad / CC-BY-SA-3.0

too slow for real-time computer graph-ics.

• Rendering and compositing multiplelayers. This technique is becomingmore and more popular and is possibleto do in real time.

• Forward mapped z-bu↵er. The influ-ence of each pixel on other pixels iscomputed. This his not very suitablefor modern graphic pipelines.

• Reverse mapped z-bu↵er. The idea isthe same as in forward mapped z-bu↵erbut for any one pixel all pixels influenc-ing this pixel are considered. This is alot more suitable for modern graphicpipelines. Therefore this approach ischosen here.

A standard screen space depth of fieldapproach that is described in [Riguer et al.,2004] and [Scheuermann and Tatarchuk,2004] is used. These are conventional-reverse mapped z-bu↵er techniques. Thesetechniques do not consider bokeh, they sim-ply blur depending on the depth value in

7

http://creativecommons.org/licenses/by-sa/3.0/

2.5. SCREEN-SPACE DEPTH OF FIELD CHAPTER 2. DEPTH OF FIELD

the z-bu↵er.For the blur a detachable low-pass filter

is used. This is a very standard approachthat is widely used in modern computergraphics for all kinds of blurs and othere↵ects. Often gauss or mean filters are used.

2.5. Screen-Space Depth ofField

Screen Space: in screen space techniquesusually the entire scene is rendered into atexture instead of rendering it directly to thescreen. In a second pass some algorithmsare applied to this texture. Together withadditional information, such as a depth mapor a normal map, a wide variety of e↵ectscan be applied. The advantage of this e↵ectis that it is completely independent fromthe scenes complexity. It is called screenspace since it operates on the 2D image orscreen space, but not in the 3D space.

Circle of Confusion: the reverse mappedz-bu↵er technique uses variable filter sizesthat is related to the depth value of eachpixel. To determine the filter size, the (circleof confusion, CoC) formula 2.1 is used. Theresult of this formula is then stored into thealpha channel of the image. No extra renderpass for the CoC map is needed.

c = A ·

|S2|� |S1|

S2·

f

S1 � f(2.1)

c Circle of Confusion size

A The aperture’s diameter

f focal length

S1 focal distance

S2 The pixels depth value from the z-Bu↵er

This formula is derived from the wellknown thin-lense-model. Figure 2.3 showsa rendering with this simple depth of fieldapproach. On the upper left the red squareshows the depth map of the rendered scene.But this method comes with a few down-sides that are addressed only partially here:

Figure 2.3.: Simple Depth of Field Render-ing

Intensity leakage or pixel bleeding: fo-cused objects leak onto blurred objectsin the background. Since the object inthe background does have a large CoC italso includes the object on the foreground.[Riguer et al., 2004] and [Scheuermann andTatarchuk, 2004] address this problem byadding weight to all samples according totheir depth value in comparison to the av-erage depth.

Lack of partial occlusion: on a phototaken by a camera blurry objects on the fore-ground spread out smoothly over focusedareas on their edges. In post processingdepth of field techniques blurry foregroundobjects stay opaque. Figure 2.4 shows anexample of this problem. A pillar in front

8

2.6. SCREEN-SPACE BOKEH CHAPTER 2. DEPTH OF FIELD

covers a wall that is in focus on the back-ground. The edges of the pillar are opaque.

Figure 2.4.: Lack of Partial Occlusion andSharp Silhouettes

In [McIntosh et al., 2012] this problemis unsolved, as it remains in the implemen-tation for this seminar. But new research[Schedl and Wimmer, 2012] applies a lay-ered approach to the screen space DOF tosolve this problem.

Sharp silhouettes: is another artifact thatcan be seen in figure 2.4 too. The edges arenot only opaque, they are also sharp. Thisproblem can be reduced by filtering the codmap. The di↵erence between the CoC sizesof two neighbor pixels in di↵erent focal areasare not so large anymore. Details about thissolution can be found in [Hammon, 2007].

No realistic bokeh: since the shape of theaperture is normally a square in these ap-proaches, there is no aperture bokeh. Thisis where [McIntosh et al., 2012] comes inaction. It adds a bokeh e↵ect by adjusting

the shape of the filter kernel.

2.6. Screen-Space Bokeh

A naive approach to achieve bokeh wouldbe to use a kernel that is shaped like thedesired aperture shape. This would resultin good bokeh and has the advantage thatevery desired shape is possible. However,its performance would be very poor. If wewant a polygonal kernel like in figure 2.5 upto a 120 samples would be required. Sincecurrent low end graphics hardware is noteven able to perform 32 texel fetches perfragment, this solution is not practicable.

Figure 2.5.: Naive Filter

This is where [McIntosh et al., 2012]comes up with a separable approach forpolygonal shaped filters. It begins with amodified ‘box-blur’ filter, which can simu-late the e↵ect of any parallelogram-shapedaperture. By combining two or more ofthese parallelogram-shaped filters manykinds of shapes can be generated.

9


2.6.1. Separable Filters

A fixed number of samples are used. Thesesamples are spread according to the CoCof the currently processed pixel when filter-ing the image. In a second pass the samesamples are used to filter the output of thefirst pass. This time, however, the samplesare rotated by some angle. With this anyparallelogram shaped filter can be gener-ated. Figure 2.6 shows a filter kernel thatis rotated by ⇡

4 .

+ =π/4

Figure 2.6.: Parallelogram Shaped Filter

2.6.2. Combining Filters

Now to achieve a more complex shape, mul-tiple images filtered with di↵erent shapescan be combined with boolean operations.This technique is known from constructivesolid geometry. Figure 2.7 shows two exam-ples of such combinations.

∩ =

∪ =

Figure 2.7.: Combining Simple Shapesto Generate More ComplexShapes

The problem is however that we onlyhave pre-filtered images. In these imagesevery pixel has its own polygon and over-laps the polygons of all surrounding pixels.

Therefore a proper boolean intersection orunion is not possible. However there are theMin(x, y) and Max(x, y) functions whichwill accomplish a similar e↵ect.

With the Min(x, y) function the leastbright of x and y is returned, so a booleanintersection by preserving bright pixelsonly can be achieved. The boolean unioncan therefore be approximated with theMax(x, y) function. It returns the brighterpixel of x and y. [McIntosh et al., 2012] sug-gests to use the Max(x, y) over the Min(x, y)function since the results show that it pro-duces less artifacts.

2.6.3. Artefacts

This technique produces good results, butthere are still some flaws. If the CoC of vari-ably bright points start to overlap, bright de-focused highlights with the wrong shape ap-pear. Also the Min(x, y) function decreasesthe images intensity in the defocused areas.The Max(x, y) function on the other handcauses problems on overlapping defocusedhighlights. If they do not overlap in theinput image but should in the combinedimage. Max(x, y) also produces slightly in-creased brightness of the image.

The major artifacts appear to do thesampling of the filter kernel. With a fixedsize of samples the larger the filters are, themore artifacts arise. With more sampleson larger filters this could be reduced whilestill maintaining good performance on smallfilters.

2.6.4. Implementation

The implementation developed for this semi-nar is based on the rendering engine Irrlicht.Irrlicht is a platform independent enginewritten in C++ and has all the requiredfeatures to implement this technique. The

10


(a) Result of a default quadratic filter

(b) Result of the First Filter Pass (angle -

⇡4) (c) Result of the Second Filter Pass (angle

⇡4)

(d) Result of the Max(x, y) Function (e) Result of the Min(x, y) Function

Figure 2.8.: Bokeh Rendering

11


prototype is organized in four steps.

1. Render the scene into a texture. Cal-culate the CoC for every fragment andstore it in the alpha channel.

2. Apply the first separable filter in tworender passes with the given rotationangles.

3. Apply the second separable filter in tworender passes with the given rotationangles. If the first pass of the previousstep uses the same filter the output ofit could be reused for this step.

4. Finally combine the results form step 2and 3 with the Min(x, y) or Max(x, y)function.

Listing A.2 in Appendix A shows thefragment shader used for step 2 and 3. Itis a simple implementation with an averagefilter.

A look at the implementation makes italso obvious that the entire process is per-formed on the GPU. Only mapping rendertargets and textures is initiated by the CPU.Everything else takes part in three shaders.The CoC shader, the filter shader and thefinal combination shader. In the actual pro-totype an additional render pass is done touse the built in shaders for rendering thelight map scene. However, this is just anIrrlicht implementation detail and shouldnot be of any concern here.

2.6.5. Performance & Results

In [McIntosh et al., 2012] the performanceof the separable approach is first comparedto the naive approach. The abnormalitythere is that the naive approach with 256and more samples does not compile anymoreon a GeForce 8600 GT 256MB video card.In the implementation developed for this

seminar a test on a low end Intel HD3000graphics card allowed for no more than 8samples per pass.

With regards to the naive approach,the performance is estimated according tothe number of samples taken. For the sep-arable approach the performances can beestimated with the number of samples perpass squared. Since the images are also com-bined afterwards this is not totally accurate,but enough for an approximation.

The computation power required for anaive filter with 64 samples, a 16⇥ 16 filterwith a combination of two images could beused instead. To achieve that however witha naive filter more than 256 samples wouldbe needed. One can clearly see that theamount of samples taken can be dramati-cally reduced by this separable approach.

We see that this separable approach ismuch faster as a naive approach while stilldelivering acceptable results. A comparisonwith other DOF techniques is missing from[McIntosh et al., 2012] but is suggested inthe future work section. Using this tech-nique as a fallback for lower-end hardwareis also proposed, if one wants to use a bet-ter but also more computationally intensivebokeh e↵ect.

12

3. Motion Blur

This Chapter covers Motion Blur andDOF rendering with Decoupled Samplingfor Graphics Pipelines [Ragan-Kelley et al.,2011]. It is inspired by the idea of MultiSample Anti Aliasing (MSAA) and Render-Man’s Reyes architecture [Cook et al., 1987].The paper covers Motion Blur and DOF butthe focus in this chapter is on Motion Blursince the general part of DOF is alreadycovered in the previous chapter.

3.1. Outline

The first part of this chapter covers generalmotion blur. It shows how motion blur isgenerated on conventional video camerasand what di↵erent kinds of motion blur canoccure. Sections 3.2 Shutter and Image Sen-sor and 3.3 Camera - and Object MotionBlur cover this.

In section 3.4 Related Work a look istaken at the work upon which [Ragan-Kelleyet al., 2011] is based. Also some work thatfurther extended [Ragan-Kelley et al., 2011]is covered there. Then the details of thedecoupled sampling approach is covered insection 3.6 Decoupled Sampling.

The final chapter 3.7 Conclusion is aclosing part that summs up the work.

3.2. Shutter and Image Sensor

When a camera takes an image the imagesensor has to be exposed to the light. There-fore the shutter is opened for some time.Now if something changes in the scene dur-ing this time multiple photo diodes are ex-

posed from the same object and the objectwill appear blurry on the final image. In or-der to have an image that does not have anymotion blur, the exposure time has to be aninfinitely short time. This is the case on amodern graphics pipeline but can never beachieved on a real-world camera. To achievea realistic looking rendering this e↵ect hasto be simulated with a modern graphicspipeline.

3.3. Camera - and ObjectMotion Blur

Often motion blur is categorized in two cat-egories: Camera Motion Blur (CMB) andObject Motion Blur (OMB). CMB originsfrom the movement of the camera duringthe exposure and CMB origins from objectsin the scene being moved during exposure.

In recent video games CMB is often ap-proximated with a very simple techniquethat delivers appealing results. It is a verysimilar technique to the DOF techniqueused in the previous chapter 2 Depth ofField. The research of Valve in [Vlachos,2008] and Crytek in [Sousa, 2008] coverthese approaches for CMB in detail.

For OMB a lot more work has to bedone, since it only applies to some objectsin the scene. For most techniques additionalmotion information is stored or computedfor moving objects. These objects are thenblurred according to their motion informa-tion. The decoupled sampling can produceaccurate results for both kinds of motionblur. It also takes into account that when

13

3.5. MULTI SAMPLE ANTI ALIASING CHAPTER 3. MOTION BLUR

an object is exposed to multiple pixels itdoes not have to be shaded multiple times.

Lets use an example of a rocket flyingthrough a scene in front of the eyes of auser or game-player. Motion vectors for thisrocket have to be calculated and the rocketis then blurred and or deformed accordingto these vectors. To simulate CMB with thistechnique every object in the scene becomemotion vectors.

3.4. Related Work

To render high quality antialiasing, motion,and defocus blur an accumulation bu↵er[Haeberli and Akeley, 1990] or stochasticrasterization [Cook, 1986] [Akenine-Molleret al., 2007] can be used. These techniquesdeliver very high quality results but arestill too expensive for real-time computergraphics today.

As mentioned before another techniqueis the aforementioned used in the previouschapter 2 Depth of Field. This could alsobe used for motion blur but it would su↵erfrom a lot of approximation errors. This iswhere decoupled sampling comes in since itdoes not su↵er from the same approxima-tion problems as screen space techniques.

Decoupled sampling is inspired byReye’s separation of shading and visibilityrates [Cook et al., 1987]. Reyes uses socalled micro polygons : geometry primitivesare tessellated in small quads. Then thesemicro polygons are shaded. This has thedisadvantage over shading since shading isdone before the visibility test and the split-ting into these micro polygons is expensive.Reyes decouples shading from visibility butcouples the shading rate to the geometrysampling rate. [Fatahalian et al., 2009]

[Ragan-Kelley et al., 2011] decouplesshading from both, visibility and geometry

sampling with motion- and defocus blur.This is achieved by rendering in a way likeMSAA is performed on modern graphicshardware. With MSAA the shading is de-coupled from visibility sampling. But therelationship between shading samples andvisibility samples is always one-to-n. Visibil-ity samples in a certain area always map toone visibility sample. In decoupled samplingthe relationship is inverted to a many-to-onerelationship. This is done with a shadingcache that can be reused for visibility sam-pling. The shading rate on blurred areas ofthe scene is therefore reduced.

This technique is extended by [Liktorand Dachsbacher, 2012] and [Petrik et al.,2013] with deferred shading. This combi-nation is, according to [Liktor and Dachs-bacher, 2012], more “interactive” than “realtime” however. [Ragan-Kelley et al., 2011]cannot make a statement about real perfor-mance since it was only implemented in asimulator.

3.5. Multi Sample AntiAliasing

With MSAA shading is performed in pixelresolution and visibility is processed at asupersampled resolution. The shading isthen blended with the frame bu↵er at thesupersampled resolution. In the final stepthe blended supersampled bu↵er is downsampled to pixel resolution. Figure 3.1 givesa good overview of that process.

This technique only anti aliases theedges of polygons. It does no anti alias-ing on the textures inside the polygons. Fortexture anti aliasing other techniques suchas mip-mapping can be used. The memoryconsumption of this technic is also criti-cal. Since 2⇥ MSAA already uses twicethe amount of memory as without MSAA.

14

3.6. DECOUPLED SAMPLING CHAPTER 3. MOTION BLUR

GPU Memory Display

Rasterizer

Shader

Blending

Framebuffer

UpdatedFramebuffer Result

Figure 3.1.: Multi Sample Anti Aliasing1

Compared to full super sampling, however,this is still a good tradeo↵.

3.6. Decoupled Sampling

Now lets first take a look at the pseudocode of the decoupled sampling algorithmin listing 3.1:

Listing 3.1: Decoupled Sampling

for all primitives do

setup, compute edge equationsfor all visibilitysamples do

skip if not visiblemap to shading sampleif not in cache then

shade and cacheelse

use cached valueend if

1From [Ragan-Kelley et al., 2011]

update frame bu↵erend for

end for

In the first step a primitive is rasterizedagainst visibility samples. Here extra dataper vertex might be included, such as a tfor the time dimension (motion blur) or u, vfor the lens (DOF).

Next step maps the visibility sample toa shading sample. E.g a vertex is movedover time and results in multiple visibilitysamples. The shading sample, however, isalways the same since it is assumed thatthe shading will not alter over time. Thevisibility sampling is therefore decoupledfrom the shading and results in a many-to-one relationship.

A cache is used for the shading sam-ples. For every visibility sample a lookup isdone and the cache value is used if the shad-ing was already computed for this visibilitysample. Otherwise the shading sample is

15


computed and stored in the cache for lateruse. [Ragan-Kelley et al., 2011] uses times-tamps in the cache. On every access of thecache the timestamp is set to the currenttime. When the cache is full and a newshading sample has to be added, the en-try with the oldest timestamp is replaced.This is a so called LRU (least recently used)strategy. In [Liktor and Dachsbacher, 2012]a bucket hash array is used as a cache, andthe shading samples are replaced on hashcollisions. This might result in some unnec-essary re-shading but can be implementedfaster than the LUR strategy.

In the last step the resulting color isblended into the frame bu↵er at full super-sampled resolution. This is equivalent tothe MSAA with the exception that it is nota strict one-to-many relationship.

3.6.1. Implementation

To implement this algorithm directly on amodern graphics pipeline a few problemshave to be solved first. The main prob-lem is that the GPU runs fragment shadersin parallel. A cache of already computedfragments is therefore critical since syn-chronization is necessary. The followingsections cover two alternative implemen-tations. They are both from [Liktor andDachsbacher, 2012] and [Liktor and Dachs-bacher, 2013], where the second is a GPUPro4 article based on the first one. Theimplementations from [Ragan-Kelley et al.,2011] are not discussed any further heresince the one intended for implementationon conventional GPUs is more theoreticaland the other suggested implementation isfor the Larrabee architecture which is stillonly theoretical. Figure 3.2 gives a goodoverview of how all these implementationswork in general.

Global Shading Cache First the range ofthe Shading Sample ID’s (ssID) for ev-ery primitive is generated. This can bedone with a geometry shader and an atomiccounter. Fragment shaders can then mapvisibility samples to shadings samples andeliminate redundancy from the shading data.Listing 3.2 shows pseudo code for such a ge-ometry shader. To implement this at leastOpenGL 4.2 is required.

Listing 3.2: Geometry Shader

// screen�space pos i t ions

in vec2 in scrPos [ ] ;// shading gr id of the t r i ang l e

f l a t out ivec4 domain ;// ID of the f i r s t sample in the sh . gr id

f l a t out uint startID ;// g l o ba l SSID counter array

uniform f loat shadingRate ;layout ( size1x32 ) uniform uimageBuffer uCtrSSID ;

void main(){// pro jec t screen pos i t ion to the shading

gr id

vec2 gPos0 = scrPos [ 0 ] ⇤ shadingRate ;[ . . . ]vec2 minCorner = min(gPos0 ,min(gPos1 , gPos2) ) ;vec2 maxCorner = max(gPos0 ,max(gPos1 , gPos2) ) ;// shading gr id : xy�top l e f t corner , zw�gr id

s i z e

domain . x = int (minCorner . x) � 1;domain . y = int (minCorner . y) � 1;domain . z = int ((maxCorner . x) ) � domain . x + 1;domain .w = int ((maxCorner . y) ) � domain . y + 1;// we a l l o ca t e the ssID range with an atomic

counter .

uint reserved = uint ((domain . z )⇤(domain .w) ) ;startID = imageAtomicAdd(

uCtrSSID ,0 ,reserved

) ;}

2

To improve performance of the cachelookup the fact that only recently storedshading values are “interesting” is used.This originates from the streaming natureof GPUs where fragments that are close toeach other are also processed close to eachother in time. Instead of searching the en-tire bu↵er only the recently used ssID’s aresearched.

[Liktor and Dachsbacher, 2012] uses abucket hash array with a simple modulo as

2Excerpt from [Liktor and Dachsbacher, 2013]

16


Geometry Shader

Cache

Shading Requests

covered subpixels visible subpixelsprimitives

Shading

Map Frame Buffer

colored subpixels

Rasterizer Depth Test

Figure 3.2.: Decoupled Sampling Piepeline

the hash function h(). To ensure that anexisting cache value does not origin froma hash collision, the actual ssID is storedalong with the shading value. These tuplesare then stored in a FIFO queue per bucket.To determine if a shading value is alreadycomputed or not. h(ssID) is computed andthe according value is fetched. To ensurethat not multiple threads insert the samessID into the cache synchronization hasto be introduced. Since the same ssIDalways points to the same bucket a binarysemaphore per bucket is used. Listing 3.3shows a pseudo code implementation of sucha synchronized cache. In this code insteadof a direct shading sample only an addressto the storage of the shading data is used.This is due to the deferred shading. Onlythe address to the deferred shading datais stored in the cache instead of the directshading sample.

Listing 3.3: Global Shading Cache

layout ( rgba32ui )uniform uimageBuffer uShaderCache ;

layout ( r32ui )uniform vo la t i l e uimageBuffer uBucketLocks ;

int getCachedAddress (uint ssID ,

inout bool needStore){

int hash = hashSSID( ssID) ;uvec4 bucket = imageLoad(uShaderCache , hash) ;int address = searchBucket ( ssID , bucket ) ;

// cache miss

while (address < 0 &&iAttempt++ < MAXATTEMPTS

){// t h i s thread i s competing for s tor ing a

sample

uint lock = imageAtomicCompSwap(uBucketLocks ,hash ,FREE,LOCKED

) ;i f ( lock == FREE){

address = int (atomicCounterIncrement (bufferTail ) ) ;

// update the cache

bucket = storeBucket ( ssID , hash , bucket ) ;imageStore (uShaderCache , hash , bucket ) ;needStore = true ;

memoryBarrier () ; // re l ease the lock

imageStore (uBucketLocks , hash , FREE) ;}i f ( lock == LOCKED){

while (lock == LOCKED &&lockAttempt++ < MAXLOCKATTEMPTS

) {lock = imageLoad(uBucketLocks , hash) . x ;

}// now try to get the address again

bucket = imageLoad(uShaderCache , hash) ;address = searchBucket ( ssID , bucket ) ;

}// i f everything fa i l ed , s tore the data

redundantly

i f ( address < 0){address = int (atomicCounterIncrement (

bufferTail ) ) ;needStore = true ;

}}

}

17


3

Per-Tile Shading Cache The above algo-rithm uses global memory atomics. Thisbecomes the main bottleneck of the entireprocess. So the following implementationtakes into account that a shading sample isnormally only reused in a certain area onthe screen and not at completely arbitrarylocations. Therefore the image is split intouniform tiles. The cache is then only validinside such a tile and it is entirely processedin one thread. So no synchronization be-tween threads is necessary. This has thedisadvantage that some samples do not endup in the same tile and have to be shadedmultiple times. So the tile size selection isimportant and might vary according to theamount of blur used. The per-tile shadinghas three steps:

• In the first step a depth map and allthe ssID’s for the entire scene are gen-erated. Instead of directly mapping thesamples in the fragment shader (as inthe previous approach). The ssID’sare written out and used in a secondpass.

• The second pass then tiles the data andprocesses every tile in a single thread.There the ssID’s are mapped to shad-ing data addresses (again because ofthe deferred shading). This pass canbe implemented in a fully computa-tional pass to control the thread execu-tion. In [Liktor and Dachsbacher, 2012]OpenCL is suggested for this step.

The output of this pass is an address foreach visibility sample where the shad-ing data can be stored.

3Excerpt from [Liktor and Dachsbacher, 2012]

• Now in the last step the shading datais physically stored. The depth bu↵erfrom the first pass is here used to en-sure that only visible fragments are exe-cuted here. This avoids that false datais written, since we do not have anysynchronization here.

000

0 -10

- 111

1

000

0 110

1 111

1

000

0 ---

- 111

1

------

------

ssIDDepth

z-fill and ssID mapping

per-tile caching

local cache

local cache

000

0 -10

- 111

1

000

0 110

1 111

1

000

0 ---

- 111

1

------

------

ssID

000

0 -20

- 222

2

000

0 220

1 222

2

000

0 ---

- 222

2

------

------

address

only visible fragments interpolate and store

Figure 3.3.: Per-Tile Shading4

The main advantage of this algorithmover the first one is that the local per tilecache is a lot faster than the global cache.

18


And with a good tile size selection the prob-lem of over shading can be minimized. Fig-ure 3.3 shows this process. The tiling herecauses the corner of the yellow triangle tobe shaded twice since it does not end upentirely in the same tile. For simplicity thetriangles are shaded flat in this example.

[Petrik et al., 2013] uses such atile based approach and extends witwith adaptive anisotropic shading (AAS)[Vaidyanathan et al., 2012]. Figure 3.4shows the over shading at tile edges andthat it is further reduced by AAS.

3.6.2. Transparency

With motion blur and depth of filed alsocomes the requirement of transparency ren-dering. In [Liktor and Dachsbacher, 2012]it is suggested that [Enderton et al., 2010]or [Yang et al., 2010] is used for order inde-pendent transparency.

The first one uses a stochastic approachthat works well with MSAA but adds noise.The second technique uses linked lists togenerate a so called A-bufer. An A-bu↵eris a depth bu↵er that not only stores thenearest fragments position but rather a listwith all fragments at that position. So thefragments can be evaluated in their correctorder to produce a final shading value.

[Ragan-Kelley et al., 2011] does not ex-plicitly point at techniques for solving thetransparency problem. Although it is men-tioned that RenderMan uses an A-bu↵erbut since it needs a complex data structurea stochastic approach would fit better.

3.6.3. Performance and Results

Figure 3.5 is from [Liktor and Dachsbacher,2012] and shows their results. The first rowis a reference implementation using OptiX.

4From [Liktor and Dachsbacher, 2012]

The second row is the result of the decoupleddeferred rendering. In the right columnthe amount of shading samples per pixel(sspp) is visualized. One can see that at theareas of high motion blur the shading rateis dramatically reduced.

In terms of speed and memory con-sumption this approach is still very expen-sive. As in figure 3.5 the gargoyle scenewith motion blur needs about 400ms to ren-der. It is still not therefore capable in re-altime. Further results show the renderingof the sponza5 scene blurred with 251msand sharp with 357ms. There the same ef-fect can be seen that blurred areas needless computational time since the cachingis more e↵ective. Memory consumption inthis approach is also very high. Due tothe deferred-rendering approach the sponzascene at 1280 ⇥ 720 needs around 250MBof graphics memory at 32⇥ supersampling.And this is already using an optimized ver-sion of the regular G-bu↵er used in deferredrendering called CG-bu↵er for compact ge-ometry bu↵er. Only the depth and refer-ence bu↵er is stored at supersampled res-olution. Normal, di↵use and specular areonly stored at the normal sampling rate.A fully supersampled deferred rendering at32⇥ supersampling would use up to 350MBof graphics memory.

[Ragan-Kelley et al., 2011] does a verydetailed performance and quality analysis.Although most of the analysis goes into thecomparison of the sort-last (GPU-style) andthe sort-middle (Larrabee) implementation.The tests are performed with a “Direct3D 9functional simulator”. But since Larrabeeis only a theoretical concept these resultsare not worth a lot. So in the paper itselfit is stated:

5http://www.crytek.com/cryengine/

cryengine3/downloads

19

http://www.crytek.com/cryengine/cryengine3/downloads

http://www.crytek.com/cryengine/cryengine3/downloads

3.7. CONCLUSION CHAPTER 3. MOTION BLUR

To appear in ACM Transactions on Graphics (Proceedings of SIGGRAPH 2013), vol. 32(4)

CITADEL (frame 560) Our algorithm Our algorithm + AAS

Figure 13: The problem of bin spread that is common in tiling architectures, i.e., that large blurs cause triangles to stride tile boundaries,is almost entirely avoided by using adaptive anisotropic shading (AAS) [Vaidyanathan et al. 2012]. The method combines very favorablywith our architecture, and it has no discernible impact on the image quality with the conservative settings we are using. In this example, theaverage shading rate is reduced by 35% from 2.29 to 1.49 executions/pixel, resulting in tiles with very well-balanced shading work.

COOK, R. L., CARPENTER, L., AND CATMULL, E. 1987. TheReyes Image Rendering Architecture. In Computer Graphics(Proceedings of SIGGRAPH 87), ACM, vol. 21, 95–102.

DEERING, M., WINNER, S., SCHEDIWY, B., DUFFY, C., ANDHUNT, N. 1988. The Triangle Processor and Normal Vec-tor Shader: A VLSI System for High Performance Graphics.In Computer Graphics (Proceedings of SIGGRAPH 88), ACM,vol. 22, 21–30.

FUCHS, H., POULTON, J., EYLES, J., GREER, T., GOLD-FEATHER, J., ELLSWORTH, D., MOLNAR, S., TURK, G.,TEBBS, B., AND ISRAEL, L. 1989. Pixel-Planes 5: A Het-erogeneous Multiprocessor Graphics System using Processor-Enhanced Memories. In Computer Graphics (Proceedings ofSIGGRAPH 89), ACM, vol. 23, 79–88.

HARADA, T., MCKEE, J., AND YANG, J. C. 2012. Forward+:Bringing Deferred Lighting to the Next Level. In Eurographics2012 – Short Papers, 5–8.

HASSELGREN, J., AND AKENINE-MOLLER, T. 2006. EfficientDepth Buffer Compression. In Graphics Hardware, 103–110.

IMAGINATION TECHNOLOGIES LTD., 2011. POWERVR Series5Graphics – SGX architecture guide for developers.

JOE, S., AND KUO, F. Y. 2008. Constructing Sobol Sequenceswith Better Two-Dimensional Projections. SIAM Journal on Sci-entific Computing, 30, 5, 2635–2654.

LAINE, S., AND KARRAS, T. 2011. Efficient Triangle CoverageTests for Stochastic Rasterization. Tech. Rep. NVR-2011-003,NVIDIA Corporation, Sep.

LAINE, S., AND KARRAS, T. 2011. Improved Dual-Space Boundsfor Simultaneous Motion and Defocus Blur. Tech. Rep. NVR-2011-004, NVIDIA Corporation, Nov.

LAINE, S., AILA, T., KARRAS, T., AND LEHTINEN, J. 2011.Clipless Dual-Space Bounds for Faster Stochastic Rasterization.ACM Transactions on Graphics, 30, 106:1–106:6.

LEHTINEN, J., AILA, T., CHEN, J., LAINE, S., AND DURAND,F. 2011. Temporal Light Field Reconstruction for RenderingDistribution Effects. ACM Transactions on Graphics, 30, 55:1–55:12.

LIKTOR, G., AND DACHSBACHER, C. 2012. Decoupled DeferredShading for Hardware Rasterization. In Symposium on Interac-tive 3D Graphics and Games, 143–150.

MCGUIRE, M., ENDERTON, E., SHIRLEY, P., AND LUEBKE, D.2010. Real-Time Stochastic Rasterization on Conventional GPUArchitectures. In High Performance Graphics, 173–182.

MOREIN, S. 2000. ATI Radeon HyperZ Technology. In GraphicsHardware, Hot3D Proceedings.

MUNKBERG, J., AND AKENINE-MOLLER, T. 2011. Backface

Culling for Motion Blur and Depth of Field. Journal of Graph-ics, GPU, and Game Tools, 15, 2, 123–139.

MUNKBERG, J., AND AKENINE-MOLLER, T. 2012. HyperplaneCulling for Stochastic Rasterization. In Eurographics 2012 –Short Papers, 105–108.

MUNKBERG, J., CLARBERG, P., HASSELGREN, J., TOTH, R.,SUGIHARA, M., AND AKENINE-MOLLER, T. 2011. Hierarchi-cal Stochastic Motion Blur Rasterization. In High PerformanceGraphics, 107–118.

NILSSON, J., CLARBERG, P., JOHNSSON, B., MUNKBERG, J.,HASSELGREN, J., TOTH, R., SALVI, M., AND AKENINE-MOLLER, T. 2012. Design and Novel Uses of Higher-Dimensional Rasterization. In High Performance Graphics, 1–11.

OLSSON, O., AND ASSARSSON, U. 2011. Tiled Shading. Journalof Graphics, GPU, and Game Tools, 15, 4, 235–251.

OLSSON, O., BILLETER, M., AND ASSARSSON, U. 2012. Clus-tered Deferred and Forward Shading. In High PerformanceGraphics, 87–96.

RAGAN-KELLEY, J., LEHTINEN, J., CHEN, J., DOGGETT, M.,AND DURAND, F. 2011. Decoupled Sampling for GraphicsPipelines. ACM Transactions on Graphics, 30, 3, 17:1–17:17.

RASMUSSON, J., HASSELGREN, J., AND AKENINE-MOLLER,T. 2007. Exact and Error-Bounded Approximate Color BufferCompression and Decompression. In Graphics Hardware, 41–48.

SAITO, T., AND TAKAHASHI, T. 1990. Comprehensible Ren-dering of 3-D Shapes. In Computer Graphics (Proceedings ofSIGGRAPH 90), ACM, vol. 24, 197–206.

SEILER, L., CARMEAN, D., SPRANGLE, E., FORSYTH, T.,ABRASH, M., DUBEY, P., JUNKINS, S., LAKE, A., SUGER-MAN, J., CAVIN, R., ESPASA, R., GROCHOWSKI, E., JUAN,T., AND HANRAHAN, P. 2008. Larrabee: A Many-Core x86 Ar-chitecture for Visual Computing. ACM Transactions on Graph-ics, 27, 3, 18:1–18:15.

SHIRLEY, P., AILA, T., COHEN, J., ENDERTON, E., LAINE, S.,LUEBKE, D., AND MCGUIRE, M. 2011. A Local Image Re-construction Algorithm for Stochastic Rendering. In Symposiumon Interactive 3D Graphics and Games, 9–14.

STROM, J., WENNERSTEN, P., RASMUSSON, J., HASSELGREN,J., MUNKBERG, J., CLARBERG, P., AND AKENINE-MOLLER,T. 2008. Floating-Point Buffer Compression in a Unified CodecArchitecture. In Graphics Hardware, 75–84.

VAIDYANATHAN, K., TOTH, R., SALVI, M., BOULOS, S., ANDLEFOHN, A. 2012. Adaptive Image Space Shading for Motionand Defocus Blur. In High Performance Graphics, 13–21.

(a) Citadel
































(b) Per-Tile
































(c) Per-Tiel + AAS

Figure 3.4.: Results from [Vaidyanathan et al., 2012] show the over shading at the tileedges and how it is reduced with AAS

Figure 7: A motion blurred character rasterized with 8� stochastic supersampling. Deferred shading is computed using 36 ambient occlusionrays per shading sample (using OptiX). Due to decoupled sampling, the shading rate stays close to 1 sspp despite the high sample count (toprow). Adaptive shading (bottom row) saves � 30% of the rendering time by further decreasing the shading rate of fast-moving surfaces.

Figure 8: Performance measurements with complex shading andstochastic rasterization. Note that despite the more expensive sam-pling, the blurry objects rendered faster due to the reduced shadingcosts.

During our evaluation, though, it proved to be inferior in almostall situations. We can explain these negative results with multiplereasons. First, the bandwidth consumption of the multi-pass sam-pling stage increases proportionally with the number of visibilitysamples. In the second rasterization pass, fragment shaders haveto execute on a per-subsample level to write the shading data tothe CG-buffer, and as there is no synchronization, every visibilitysample needs to interpolate and store its data independently.

In the global caching method, we could also exploit that ssIDranges are generated incrementally, so we could distribute queriesuniformly over the hash buckets. The cache size in the local mem-ory is very limited, and as a tile stores ssIDs from several triangles,the number of hash collisions (and therefore lock-spinning itera-tions) is very high. The problem is essentially to deduplicate num-bers in a 2D array on the GPU, which is algorithmically expensive.

Figure 9: Storage requirements of the CG-buffer compared to stan-dard G-buffers. Our memory footprint consists of visibility andshading samples. The lightweight visibility data saves significantspace at high supersampling resolutions, and note that the footprintof shading samples becomes negligible (we also plotted the spaceused for storing visibility refereences only). The Sponza scene wasrendered at 1280�720 pixels.

In this section we limit our measurements to the global method,and we believe that in the future the performance of our multi-passsampling can be improved by using a different caching strategy, orby integrating it into a tile-based sort middle architecture.

We analyzed of memory consumption of our method in comparionwith supersampled deferred shading. We save storage by essen-tially deduplicating shading data in the G-buffer. However, thisonly takes effect at higher sample counts, as we need to store ad-ditional information, which existing techniques do not require. Weassume that the ground truth supersampled deferred method uses12 bytes per subsample in the G-buffer: 32 bits for depth-stencil,2� RGBA8 textures for normals and material information. In fact,state-of-the-art deferred rendering engines use even more.

Figure 3.5.: Results from [Liktor and Dachsbacher, 2012] (bottom row) compared to areference implementation with OptiX (top row)

. . . it is challenging to directly pre-dict the absolute performance ofa hardware architecture that hasnot been built, . . .

More interesting is the comparison withan accumulation bu↵er and a stochastic su-persampling implementation. As seen infigure 3.6 it shows similar results as [Lik-tor and Dachsbacher, 2012] in case of theshading samples per pixel (Shading rate).The di↵erence of the shading rate between

8 and 64 visibility samples per pixel is onlyaround 0.7 to 1.0. The main goal of de-coupled sampling, keeping the shading ratelow at a high visibility sampling rate, istherefore extremely successful.

3.7. Conclusion

The underlying technique used for the ap-proach in chapter 2 is nearly ten years old.A lot a research has gone into this approachand is still ongoing, as we can see by [McIn-

20

3.7. CONCLUSION CHAPTER 3. MOTION BLUR17:10 • J. Ragan-Kelley et al.

Accumulation buffering Decoupled Sampling Stochastic Supersampling

8 V

isib

iliry

Sam

ples

per

Pix

el64

Vis

ibili

ty S

ampl

es p

er P

ixel

27 V

isib

ility

Sam

ples

per

Pix

el

Shading rate: 3.3-4.0 Shading rate: 64 Shading rate: 64



Fig. 8. A comparison of accumulation buffering, decoupled sampling, and stochastic supersampling. Top to bottom: 8/27/64 samples/pixel. Decoupledsampling (center) attains image quality similar to full stochastic supersampling (right) with only a small fraction of the shading cost. Accumulation buffering(left) shows significant banding and strobing artifacts, despite shading just as much as stochastic supersampling, but provides a useful comparison, as it is theonly technique achievable directly on current hardware.

and amounts of defocus and motion blur for two test scenes forboth global (sort-last) and tiled (sort-middle) implementations ofdecoupled sampling. The Half-Life 2, Episode 2 scene (left column)exhibits defocus blur and the Team Fortress 2 scene (right column)exhibits motion blur due to camera ego-motion. Figure 9 similarlyanalyzes shading cache coherence over 50 frames of animationwith simultaneous defocus and object motion blur for the scene inFigure 8.

With a global (sort-last) shading cache, shading rate stays nearlyconstant as a function of blur area and sampling rate, varying overat most a 3x range from 8 samples/pixel with no blur to the mostincoherent blur at 64 samples/pixel in the same scene (Figure 10,right). This is the goal of decoupled sampling, and it is extremelysuccessful. The game frames do exhibit some variation in shadingrate as a function of blur, likely because they both include large,heavily blurred background polygons which are much larger thanany reasonable shading cache. The animated bats (Figure 9 andaccompanying video), meanwhile, contain only small triangles andshading rate exhibits virtually no variation (less than 20%) over aneven wider variation in blur area.

In a sort-middle implementation, local shading caches per tile re-duce potential shading reuse, because shading is not shared acrosstile boundaries. Our simulation assumes no sharing between screen-

space tiles, which represents the worst-case behavior of the mostextreme tiled renderer design. Indeed, the shading rate plateaus ata rather low reuse rate due to lost shading coherence. With tiling,shading rate grows much more as a function of blur, since blur in-creases bin spread. The slope of shading rate growth as a functionof blur is steeper at higher sampling rates, since they reduce effec-tive tile size. However, even losing substantial shading reuse due totiling, decoupled sampling still shades significantly fewer samplesthan supersampling. To emphasize this, the second row of graphsshows the same data, but graphed as savings (higher is better) com-pared to an idealized supersampling implementation which shadesexactly once per visibility sample. Even at high levels of blur, thoughless effective than a global cache, the decoupled tiled renderer stillshades 2–12! less than supersampling in the game frames, withreasonably sized caches. The bats animation, with even higher blurand substantial overshade on small polygon boundaries (giving anideal shading rate above 3), however, struggles to shade much lessthan an idealized supersampling engine with tiled caches, while aglobal cache still performs nearly perfectly.

Motion blur (Figure 10, right) leads to greater shading inco-herence than defocus blur (Figure 10, left) per unit of blur area.This is caused by interaction with the 2D space-filling rasteriza-tion traversal order: defocus blur is confined to a compact area on

ACM Transactions on Graphics, Vol. 30, No. 3, Article 17, Publication date: May 2011.

Figure 3.6.: Results from [Ragan-Kelley et al., 2011]. Accumulation bu↵er rendering andstochastic supersampling are compared to decoupled sampling.

tosh et al., 2012] and [Schedl and Wimmer,2012]. Due to improvements in graphicshardware over the last few years this tech-nique can now be used with expensive andcomplex filter kernels and, with the scala-bility of filter kernels, this technique canalso be a good replacement for more sophis-ticated DOF on low-end hardware. Thetechnique produces good results with thesuggested improvements.

The second approach from chapter 3,however, is a completely new approach onthis topic. It is relatively new and needstop-end graphics hardware to run at leastat an interactive frame rate. It introducesa new approach for stochastic sampling per-formed in a conventional forward-renderinggraphics pipeline. There is still more new

research going into this technique as we cansee by the research paper [Petrik et al., 2013]that will be presented at SIGGRAPH 2013,specifically the fact that it reduces shadingcosts on blurred areas, as this scales withthe visual impression a user gets from therendering.

The screen space approach on the otherhand (chapter 2) does exactly the oppo-site. It is more costly on blurred areas thanon focused areas. If the cache mechanismgets better hardware support in the future,decoupled sampling might be able to be per-formed in real-time and become faster eventhan the filter approach for some tasks.

Looking to the future, [Liktor andDachsbacher, 2012] suggests investigatingbetter caching strategies for the tile-based

21

3.7. CONCLUSION CHAPTER 3. MOTION BLUR

approach. The generation of the ssID couldalso be made temporarily coherent to reuseshading samples across multiple frames.

22

Bibliography

Akenine-Moller, T., Munkberg, J., and Hasselgren, J. (2007). Stochastic rasterizationusing time-continuous triangles. In Proceedings of the 22nd ACM SIGGRAPH/EU-ROGRAPHICS symposium on Graphics hardware, GH ’07, pages 7–16, Aire-la-Ville,Switzerland, Switzerland. Eurographics Association.

Cook, R. L. (1986). Stochastic sampling in computer graphics. ACM Trans. Graph.,5(1):51–72.

Cook, R. L., Carpenter, L., and Catmull, E. (1987). The reyes image rendering architec-ture. SIGGRAPH Comput. Graph., 21(4):95–102.

Cook, R. L., Porter, T., and Carpenter, L. (1984). Distributed ray tracing. SIGGRAPHComput. Graph., 18(3):137–145.

Enderton, E., Sintorn, E., Shirley, P., and Luebke, D. (2010). Stochastic transparency.In I3D ’10: Proceedings of the 2010 symposium on Interactive 3D graphics and games,pages 157–164, New York, NY, USA.

Fatahalian, K., Luong, E., Boulos, S., Akeley, K., Mark, W. R., and Hanrahan, P.(2009). Data-parallel rasterization of micropolygons with defocus and motion blur. InProceedings of the Conference on High Performance Graphics 2009, HPG ’09, pages59–68, New York, NY, USA. ACM.

Haeberli, P. and Akeley, K. (1990). The accumulation bu↵er: hardware support forhigh-quality rendering. In Proceedings of the 17th annual conference on Computergraphics and interactive techniques, SIGGRAPH ’90, pages 309–318, New York, NY,USA. ACM.

Hammon, E. (2007). Practical post-process depth of field. In GPU Gems 3, chapter 28.Addison-Wesley.

Hargreaves, S. (2004). Deferred shading. In Game Developer’s Conference.

Liktor, G. and Dachsbacher, C. (2012). Decoupled deferred shading for hardwarerasterization. In Proceedings of the ACM SIGGRAPH Symposium on Interactive 3DGraphics and Games, I3D ’12, pages 143–150, New York, NY, USA. ACM.

Liktor, G. and Dachsbacher, C. (2013). Decoupled deferred shading on the gpu. In GPUPro4, chapter II-3, pages 81–89. CRC Press.

23

Bibliography Bibliography

McIntosh, L., Riecke, B. E., and DiPaola, S. (2012). E�ciently simulating the bokeh ofpolygonal apertures in a post-process depth of field shader. Comput. Graph. Forum,31(6):1810–1822.

Navarro, F., Seron, F. J., and Gutierrez, D. (2011). Motion blur rendering: State of theart. Computer Graphics Forum, 30(1):3–26.

Petrik, C., Robert, T., and Munkberg, J. (2013). A sort-based deferred shading architec-ture for decoupled sampling. In Siggraph, volume 32. ACM.

Ragan-Kelley, J., Lehtinen, J., Chen, J., Doggett, M., and Durand, F. (2011). Decoupledsampling for graphics pipelines. ACM Trans. Graph., 30(3):17:1–17:17.

Riguer, G., Tatarchuck, N., and Isidoro, J. (2004). Real-time depth of field simulation.

Schedl, D. and Wimmer, M. (2012). A layered depth-of-field method for solving partialocclusion. Journal of WSCG, 20(3):239–246.

Scheuermann, T. and Tatarchuk, N. (2004). Advanced depth of field. In ShaderX 3.Charles River Media.

Sousa, T. (2008). Crysis next gen e↵ects. In Game Developer’s Conference.

Vaidyanathan, K., Toth, R., Salvi, M., Boulos, S., and Lefohn, A. (2012). Adaptiveimage space shading for motion and defocus blur. In Proceedings of the Fourth ACMSIGGRAPH / Eurographics conference on High-Performance Graphics, EGGH-HPG’12,pages 13–21, Aire-la-Ville, Switzerland, Switzerland. Eurographics Association.

Vlachos, A. (2008). Post processing in the orange box. In Game Developer’s Conference.

Wikipedia (2013a). Bokeh — wikipedia, the free encyclopedia. [Online; accessed 12-April-2013].

Wikipedia (2013b). Deferred shading — wikipedia, the free encyclopedia. [Online;accessed 16-June-2013].

Yang, J. C., Hensley, J., Grun, H., and Thibieroz, N. (2010). Real-time concurrentlinked list construction on the gpu. In Proceedings of the 21st Eurographics conferenceon Rendering, EGSR’10, pages 1297–1304, Aire-la-Ville, Switzerland, Switzerland.Eurographics Association.

24

List of Figures

2.2. Bokeh Photography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1. DOF with Lense . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3. Simple Depth of Field Rendering . . . . . . . . . . . . . . . . . . . . . . . 82.4. Lack of Partial Occlusion and Sharp Silhouettes . . . . . . . . . . . . . . . 92.5. Naive Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.6. Parallelogram Shaped Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 102.7. Combining Simple Shapes to Generate More Complex Shapes . . . . . . . 102.8. Bokeh Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1. Multi Sample Anti Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2. Decoupled Sampling Piepeline . . . . . . . . . . . . . . . . . . . . . . . . . 173.3. Per-Tile Shading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.4. Results from [Vaidyanathan et al., 2012] show the over shading at the tile

edges and how it is reduced with AAS . . . . . . . . . . . . . . . . . . . . 203.5. Results from [Liktor and Dachsbacher, 2012] (bottom row) compared to a

reference implementation with OptiX (top row) . . . . . . . . . . . . . . . 203.6. Results from [Ragan-Kelley et al., 2011]. Accumulation bu↵er rendering

and stochastic supersampling are compared to decoupled sampling. . . . . 21

25

A. Code Listings

Listing A.1: Filter Vertex Shader

uniform mat4 mWorldViewProj ;uniform f loa t uFilterAngle ;

varying vec2 vSamples [ 6 ] ;

void main( void ){

gl Pos it ion = mWorldViewProj ⇤ gl Vertex ;gl TexCoord [ 0 ] = gl MultiTexCoord0 ;

f l oa t aspectRatio = 640.0/480.0;f l oa t radius = 0 .1 ;f l oa t angle = uFilterAngle ;int numbSamples = 6;

//rotate sample pointvec2 point = vec2 ( radius ⇤ cos ( angle ) , radius ⇤ s in ( angle ) ) ;

point . x /= aspectRatio ;

//create sample posi t ions in vertex shader so they areinterpolated and dependent texture reads are avoided

for ( int i = 0; i < numbSamples ; i++) {

f l oa t t = f l oa t ( i ) / ( f l oa t (numbSamples) � 1.0) ;vSamples [ i ] = mix(�point , point , t ) ;

}

}

Listing A.2: Filter Fragment Shader

uniform sampler2D uColorTexture0 ;uniform sampler2D uCoCTexture0 ;uniform sampler2D uDepthTexture0 ;

26

APPENDIX A. CODE LISTINGS

varying vec2 vSamples [ 6 ] ;

void main ( void ){

f l oa t bleedingMult = 0 .5 ;f l oa t bleedingBias = 0.01 ;

vec2 textureCoord = vec2 (gl TexCoord [ 0 ] ) ;textureCoord = vec2 ( textureCoord . x , 1.0 � textureCoord . y) ;

//read color , coc and depth at fragment posi t ionvec4 color = texture2D(uColorTexture0 , textureCoord ) ;f l oa t CoC = texture2D(uCoCTexture0 , textureCoord ) . a ;f l oa t depth = texture2D(uDepthTexture0 , textureCoord ) . r ;

int numbSamples = 6;vec4 outputColor = vec4 (0 .0) ;

// i t e ra t e over every samplefor ( int i = 0; i < numbSamples ; i++) {

vec2 o f f s e t = vSamples [ i ] ;vec2 sampleCoords = textureCoord + o f f s e t ⇤ CoC;

//read color , CoC and depth at sample locationvec4 sampleColor = texture2D(uColorTexture0 , sampleCoords) ;f l oa t sampleCoC = texture2D(uCoCTexture0 , sampleCoords) . a ;f l oa t sampleDepth = texture2D(uDepthTexture0 , sampleCoords)

. r ;

//avoid l i g h t b leeding from focused objects to backgroundobje ts

f l oa t weight = sampleDepth < depth ? sampleCoC ⇤

bleedingMult : 1 . 0 ;weight = (CoC > sampleCoC + bleedingBias ) ? weight : 1 . 0 ;weight = clamp(weight , 0 .0 , 1 .0) ;

//add sample to to ta l coloroutputColor . rgb += sampleColor . rgb ⇤ weight ;outputColor . a += weight ;

}

//average over a l l samplesoutputColor /= outputColor . a ;

27

APPENDIX A. CODE LISTINGS

gl FragColor = vec4 ( outputColor . rgb , 1.0) ;}

28

B. Data CD

CD

Documentation

Seminar.pdf ...........................A digital version of this documentPresentations

Konzept.pdf .....................Concept presentation from April 4, 2013Zwischenbericht-I.pdf ...................Status report from Mai 2, 2013Zwischenbericht-II.pdf .................Status report from June 6, 2013FinalPresentation.pdf .........Final seminar presentation June 27, 2013

Code

Bokeh .........Xcode project with bokeh DOF implementation with irrlicht

29

C. Erklarung

Thomas Post Munchen, 26.06.2013

IG/SS 2013

Erklarung

Gemaß §40 Abs. 1 i. V. m. §31 Abs. 7 RaPO

Hiermit erklare ich, dass ich die Seminararbeit selbstandig verfasst, noch nicht anderweitigfur Prufungszwecke vorgelegt, keine anderen als die angegebenen Quellen oder Hilfsmittelbenutzt sowie wortliche und sinngemaße Zitate als solche gekennzeichnet habe.

Thomas Post

30

Date post:	19-May-2018
Category:	Documents
Upload:	trandat
View:	232 times
Download:	1 times

Depth of Field and Motion Blur in Realtime Computer … ﬁeld rendering on modern graphics...

Documents