+ All Categories
Home > Documents > Fragment-Parallel Composite and Filter

Fragment-Parallel Composite and Filter

Date post: 22-Feb-2016
Category:
Upload: kaleb
View: 19 times
Download: 0 times
Share this document with a friend
Description:
Fragment-Parallel Composite and Filter. Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis. Parallelism in Interactive Graphics. Well-expressed in hardware as well as APIs Consistently growing in degree & expression More and more cores on upcoming GPUs - PowerPoint PPT Presentation
Popular Tags:
30
Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis
Transcript
Page 1: Fragment-Parallel Composite and Filter

Fragment-Parallel Composite and FilterAnjul Patney, Stanley Tzeng, and John D. OwensUniversity of California, Davis

Page 2: Fragment-Parallel Composite and Filter

Parallelism in Interactive Graphics• Well-expressed in hardware as well as APIs

• Consistently growing in degree & expression–More and more cores on upcoming GPUs– From programmable shaders to pipelines

• We should rethink algorithms to exploit this

• This paper provides one example– Parallelization of composite/filter stages

Page 3: Fragment-Parallel Composite and Filter

A Feed-Forward Rendering Pipeline

Geometry Processing

Rasterization

Composite

Filter

Primitives

Pixels

Page 4: Fragment-Parallel Composite and Filter

Composite & Filter• Input: – Unordered list of fragments

• Output– Pixel colors

• Assumption– No fragments are

discarded PixelSample Locations

Page 5: Fragment-Parallel Composite and Filter

Basic Idea

Pixel-Parallel

Processors

Page 6: Fragment-Parallel Composite and Filter

Basic Idea

Insufficientparallelism

Irregularity

Fragment-Parallel

Processors

Page 7: Fragment-Parallel Composite and Filter

Motivation• Most applications have low depth

complexity– Pixel-level parallelism is sufficient

• We are interested in applications with– Very high depth complexity– High variation in depth complexity

• Further– Future platforms will demand more parallelism– High depth-complexity can limit pixel-parallelism

Page 8: Fragment-Parallel Composite and Filter

Motivation

10 70 130

190

250

310

370

430

490

550

610

670

73010

100

1000

10000

100000

1000000

Distribution of DepthComplexity

Number of depth layers

Num

ber

of s

ubpi

xels

Page 9: Fragment-Parallel Composite and Filter

Related WorkOrder-Independent

Transparency (OIT)• Depth-Peeling [Everitt 01]– One pass per transparent layer

• Stencil-Routed A-buffer [Myers & Bavoil 07]– One pass per 8 depth layers1

• Bucket Depth-Peeling [Liu et al. 09]– One pass per up to 32 layers2

1 Maximum MSAA samples per pixel2 Maximum render targets

Page 10: Fragment-Parallel Composite and Filter

Related WorkOrder-Independent Transparency

(OIT)

• OIT using Direct3D 11 [Gruen et al. 10]– Use fragment linked-lists– Per-pixel sort and composite

• Hair Self-Shadowing [Sintorn et al. 09]– Each fragment computes its contribution– Assumes constant opacity

Page 11: Fragment-Parallel Composite and Filter

Related WorkProgrammable Rendering

Pipelines• RenderAnts [Zhou et al. 09]– Sort fragments globally– Per-pixel composite/filter

• FreePipe [Liu et al. 10]– Sort fragments globally– Per-pixel composite/filter

Page 12: Fragment-Parallel Composite and Filter

Pixel-Parallel FormulationPi P(i+1) P(i+2)

Sj S(j+1) S(j+2) S(j+3) S(j+4) S(j+5) S(j+6)j (j+1) (j+2) (j+3) (j+4) (j+5) (j+6)Thread IDs

P: PixelS: Subsample

Page 13: Fragment-Parallel Composite and Filter

Fragment-Parallel Formulation

P: PixelS: Subsample

Pi P(i+1) P(i+2)

Sj S(j+1) S(j+2) S(j+3) S(j+4) S(j+5) S(j+6)

P: PixelS: Subsample

Thread IDs

j j+ 1 j+ 2 j+ 3 j+ 4 j+ 5 j+ 6 j+ 7 j+ 8 j+ 9 j+ 10 j+ 11 j+ 12 j+ 13 j+ 14 j+ 15 j+ 16 j+ 17 j+ 18 j+ 19 j+ 20 j+ 21 j+ 22 j+ 23

Page 14: Fragment-Parallel Composite and Filter

Fragment-Parallel Formulation• How can this behavior be achieved?

• Revisit the composite equation

Cs = α1C1 + (1-α1){α2C2+(1-α2)(…(αN+(1-αN)CB)…}fragment 1 fragment 2 … background

Cs = 1.α1.C1 + (1-α1).α2.C2 + (1-α1)(1-α2).α3.C3 + …

+ (1-α1)(1-α2)…(1-αk-1).αi.Ck + …

+ (1-α1)(1-α2)…(1-αN).CB Local Contribution Lk

Global Contribution Gk

Page 15: Fragment-Parallel Composite and Filter

Fragment-Parallel Formulation

• Lk is trivially parallel (local computation)• Gk is the result of a scan operation

(product)• For the list of input fragments– Compute G[ ] and L[ ], multiply– Perform reduction to add subpixel contributions

Cs = G1.L1 + G2.L2 + G3.L3 … GN.LN

Gk = (1-α1).(1-α2)…(1-αk-1)Lk = αk.Ck

Page 16: Fragment-Parallel Composite and Filter

Fragment-Parallel Formulation• Filter, for every pixel:

• This can be expressed as another reduction– After multiplying with subpixel weights

κm

– Can be merged with previous reduction

Cp = Cs1.κ1 + Cs2.κ2 + … + CsM.κM

Page 17: Fragment-Parallel Composite and Filter

Fragment-Parallel Composite & Filter

Final Algorithm1. Two-key sort (Subpixel ID, depth)

2. Segmented Scan (obtain Gk)

3. Premultiply with weights (Lk, κm)

4. Segmented Reduction

Page 18: Fragment-Parallel Composite and Filter

Fragment-Parallel Formulation

P: PixelS: Subsample

Pi P(i+1) P(i+2)

P: PixelS: Subsample

Segmented Scan (product)

Segmented Reduction (sum)

Page 19: Fragment-Parallel Composite and Filter

Implementation• Hardware used: NVIDIA GeForce GTX 280

• We require fast Segmented Scan and Reduce– CUDPP library provides that– Restricts implementation to NVIDIA CUDA

• No direct access to hardware rasterizer–We wrote our own

Page 20: Fragment-Parallel Composite and Filter

Example System – Polygons• Applications– Games

• Depth Complexity– 1 to few tens of layers– Suited to pixel-parallel

• Fragment-parallel software rasterizer

Page 21: Fragment-Parallel Composite and Filter

Example System – Particles• Applications

– Simulations, games

• Depth Complexity– Hundreds of layers– High depth-variance

• Particle-parallel sprite rasterizer

Page 22: Fragment-Parallel Composite and Filter

Example System – Volumes• Applications

– Scientific Visualization

• Depth Complexity– Tens to Hundreds of

layers– Low depth-variance

• Major-axis-slice rasterizer

Page 23: Fragment-Parallel Composite and Filter

Example System – Reyes• Applications– Offline rendering

• Depth Complexity– Tens of layers– Moderate depth variance

• Data-parallel micropolygon rasterizer

Page 24: Fragment-Parallel Composite and Filter

Performance Results

Parti

cles

Volu

me

Reye

s (gr

ass)

Polyg

on

0

100

200

300

400

500

600

Rend

erin

g Ti

me

(ms)

Fragment GenerationPixel-Parallel Composite/FilterFragment-Parallel Composite/Fil-ter

Page 25: Fragment-Parallel Composite and Filter

Performance Variation

0 200 400 600 800 1000 1200 1400 16001.00E+05

1.00E+06

1.00E+07

1.00E+08

Performance Variation

Fragment-ParallelPixel-Parallel

Depth Complexity

Frag

men

ts p

er s

econ

d

Page 26: Fragment-Parallel Composite and Filter

Limitations• Increased memory traffic– Several passes through CUDPP

primitives

• Unclear how to optimize for special cases– Threshold opacity– Threshold depth complexity

Page 27: Fragment-Parallel Composite and Filter

Summary and Conclusion• Parallel formulation of composite equation–Maps well to known primitives– Can be integrated with filter– Consistent performance across varying workloads

• FPC is applicable to future rendering pipelines– Exploits higher degree of parallelism– Better related to size of rendering workload

• A tool for building programmable pipelines

Page 28: Fragment-Parallel Composite and Filter

Future Work• Performance– Reduction in memory traffic– Extension to special-case scenes– Hybrid PPC-FPC formulations

• Applications– Integration with hardware rasterizer– Cinematic rendering, Photoshop

Page 29: Fragment-Parallel Composite and Filter

Acknowledgments• NSF Award 0541448• SciDAC Insitute for Ultrascale Visualization• NVIDIA Research Fellowship • Equipment donated by NVIDIA• Discussions and Feedback

– Shubho Sengupta (UC Davis), Matt Pharr (Intel), Aaron Lefohn (Intel), Mike Houston (AMD)

– Anonymous reviewers• Implementation assistance

– Jeff Stuart, Shubho Sengupta

Page 30: Fragment-Parallel Composite and Filter

Thanks!


Recommended