Date post: | 07-Sep-2014 |
Category: |
Technology |
Upload: | mark-kilgard |
View: | 18 times |
Download: | 10 times |
GPU-accelerated Path Rendering
Mark Kilgard & Jeff BolzNVIDIA CorporationNovember 30, 2012
GPUs are good at a lot of stuff
Games
Battlefield 3, EA
Data visualization
Product design
Catia
Physics simulation
CUDA N-Body [Nyland et al., GPU Gems 3, 2007]
Interactive ray tracing
OptiX [Parker et al., SIGGRAPH 2010]
Game physics
PhysX [Tonge et al., SIGGRAPH 2012]
Molecular modeling
NCSA
Impressive stuff
What about advancing 2D graphics?
Can GPUs render & improve the immersive web?
Complete Web Pages Rendered via OpenGL
without Pre-rendered Glyph Bitmaps and all on GPU
Not just zoomed & rotated,also perspective
No tricks
Every glyph isrendered from itsoutline; no render-to-texture
Magnify & minify with
no transitionalpixelization
or tile poppingartifacts
syncedto refreshrate; 60 Hzupdates
Live demo!
Web page Control points ofTrueType glyphsvisualized
Zoomed in
Projected
What is path rendering?A rendering approach
Resolution-independent two-dimensional graphicsOcclusion & transparency depend on rendering order
So called “Painter’s Algorithm”
Basic primitive is a path to be filled or stroked
Path is a sequence of path commandsCommands are
– moveto, lineto, curveto, arcto, closepath, etc.
StandardsContent: PostScript, PDF, TrueType fonts, Flash, Scalable Vector Graphics (SVG), HTML5 Canvas, Silverlight, Office drawingsAPIs: Apple Quartz 2D, Khronos OpenVG, Microsoft Direct2D, Cairo, Skia, Qt::QPainter, Anti-grain Graphics
Path Rendering Standards
DocumentPrinting andExchange
ImmersiveWebExperience
2D GraphicsProgrammingInterfaces
OfficeProductivityApplications
Resolution-IndependentFonts
OpenType
TrueType
Flash
Open XMLPaper (XPS)
Java 2DAPI
Mac OS X2D API
Khronos API
Adobe Illustrator
InkscapeOpen Source
ScalableVectorGraphics
QtGuiAPI
HTML 5
Seminal Path Rendering Paper
John Warnock & Douglas Wyatt, Xerox PARCPresented SIGGRAPH 1982
Warnock founded Adobe months later
John WarnockAdobe founder
Power wallMore functionality with less latency…
…with less power
Reasons toGPU-accelerate Path Rendering
Increasing screen resolutions
Multi-touch
Increasing screen densities
Immersive 2D web content
Live Demo
New York Times rendered fromits resolution-independent form
Flash content
Classic PostScript content
Complex text rendering
Live demo!
Dragon, andzoomed dragon 3D dice, but really
2D + gradients
Dashed stroking Complexgradientcontent
Gradients withblending
Maps with text
Last Year’s SIGGRAPH Results in Real-time
“Digital Micrography” Ron Maharik, Mikhail Bessmeltsev, Alla Sheffer, Ariel Shamir, and Nathan Carr
SIGGRAPH 2011
“Girl with Words inHer Hair” scene
591 paths
338,507 commands
1,244,474 scalarcoordinates
Our Contributions
A novel “stencil, then cover” programming interface for path rendering, well-suited to acceleration by GPUs
Our NV_path_rendering API
Our programming interface’s efficient implementation within OpenGL to avoid CPU bottlenecks
Productized, shipping in GeForce/Quadro drivers
Accompanying algorithms to handletessellation-free stenciled stroking of pathsstandard stroking embellishments such as dashingclipping paths to arbitrary pathsmixing 3D and path rendering
Notable Prior Art
Loop & Blinn 2005: Resolution independent curve rendering using programmable graphics hardware
Kokojima, et al. 2006: Resolution independent rendering of deformable vector objects using graphics hardware
Rueda, et al. 2008: GPU-based rendering of curved polygons using simplicial coverings
CPU vs. GPU atRendering Tasks over Time
Goal of our research is to make path rendering a GPU task
Render all interactive pixels, whether 3D or 2D or web content with the GPU
Pipelined 3D Interactive Rendering Path Rendering
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
GPU
CPU
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
GPU
CPU
Our Approach
“Stencil, then Cover” (StC)Map the path rendering task from a sequential algorithm……to a pipelined and massively parallel taskBreak path rendering into two steps
First, “stencil” the path’s coverage into stencil bufferSecond, conservatively “cover” path
Test against path coverage determined in the 1st stepShade the pathAnd reset the stencil value to render next path
Step 1Stencil
Step 2:Cover
repeat
Our Implemented System: NV_path_rendering
OpenGL extension to GPU-accelerate path renderingUses “stencil, then cover” (StC) approach via OpenGL calls
Create a path objectStep 1: “Stencil” the path object into the stencil buffer
GPU provides fast stenciling of filled or stroked pathsStep 2: “Cover” the path object and stencil test against its coverage stenciled by the prior step
Application can configure arbitrary shading during the step
More details laterSupports the union of functionality of all major path rendering standards
Includes all stroking embellishmentsIncludes first-class text and font supportAllows functionality to mix with traditional 3D and programmable shading
Vertex assembly
Primitive assembly
Rasterization
Fragment operations
Display
Vertex operations
Application
Primitive operations
Texturememory
Pixel assembly(unpack)
Pixel operations
Pixel pack
Vertex pipelinePixel pipeline
Application
transformfeedback
readback
Framebuffer
Raster operations
Path pipeline
Path specification
Transform path
Fill/StrokeStenciling
Fill/StrokeCovering
Stencil Fill Process Visualized
Visualizationof “invisible”stencil-onlygeometrygeneratedduringstencil step
Net resultof stencilincrementsanddecrementsis path’swindingnumber
Cover Fill Geometry Visualized
Stroking Approach
Stroked line segments are straightforwardDrawn as rectangles into the stencil buffer
Curved stroked segments are involvedCurved segments are broken into stroked quadratic segments
Hulls are formed around each stroked quadratic segment
An intricate fragment discard shader solves the cubic equation for every sample to determine the sample’s containment in the quadratic stroke segment
If contained, the sample’s stencil sample is updated
Caps & joins are also drawn into the stencil buffer
Covering geometry is computed as union of rectangles, hulls, and cap/join geometry
Quadratic Stroking Hulls Visualized
SimplequadraticBeziersegment,movingcontrol points
Drawnwith stroking
Non-convexhull usedfor thestroking stencilstep isvisualized
Intricate Path’sStroking Example
Zoomed stroking Same zoom: Stencil hull geometry
Joinstylegeometry
Excellent Geometric Fidelity for Stroking
Correct stroking is hardLots of CPU implementations approximate stroking
GPU-accelerated stroking avoids such short-cuts
GPU has FLOPS to compute true stroke point containment
GPU-accelerated OpenVG reference
Cairo Qt
Stroking with tight end-point curve
Combined for a Complex ScenesWith Many Paths
Stencil Fill Geometry Cover Fill Geometry Filling-only Result
Stencil Stroke Geometry Cover Stroke Geometry Stroking-only Result
Complete Tiger
240 paths2,510 commands12,174 coordinates
NV_path_renderingCompared to Alternatives
Alternative APIs rendering same content
-
200.00
400.00
600.00
800.00
1,000.00
1,200.00
1,400.00
1,600.00
1,800.00
2,000.00
10
0x10
0
20
0x20
0
30
0x30
0
40
0x40
0
50
0x50
0
60
0x60
0
70
0x70
0
80
0x80
0
90
0x90
0
100
0x10
00
110
0x11
00
Window Resolution in Pixels
Fram
es p
er se
cond
Cairo
QtSkia Bitmap
Skia Ganesh FBO (16x)Skia Ganesh Aliased (1x)
Direct2D GPUDirect2D WARP
With Release 300 driver NV_path_rendering
-
200.00
400.00
600.00
800.00
1,000.00
1,200.00
1,400.00
1,600.00
1,800.00
2,000.00
10
0x10
0
20
0x20
0
30
0x30
0
40
0x40
0
50
0x50
0
60
0x60
0
70
0x70
0
80
0x80
0
90
0x90
0
100
0x10
00
110
0x11
00
Window Resolution in Pixels
Fram
es p
er se
cond
16x
8x
4x
2x
1x
ConfigurationGPU: GeForce 480 GTX (GF100)CPU: Core i7 950 @ 3.07 GHz
Alternative approaches are all much slower
Detail on Alternatives
Same results, changed Y Axis
-
50.00
100.00
150.00
200.00
250.00
10
0x10
0
20
0x20
0
30
0x30
0
40
0x40
0
50
0x50
0
60
0x60
0
70
0x70
0
80
0x80
0
90
0x90
0
100
0x10
00
110
0x11
00
Window Resolution in Pixels
Fram
es p
er s
econ
d
CairoQtSkia BitmapSkia Ganesh FBO (16x)Skia Ganesh Aliased (1x)Direct2D GPUDirect2D WARP
Alternative APIs rendering same content
-
200.00
400.00
600.00
800.00
1,000.00
1,200.00
1,400.00
1,600.00
1,800.00
2,000.00
100
x100
200
x200
300
x300
400
x400
500
x500
600
x600
700
x700
800
x800
900
x900
1000
x1000
11
00x11
00
Window Resolution in Pixels
Frame
s per
secon
d
Cairo
QtSkia Bitmap
Skia Ganesh FBO (16x)Skia Ganesh Aliased (1x)
Direct2D GPUDirect2D WARP
Fast, but unacceptable
quality
ConfigurationGPU: GeForce 480 GTX (GF100)CPU: Core i7 950 @ 3.07 GHz
Across an range of scenes…Release 300 GeForce GTX 480 Speedups over Alternatives
0.10
1.00
10.00
100.00
1000.00
1
00
x1
00
2
00
x2
00
3
00
x3
00
4
00
x4
00
5
00
x5
00
6
00
x6
00
7
00
x7
00
8
00
x8
00
9
00
x9
00
1
00
0x1
00
0
11
00
x1
10
0
1
00
x1
00
2
00
x2
00
3
00
x3
00
4
00
x4
00
5
00
x5
00
6
00
x6
00
7
00
x7
00
8
00
x8
00
9
00
x9
00
1
00
0x1
00
0
11
00
x1
10
0
1
00
x1
00
2
00
x2
00
3
00
x3
00
4
00
x4
00
5
00
x5
00
6
00
x6
00
7
00
x7
00
8
00
x8
00
9
00
x9
00
1
00
0x1
00
0
11
00
x1
10
0
1
00
x1
00
2
00
x2
00
3
00
x3
00
4
00
x4
00
5
00
x5
00
6
00
x6
00
7
00
x7
00
8
00
x8
00
9
00
x9
00
1
00
0x1
00
0
11
00
x1
10
0
1
00
x1
00
2
00
x2
00
3
00
x3
00
4
00
x4
00
5
00
x5
00
6
00
x6
00
7
00
x7
00
8
00
x8
00
9
00
x9
00
1
00
0x1
00
0
11
00
x1
10
0
1
00
x1
00
2
00
x2
00
3
00
x3
00
4
00
x4
00
5
00
x5
00
6
00
x6
00
7
00
x7
00
8
00
x8
00
9
00
x9
00
1
00
0x1
00
0
11
00
x1
10
0
1
00
x1
00
2
00
x2
00
3
00
x3
00
4
00
x4
00
5
00
x5
00
6
00
x6
00
7
00
x7
00
8
00
x8
00
9
00
x9
00
1
00
0x1
00
0
11
00
x1
10
0
1
00
x1
00
2
00
x2
00
3
00
x3
00
4
00
x4
00
5
00
x5
00
6
00
x6
00
7
00
x7
00
8
00
x8
00
9
00
x9
00
1
00
0x1
00
0
11
00
x1
10
0
1
00
x1
00
2
00
x2
00
3
00
x3
00
4
00
x4
00
5
00
x5
00
6
00
x6
00
7
00
x7
00
8
00
x8
00
9
00
x9
00
1
00
0x1
00
0
11
00
x1
10
0
1
00
x1
00
2
00
x2
00
3
00
x3
00
4
00
x4
00
5
00
x5
00
6
00
x6
00
7
00
x7
00
8
00
x8
00
9
00
x9
00
1
00
0x1
00
0
11
00
x1
10
0
1
00
x1
00
2
00
x2
00
3
00
x3
00
4
00
x4
00
5
00
x5
00
6
00
x6
00
7
00
x7
00
8
00
x8
00
9
00
x9
00
1
00
0x1
00
0
11
00
x1
10
0
1
00
x1
00
2
00
x2
00
3
00
x3
00
4
00
x4
00
5
00
x5
00
6
00
x6
00
7
00
x7
00
8
00
x8
00
9
00
x9
00
1
00
0x1
00
0
11
00
x1
10
0
tigerWelsh_dragon
Celtic_round_dogsbutterfly spikesAmerican_Samoacowboy BuonaparteEmbrace_the_WorldYokozawaCougar
tiger_clipped_by_heart
NVpr16/Cairo
NVpr16/SkiaBitmap
NVpr16/SkiaGanesh
NVpr16/Direct2D GPU
NVpr16/Direct2D WARP
Y axis is logarithmic—shows how many TIMES faster NV_path_rendering is that competitor
Partial Solutions Not Enough
Path rendering has 30 years of heritage and history
Can’t do a 90% solution and expect Software to change
Trying to “mix” CPU and GPU methods doesn’t work
Expensive to move software—needs to be an unambiguous win
Must surpass CPU approaches on all frontsPerformance
Quality
Functionality
Conformance to standards
More power efficient
Enable new applications
John WarnockAdobe founder
Inspiration: Perceptive Pixel
Dashing Content Examples
Dashing character outlines for quilted look
Frosting on cake is dashedelliptical arcs with roundend caps for “beaded” look;flowers are also dashing
Same cakemissing dashedstroking details
Artist made windows with dashed line
segment
Technical diagramsand charts often employ dashing
All content shownis fully GPU rendered
First-class, Resolution-independentFont Support
Fonts are a standard, first-class part of all path rendering systemsForeign to 3D graphics systems such as OpenGL and Direct3D, but natural for path renderingBecause letter forms in fonts have outlines defined with paths
TrueType, PostScript, and OpenType fonts all use outlines to specify glyphs
NV_path_rendering makes font support easyCan specify a range of path objects with
A specified fontSequence or range of Unicode character points
No requirement for applications use font API to load glyphsYou can also load glyphs “manually” fromyour own glyph outlines
Rendering Paths Clipped toSome Other Arbitrary Path
Example clipping the PostScript tiger to a heart constructed from two cubic Bezier curves
unclipped tiger tiger with pink background clipped to heart
Complex Clipping Example
cowboy clip isthe union of 1,366 paths
tiger is 240 paths
result of clipping tigerto the union of all the cowboy paths
NV_path_rendering is more than justmatching CPU vector graphics
3D and vector graphics mix
2D in perspective is free
Superior quality
Arbitrary programmable shader on paths— bump mapping
GPU
CPUCompetitors
Mixing 3D Depth Buffering andPath Rendering
PostScript tigers surrounding Utah teapotPlus overlaid TrueType font renderingNo textures involved, no multi-pass
Live demo!
Very fastTeapots + tigers in same 3D scene
Zoom on tigersAll the detail is there
Solidor wireframeteapots
Handling Uncommon Path RenderingFunctionality: Projection
Projection “just works”Because GPU does everythingwith perspective-correctinterpolation
Example of Bump Mapping onPath Rendered Text
Phrase “Brick wall!” is path rendered and bump mapped with a Cg fragment shader
light source position
Handling Common Path RenderingFunctionality: Filtering
GPUs are highly efficient at image filtering
Fast texture mappingMipmappingAnisotropic filteringWrap modes
CPUs aren'treally
GPU
Qt
Cairo
Moiréartifacts
Anti-aliasing Discussion
Good anti-aliasing is a big deal for path renderingParticularly true for font rendering of small point sizesFeatures of glyphs are often on the scale of a pixel or less
NV_path_rendering uses multiple stencil samples per pixel for reasonable antialiasing
Otherwise, image quality is poor4 samples/pixel bare minimum8 or 16 samples/pixel is pretty sufficient
But 16 requires expensive 2x2 supersampling of 4x multisampling16x is quite memory intensive
Alternative: quality vs. performance tradeoffFast enough to render multiple passes to improve qualityApproaches
Accumulation bufferAlpha accumulation
RealFlash
Scene
conflationartifacts abound,rendered by Skia
same scene, GPU-renderedwithout conflation
conflation is aliasing &edge coverage percentsare un-predicable in general;means conflated pixelsflicker when animated slowly
Improved Color Space:sRGB Path Rendering
Modern GPUs have native support for perceptually-correct for
sRGB framebuffer blendingsRGB texture filteringNo reason to tolerate uncorrected linear RGB color artifacts!More intuitive for artists to control
Negligible expense for GPU to perform sRGB-correct rendering
However quite expensive for software path renderers to perform sRGB rendering
Not done in practice
linear RGBtransition between saturatedred and saturated blue hasdark purple region
sRGBperceptually smoothtransition from saturatedred to saturated blue
Radial color gradient examplemoving from saturated red to blue
Trying OutNV_path_rendering
Operating system support2000, XP, Vista, Windows 7, Linux, FreeBSD, and SolarisUnfortunately no Mac support
GPU supportGeForce 8 and up (easy rule: all CUDA-capable GPUs)Most efficient on Fermi and Kepler GPUsCurrent performance can be expected to improve
Shipping since NVIDIA’s Release 275 driversAvailable since summer 2011
New Release 300+ drivers have remarkable NV_path_rendering performance
Try it, you’ll like it
There’s an SDK freely available with example code! https://developer.nvidia.com/nv-path-rendering
Future Work
Using NV_path_rendering in actual web and 2D applications
Standardizing the programming interface
Moving these algorithms to mobile devices
Path rendering test bed on Nexus 7