Date post: | 29-Dec-2015 |
Category: |
Documents |
Upload: | maria-hodge |
View: | 218 times |
Download: | 0 times |
GPU Shading and Rendering
Shading
Technology
8:30 Introduction(:30–Olano)
9:00 Direct3D 10(:45–Blythe)
Languages, System
s and D
emos
10:30 RapidMind (:50–McCool)
11:20 OpenGL Shading Language(:50–Olano)
1:45 Cg / NVIDIA(:50–Kilgard)
2:35 HLSL / ATI(:50–Scheuermann)
GP
Us in P
roduction Rendering
3:45 GPU Production Animation(:45–Wexler)
4:30 Interactive Cinematic Shading. Where are we?(:45–Pellacini)
Wrap
Up
5:15 Discussion and Q&A(:15–All)
GPU Shading and Rendering:Introduction
Marc Olano
UMBC
Americas Army
GPU
• GPU: Graphics Processing Unit
– Designed for real-time graphics
– Present in almost every PC
– Increasing realismand complexity
GPU computation
Texture /
Buffer
Texture /
BufferVertexVertex
GeometryGeometry
FragmentFragment
CPUCPU
DisplayedPixels
DisplayedPixels
Low-level code
!!ARBvp1.0# Transform the normal to view spaceTEMP Nv,Np;DP3 Nv.x,state.matrix.modelview.invtrans.row[0],vertex.normal;DP3 Nv.y,state.matrix.modelview.invtrans.row[1],vertex.normal;DP3 Nv.z,state.matrix.modelview.invtrans.row[2],vertex.normal;MAD Np,Nv,{.9,.9,.9,0},{0,0,0,1};
# screen position from vertexTEMP Vp;DP4 Vp.x, state.matrix.mvp.row[0], vertex.position;DP4 Vp.y, state.matrix.mvp.row[1], vertex.position;DP4 Vp.z, state.matrix.mvp.row[2], vertex.position;DP4 Vp.w, state.matrix.mvp.row[3], vertex.position;[…]# interpolateMAD Np, Np, -vertex.color.x, Np;MAD result.position, Vp, vertex.color.x, Np;END
High-level code
void main() { vec4 Kin = gl_Color; // key input
// screen position from vertex, texture and normal vec4 Vp = ftransform(); vec4 Tp = vec4(gl_MultiTexCoord0.xy*1.8-.9, 0,1); vec4 Np = vec4(nn*.9,1);
// interpolate between Vp, Tp and Np gl_Position = Vp; gl_Position = mix(Tp,gl_Position,pow(1.-Kin.x,8.)); gl_Position = mix(Np,gl_Position,pow(1.-Kin.y,8.));
// copy to output gl_TexCoord[0] = gl_MultiTexCoord0; gl_TexCoord[1] = Vp; gl_TexCoord[3] = Kin;}
Non-real time vs. Real time
• Not real-time
– Developed from General CPU code
– Seconds to hours per frame
– 1000s of lines
– “Unlimited” computation, texture, memory, …
• Real-time
– Developed from fixed-function hardware
– Tens of frames per second
– 1000s of instructions
– Limited computation, texture, memory, …
Non-real time vs. Real-time
• Non-real time • Real-time
Texture/
Buffer
Texture/
BufferVertexVertex
GeometryGeometry
FragmentFragment
ApplicationApplication
DisplayedPixels
DisplayedPixels
LightLight
DisplayedPixels
DisplayedPixels
ApplicationApplication
DisplacementDisplacement
SurfaceSurface
VolumeVolume
AtmosphereAtmosphere
ImagerImager
History (not real-time)
• Testbed [Whitted and Weimer 1981]
• Shade Trees [Cook 1984]
• Image Synthesizer [Perlin 1985]
• RenderMan [Hanrahan and Lawson 1990]
• Multi-pass RenderMan [Peercy et al. 2000]
• GPU acceleration [Wexler et al. 2005]
History (real-time)
• Custom HW [Olano and Lastra 1998]
• Multi-pass standard HW [Peercy et al. 2000]
• Register combiners [NVIDIA 2000]
• Vertex programs [Lindholm et al. 2001]
• Compiling to mixed HW [Proudfoot et al. 2001]
• Fragment programs
• Standardized languages
• Geometry shaders [Blythe 2006]
Choices
• OS: Windows, Mac, Linux
• API: DirectX, OpenGL
• Language: HLSL, GLSL, Cg, …
• Compiler: DirectX, OpenGL, Cg, ASHLI
• Runtime: CgFX, ASHLI, OSG (& others), sample code
Major Commonalities
• Vertex & Fragment/Pixel
• C-like, if/while/for
• Structs & arrays
• Float + small vector and matrix
– Swizzle & mask (a.xyz = b.xxw)
• Common math & shading functions
PipelinePipelinePipelinePipeline
Texture /
Buffer
Texture /
BufferVertexVertex
GeometryGeometry
FragmentFragment
GPU Parallelism
PipelinePipelinePipelinePipeline
SPMD ParallelSPMD ParallelFragment StreamFragment StreamSPMD ParallelSPMD ParallelFragment StreamFragment Stream
Texture /
Buffer
Texture /
BufferVertexVertex
GeometryGeometry
FragmentFragment
GPU Parallelism
GPU Parallelism
SPMD ParallelSPMD ParallelFragment StreamFragment StreamSPMD ParallelSPMD ParallelFragment StreamFragment Stream
Fragment
Fragment
Fragment
Fragment
Fragment
Fragment
Fragment
Fragment
SIMD ParallelSIMD Parallel2x2 Block2x2 Block
SIMD ParallelSIMD Parallel2x2 Block2x2 Block
Fragment
Fragment
Fragment
Fragment
Fragment
Fragment
Fragment
Fragment
GPU Parallelism
ShaderUnit
ShaderUnit
ShaderUnit
ShaderUnit
BranchUnit
BranchUnit
FogFog
Texture
Unit
Texture
Unit
L1 Cache L1
Cache
L2 Cache L2
Cache
PipelinePipeline(NVIDIA)(NVIDIA)PipelinePipeline(NVIDIA)(NVIDIA)
SIMD ParallelSIMD Parallel2x2 Block2x2 Block
SIMD ParallelSIMD Parallel2x2 Block2x2 Block
GPU Parallelism
ShaderUnit
ShaderUnit
ShaderUnit
ShaderUnit
BranchUnit
BranchUnit
FogFog
Texture
Unit
Texture
Unit
L1 Cache L1
Cache
L2 Cache L2
Cache
PipelinePipeline(NVIDIA)(NVIDIA)
Vector ParallelVector ParallelLimited MIMDLimited MIMD
ALUALUALUALU ALUALUALUALU
ALUALUALUALU ALUALUALUALU
Managing GPU Programming
• Simplified computational model
– Bonus: consistent as hardware changes
• All stages SIMD
– Explicit 4-element SIMD vectors
• Fixed conversion / remapping between each stage
BufferBufferVertex (stream)Vertex (stream)
Geometry(stream)Geometry(stream)
Fragment(array)Fragment(array)
Vertex
• One element in / one out
• NO communication
• Can select fragment address BufferBufferVertex (stream)Vertex (stream)
Geometry(stream)Geometry(stream)
Fragment(array)Fragment(array)
Geometry
• More next (Blythe talk)
• One element in / 0 to ~100 out
– Limited by hardware buffer sizes
• Like vertex:
– NO communication
– Can select fragment address
BufferBufferVertex (stream)Vertex (stream)
Geometry(stream)Geometry(stream)
Fragment(array)Fragment(array)
Fragment
• Biggest computational resource
• One element in / 0 – 1 out
• Cannot change destination address
– I am element x,y in an array, what is my value?
• Effectively no communication
• Conditionals expensive
– Better if block coherence
BufferBufferVertex (stream)Vertex (stream)
Geometry(stream)Geometry(stream)
Fragment(array)Fragment(array)
Program / Multiple Passes
• Communication
– None in one pass
– Arbitrary read addresses between passes
• Data layout
– No persistent per-processor memory
– No penalty to change
BufferBufferVertex (stream)Vertex (stream)
Geometry(stream)Geometry(stream)
Fragment(array)Fragment(array)
Multiple passes
• GPGPU
• Non-local effects
– Shadow maps
– Texture space
• Precomputation
– Fix some degrees of freedom
– Factor into functions of 1-3D
– Project input or output into another space
GPU Shading and Rendering
Shading
Technology
8:30 Introduction(:30–Olano)
9:00 Direct3D 10(:45–Blythe)
Languages, System
s and D
emos
10:30 RapidMind (:50–McCool)
11:20 OpenGL Shading Language(:50–Olano)
1:45 Cg / NVIDIA(:50–Kilgard)
2:35 HLSL / ATI(:50–Scheuermann)
GP
Us in P
roduction Rendering
3:45 GPU Production Animation(:45–Wexler)
4:30 Interactive Cinematic Shading. Where are we?(:45–Pellacini)
Wrap
Up
5:15 Discussion and Q&A(:15–All)