Date post: | 06-May-2015 |
Category: |
Technology |
Upload: | amd-developer-central |
View: | 5,167 times |
Download: | 4 times |
USING OPENGL AND DIRECTX FOR HETEROGENEOUS COMPUTE
KARL HILLESLAND
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 2
AGENDA
THE GRAPHICS PIPELINE
PROGRAMMING THE GPU
FEEDING THE GPU
The Graphics Pipeline
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 4
GRAPHICS PIPELINE SHADER CENTRIC
OpenGL DirectX Input Assembler Vertex Shader Hull Shader Tessellator
Domain Shader Geometry Shader
Rasterizer Pixel Shader
Output Merger
Vertex Puller Vertex Shader
TessellaQon Control Shader TessellaQon PrimiQve Generator TessellaQon EvaluaQon Shader
Geometry Shader Rasterizer
Fragment Shader Per-‐Fragment OperaQons
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 5
GRAPHICS PIPELINE MORE DETAILS
Input Assembler Vertex Shader
Hull Shader Domain Shader
Geometry Shader Next Slide
Collects Patches
Patch Constant
indices, verQces
vertex
Patch verts n1 Thread per vertex
Thread per output control point n2
Control point
Tessellator Tess factors
Collects patches
Thread per DS vertex (n3) Barycentric
Patch verts n2 Collects prims
DS vertex
Prim verts PrimiQve Assembler
Prims
vertex
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 6
Hi-‐Z/Stencil Rasterizer 1 prim Hi-‐Z/Stencil info
Rasterizer 2 Early-‐Z/Stencil
Collects Quads Pixel Shader
Reordering Depth/Stencil Blending
Unroller Unrolling, Masking
Not shown: Any shader stage can read/write to memory, including atomics, filtering*, decompression, and sRGB conversion
Conversion
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 7
WHAT’S THE POINT?
! The Graphics pipeline has a lot more parts ‒ Reorganizes threads ‒ Tracks dependencies ‒ Reorders ‒ Extra fixed-‐funcQon units
! Are they usable?
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 8
GRAPHICS IN THE NINETIES
Input Assembler
Transform and LighQng
Rasterizer
Texturing and Fog
Output Merger
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 9
VORONOI DIAGRAMS
! Color according to closest ‒ Point ‒ Line
! Could be weighted ! Useful for
‒ Collision DetecQon ‒ Surface ReconstrucQon ‒ Robot MoQon Planning ‒ Non-‐PhotorealisQc Rendering ‒ Surface SimplificaQon ‒ Mesh GeneraQon
GPGPU WITHOUT SHADERS
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 10
VORONOI DIAGRAMS IN THE NINETIES
2-‐part discrete Voronoi diagram representaQon
Distance
Depth Buffer
Site IDs
Color Buffer
Simply rasterize the cones using graphics
hardware
Haeberli90, Woo97
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 11
OPENGL 1 SIMD MACHINE PEERCY, ET. AL. SIGGRAPH 2000
SIMD Concept OpenGL 1 SIMD
InstrucQon OpenGL call (CPU)
SIMD Lane Pixel
SIMD Lane Input Data Texel
SIMD Lane Output Data Fragment
ALU Blend OperaQon
CondiQonals Alpha and Stencil Tests
float y; float4 contrived_example() { float x = f(u,v) if( x*y > 0) { x = x + g(u,v) } return x*h(u,v);
}
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 12
USING EARLY-‐Z OR STENCIL
ApplicaQons of Explicit Early-‐Z Culling, Real-‐Time Shading Course, Siggraph 2004.
Pressure buffer used for sim culling Texture-‐space blur With back-‐face culling
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 13
The graphics pipeline gives you access to more
What’s the Point?
Programming the GPU
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 15
OpenGL D3D
! Compute (4.3)
! Vertex (2, ES 2) ! TessellaQon Control (4) ! TessellaQon EvaluaQon (4) ! Geometry (3)
! Fragment (2, ES 2)
! Compute (11) ! Vertex (8) ! Hull (11) ! Domain (11) ! Geometry (10) ! Pixel (9)
SHADER TYPES
15
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 16
#version 430 in vec3 Position; in vec2 UV; out PosUV //Not available in GLES
{ vec3 vPositionWS; vec2 vUV;
} vs_output; uniform mat4x4 mMVP; uniform mat4x4 mM;
void main(void) {
gl_Position = mMVP * vec4(Position, 1.0);
vs_output.vPositionWS = mM * vec4(Position, 1.0); vs_output.vUV = UV;
}
BASIC GLSL VERTEX SHADER
16
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 17
in fsInput //Not available in GLES
{
vec3 vPositionWS;
vec2 vUV;
} fs_input;
uniform sampler2D sDiffuse;
out vec4 color_out;
void main(void)
{
color_out = texture( sDiffuse, fs_input.vUV );
}
BASIC GLSL PIXEL SHADER
17
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 18
struct PosUV //Not available in GLES
{
float4 vPositionSS : SV_POSITION;
float3 vPositionWS : POSITION;
float2 vUV : TEXCOORD0;
};
float4x4 mMVP;
float4x4 mM;
PosUV main(
float3 Position : POSITION,
float2 UV: TEXCOORD0)
{
PosUV vs_output;
output.vPositionSS = mMVP * float4(Position, 1.0);
vs_output.vPositionWS = mMP * float4(Position, 1.0);
vs_output.vUV = UV;
return vs_output;
}
BASIC HLSL VERTEX SHADER
18
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 19
struct fsInput
{
float3 vPositionWS : POSITION;
float2 vUV : TEXCOORD0;
};
sampler sWrapTriLin;
texture2D <float4> tDiffuse;
float4 main(fsInput i) : SV_TARGET
{
return tDiffuse.Sample(sWrapTriLin, i.vUV);
}
BASIC HLSL PIXEL SHADER
19
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 20
layout (triangles) in;
layout (triangle_strip, max_vertices = 3) out;
void main(void)
{
for(int i=0; i < gl_in.length(); i++)
{
gl_Position = gl_in[i].gl_Position;
EmitVertex();
}
EndPrimitive();
}
BASIC GEOMETRY SHADER
20
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 21
TESSELLATION
D3D11 OpenGL 4.0
Hull Shader Patch Constant Func
Tessellator
Domain Shader
Tess factors
Topology
TessellaQon Control
TessellaQon EvaluaQon
Tessellator
Tess factors
Topology
21
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 22
TESSELLATION
D3D11 OpenGL 4.0
// Hull Shader
[outputcontrolpoints(4)]
[patchconstantfunc("ConstantsHS")]
[domain("quad")]
[partitioning(“integer")]
[outputtopology("triangle_cw")]
HS_OUTPUT HullShader(…)
// Domain Shader
DS_OUTPUT DomainShader(…)
// Tessellation Control layout (vertices = 4) out; void TCS(void) { if (gl_InvocationID == 0) { gl_TessLevelInner[0] = 2.0; … // Tessellation Evaluation layout (quads, cw, equal_spacing) in void TES(void) { …
22
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 23
out patch float tessFactor;
void main(void)
{
if (gl_InvocationID == 0)
{
gl_TessLevelInner[0] = 2.0;
…
tessFactor = 2.0;
}
barrier();
DoSomeWork(tessFactor, gl_InvocationID);
TESSELLATION CONTROL
TessellaQon rate can be set by any instance
Values can be communicated across threads
23
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 24
! Groups can share local memory
! Threads can be synced at a group level
24
COMPUTE SHADERS
global size y
global size x
Thread Group
group size x
group size y
Thread Thread
Thread Thread
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 25
OPENGL COMPUTE
buffer BlockName { int linearOutput[] };
shared int var;
layout(local_size_x = 64, local_size_y = 1, local_size_z = 1) void ContrivedSample()
{
const uvec3 localIdx = gl_LocalInvocationID; const uvec3 globalIdx = gl_GlobalInvocationID; const uvec3 groupIdx = gl_WorkGroupID;
if(localId.x == 0)
var = groupIdx.x;
barrier();
linearOutput[globalIdx.x] = var;
}
25
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 26
DIRECT COMPUTE
RWStructuredBuffer<int> linearOutput;
groupshared int var;
[numthreads(64, 1, 1)]
void ContrivedSample(
uint3 globalIdx : SV_DispatchThreadID,
uint3 localIdx : SV_GroupThreadID,
uint3 groupIdx : SV_GroupID )
{
if(localIdx.x == 0)
var = groupIdx.x;
GroupMemoryBarrierWithGroupSync();
linearOutput[globalIdx.x] = var;
}
26
PROGRAMMING THE GPU SYNCHRONIZATION
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 28
MEMORY COHERENCE-‐ GL / DX
Dispatch
CS Mem
CS
28
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 29
MEMORY COHERENCE-‐ GL/DX 11.1
Draw
VS
GS
FS
RT
Mem VS
GS
FS
29
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 30
MEMORY COHERENCE-‐ GL / DX 11.1
Draw
VS
GS
FS
RT
Mem
30
Feeding the GPU
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 32
DRIVER STACKS (WINDOWS)
32
OpenGL App
OpenGL32.dll
OpenGL ICD
DirectX App
D3D11.dll
D3D UMD
KMD
DXGI
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 33
DRIVER STACKS (LINUX)
33
App
libGL
DRI
drm
libDRM-‐radeon
Gallium3D State tracker
Gallium3D WinSys
Hardware layer Or
FEEDING THE GPU GPU-‐CPU SYNCHRONIZATION
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 35
DRIVER COMMAND QUEUE
35
Dr 5
Dr 6
Da 6
ApplicaQon
Dr 1 Dr 2 Da 2 Dr 3 Dr 4 Da 4 Dr 5 Dr 6 Da 6
Driver/GPU
Reorder possible?
Time
Da 1 Da 3
Da 5
Da 5
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 36
CPU/GPU MEMORY SYNCHRONIZATION BY DRIVER
App Memory
Driver Copy
App Memory
Driver Copy
GPU Read
GPU Read
Driver Copy
Stream, StaQc, Dynamic Draw, Read, Copy Hints
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 37
CPU/GPU MEMORY SYNCHRONIZATION MANUAL
Dr 1 Dr 2 Da 2 Dr 3 Dr 4 Da 4 Dr 5 Dr 6 Da 6 Da 1 Da 3 Da 5
App Memory App Copy GPU
Read Driver Copy
Fence
FEEDING THE GPU DATA
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 39
! glGenBuffers, glGenTextures, glGenSamplers, … ‒ Creates name / handle
! glBindBuffer, glBindTexture, ‒ Sets as current
! glBufferData, glTexSubImage, glMapBuffer ‒ Supplies data
LEGACY OPENGL OBJECT MODEL
39
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 40
BUFFER BINDING AND CREATION
glBindBuffer(target,name)
binding Target BufferObject
desc.BindFlags = <Target> pDevice-‐>CreateBuffer(desc,…)
BufferData
State, Usage
40
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 41
SETTING DATA (SIMPLEST OPTION)
binding Target BufferObject
glBufferData (target, size, pData, usage)
data
desc.Usage = <Usage> desc.CPUAccessFlags = <RWUsage> pDevice-‐>CreateBuffer(desc,pData,)
41
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 42
GL Name Typical Purpose DX Equivalent
ARRAY VerQces VERTEX
ELEMENT_ARRAY Indices INDEX
UNIFORM Read-‐only vars CONSTANT
TEXTURE_BUFFER Buffer-‐as-‐texture CONSTANT (tbuffer)
SHADER_STORAGE Read/write SHADER_RESOURCE
TRANSFORM_FEEDBACK Stream out Stream out
DRAW_INDIRECT indirect draw DRAWINDIRECT
ATOMIC_COUNTER Global counter var UAV_FLAG_COUNTER
COPY_READ, _WRITE Copying (opQonal) Staging?
PIXEL_PACK, _UNPACK GPU <-‐> CPU Staging?
BUFFER TARGETS
42
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 43
! Resource (base class) ‒ Usage: default, immutable, dynamic, staging ‒ Bind flags: vertex, index, shader resource, …
! Buffer ! Texture2D, …
! DepthStencilView ! RenderTargetView ! ShaderResourceView ! UnorderedAccessView
43
DIRECTX OBJECTS AND VIEWS
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 44
D3D11_BUFFER_DESC desc;
desc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
…
pDevice->CreateBuffer(&desc, data, &pBuffer);
D3D11_SHADER_RESOURCE_VIEW_DESC srvDesc;
srcDesc.ViewDimension = D3D11_SRV_DIMENSION_BUFFER;
…
pDevice->CreateShaderResourceView(pBuffer, &srvDesc, &pView);
//at draw time
pContext->VSSetShaderResources(0, 1, pView);
44
OBJECT AND VIEW EXAMPLE
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 45
DATA TYPES
Image Linear
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 46
glGenTextures(1, &texObjName);
glBindTexture(GL_TEXTURE_2D_ARRAY,
texObjName);
glTexStorage3D(GL_TEXTURE_2D_ARRAY, level, internalformat,
width, height, depth);
glTexSubImage3D(GL_TEXTURE_2D_ARRAY,
0,0,0, width, height, depth,
format, type, pData);
IMMUTABLE TEXTURES (4.2, GLES 3)
CreateTexture2D( desc, srcDataLayout, pData);
46
FEEDING THE GPU PROGRAMS
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 48
GLuint shader = glCreateShader(GL_VERTEX_SHADER);
glShaderSource(…);
glCompileShader();
GLuint program = glCreateProgram();
glAttachShader(program, shader);
glLinkProgram(program);
glUseProgram(program);
SHADER MANAGEMENT -‐ OPENGL
48
Program Object
Vertex Shader Pixel Shader
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 49
in fsInput //Not available in GLES
{
vec3 vPositionWS;
vec2 vUV;
} fs_input;
uniform sampler2D sDiffuse;
out vec4 color_out;
void main(void)
{
color_out = texture( sDiffuse, fs_input.vUV );
}
BASIC GLSL PIXEL SHADER
49
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 50
#version 430 in vec3 Position; in vec2 UV; out PosUV //Not available in GLES
{ vec3 vPositionWS; vec2 vUV;
} vs_output; uniform mat4x4 mMVP; uniform mat4x4 mM;
void main(void) {
gl_Position = mMVP * vec4(Position, 1.0);
vs_output.vPositionWS = mM * vec4(Position, 1.0); vs_output.vUV = UV;
}
BASIC GLSL VERTEX SHADER
50
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 51
D3DCompile(source,..,vs_5_0,..,&pByteCode)
pShader = CreateVertexShader(pByteCode);
VSSetShader(pShader,0,0);
! No program / link concept in API
SHADER MANAGEMENT -‐ DX
51
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 52
PROGRAM BINARIES
glGetProgramBinary(program,…,format,pBinaryOut);
! Program level
! In theory: format choices
! In pracQce: somewhat final, non-‐portable
D3DCompile(source,..,vs_5_0,..,&pByteCode)
! Shader level ! Portable byte code
OpenGL DirectX
52
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 53
OpenGL D3D
glDrawArrays Draw
glDrawArraysInstanced DrawInstanced(…,0)
glDrawArraysInstancedBaseInstance DrawInstanced
glDrawArraysIndirect DrawInstancedIndirect
glMulQDrawArrays for(int i=0; i<n; ++i) Draw(count[i], start[i]);
glMulQDrawArraysIndirect for(int i=0; i<n; ++i) DrawInstancedIndirect(…)
glDrawElements DrawIndexed
…And so forth
DRAW CALLS
53
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 54
glDispatchCompute(nGroupsX,nGroupsY,nGroupsZ)
COMPUTE SHADERS
Dispatch(nGroupsX,nGroupsY,nGroupsZ)
glDispatchComputeIndirect(offset)
DispatchIndirect(pResource,offset)
OpenGL 4.3 D3D11
54
Wrap up
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 56
IMAGE-‐BASED MODELING
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 57
GENERATING THE MODEL
Render: projecQon, rasterizaQon, texturing, depth buffering, …
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 58
TressFX
! AMD technology for high-‐quality hair rendering
! Thousands of hair strands individually simulated and rendered on the GPU
! DirectCompute physics simulaQon
! Shader Model 5.0 pixel shader using compute capabiliQes for rendering
HAIR
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 59
NOT EXPOSED IN GRAPHICS APIS (YET)
! Local shared memory restricted to ‒ Compute ‒ TessellaQon Control, in a limited sense
! Some OpenCL extensions (e.g., 64 bit atomics)
! Numerical compliance
! Some OpenCL 1.2 addiQons
! OpenCL 2.0 addiQons
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 60
SUMMARY
The graphics pipeline gives you access to different hardware
Mix and match for the best of both compute and graphics
There are addiQonal synchroniza6on issues and opportunites
| PRESENTATION TITLE | DECEMBER 4, 2013 | CONFIDENTIAL 61
DISCLAIMER & ATTRIBUTION
The informaQon presented in this document is for informaQonal purposes only and may contain technical inaccuracies, omissions and typographical errors.
The informaQon contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, sozware changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligaQon to update or otherwise correct or revise this informaQon. However, AMD reserves the right to revise this informaQon and to make changes from Qme to Qme to the content hereof without obligaQon of AMD to noQfy any person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION
© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinaQons thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdicQons. SPEC is a registered trademark of the Standard Performance EvaluaQon CorporaQon (SPEC). Other names are for informaQonal purposes only and may be trademarks of their respecQve owners.