Introduction to OpenGL Joseph Kider University of Pennsylvania CIS 565 – Fall 2011 (Source:...

Post on 20-Dec-2015

218 views 2 download

transcript

Introduction to OpenGL

Joseph Kider

University of Pennsylvania

CIS 565 – Fall 2011(Source: Patrick Cozzi)

Administrivia

Assignment 2 handed outUpgrade your video card drivers [NVIDIA | ATI]

Agenda

Review Monday’s GLSL material OpenGL

Shaders and uniformsVertex arrays and buffersMultithreading

Review Assignment 1

GLSL Review

Rewrite with one if and one compareif (dist < wPrime)

{

if (dist < closestDistance)

{

closestDistance = dist;

}

}

GLSL Review

Implement this conciselybool PointInsideAxisAlignedBoundingBox(vec3 p, vec3 b0, vec3 b1)

{

// ...

}

b1

b0

p

Does your code also work for vec2?

GLSL Review

What is the difference between a fixed function and programmable stage?

Vertex shader What is its input? Output?

Fragment shader What is its input? Output? [true | false] Fragment shaders allow you to change

the xy position [true | false] A best practice is to roll your own

functions instead of calling library functions In general, build vs buy

OpenGL

Is a C-based API Is cross platform Is run by the ARB: Architecture Review

Board Hides the device driver details OpenGL vs Direct3D

Not going there – at least not on record

OpenGL

We are using GL 3.3 core profileNo fixed function vertex and fragment shadingNo legacy API calls:

glBegin() glRotatef() glTexEnvf() AlphaFunc() …

Why was the alpha test remove?

Recall the fixed function light map

OpenGL

GPU

Device Driver

OpenGL API

Application

Software stack:

OpenGL

Major objects:

Shader Programs

Textures

Framebuffers Vertex Arrays

Shader Objects

Vertex Buffers

Fixed Function State

Index Buffers

Pixel Buffers

Samplers

We are not covering everything. Just surveying the most relevant parts for writing GLSL shaders

Shaders

Shader object: an individual vertex, fragment, etc. shaderAre provided shader source code as a stringAre compiled

Shader program: Multiple shader objects linked together

Shader Objects

const char *source = // ...

GLint sourceLength = // ...

GLuint v = glCreateShader(GL_VERTEX_SHADER);

glShaderSource(v, 1, &source, &sourceLength);

glCompileShader(v);

GLint compiled;

glGetShaderiv(v, GL_COMPILE_STATUS, &compiled);

// success: compiled == GL_TRUE

// ...

glDeleteShader(v);

Compile a shader object:

Shader Objects

const char *source = // ...

GLint sourceLength = // ...

GLuint v = glCreateShader(GL_VERTEX_SHADER);

glShaderSource(v, 1, &source, &sourceLength);

glCompileShader(v);

GLint compiled;

glGetShaderiv(v, GL_COMPILE_STATUS, &compiled);

// success: compiled == GL_TRUE

// ...

glDeleteShader(v);

Compile a shader object:

v is an opaque object• What is it under the hood?• How would you design this in C++?

OpenGL functions start with gl. Why? How would you design this in C++?

Shader Objects

const char *source = // ...

GLint sourceLength = // ...

GLuint v = glCreateShader(GL_VERTEX_SHADER);

glShaderSource(v, 1, &source, &sourceLength);

glCompileShader(v);

GLint compiled;

glGetShaderiv(v, GL_COMPILE_STATUS, &compiled);

// success: compiled == GL_TRUE

// ...

glDeleteShader(v);

Compile a shader object: Provide the shader’ssource code

Where should thesource come from?

Why can we pass more than one string?

Shader Objects

const char *source = // ...

GLint sourceLength = // ...

GLuint v = glCreateShader(GL_VERTEX_SHADER);

glShaderSource(v, 1, &source, &sourceLength);

glCompileShader(v);

GLint compiled;

glGetShaderiv(v, GL_COMPILE_STATUS, &compiled);

// success: compiled == GL_TRUE

// ...

glDeleteShader(v);

Compile a shader object:Compile, but what does the driver really do?

Shader Objects

const char *source = // ...

GLint sourceLength = // ...

GLuint v = glCreateShader(GL_VERTEX_SHADER);

glShaderSource(v, 1, &source, &sourceLength);

glCompileShader(v);

GLint compiled;

glGetShaderiv(v, GL_COMPILE_STATUS, &compiled);

// success: compiled == GL_TRUE

// ...

glDeleteShader(v);

Compile a shader object:

Good developers check for error. Again, how would you design this in C++?

Calling glGet* has performance implications. Why?

Shader Objects

const char *source = // ...

GLint sourceLength = // ...

GLuint v = glCreateShader(GL_VERTEX_SHADER);

glShaderSource(v, 1, &source, &sourceLength);

glCompileShader(v);

GLint compiled;

glGetShaderiv(v, GL_COMPILE_STATUS, &compiled);

// success: compiled == GL_TRUE

// ...

glDeleteShader(v);

Compile a shader object:

Good developers also cleanup resources

Shader Objects

const char *source = // ...

GLint sourceLength = // ...

GLuint v = glCreateShader(GL_VERTEX_SHADER);

glShaderSource(v, 1, &source, &sourceLength);

glCompileShader(v);

GLint compiled;

glGetShaderiv(v, GL_COMPILE_STATUS, &compiled);

// success: compiled == GL_TRUE

// ...

glDeleteShader(v);

Compile a shader object: This process is just like compiling an OpenCL kernel. We will see later this semester

Shader Programs

GLuint v = glCreateShader(GL_VERTEX_SHADER);

GLuint f = glCreateShader(GL_FRAGMENT_SHADER);

// ...

GLuint p = glCreateProgram();

glAttachShader(p, v);

glAttachShader(p, f);

glLinkProgram(p);

GLint linked;

glGetShaderiv(p, GL_LINK_STATUS, &linked);

// success: linked == GL_TRUE

// ...

glDeleteProgram(v);

Link a shader program:

Shader Programs

GLuint v = glCreateShader(GL_VERTEX_SHADER);

GLuint f = glCreateShader(GL_FRAGMENT_SHADER);

// ...

GLuint p = glCreateProgram();

glAttachShader(p, v);

glAttachShader(p, f);

glLinkProgram(p);

GLint linked;

glGetShaderiv(p, GL_LINK_STATUS, &linked);

// success: linked == GL_TRUE

// ...

glDeleteProgram(v);

Link a shader program:A program needs at least a vertex and fragment shader

Shader Programs

GLuint v = glCreateShader(GL_VERTEX_SHADER);

GLuint f = glCreateShader(GL_FRAGMENT_SHADER);

// ...

GLuint p = glCreateProgram();

glAttachShader(p, v);

glAttachShader(p, f);

glLinkProgram(p);

GLint linked;

glGetShaderiv(p, GL_LINK_STATUS, &linked);

// success: linked == GL_TRUE

// ...

glDeleteProgram(v);

Link a shader program:

Shader Programs

GLuint v = glCreateShader(GL_VERTEX_SHADER);

GLuint f = glCreateShader(GL_FRAGMENT_SHADER);

// ...

GLuint p = glCreateProgram();

glAttachShader(p, v);

glAttachShader(p, f);

glLinkProgram(p);

GLint linked;

glGetShaderiv(p, GL_LINK_STATUS, &linked);

// success: linked == GL_TRUE

// ...

glDeleteProgram(v);

Link a shader program:

Be a good developer again

Using Shader Programs

GLuint p = glCreateProgram();

// ...

glUseProgram(p);

glDraw*(); // * because there are lots of draw functions

Part of the current state• How do you draw different objects with different shaders?• What is the cost of using multiple shaders?• How do you reduce the cost?

• Hint: write more CPU code – really.

Using Shader Programs

GLuint p = glCreateProgram();

// ...

glUseProgram(p);

glDraw*(); // * because there are lots of draw functions

Uniforms

GLuint p = glCreateProgram();

// ...

glLinkProgram(p);

GLuint m = glGetUniformLocation(p, “u_modelViewMatrix”);

GLuint l = glGetUniformLocation(p, “u_lightMap”);

glUseProgram(p);

mat4 matrix = // ...

glUniformMatrix4fv(m, 1, GL_FALSE, &matrix[0][0]);

glUniform1i(l, 0);

Uniforms

GLuint p = glCreateProgram();

// ...

glLinkProgram(p);

GLuint m = glGetUniformLocation(p, “u_modelViewMatrix”);

GLuint l = glGetUniformLocation(p, “u_lightMap”);

glUseProgram(p);

mat4 matrix = // ...

glUniformMatrix4fv(m, 1, GL_FALSE, &matrix[0][0]);

glUniform1i(l, 0);

Each active uniform has an integer index location.

Uniforms

GLuint p = glCreateProgram();

// ...

glLinkProgram(p);

GLuint m = glGetUniformLocation(p, “u_modelViewMatrix”);

GLuint l = glGetUniformLocation(p, “u_lightMap”);

glUseProgram(p);

mat4 matrix = // ...

glUniformMatrix4fv(m, 1, GL_FALSE, &matrix[0][0]);

glUniform1i(l, 0);

mat4 is part of the C++ GLM library

GLM: http://www.g-truc.net/project-0016.html#menu

Uniforms

GLuint p = glCreateProgram();

// ...

glLinkProgram(p);

GLuint m = glGetUniformLocation(p, “u_modelViewMatrix”);

GLuint l = glGetUniformLocation(p, “u_lightMap”);

glUseProgram(p);

mat4 matrix = // ...

glUniformMatrix4fv(m, 1, GL_FALSE, &matrix[0][0]);

glUniform1i(l, 0);

Uniforms can be changed as often as needed, but are constant during a draw call

Not transposing the matrix

glUniform* for all sorts of datatypes

Uniforms

GLuint p = glCreateProgram();

// ...

glLinkProgram(p);

GLuint m = glGetUniformLocation(p, “u_modelViewMatrix”);

GLuint l = glGetUniformLocation(p, “u_lightMap”);

glUseProgram(p);

mat4 matrix = // ...

glUniformMatrix4fv(m, 1, GL_FALSE, &matrix[0][0]);

glUniform1i(l, 0);

Why not glUniform*(p, …)?

Drawing

How do we transfer vertices from system memory to video memory?

How do we issue draw calls?

Drawing

It doesn’t matter if we’re using:

Efficiently transferring data between the CPU and GPU is critical for performance.

Drawing

Image from http://arstechnica.com/hardware/news/2009/10/day-of-nvidia-chipset-reckoning-arrives.ars

Typical pre-Nahalem Intel System

Separate system and video memory

Need to transfer vertices from one to the other quickly

• 4 GB/s reads and writes• Theoretical 128M 32 byte vertices/second

Drawing How good is 128M vertices/second?

Image from http://graphics.cs.uni-sb.de/MassiveRT/boeing777.html

Boeing 777 model: ~350 million polygons

Drawing How good is 128M vertices/second?

Image from http://www.vision.ee.ethz.ch/~pmueller/wiki/CityEngine/Documents

Procedurally generated model of Pompeii: ~1.4 billion polygons

Drawing

OpenGL has evolved since 1992 (GL 1.0) Immediate modeDisplay listsClient-side vertex arraysVertex buffer objects (VBOs)

Drawing: Immediate Mode

GLfloat v0[3] = { 0.0f, 0.0f, 0.0f };

// ...

glBegin(GL_TRIANGLES);

glVertex3fv(v0);

glVertex3fv(v1);

glVertex3fv(v2);

glVertex3fv(v3);

glVertex3fv(v4);

glVertex3fv(v5);

glEnd();

Pro: really simple What’s the con?

Drawing: Display Lists

GLuint dl = glGenLists(1);

glNewList(dl, GL_COMPILE);

glBegin(GL_TRIANGLES);

// ...

glEnd();

glEndList();

// ...

glCallList(dl);

// ...

glDeleteLists(dl, 1);

Drawing: Display Lists

GLuint dl = glGenLists(1);

glNewList(dl, GL_COMPILE);

glBegin(GL_TRIANGLES);

// ...

glEnd();

glEndList();

// ...

glCallList(dl);

// ...

glDeleteLists(dl, 1);

Create one display list, just like glCreateShader creates a shader

Drawing: Display Lists

GLuint dl = glGenLists(1);

glNewList(dl, GL_COMPILE);

glBegin(GL_TRIANGLES);

// ...

glEnd();

glEndList();

// ...

glCallList(dl);

// ...

glDeleteLists(dl, 1);

OpenGL commands between glNewList and glEndList are not executed immediately. Instead, they are compiled into the display list.

Drawing: Display Lists

GLuint dl = glGenLists(1);

glNewList(dl, GL_COMPILE);

glBegin(GL_TRIANGLES);

// ...

glEnd();

glEndList();

// ...

glCallList(dl);

// ...

glDeleteLists(dl, 1);

A single function call executes the display list.You can execute the same display list many times.

Pros Little function call overhead Optimized compiling: stored in

video memory, perhaps vertex cache optimized, etc.

Cons Compiling is slow. How do

you support dynamic data? Usability: what is compiled

into a display list and what isn’t?

Drawing: Display Lists

GLuint dl = glGenLists(1);

glNewList(dl, GL_COMPILE);

glBegin(GL_TRIANGLES);

// ...

glEnd();

glEndList();

// ...

glCallList(dl);

// ...

glDeleteLists(dl, 1); You guys are good developers

Drawing: Client-side Vertex Arrays

Point GL to an array in system memory

GLfloat vertices[] = {...}; // 2 triangles = 6 vertices = 18 floats

glEnableClientState(GL_VERTEX_ARRAY);

glVertexPointer(3, GL_FLOAT, 0, vertices);

glDrawArrays(GL_TRIANGLES, 0, 18);

glDisableClientState(GL_VERTEX_ARRAY);

Drawing: Client-side Vertex Arrays

GLfloat vertices[] = {...}; // 2 triangles = 6 vertices = 18 floats

glEnableClientState(GL_VERTEX_ARRAY);

glVertexPointer(3, GL_FLOAT, 0, vertices);

glDrawArrays(GL_TRIANGLES, 0, 18);

glDisableClientState(GL_VERTEX_ARRAY);

Store vertices in an array

Drawing: Client-side Vertex Arrays

GLfloat vertices[] = {...}; // 2 triangles = 6 vertices = 18 floats

glEnableClientState(GL_VERTEX_ARRAY);

glVertexPointer(3, GL_FLOAT, 0, vertices);

glDrawArrays(GL_TRIANGLES, 0, 18);

glDisableClientState(GL_VERTEX_ARRAY);

Ugh, tell GL we have vertices (positions, actually)• Managing global state is painful

Drawing: Client-side Vertex Arrays

GLfloat vertices[] = {...}; // 2 triangles = 6 vertices = 18 floats

glEnableClientState(GL_VERTEX_ARRAY);

glVertexPointer(3, GL_FLOAT, 0, vertices);

glDrawArrays(GL_TRIANGLES, 0, 18);

glDisableClientState(GL_VERTEX_ARRAY);

Pointer to our vertices

Drawing: Client-side Vertex Arrays

GLfloat vertices[] = {...}; // 2 triangles = 6 vertices = 18 floats

glEnableClientState(GL_VERTEX_ARRAY);

glVertexPointer(3, GL_FLOAT, 0, vertices);

glDrawArrays(GL_TRIANGLES, 0, 18);

glDisableClientState(GL_VERTEX_ARRAY);Stride, in bytes, between vertices. 0 means tightly packed.

Drawing: Client-side Vertex Arrays

GLfloat vertices[] = {...}; // 2 triangles = 6 vertices = 18 floats

glEnableClientState(GL_VERTEX_ARRAY);

glVertexPointer(3, GL_FLOAT, 0, vertices);

glDrawArrays(GL_TRIANGLES, 0, 18);

glDisableClientState(GL_VERTEX_ARRAY);Each vertex has 3 floating point components

Drawing: Client-side Vertex Arrays

GLfloat vertices[] = {...}; // 2 triangles = 6 vertices = 18 floats

glEnableClientState(GL_VERTEX_ARRAY);

glVertexPointer(3, GL_FLOAT, 0, vertices);

glDrawArrays(GL_TRIANGLES, 0, 18);

glDisableClientState(GL_VERTEX_ARRAY);

Draw in a single GL call

Pro: little function call overhead Con: bus traffic

Drawing: Vertex Buffer Objects

VBO: Vertex Buffer Object Like client-side vertex arrays, but:

Stored in driver-controlled memory, not an array in your application

Provide hints to the driver about how you will use the buffer

VBOs are the only way to store vertices in GL 3.3 core profile. The others are deprecated

We can use textures, but let’s not jump ahead

Drawing: Vertex Buffer Objects

GLuint vbo;

GLfloat* vertices = new GLfloat[3 * numberOfVertices];

glGenBuffers(1, &vbo);

glBindBuffer(GL_ARRAY_BUFFER_ARB, vbo);

glBufferData(GL_ARRAY_BUFFER_ARB, numberOfBytes, vertices, GL_STATIC_DRAW_ARB);

// Also check out glBufferSubData

delete [] vertices;

glDeleteBuffers(1, &vbo);

Drawing: Vertex Buffer Objects

GLuint vbo;

GLfloat* vertices = new GLfloat[3 * numberOfVertices];

glGenBuffers(1, &vbo);

glBindBuffer(GL_ARRAY_BUFFER_ARB, vbo);

glBufferData(GL_ARRAY_BUFFER_ARB, numberOfBytes, vertices, GL_STATIC_DRAW_ARB);

// Also check out glBufferSubData

delete [] vertices;

glDeleteBuffers(1, &vbo);

Drawing: Vertex Buffer Objects

GLuint vbo;

GLfloat* vertices = new GLfloat[3 * numberOfVertices];

glGenBuffers(1, &vbo);

glBindBuffer(GL_ARRAY_BUFFER_ARB, vbo);

glBufferData(GL_ARRAY_BUFFER_ARB, numberOfBytes, vertices, GL_STATIC_DRAW_ARB);

// Also check out glBufferSubData

delete [] vertices;

glDeleteBuffers(1, &vbo);

Drawing: Vertex Buffer Objects

GLuint vbo;

GLfloat* vertices = new GLfloat[3 * numberOfVertices];

glGenBuffers(1, &vbo);

glBindBuffer(GL_ARRAY_BUFFER_ARB, vbo);

glBufferData(GL_ARRAY_BUFFER_ARB, numberOfBytes, vertices, GL_STATIC_DRAW_ARB);

// Also check out glBufferSubData

delete [] vertices;

glDeleteBuffers(1, &vbo); Copy from application to driver-controlled memory. GL_STATIC_DRAW should imply video memory.

Drawing: Vertex Buffer Objects

GLuint vbo;

GLfloat* vertices = new GLfloat[3 * numberOfVertices];

glGenBuffers(1, &vbo);

glBindBuffer(GL_ARRAY_BUFFER_ARB, vbo);

glBufferData(GL_ARRAY_BUFFER_ARB, numberOfBytes, vertices, GL_STATIC_DRAW_ARB);

// Also check out glBufferSubData

delete [] vertices;

glDeleteBuffers(1, &vbo); Does glBufferData block? Does glBufferSubData block?

Drawing: Vertex Buffer Objects

Usage HintStatic: 1-to-n update-to-draw ratioDynamic: n-to-m update to draw (n < m)Stream: 1-to-1 update to draw

It’s a hint. Do drivers take it into consideration?

Drawing: Vertex Buffer Objects

Image from http://developer.nvidia.com/object/using_VBOs.html

Map a pointer to driver-controlled memory• Also map just a subset of the buffer

Drawing: Vertex Buffer Objects

Image from: http://upgifting.com/tmnt-pizza-poster

Immediate Mode

VBOs

In general:

Say no to drugstoo, please.

Vertex Array Objects

VBOs are just buffersRaw bytesVAOs: Vertex Array Objects

Interpret VBOs as actual vertices Used when issuing glDraw* You are not responsible for the implementation

details

VBO Layouts

Images courtesy of A K Peters, Ltd. www.virtualglobebook.com

Separate Buffers

Non-interleaved Buffer

Interleaved Buffer

VBO Layouts: Tradeoffs

Separate Buffers Flexibility, e.g.:

Combination of static and dynamic buffers Multiple objects share the same buffer

Non-interleaved Buffer How is the memory coherence?

Interleaved Buffer Faster for static buffers

Proportional to the number of attributes

Hybrid?

Vertex Throughput: VBO Layouts

Image from http://www.sci.utah.edu/~csilva/papers/thesis/louis-bavoil-ms-thesis.pdf

64k triangles per batch and n 4-float texture coordinates

Vertex Throughput: VBO Layouts

Image from http://www.sci.utah.edu/~csilva/papers/thesis/louis-bavoil-ms-thesis.pdf

64k triangles per batch and n 4-float texture coordinates

Vertex Throughput: Batching

Image from http://www.sci.utah.edu/~csilva/papers/thesis/louis-bavoil-ms-thesis.pdf

Vertex Throughput: Batching

Image from http://www.sci.utah.edu/~csilva/papers/thesis/louis-bavoil-ms-thesis.pdf

Making lots of glDraw* calls is slow. Why?

Vertex Throughput Tips

Optimize for the Vertex Caches Use smaller vertices

Use less precision, e.g., half instead of floatCompress, then decompress in vertex shaderPack, then unpack in vertex shaderDerive attributes or components from other

attributesHow many components do you need to store a

normal?

Vertex Throughput Tips

Know your architecture!

Image from http://www.sci.utah.edu/~csilva/papers/thesis/louis-bavoil-ms-thesis.pdf

Vertex Throughput Tips

Know your architecture!

Image from http://www.sci.utah.edu/~csilva/papers/thesis/louis-bavoil-ms-thesis.pdf

GL_SHORT faster on NVIDIA… …slower on ATI

Vertex Throughput Tips

Know your architecture!

Image from http://www.sci.utah.edu/~csilva/papers/thesis/louis-bavoil-ms-thesis.pdf

Vertex Throughput Tips

Know your architecture!

Image from http://www.sci.utah.edu/~csilva/papers/thesis/louis-bavoil-ms-thesis.pdf

GL_SHORT normals faster than GL_FLOAT on NVIDIA But not ATI Still true today?

Vertex Throughput Tips

Know your architecture!

Image from http://www.sci.utah.edu/~csilva/papers/thesis/louis-bavoil-ms-thesis.pdf

GL_BYTE normals use less memory than GL_SHORT or GL_FLOAT but are slower Why? Still true today?

Vertex Throughput Tips

Know your architecture!

Image from http://www.sci.utah.edu/~csilva/papers/thesis/louis-bavoil-ms-thesis.pdf

Vertex Throughput Tips

Know your architecture!

Do you believe me yet?

Multithreaded Rendering Quake 4 CPU usage

41% - driver49% - engine

Split render work into two threads:

Image from http://mrelusive.com/publications/presentations/2008_gdc/GDC%2008%20Threading%20QUAKE%204%20and%20ETQW%20Final.pdf

Multithreaded Rendering Tradeoffs

Throughput vs latencyMemory usage – double buffering

Cache pollution

SynchronizationSingle core machines

DOOM III era

Multithreaded OpenGL Drivers

Image from http://developer.apple.com/library/mac/#technotes/tn2006/tn2085.html

Driver CPU overhead is moved to a separate core

Application remains unchanged

What happens when you call glGet*?

Not Covered Today

Textures Framebuffers State management …

Useful for GPGPU – and graphics, obviously

Class Poll

Multithreaded graphics engine design class? More graphics-related classes? Itching to get to GPGPU and GPU computing?

OpenGL Resources

OpenGL/GLSL Quick Reference Card http://www.khronos.org/files/opengl-quick-reference-card.pdf

OpenGL Spec http://www.opengl.org/registry/doc/glspec33.core.20100311.pdf

OpenGL Forums http://www.opengl.org/discussion_boards/