of 29
7/30/2019 BobAndDerek GPU Part4
1/29
Speed-up of Algorithms With Graphics
Processing Units (GPU): Part IV of IV*# Robert H. Luke and*# Derek Anderson
*Electrical and Computer Engineering Department
#
Predoctoral Fellows, NLM Training Grant
IEEE Computational Intelligence Society MU Chapter
And
National Library of Medicine Medical Informatics Training Grant
Special Seminar Series
7/30/2019 BobAndDerek GPU Part4
2/29
Organization of Lectures
Part I Introduction to GPUs and shader languages
Part II
Image processing (Morphology, Sobel, and Gaussian)
Part III Performance, multi-pass rendering, optimizations, and debugging
Part IV Using GPUs for non-image based processing (SOFM & CA)
7/30/2019 BobAndDerek GPU Part4
3/29
General Purpose Programming
You already have all the tools required 16/32 bit floating point values
2D Textures of any size [1,8192]
Frame Buffer Objects
But a few extra tricks are helpful Ping Pong
Reduction
7/30/2019 BobAndDerek GPU Part4
4/29
Ping Pong
The name says it all Have two textures of the same dimensionality
One fragment program
Use texture A as input, B as output
Then swap; B as input, A as output
FragmentFragment
ProgramProgramTexture ATexture A Texture BTexture B
InputInput
InputInput
OutputOutput OutputOutput
7/30/2019 BobAndDerek GPU Part4
5/29
Reduction
Use the Ping Pong idea Reduce the size of the texture with each pass
Distributes operations over more processors
PONMLKJI
HGFE
DCBA
O+PM+NK+LI+J
G+HE+F
C+DA+B
M+N+O+PI+J+K+L
E+F+G+H
A+B+C+D
7/30/2019 BobAndDerek GPU Part4
6/29
Reduction OpenGL Side
Only draw to half the texture Determine proper index in fragment code
texRECT(2.0*(texCoordXtexRECT(2.0*(texCoordX--.5),.5), texCoordYtexCoordY))texRECT(2.0*(texCoordYtexRECT(2.0*(texCoordY--.5)+1,.5)+1, texCoordYtexCoordY))
7/30/2019 BobAndDerek GPU Part4
7/29
SOFM Clustering technique
Used in many settings for generating a codebook
Have a 1D/2D/3D space of nodes (Usually 2D) Each map node has dimensionality d
Input data is of size nXd
53.88.73.96.577.734.29.421.84.51.3
35.55.49.754.973.429.15.3928.423.986.5
75.55.898.725.953.489.199.3948.463.946.5
Input DataInput Datadd
nn
SOFMSOFM
NodesNodes
7/30/2019 BobAndDerek GPU Part4
8/29
SOFM Cont.
Go through each input vector Find node with minimum dist to current input vector
Move nearest node closer to input vector
Also move the nodes around the winning node closer to the input
vector by smaller amountSOFMSOFM
NodesNodes
53.88.73.96.577.734.29.421.84.51.3
Current Input VectorCurrent Input Vector
Move winning node and neighborsMove winning node and neighbors
closer to input vectorcloser to input vector
7/30/2019 BobAndDerek GPU Part4
9/29
SOFM Results
Have a 2D representation of clusters Neighboring nodes are similar
7/30/2019 BobAndDerek GPU Part4
10/29
GPU SOFM
Input Data Written numbers 0-9
5 samples of each
Gray Scale image shrunk to 16x16 for each number : d=256
Texture size for data : 50x256
5050
256256
InputInput
7/30/2019 BobAndDerek GPU Part4
11/29
GPU SOFM
SOFM Nodes 8x8 space
Dimensionality of each node 256
Texture size for data : 8x2046
Need 2 for Ping Pong
88
20462046
88
20462046
InputInput OutputOutput
7/30/2019 BobAndDerek GPU Part4
12/29
GPU SOFM
Dist to current input vector Same size as SOFM node texture : 8x2046
Need 2 for reduction
88
20462046
88
20462046
InputInput OutputOutput
7/30/2019 BobAndDerek GPU Part4
13/29
GPU SOFM
Min dist to current vector texture Dimensionality : 8x8
Need 2 for reduction
88
88
88
88
InputInput OutputOutput
7/30/2019 BobAndDerek GPU Part4
14/29
GPU SOFM Algorithm
Determine distance from each node to the input vector Done on a per pixel/index basis
Input VectorInput VectorDistancesDistances
SOFM Node DataSOFM Node Data
7/30/2019 BobAndDerek GPU Part4
15/29
SUM The Distances
DistancesDistances
Sum the distance indexesSum the distance indexes
Reduced DistancesReduced Distances
Summation DistancesSummation Distances
7/30/2019 BobAndDerek GPU Part4
16/29
Find Minimum Distance
Perform min reduction
DistancesDistances Min Reduced DistancesMin Reduced Distances
Min DistanceMin Distance
7/30/2019 BobAndDerek GPU Part4
17/29
Update Node Data
SOFM Node DataSOFM Node Data
Input VectorInput Vector
DistancesDistances
Min DistanceMin Distance
Updated SOFM Node DataUpdated SOFM Node Data
7/30/2019 BobAndDerek GPU Part4
18/29
And So On
Continue looping through these steps as many time as desired. Be sure to toggle which SOFM Node Data texture is being read
from and written to.
7/30/2019 BobAndDerek GPU Part4
19/29
Cellular Automata (CA)
Studied within mathematics, theory of computation, pattern recognition,
General idea (what we need to know in order to make a GPU program!) A grid of identical finite state automata whose next state is determined solely
by their current state and the state of their neighbors
StartStart
Increasing TimeIncreasing Time
RulesRules
1D Grid1D Grid
7/30/2019 BobAndDerek GPU Part4
20/29
Steven Wolfram: A New Kind of Science
We took rules from A New Kind of Science and put them on a GPU
implementation of Cellular Automata (1D grid)
References
http://mathworld.wolfram.com/CellularAutomaton.html
http://www.wolframscience.com/nksonline/toc.html
7/30/2019 BobAndDerek GPU Part4
21/29
http://mathworld.wolfram.com/CellularAutomaton.html
Same start state, different rules!Same start state, different rules!
7/30/2019 BobAndDerek GPU Part4
22/29
Active HeadActive Head
(red)(red)
7/30/2019 BobAndDerek GPU Part4
23/29
CA on a GPU
Basic idea (for the 1D binary case)
Pack the data into an image (one channel, i.e. Red)
Use a FBO for multi-pass rendering (fast!)
Initialization
Set the values in the first row of pixels
Turn some on (1=black above) and some off (0=green above)
From i=2 to N [N is the number of rows in your image]
Only render row i
Fragment program (pixel j row i)
Look at the three values below you
Left, center, and right
Look at the rules and determine your state (on or off)
7/30/2019 BobAndDerek GPU Part4
24/29
Selecting a Row to Render!
OpenGL Setup glViewport(0, 0, (GLsizei) imageWidth, (GLsizei) imageHeight); gluOrtho2D(0.0,imageWidth,0.0,imageHeight);
Row counter int ca_counter;
Render Code glPolygonMode(GL_FRONT,GL_FILL); glBegin(GL_QUADS); glTexCoord2i(2,ca_counter-1); glVertex2f(2.0,ca_counter-1); glTexCoord2i(imageWidth-2,ca_counter-1); glVertex2f(imageWidth-2.0,ca_counter-1);
glTexCoord2i(imageWidth-2,ca_counter); glVertex2f(imageWidth-2.0,ca_counter); glTexCoord2i(2,ca_counter); glVertex2f(2.0,ca_counter); glEnd();
7/30/2019 BobAndDerek GPU Part4
25/29
Fragment Program
void FragmentProgram (
out float4 color0 : COLOR0 ,
float2 coords : TEXCOORD0 ,
uniform samplerRECT tex )
{
//I use this for the check below
half2 tul, tuc, tur, tul2, tur2;
//Calc the image index
half2 newindex = coords.xy;
static const half offset = 1.0;
tul = texRECT( tex , newindex + float2(-offset,-offset) ).rg;
tuc = texRECT( tex , newindex + float2(0.0,-offset) ).rg;
tur = texRECT( tex , newindex + float2(offset,-offset) ).rg;
7/30/2019 BobAndDerek GPU Part4
26/29
Fragment Programif(tuc.r == 1.0){
if( tul.g == 1.0 ){
if( tuc.g == 1.0 ){
if( tur.g == 1.0 ){ //1 1 1
color0 = float4(0.0,0.0,0.0,1.0);
}else{ //1 1 0
color0 = float4(0.0,0.0,0.0,1.0);
}
}else{
if( tur.g == 1.0 ){ //1 0 1
color0 = float4(0.0,1.0,0.0,1.0);
}else{ //1 0 0
color0 = float4(0.0,1.0,0.0,1.0);
}
}
}else{
if( tuc.g == 1.0 ){
if( tur.g == 1.0 ){ //0 1 1
color0 = float4(0.0,0.0,0.0,1.0);
}else{ //0 1 0
color0 = float4(0.0,0.0,0.0,1.0);
}
}else{
if( tur.g == 1.0 ){ //0 0 1
color0 = float4(0.0,1.0,0.0,1.0);
}else{ //0 0 0
color0 = float4(0.0,1.0,0.0,1.0);
}
}
}}
7/30/2019 BobAndDerek GPU Part4
27/29
Fragment Program
else{
if( tul.r == 1.0 ){
tul2 = texRECT( tex , newindex + float2(-2.0*offset,-offset) ).rg;if( (tul2.g == 0.0 && tul.g == 0.0 && tuc.g == 0.0) || (tul2.g == 0.0 && tul.g == 0.0 && tuc.g == 1.0) || (tul2.g == 0.0 && tul.g == 1.0 && tuc.g == 0.0) ||
(tul2.g == 1.0 && tul.g == 1.0 && tuc.g == 0.0) || (tul2.g == 1.0 && tul.g == 1.0 && tuc.g == 1.0) ){
color0 = float4(1.0,tuc.g,0.0,1.0);
}else{
color0 = float4( tuc , 0.0 , 1.0 );
}
}else if( tur.r == 1.0 ){
tur2 = texRECT( tex , newindex + float2(2.0*offset,-offset) ).rg;
if( (tuc.g == 0.0 && tur.g == 1.0 && tur2.g == 1.0) || (tuc.g == 1.0 && tur.g == 0.0 && tur2.g == 0.0) || (tuc.g == 1.0 && tur.g == 0.0 && tur2.g == 1.0) ){
color0 = float4(1.0,tuc.g,0.0,1.0);
}else{
color0 = float4( tuc , 0.0 , 1.0 );
}
}else{
color0 = float4( tuc , 0.0 , 1.0 );}
}
7/30/2019 BobAndDerek GPU Part4
28/29
Conway's Game of Life
For a space that is populated
Each cell with one or no neighbors dies
Each cell with four or more neighborsdies, as if by overpopulation
Each cell with two or three neighbors
survives For a space that is 'empty' or
'unpopulated'
Each cell with three neighbors becomes
populated
7/30/2019 BobAndDerek GPU Part4
29/29
Life on a GPU
Simple GPU program! Use FBOs
Render each pixel
Sample the neighborhood
Like CA, make a decision based on the rules of the game