+ All Categories
Home > Documents > BobAndDerek GPU Part4

BobAndDerek GPU Part4

Date post: 14-Apr-2018
Category:
Upload: proxymo1
View: 223 times
Download: 0 times
Share this document with a friend

of 29

Transcript
  • 7/30/2019 BobAndDerek GPU Part4

    1/29

    Speed-up of Algorithms With Graphics

    Processing Units (GPU): Part IV of IV*# Robert H. Luke and*# Derek Anderson

    *Electrical and Computer Engineering Department

    #

    Predoctoral Fellows, NLM Training Grant

    IEEE Computational Intelligence Society MU Chapter

    And

    National Library of Medicine Medical Informatics Training Grant

    Special Seminar Series

  • 7/30/2019 BobAndDerek GPU Part4

    2/29

    Organization of Lectures

    Part I Introduction to GPUs and shader languages

    Part II

    Image processing (Morphology, Sobel, and Gaussian)

    Part III Performance, multi-pass rendering, optimizations, and debugging

    Part IV Using GPUs for non-image based processing (SOFM & CA)

  • 7/30/2019 BobAndDerek GPU Part4

    3/29

    General Purpose Programming

    You already have all the tools required 16/32 bit floating point values

    2D Textures of any size [1,8192]

    Frame Buffer Objects

    But a few extra tricks are helpful Ping Pong

    Reduction

  • 7/30/2019 BobAndDerek GPU Part4

    4/29

    Ping Pong

    The name says it all Have two textures of the same dimensionality

    One fragment program

    Use texture A as input, B as output

    Then swap; B as input, A as output

    FragmentFragment

    ProgramProgramTexture ATexture A Texture BTexture B

    InputInput

    InputInput

    OutputOutput OutputOutput

  • 7/30/2019 BobAndDerek GPU Part4

    5/29

    Reduction

    Use the Ping Pong idea Reduce the size of the texture with each pass

    Distributes operations over more processors

    PONMLKJI

    HGFE

    DCBA

    O+PM+NK+LI+J

    G+HE+F

    C+DA+B

    M+N+O+PI+J+K+L

    E+F+G+H

    A+B+C+D

  • 7/30/2019 BobAndDerek GPU Part4

    6/29

    Reduction OpenGL Side

    Only draw to half the texture Determine proper index in fragment code

    texRECT(2.0*(texCoordXtexRECT(2.0*(texCoordX--.5),.5), texCoordYtexCoordY))texRECT(2.0*(texCoordYtexRECT(2.0*(texCoordY--.5)+1,.5)+1, texCoordYtexCoordY))

  • 7/30/2019 BobAndDerek GPU Part4

    7/29

    SOFM Clustering technique

    Used in many settings for generating a codebook

    Have a 1D/2D/3D space of nodes (Usually 2D) Each map node has dimensionality d

    Input data is of size nXd

    53.88.73.96.577.734.29.421.84.51.3

    35.55.49.754.973.429.15.3928.423.986.5

    75.55.898.725.953.489.199.3948.463.946.5

    Input DataInput Datadd

    nn

    SOFMSOFM

    NodesNodes

  • 7/30/2019 BobAndDerek GPU Part4

    8/29

    SOFM Cont.

    Go through each input vector Find node with minimum dist to current input vector

    Move nearest node closer to input vector

    Also move the nodes around the winning node closer to the input

    vector by smaller amountSOFMSOFM

    NodesNodes

    53.88.73.96.577.734.29.421.84.51.3

    Current Input VectorCurrent Input Vector

    Move winning node and neighborsMove winning node and neighbors

    closer to input vectorcloser to input vector

  • 7/30/2019 BobAndDerek GPU Part4

    9/29

    SOFM Results

    Have a 2D representation of clusters Neighboring nodes are similar

  • 7/30/2019 BobAndDerek GPU Part4

    10/29

    GPU SOFM

    Input Data Written numbers 0-9

    5 samples of each

    Gray Scale image shrunk to 16x16 for each number : d=256

    Texture size for data : 50x256

    5050

    256256

    InputInput

  • 7/30/2019 BobAndDerek GPU Part4

    11/29

    GPU SOFM

    SOFM Nodes 8x8 space

    Dimensionality of each node 256

    Texture size for data : 8x2046

    Need 2 for Ping Pong

    88

    20462046

    88

    20462046

    InputInput OutputOutput

  • 7/30/2019 BobAndDerek GPU Part4

    12/29

    GPU SOFM

    Dist to current input vector Same size as SOFM node texture : 8x2046

    Need 2 for reduction

    88

    20462046

    88

    20462046

    InputInput OutputOutput

  • 7/30/2019 BobAndDerek GPU Part4

    13/29

    GPU SOFM

    Min dist to current vector texture Dimensionality : 8x8

    Need 2 for reduction

    88

    88

    88

    88

    InputInput OutputOutput

  • 7/30/2019 BobAndDerek GPU Part4

    14/29

    GPU SOFM Algorithm

    Determine distance from each node to the input vector Done on a per pixel/index basis

    Input VectorInput VectorDistancesDistances

    SOFM Node DataSOFM Node Data

  • 7/30/2019 BobAndDerek GPU Part4

    15/29

    SUM The Distances

    DistancesDistances

    Sum the distance indexesSum the distance indexes

    Reduced DistancesReduced Distances

    Summation DistancesSummation Distances

  • 7/30/2019 BobAndDerek GPU Part4

    16/29

    Find Minimum Distance

    Perform min reduction

    DistancesDistances Min Reduced DistancesMin Reduced Distances

    Min DistanceMin Distance

  • 7/30/2019 BobAndDerek GPU Part4

    17/29

    Update Node Data

    SOFM Node DataSOFM Node Data

    Input VectorInput Vector

    DistancesDistances

    Min DistanceMin Distance

    Updated SOFM Node DataUpdated SOFM Node Data

  • 7/30/2019 BobAndDerek GPU Part4

    18/29

    And So On

    Continue looping through these steps as many time as desired. Be sure to toggle which SOFM Node Data texture is being read

    from and written to.

  • 7/30/2019 BobAndDerek GPU Part4

    19/29

    Cellular Automata (CA)

    Studied within mathematics, theory of computation, pattern recognition,

    General idea (what we need to know in order to make a GPU program!) A grid of identical finite state automata whose next state is determined solely

    by their current state and the state of their neighbors

    StartStart

    Increasing TimeIncreasing Time

    RulesRules

    1D Grid1D Grid

  • 7/30/2019 BobAndDerek GPU Part4

    20/29

    Steven Wolfram: A New Kind of Science

    We took rules from A New Kind of Science and put them on a GPU

    implementation of Cellular Automata (1D grid)

    References

    http://mathworld.wolfram.com/CellularAutomaton.html

    http://www.wolframscience.com/nksonline/toc.html

  • 7/30/2019 BobAndDerek GPU Part4

    21/29

    http://mathworld.wolfram.com/CellularAutomaton.html

    Same start state, different rules!Same start state, different rules!

  • 7/30/2019 BobAndDerek GPU Part4

    22/29

    Active HeadActive Head

    (red)(red)

  • 7/30/2019 BobAndDerek GPU Part4

    23/29

    CA on a GPU

    Basic idea (for the 1D binary case)

    Pack the data into an image (one channel, i.e. Red)

    Use a FBO for multi-pass rendering (fast!)

    Initialization

    Set the values in the first row of pixels

    Turn some on (1=black above) and some off (0=green above)

    From i=2 to N [N is the number of rows in your image]

    Only render row i

    Fragment program (pixel j row i)

    Look at the three values below you

    Left, center, and right

    Look at the rules and determine your state (on or off)

  • 7/30/2019 BobAndDerek GPU Part4

    24/29

    Selecting a Row to Render!

    OpenGL Setup glViewport(0, 0, (GLsizei) imageWidth, (GLsizei) imageHeight); gluOrtho2D(0.0,imageWidth,0.0,imageHeight);

    Row counter int ca_counter;

    Render Code glPolygonMode(GL_FRONT,GL_FILL); glBegin(GL_QUADS); glTexCoord2i(2,ca_counter-1); glVertex2f(2.0,ca_counter-1); glTexCoord2i(imageWidth-2,ca_counter-1); glVertex2f(imageWidth-2.0,ca_counter-1);

    glTexCoord2i(imageWidth-2,ca_counter); glVertex2f(imageWidth-2.0,ca_counter); glTexCoord2i(2,ca_counter); glVertex2f(2.0,ca_counter); glEnd();

  • 7/30/2019 BobAndDerek GPU Part4

    25/29

    Fragment Program

    void FragmentProgram (

    out float4 color0 : COLOR0 ,

    float2 coords : TEXCOORD0 ,

    uniform samplerRECT tex )

    {

    //I use this for the check below

    half2 tul, tuc, tur, tul2, tur2;

    //Calc the image index

    half2 newindex = coords.xy;

    static const half offset = 1.0;

    tul = texRECT( tex , newindex + float2(-offset,-offset) ).rg;

    tuc = texRECT( tex , newindex + float2(0.0,-offset) ).rg;

    tur = texRECT( tex , newindex + float2(offset,-offset) ).rg;

  • 7/30/2019 BobAndDerek GPU Part4

    26/29

    Fragment Programif(tuc.r == 1.0){

    if( tul.g == 1.0 ){

    if( tuc.g == 1.0 ){

    if( tur.g == 1.0 ){ //1 1 1

    color0 = float4(0.0,0.0,0.0,1.0);

    }else{ //1 1 0

    color0 = float4(0.0,0.0,0.0,1.0);

    }

    }else{

    if( tur.g == 1.0 ){ //1 0 1

    color0 = float4(0.0,1.0,0.0,1.0);

    }else{ //1 0 0

    color0 = float4(0.0,1.0,0.0,1.0);

    }

    }

    }else{

    if( tuc.g == 1.0 ){

    if( tur.g == 1.0 ){ //0 1 1

    color0 = float4(0.0,0.0,0.0,1.0);

    }else{ //0 1 0

    color0 = float4(0.0,0.0,0.0,1.0);

    }

    }else{

    if( tur.g == 1.0 ){ //0 0 1

    color0 = float4(0.0,1.0,0.0,1.0);

    }else{ //0 0 0

    color0 = float4(0.0,1.0,0.0,1.0);

    }

    }

    }}

  • 7/30/2019 BobAndDerek GPU Part4

    27/29

    Fragment Program

    else{

    if( tul.r == 1.0 ){

    tul2 = texRECT( tex , newindex + float2(-2.0*offset,-offset) ).rg;if( (tul2.g == 0.0 && tul.g == 0.0 && tuc.g == 0.0) || (tul2.g == 0.0 && tul.g == 0.0 && tuc.g == 1.0) || (tul2.g == 0.0 && tul.g == 1.0 && tuc.g == 0.0) ||

    (tul2.g == 1.0 && tul.g == 1.0 && tuc.g == 0.0) || (tul2.g == 1.0 && tul.g == 1.0 && tuc.g == 1.0) ){

    color0 = float4(1.0,tuc.g,0.0,1.0);

    }else{

    color0 = float4( tuc , 0.0 , 1.0 );

    }

    }else if( tur.r == 1.0 ){

    tur2 = texRECT( tex , newindex + float2(2.0*offset,-offset) ).rg;

    if( (tuc.g == 0.0 && tur.g == 1.0 && tur2.g == 1.0) || (tuc.g == 1.0 && tur.g == 0.0 && tur2.g == 0.0) || (tuc.g == 1.0 && tur.g == 0.0 && tur2.g == 1.0) ){

    color0 = float4(1.0,tuc.g,0.0,1.0);

    }else{

    color0 = float4( tuc , 0.0 , 1.0 );

    }

    }else{

    color0 = float4( tuc , 0.0 , 1.0 );}

    }

  • 7/30/2019 BobAndDerek GPU Part4

    28/29

    Conway's Game of Life

    For a space that is populated

    Each cell with one or no neighbors dies

    Each cell with four or more neighborsdies, as if by overpopulation

    Each cell with two or three neighbors

    survives For a space that is 'empty' or

    'unpopulated'

    Each cell with three neighbors becomes

    populated

  • 7/30/2019 BobAndDerek GPU Part4

    29/29

    Life on a GPU

    Simple GPU program! Use FBOs

    Render each pixel

    Sample the neighborhood

    Like CA, make a decision based on the rules of the game


Recommended