+ All Categories
Home > Documents > Parallel Memetic Algorithm QUICK DESIGN GUIDE QUICK TIPS ......1 10 100 1000 10000 4 6 8 10 (ms)...

Parallel Memetic Algorithm QUICK DESIGN GUIDE QUICK TIPS ......1 10 100 1000 10000 4 6 8 10 (ms)...

Date post: 07-Feb-2021
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
1
RESEARCH POSTER PRESENTATION DESIGN © 2011 www.PosterPresentation s.com Genome size is a parameter that determines the length of the genetic information carried by individuals. Increasing the genome size causes the algorithm to execute longer for both CPU and GPU algorithms since genome size is directly proportional to the size of data processed. In this paper, a parallel memetic algorithm implementation for CUDA platform is described. The conventional genetic operators are adapted to the GPU considering the GPU architecture. In this population based optimization technique, there are one more islands and each island consists of constant number of individuals. Each CUDA thread is responsible for evolution of one individual, and islands are mapped as CUDA blocks to benefit from the shared memory. The results show up to 38x speedup compared to the CPU implementation. ABSTRACT Populations reside in thread blocks, so the shared memory is effectively used for genetic operators including crossover, mutation and local search. In each thread block, each thread is responsible for an individual. At the beginning of the kernel code, each thread reads its genome data from global memory and puts them into shared memory. After that point all of the calculations done by threads are made on shared memory except the migration operator. Since migration operator requires communication between distinct islands (thread blocks), global memory has to be employed for this operation. GENERAL DESIGN KERNEL PSEUDO CODE EXPERIMENTS TESTING CONFIGURATION The performance of our implementation has been evaluated on a PC having Intel Core i7 870 @ 2.93 GHz processor and NVDIA GeForce GTX 460 (336 cores) GPU. All of the experiments have been repeated 20 times using the same configuration.. The parallel memetic algorithm configuration: population size: 16 to 512 number of islands: 16 to 1024 genome size: 4 tournament selection with winning probability of 0.8 mutation probability: 0.05 local search iteration number: 10 local search step size: 0.001 migration interval: 10 CONCLUSION While it is dependent on the fitness functions, GPU implementation shows better performance for all types of fitness functions. The best speedup is achieved with Griewangk function where 38x speedup is observed. The future work will be focusing on more complex optimization problems providing solutions to real world problems. Moreover, different genetic operators (local search, mutation and crossover) are planned to be implemented on CUDA to achieve better optimization results. Parallel Memetic Algorithm Implementation on CUDA Umut Cinar, Alptekin Temizel Graduate School of Informatics, Middle East Technical University, Ankara, Turkey Function Formula Global Minimum Properties Rosenbrock = [100 +1 2 2 + ( − 1) 2 ] −1 =1 = 1,1, … , 1 ; =0 Not seperable Rastrigin = 10 + [ 2 − 10cos(2 )] =1 = 0,0, … , 0 ; =0 Seperable and Multimodal Griewangk =1+ 2 4000 =1 cos( ) =1 = 0,0, … , 0 ; =0 Not seperable and Multimodal Schwefel = ( =1 ) 2 =1 = 0,0, … , 0 ; =0 Not seperable Initialize shared memory buffers WHILE termination criterion is not met; DO evaluate individual begin local search tournament selection IF thread_id < warp_size offsprings = crossover(parents) offsprings = mutation(offsprings) replace offsprings with worst_individuals ELSE IF thread_id < warp_size*2 AND (MIGRATION_INTERVAL is true) migrate(best_individuals) END IF END WHILE Threads within a block continues in parallel until the variation and migration operators take place. At this point, two warp of threads executes crossover/mutation and migration operations. First warp is assigned to crossover/mutation while the second warp is assigned to migration (when it is migration interval). Hence, this part of the implementation is completely unrolled by using a warp for each operation. This approach decreases the cost of warp divergence significantly. The experiments show that the results are heavily dependent on the fitness function type. In a memetic algorithm, the most computationally complex routine is the fitness function, and this is reflected in the dependency of performance on fitness function type. The maximum speedups achieved with Griewangk function as x38. 16 64 256 0 2 4 6 8 10 12 14 Speedup Population Size Number of Islands Rosenbrock function on GPU (max speedup is x13) 12-14 10-12 8-10 6-8 4-6 2-4 0-2 16 64 256 0 5 10 15 20 25 Speedup Population Size Number of Islands Rastrigin function on GPU (max speedup is x25) 20-25 15-20 10-15 5-10 0-5 16 64 256 0 2 4 6 8 10 Speedup Population Size Number of Islands Schwefel function on GPU (max speedup is x9) 8-10 6-8 4-6 2-4 0-2 16 64 256 0 10 20 30 40 Speedup Population Size Number of Islands Grienwangk function on GPU (max speedup is x38) 30-40 20-30 10-20 0-10 Speedup has been investigated using several benchmarking functions including Rosenbrock, Rastrigin, Schwefel and Griewangk and various parameters. INFLUENCE OF GENOME SIZE ON EXECUTION TIME 53 97 141 185 229 1989 3408 5027 6626 8536 1 10 100 1000 10000 2 4 6 8 10 Execution Time (ms) Genome Size Genome Size vs Execution Time CPU GPU REFERENCES [1] Pablo Moscato, Carlos Cotta, “A Gentle Introduction to Memetic Algorithms”, Handbook of Metaheuristics, pp. 105-144, 2003. [2] Wojciech Bozejko, Mieczyslaw Wodecki, “The Methodology of Parallel Memetic Algorithms Designing”, in Proc. ICAART 2011.
Transcript
  • RESEARCH POSTER PRESENTATION DESIGN © 2011

    www.PosterPresentations.com

    QUICK DESIGN GUIDE (--THIS SECTION DOES NOT PRINT--)

    This PowerPoint 2007 template produces a 91cm x 122cm

    professional poster. It will save you valuable time placing

    titles, subtitles, text, and graphics.

    Use it to create your presentation. Then send it to

    PosterPresentations.com for premium quality, same day

    affordable printing.

    We provide a series of online tutorials that will guide you

    through the poster design process and answer your poster

    production questions.

    View our online tutorials at:

    http://bit.ly/Poster_creation_help

    (copy and paste the link into your web browser).

    For assistance and to order your printed poster call

    PosterPresentations.com at 1.866.649.3004

    Object Placeholders

    Use the placeholders provided below to add new elements

    to your poster: Drag a placeholder onto the poster area,

    size it, and click it to edit.

    Section Header placeholder

    Move this preformatted section header placeholder to the

    poster area to add another section header. Use section

    headers to separate topics or concepts within your

    presentation.

    Text placeholder

    Move this preformatted text placeholder to the poster to

    add a new body of text.

    Picture placeholder

    Move this graphic placeholder onto your poster, size it

    first, and then click it to add a picture to the poster.

    Student discounts are available on our Facebook page.

    Go to PosterPresentations.com and click on the FB icon.

    QUICK TIPS

    (--THIS SECTION DOES NOT PRINT--)

    This PowerPoint template requires basic PowerPoint

    (version 2007 or newer) skills. Below is a list of commonly

    asked questions specific to this template.

    If you are using an older version of PowerPoint some

    template features may not work properly.

    Using the template

    Verifying the quality of your graphics

    Go to the VIEW menu and click on ZOOM to set your

    preferred magnification. This template is at 100% the size

    of the final poster. All text and graphics will be printed at

    100% their size. To see what your poster will look like

    when printed, set the zoom to 100% and evaluate the

    quality of all your graphics before you submit your poster

    for printing.

    Using the placeholders

    To add text to this template click inside a placeholder and

    type in or paste your text. To move a placeholder, click on

    it once (to select it), place your cursor on its frame and

    your cursor will change to this symbol: Then, click

    once and drag it to its new location where you can resize

    it as needed. Additional placeholders can be found on the

    left side of this template.

    Modifying the layout

    This template has four different

    column layouts. Right-click your

    mouse on the background and

    click on “Layout” to see the

    layout options. The columns in

    the provided layouts are fixed and cannot be moved but

    advanced users can modify any layout by going to VIEW

    and then SLIDE MASTER.

    Importing text and graphics from external sources

    TEXT: Paste or type your text into a pre-existing

    placeholder or drag in a new placeholder from the left

    side of the template. Move it anywhere as needed.

    PHOTOS: Drag in a picture placeholder, size it first, click

    in it and insert a photo from the menu.

    TABLES: You can copy and paste a table from an external

    document onto this poster template. To adjust the way

    the text fits within the cells of a table that has been

    pasted, right-click on the table, click FORMAT SHAPE then

    click on TEXT BOX and change the INTERNAL MARGIN

    values to 0.25

    Modifying the color scheme

    To change the color scheme of this template go to the

    “Design” menu and click on “Colors”. You can choose from

    the provide color combinations or you can create your

    own.

    © 2011 PosterPresentations.com 2117 Fourth Street , Unit C Berkeley CA 94710 [email protected]

    • Genome size is a parameter that determines the length of the genetic information carried by individuals.

    • Increasing the genome size causes the algorithm to execute longer for both CPU and GPU algorithms since genome size is directly proportional to the size of data processed.

    In this paper, a parallel memetic algorithm implementation for CUDA platform is described. The conventional genetic operators are adapted to the GPU considering the GPU architecture. In this population based optimization technique, there are one more islands and each island consists of constant number of individuals. Each CUDA thread is responsible for evolution of one individual, and islands are mapped as CUDA blocks to benefit from the shared memory. The results show up to 38x speedup compared to the CPU implementation.

    ABSTRACT

    Populations reside in thread blocks, so the shared memory is effectively used for genetic operators including crossover, mutation and local search. In each thread block, each thread is responsible for an individual.

    At the beginning of the kernel code, each thread reads its genome data from global memory and puts them into shared memory. After that point all of the calculations done by threads are made on shared memory except the migration operator. Since migration operator requires communication between distinct islands (thread blocks), global memory has to be employed for this operation.

    GENERAL DESIGN

    KERNEL PSEUDO CODE

    EXPERIMENTS

    TESTING CONFIGURATION

    The performance of our implementation has been evaluated on a PC having Intel Core i7 870 @ 2.93 GHz processor and NVDIA GeForce GTX 460 (336 cores) GPU. All of the experiments have been repeated 20 times using the same configuration..

    The parallel memetic algorithm configuration:

    • population size: 16 to 512

    • number of islands: 16 to 1024

    • genome size: 4

    • tournament selection with winning probability of 0.8

    • mutation probability: 0.05

    • local search iteration number: 10

    • local search step size: 0.001

    • migration interval: 10

    CONCLUSION

    • While it is dependent on the fitness functions, GPU implementation shows better performance for all types of fitness functions.

    • The best speedup is achieved with Griewangk function where 38x speedup is observed.

    • The future work will be focusing on more complex optimization problems providing solutions to real world problems.

    • Moreover, different genetic operators (local search, mutation and crossover) are planned to be implemented on CUDA to achieve better optimization results.

    Parallel Memetic Algorithm

    Implementation on CUDA Umut Cinar, Alptekin Temizel

    Graduate School of Informatics, Middle East Technical University, Ankara, Turkey

    Function Formula Global Minimum Properties

    Rosenbrock 𝑓𝑅𝑜𝑠 𝑥 = [100 𝑥𝑖+1 − 𝑥𝑖22+ (𝑥𝑖 − 1)

    2]

    𝑝−1

    𝑖=1

    𝑥∗ = 1,1,… , 1 ; 𝑓𝑅𝑜𝑠 𝑥

    ∗ = 0 Not

    seperable

    Rastrigin 𝑓𝑅𝑎𝑠 𝑥 = 10𝑝 + [𝑥𝑖2 − 10cos(2𝜋𝑥𝑖)]𝑝

    𝑖=1

    𝑥∗ = 0,0,… , 0 ; 𝑓𝑅𝑎𝑠 𝑥

    ∗ = 0

    Seperable

    and

    Multimodal

    Griewangk 𝑓𝐺𝑟𝑖 𝑥 = 1 + 𝑥𝑖2

    4000

    𝑝

    𝑖=1

    − cos(𝑥𝑖

    𝑖)

    𝑝

    𝑖=1 𝑥

    ∗ = 0,0,… , 0 ; 𝑓𝐺𝑟𝑖 𝑥∗ = 0

    Not

    seperable

    and

    Multimodal

    Schwefel 𝑓𝑆𝑐ℎ 𝑥 = ( 𝑥𝑗𝑖

    𝑗=1

    )2

    𝑝

    𝑖=1

    𝑥∗ = 0,0,… , 0 ; 𝑓𝑆𝑐ℎ 𝑥

    ∗ = 0 Not

    seperable

    Initialize shared memory buffers

    WHILE termination criterion is not met; DO

    evaluate individual

    begin local search

    tournament selection

    IF thread_id < warp_size

    offsprings = crossover(parents)

    offsprings = mutation(offsprings)

    replace offsprings with worst_individuals

    ELSE IF thread_id < warp_size*2 AND

    (MIGRATION_INTERVAL is true)

    migrate(best_individuals)

    END IF END WHILE

    • Threads within a block continues in parallel until the variation and migration operators take place.

    • At this point, two warp of threads executes crossover/mutation and migration operations. First warp is assigned to crossover/mutation while the second warp is assigned to migration (when it is migration interval).

    • Hence, this part of the implementation is completely unrolled by using a warp for each operation. This approach decreases the cost of warp divergence significantly.

    • The experiments show that the results are heavily dependent on the fitness function type. In a memetic algorithm, the most computationally complex routine is the fitness function, and this is reflected in the dependency of performance on fitness function type.

    • The maximum speedups achieved with Griewangk function as x38.

    16

    64

    2560

    2

    4

    6

    8

    10

    12

    14

    Spe

    ed

    up

    Population Size

    Number of Islands Rosenbrock function on GPU (max speedup is x13)

    12-14

    10-12

    8-10

    6-8

    4-6

    2-4

    0-2

    16

    64

    2560

    5

    10

    15

    20

    25

    Spe

    ed

    up

    Population Size

    Number of Islands

    Rastrigin function on GPU (max speedup is x25)

    20-25

    15-20

    10-15

    5-10

    0-5

    16

    64

    256

    0

    2

    4

    6

    8

    10

    Spe

    ed

    up

    Population Size

    Number of Islands

    Schwefel function on GPU (max speedup is x9)

    8-10

    6-8

    4-6

    2-4

    0-2

    16

    64

    2560

    10

    20

    30

    40

    Spe

    ed

    up

    Population Size

    Number of Islands

    Grienwangk function on GPU (max speedup is x38)

    30-40

    20-30

    10-20

    0-10

    Speedup has been investigated using several benchmarking functions including Rosenbrock, Rastrigin, Schwefel and Griewangk and various parameters.

    INFLUENCE OF GENOME SIZE ON EXECUTION TIME

    53 97

    141 185 229

    1989 3408

    5027 6626 8536

    1

    10

    100

    1000

    10000

    2 4 6 8 10

    Exe

    cuti

    on

    Tim

    e (

    ms)

    Genome Size

    Genome Size vs Execution Time

    CPU

    GPU

    REFERENCES

    [1] Pablo Moscato, Carlos Cotta, “A Gentle Introduction to Memetic Algorithms”, Handbook of Metaheuristics, pp. 105-144, 2003. [2] Wojciech Bozejko, Mieczyslaw Wodecki, “The Methodology of Parallel Memetic Algorithms Designing”, in Proc. ICAART 2011.

    http://www.facebook.com/pages/PosterPresentationscom/217914411419?v=app_4949752878&ref=ts

Recommended