Performance Analysis of AMD Multi-core Processor and Graphical...

Performance Analysis of AMD Multi-core Processor and Graphical Processing Units

Mohammad Ashraf BhuiyanMelissa C. SmithVivek K. Pallipuram

June 2011

This work supported in part by NSF Grant No. CCF-0916387

Motivation

The recent trend of computingMulticore and many-core processorsMany-core GPUs

Various types of Accelerators availableNumber of cores, threadsMemory hierarchyProgramming modelsCode optimization techniques

Parallel program development requires knowledge ofAcceleration techniques and optimizationsApplication characteristics

This calls forPerformance analysis of Accelerators for ApplicationsUnderstanding the match between Accelerators and Applications

2

Outline

Experimental SystemAMD 8 core and 32 core CPUAMD 1600 core GPU

Spiking Neural NetworkBiological ModelsNetwork Design

Preliminary ResultsEffect of problem sizeEffect of optimizationsEffect of threads/cores

Future Work

3

Experimental Systems

4

Utilizing several leading architecturesAMD 8-core (Opteron 2356)AMD 32-core (Opteron 6134)AMD 1600-core GPU (Radeon 5870)

Case Study: Neuron Models & Network

5

Two Layer Network:

SNN Model FLOPs per neuron update

Memory Accessper Neuron (Byte)

FLOP/ByteRatio

Izhikevich 13 20 0.65

Wilson 38 44 0.86

Morris-Lecar 132 28 4.71

Hodgkin-Huxley 246 44 6.02

Image

Level 1 neurons

Level 2 neurons

Network (Problem Size) Scaling

6

Image Size Level 1Neurons

Level 2 Neurons

Total Neurons

96×96 9216 48 9264

192×192 36864 48 36912

240×240 57600 48 57648

…… …… …… ……

2400×2400 5,760,000 48 5,760,048

3120×3120 9,734,400 48 9,734,448

Preliminary Results

Accelerator performance studyProblem sizeOptimization techniquesAccelerator configuration

Number of threads for CPULocal work group size for GPU

7

Problem Size Variation

8

Izhikevich Wilson

Speedup over a serial implementation on Intel core 2 quad, 2.66 GHz, using all compiler optimizations

Problem Size Variation Cont.

9

Morris-Lecar Hodgkin-Huxley

Optimization Techniques Used

10

AMD Multi-core

1. pth: POSIX thread, 2. SSE: Streaming SIMD

Extension 3, 3. SP: Software Prefetching

AMD Radeon GPU

1. MT: Multithread 2. SP: Software Prefetching3. LM: Local Memory4. MW: Memory Write 5. MAT: Unsafe Math and

Native Math 6. RCS: Reducing

Conditional Statement7. VEC: Vector Calculation

Optimization: AMD 8 core

11

Izhikevich Wilson

pth: POSIX thread, SSE: Streaming SIMD Extension 3, SP: Software Prefetching

Optimization : AMD 8 core Cont.

12



Optimization: AMD 32 core

13

Izhikevich Wilson


Optimization : AMD 32 core Cont.

14



Optimization : AMD 1600 core GPU

15

Izhikevich Wilson

MT: multithread, SP: software prefetching, LM: local memory, MW: memory write, RCS: reducing conditional statement, MAT: Unsafe and Native math, VEC: Vector Calculation

Optimization: AMD 1600 core GPU

16


MT: multithread, SP: software prefetching, LM: local memory, MW: memory write, RCS: reducing conditional statement, MAT: Unsafe and Native math VEC: Vector Calculation

Thread Effect: AMD 8 core

17

Thread Effect: AMD 32 core

18

Thread Effect: AMD 1600 core GPU

19

Performance Observations

20

Problem Size EffectGenerally performance improves with problem sizeIzhikevich model on AMD 8 core CPU

Speedup of 9x for 9000 neurons; 16x for 9.7 million neurons

HH model on AMD 1600 core GPUSpeedup of 11x for 9000 neurons;603x for 9.7 million neurons

Flop:byte Ratio EffectsHigher value provides better performanceIzhikevich (0.65): 12xHH (6.02) : 603x

Performance Observations

21

Architecture Specific Optimizations Generally performance improves with optimizationsAlso depends on

Problem sizeFlop:byte ratio

Threading EffectGenerally performance improves with threadsAlso depends on

Problem sizeOverhead for Intra-processor communications

Future Work

22

Extend the experimentHeterogeneous architecture (multi-core + GPU)Multi-node accelerators (Supercomputers)Accelerators from other vendorsOther application kernels such as

BioinformaticsMolecular DynamicsOptimization problems (Simulated Annealing)

Related Publications

23

JournalMohammad Bhuiyan, Melissa C. Smith, Vivek K. Pallipuram, “Performance, Optimization and Fitness: Connecting Applications to Architectures”, in Journal of Concurrency and Computation: Practice and Experience, Wiley, December 2010, DOI: 10.1002/cpe.1688Vivek K. Pallipuram, Mohammad Bhuiyan, and Melissa C. Smith, “A Comparative Study of GPU Programming Models and Architectures”, in Journal of Supercomputing, Springer, May 2011, DOI: 10.1007/s11227-011-0631-3

ConferenceMohammad Bhuiyan, Ananth Nallamuthu, Melissa C. Smith, and Vivek K. Pallipuram, “Optimization and Performance Study of Large-scale Biological Networks For Reconfigurable Computing,” in proceedings of HPRCTA, SC 10, New Orleans, April 2010Mohammad Bhuiyan, Vivek K. Pallipuram and Melissa C. Smith, “Acceleration of Spiking Neural Networks in Emerging Multi-core and GPU Architectures,” in IEEE proceedings HiCOMB, IPDPS, Atlanta, GA, April 2010Kenneth Rice, Mohammad Bhuiyan, Tarek M. Taha, Christopher N. Vutsinas, Melissa C Smith, “FPGA Implementation of Izhikevich Spiking Neural Networks for Character Recognition,” in proceedings ReConFig09, pp. 451 – 456, Dec. 2009

Thank you

24

Disclaimer & AttributionThe information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes.NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners.The contents of this presentation were provided by individual(s) and/or company listed on the title page. The information and opinions presented in this presentation may not represent AMD’s positions, strategies or opinions. Unless explicitly stated, AMD is not responsible for the content herein and no endorsements are implied.

25

Date post:	12-Sep-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Performance Analysis of AMD Multi-core Processor and Graphical...

Documents